Recently I’ve had a weird situation when trying to login to VMI (Lithuanian tax inspection) website via SEB bank system (banking systems are commonly used for authenticating users in Lithuania. They act as single-sign-on solutions where user identification is required). After authentication and redirection to the VMI website I received an error message of failed login. I was using Chrome web browser.
Although I contacted them, I’ve received a polite response with a “go fix yourself” message. Therefore I tried to investigate what’s going on with the workflow. What motivated me even more is that I was able to login using other browsers (Chrome on Android, Firefox).
tl;dr; I have found out that the issue is due to lack of HTTP response header providing the information of the character set. Always make sure your backend application returns Content-type header with appropriate character set.
I used Burp to track the HTTP workflow and later to intercept the requests to validate my assumptions.
You can see the workflow from the requests and responses below.
As the response does not have a defined character encoding (neither in HTTP response headers, nor as the HTML meta tag), the Chrome browser treats the response as the default character set - in this situation UTF-8.
Since the surname characters š and č from the form field (PERSON_NAME) are not recognized as the proper UTF-8 characters, they are replaced by Unicode replacement character (U+FFFD). When these characters are encoded back to the UTF-8, their representation becomes: 0xEF 0xBF 0xBD.
Request to https://www.vmi.lt/sso/internetauth
You can see that 0xEF 0xBF 0xBD (URL-encoded value: %EF%BF%BD) is posted as the HTML form value. As it differs from the original value, the signature, obviously, becomes wrong and further workflow is rejected.
How to make it work
Browser must treat the character set of response from https://deklaravimas.vmi.lt/InternetAuth.aspx respectively to make a proper request to further to https://www.vmi.lt/sso/internetauth.
There are 2 ways to make it work - either configure the user agent (browser) or provide the information about the charset in HTTP response headers.
Setting the default charset to windows-1257 solves the issue on the clien side:
However it just fixes the consequence, but NOT the cause, therefore it is just a hack, but not a solution.
I have used Burp to intercept and modify the traffic to validate my assumptions.
Adding the Content-Type: text/html; charset=windows-1257 response header makes the browser interpret the data correctly and solves the issue:
Since this solution is client-agnostic and can be controlled by the asset (system) owner, this is the proper solution.
As it works with other browsers (Firefox, Chrome on Android), it just confirms that relying on user agent (browser) is a bad practice.
Well, not much to summarize here - ALWAYS ensure that your web application includes the charset in the HTTP response headers, especially if you don’t use Unicode.
If possible, always use UTF-8, which is the de-facto standard for data communication in the web. Otherwise you will have problems more or less - storing data in the DB, converting data to one or another charset, validating and sanitizing data etc.
Neither SEB bank website, nor VMI website, who operate PII (personally identifyable information) do not ensure strict transport security (via HSTS, CSP and HPKP). Therefore it might be possible to spoof the certificates and intercept network traffic to these systems, storing sensitive information.