Spring form and UTF-8 bad encoding - spring

In our Web app we've faced a bad encoding problem. In order to reproduce this problem user selects in browser non unicode encoding(as example in chrome -> More tools->encoding->Koi8) and tries to set Cyrillic text.
Chars were spoiled when it goes to controller (just checked on debug) and even it's stored incorrectly and incorrectly rendered.
We've followed all recommendations: http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html and seems this is a problem with submitting of application/x-www-form-urlencoded encoding content type. Because it's impossible to set charset during such forms submits.
As example if submit the same data using json and set necessry content type everything is stored correctly.
We've also tried example with this article:
http://www.codejava.net/frameworks/spring/spring-mvc-form-handling-tutorial-and-example and added additionally UTF8 filer with the following method:
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException,
ServletException {
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
But the same problem was reproducible.
Could somebody suggest how to resolve mentioned problem?
Is it possible to correctly handle mentioned use case in Spring MVC because we tried on simple example and it seems it doesn't work. Does this use case of changing browser encoding valid at all ?

Try : In web.xml
<filter>
<filter-name>encoding-filter</filter-name>
<filter-class>
org.springframework.web.filter.CharacterEncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encoding-filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
Refer : http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
Also, in view if you use jstl try to set the default encoding

This behavior can be achieved by using accept-charset="UTF-8" attribute in form.
It can be added in spring form tag. Also there's a bug in older versions of struts (1.1 are affected)
https://issues.apache.org/jira/browse/STR-1636
that makes impossible to add this attribute directly to the form. As a workaround jQuery can be used
jQuery( document ).ready(function() {
jQuery("#formSelector").attr("accept-charset", "UTF-8");
});
So in a nutshell this attribute will force browser send data from this form using specified encoding. In case user will set some control characters into the input they will be sent to backend as well so validation is required to prevent such cases. Also cases where the browser encoding and keyboard languages are not working well together will be handled (for example KOI8-U and Chinese keyboard language).
accept-charset official documentation

Somewhere in your request pipeline your overriding the encoding (ie String.getBytes() or new String(bytes) is being called with out the right encoding.
There are so many places that this can potentially happen and its one of the reasons why Spring Boot and various other frameworks force UTF-8 for both input and output particularly since UTF-8 is the recommended encoding.
Your users should not be changing the encoding. In fact when the page loads both the servlet response and the HTML itself should specify UTF-8 and this is for good reason. The server is saying "I speak UTF-8". If you want a different encoding you will have to specify a different encoding in the HTML (ie jsp), and the servlet request/responses so that the browser will then auto select encoding. Even then your mileage will vary for application/x-www-form-urlencoded particularly if you use Javascript (probably because the spec on the encoding in other characters is somewhat ambiguous). To give you some more example of why the HTML has to have the exact same encoding as what your sending over is that the name value pairs will have different encoding. That is your form has UTF-8 request parameter names (because thats whats in the HTML) but when you override it your supplying a different encoding for the parameter values (ie ?UTF-8Name=KoiValue&UTF-8Name=KoiValue). Hopefully you can see why that is bad and I'm not sure chrome is smart enough (or if it even should do it) to change the request parameter names back to Koi8.
Thus if you absolutely must support other character encoding you probably should use multipart/form (you specify this in the enctype attribute on the form element) AND NOT USE the encoding filters that set UTF-8 as those will probably cause corruption.

Related

MS Edge sending one field in ajax request with bad characters ‎. How to omit?

I'm using a time and date controller plugin for jQuery: http://jonthornton.github.com/jquery-timepicker/
In a form on a web page, the default values from a date and a time picker are combined to make a date time string and sent with an Ajax request to a servlet sitting on a Tomcat 8 server running on Red Hat Enterprise Linux.
In Chrome, this works as expected - all JSON text is received as intended.
With MS Edge (IE does not have this problem), the characters come back with garbage characters: ‎ preceding every character. This appears to be the upside down quote character according to an online reference.
Example:
"timeReported":"‎6‎/‎22‎/‎2017 ‎10‎:‎29‎:‎09‎ ‎AM","description":"whatthewhat"
All other fields in the JSON submitted are just fine, the description above being an example.
I populate the fields on the form with this bit of Javascript:
$('#TimeReported .time').timepicker({
'showDuration': true,
'timeFormat': 'g:ia',
'step': 5
});
$('#TimeReported .date').datepicker({
'format': 'mm/dd/yyyy',
'autoclose': true
});
And read the values from the inputs like this:
// joined to avoid any problems with unary + which may take some values as numbers
var timeelements= [
String($('#TimeReported .date').val()),
String($('#TimeReported .time').val())
];
var issueTimeReported=timeelements.join(' ');
The value that is derived and placed into the json record debugs from IE as:
"‎6‎/‎22‎/‎2017 ‎1‎:‎00‎:‎51‎ ‎PM"
The data is submitted as JSON via jQuery with the following options:
type : "POST",
url : "submitForm.page",
async: true,
dataType: 'json',
contentType: 'application/json;charset=Windows-1252',
processData: false,
data : JSON.stringify(rdata),
Reading a bit, I find that this is commonly a misalignment of encodings, where utf-8 or ISO-whatever conflicts with Windows-1252.
I also noticed the debug console in Edge reporting the HTTP header and the page were sending conflicting encodings, so I removed all page specific encodings and applied a filter on the web.xml on tomcat to force everything to Windows-1252.
<!-- A filter that sets character encoding that is used to decode -->
<!-- parameters in a POST request -->
<filter>
<filter-name>setCharacterEncodingFilter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>Windows-1252</param-value>
</init-param>
<async-supported>true</async-supported>
</filter>
<!-- The mapping for the Set Character Encoding Filter -->
<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
This seems to have resolved the conflicting encoding warning, but the results received at the server are still the same. The effective doctype is X-UA-Compatible (via meta tag).
<meta http-equiv="X-UA-Compatible" content="IE=edge">
I would like to prevent these characters from being sent from the browser, but if necessary, I could explicitly filter them in the servlet. It seems that the problem is between IE's backend, submitting the Ajax request, and Tomcat, and I don't think it's on Tomcat's end.
I've worked around this problem by catching the date parsing exception my code throws when trying to ingest this data and providing a suitable substitute (it only does it with the default values from the controls, which is "now", so we substitute "now" server side when this blows.)
But that's not the answer. The fact that selecting values from the controls bypasses the issue, e.g. not default values, suggests that there may be a problem with the jQuery time and date picker plugin I'm using.
We've submitted an issue on the plugin. At this time all testing points to the the way the controls are initialized. https://github.com/jonthornton/jquery-timepicker/issues/624
This problem typically arises in web development due to a mismatch between character sets on the ajax request and receiver system. Everything tried in the question would normally resolve this issue, namely ensuring whichever character set is set as expected in the SetCharacterEncodingFilter matches that provided in the incoming web request contentType.
In this specific instance, the issue lies within a third party plugin, so there is no specific resolution to this particular question, though the root problem is addressable and correctable.

sendRedirect in JSF 2.2

I am upgrading JSF from 1.2 to 2.2 version.
I have a simple response.sendRedirect() in my backing bean method. With JSF2.2, it started giving "java.lang.IllegalStateException: Cannot change buffer size after data has been written at org.apache.catalina.connecto" exception.
After adding "FacesContext.getCurrentInstance().responseComplete();", it worked!
Can anyone help me in understanding how the implementation is upgraded in JSF2.2 that redirect is not working without explicity saying response is completed?
Thanks!
You're supposed to use ExternalContext#redirect() for the job.
public void submit() throws IOException {
// ...
ExternalContext ec = FacesContext.getCurrentInstance().getExternalContext();
ec.redirect(ec.getRequestContextPath() + "/otherpage.jsf");
}
This has always been the case since the beginning, also in JSF 1.x. It will under the covers automatically call FacesContext#responseComplete() after performing the HttpServletResponse#sendRedirect(). The responseComplete() will basically instruct JSF that the response is already manually completed, and that JSF basically doesn't need to proceed to render response phase then (i.e. writing the navigation outcome into the response).
Moreover, any attempt to grab and downcast the raw javax.servlet.* API from under JSF's covers should be taken as a hint to think twice if there isn't already a JSF-ish way to achieve the same. In JSF 2.x there's an additional new way to perform a redirect: append faces-redirect=true query parameter to the (implicit) navigation outcome:
public String submit() {
// ...
return "otherpage?faces-redirect=true";
}
As to the illegal state exception you faced, JSF 2.2 just postpones setting the response headers to the point when it actually needs to render response. It will be too late if the response is already committed.
Java.lang.IllegalStateException: Cannot change buffer size after data has been written at org.apache.catalina.connecto" exception.
This may occur because of you had set response buffer size manually for reducing memory reallocation at rendering time but your page has more size than buffer size
For example
<context-param>
<param-name> javax.faces.FACELETS_BUFFER_SIZE </param-name>
<param-value> 55555 </param-value>
</context-param>

Spring message in Javascript: cannot display Spanish accent characters properly

I am facing a very general Spring message issue but so far doesn't have a simple solution, so hope everyone here can enlighten me a little bit.
Current Spring MVC application has an issue on properly display Spanish accent characters on javascript alert. The alert message now shows up like this:
Por favor elija la fecha de aplicación
but it is supposed to show up like this:
Por favor elija la fecha de aplicación
above message pops up when user failed the validation, which processed by javascript:
alert("<spring:message code='message_miss_duedate' />");
but if I put the whole string in Spanish into the javascript:
alert("Por favor elija la fecha de aplicación");
the output is fine.
Cause of the issue is obvious: the &Xacute; is generated by method from Spring message to convert Spanish accent characters to HTML friendly codes, which works fine when parsed by html, however, such code is not recognized by javascript.
So far the 'EncodingFilter' is set to UTF-8
<filter>
<filter-name>CharacterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
and same to the pom setting:
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
So, is there anyway to skip the accent character conversion by Spring when using Spring message? Thanks.
UPDATE
The solution is very simple, use htmlEscape="false" in spring:message, by default the value is true hence the escaping characters
so
alert('<spring:message code="message_miss_duedate" htmlEscape="false"/>');
now pop up message looks pretty.
END of UPDATE
Now I have a workaround but not a solution because it has limitation, so I will leave this thread open until there is a general solution:
In the controller using Spring MessageSource object to pass the alert message into uiModel:
Locale currentLocale = LocaleContextHolder.getLocale();
uiModel.addAttribute("message_miss_duedate", messageSource.getMessage("message_miss_duedate", null, currentLocale));
So in javascript we can get the message like normal JSTL variable
alert("${message_miss_duedate}");
But like mentioned this approach as limit because it is difficult to handle runtime responding message with variable, especially with code template.

URL Form Encoded special German Characters between AngularJS and Jackson Backend

I have an AngularJS Frontend and a Spring MVC Backend with Jackson to take care of the Serialization and JS<->Java conversion
When i pass German characters like "ö, ä, ü, ß" to my backend via http body payload, there is no problem. I have the header "Content-Type" "application/json;charset=UTF-8" and all works fine.
But if i have those characters in my url angular encodes them. This is fine however it encodes them a different way that jackson tries to decode i believe.
Here is what Angular makes out of "höhe": h%C3%B6he
I believe Jackson expects: h%f6he
I think this is because UTF8 is 2 byte while ASCII is 1 byte encoding. However is there a setting for either Jackson or Angular to "speak the same encoding language"?
Thanks for any help!
Kind regards,
Pascal
Jackson does not handle URL decoding, as it requires an input source such as InputStream or String: it is most likely that the Servlet container (Jetty?) that service runs on handles this. One problem is that definition of which encoding URL should use is... well, poorly defined really: "Content-Type" does NOT define this (it's just for payload).
So you need to figure out how to make servlet container and client have shared understanding of what encoding is to be used (difference in your case looks like UTF-8 vs Latin-1).
Or: if you can make client escape all non-ASCII characters with JSON escape sequences, that will also work.

url encoded character gets parsed wrongly by webflow/EL/JSF

when I submit the character Ö from a webpage the backend recieves Ã. The webpage is part of a Spring Webflow/JSF1.2/Facelets application. When I inspect the POST with firebug I see:
Content-Type: application/x-www-form-urlencoded
Content-Length: 74
rapport=krediet_aanvragen&fw1=0&fw2=%C3%96ZTEKIN&fw3=0&fw4=0&zoeken=Zoeken
The character Ö is encoded as %C3%96, using this table I can see that it is the correct hexadecimal representation of the UTF-8/Unicode character Ö.
However when it reaches the backend the character is changed into Ã. Using the same table I can see there is some code somewhere that tries to interpret the C3 and the 96 separately (or as unicode \u notation). U+00C3 happens to be Ã, 96 is not a visible character so that explains that.
Now I know this is a typical case of an encoding mismatch, I just don't know where to look to fix this.
The webpage contains
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
When debugging I can see the library responsible for the wrong interpration is jboss-el 2.0.0.GA, which seems right because the value is parsed to the backend in a webflow expression:
<evaluate expression="rapportCriteria.addParameter('fw2', flowScope.fw2)" />
It is put onto the flowScope by:
<evaluate expression="requestParameters.fw2" result="flowScope.fw2"/>
Nevermind the convulated way of getting the form input into the backend, this is code that tries to integrate Webflow with BIRT reports...but I have the same sympton in other webapplications.
Any idea where I have to start looking?
I can see that it is the correct hexadecimal representation of the UTF-8/Unicode character Ö. However when it reaches the backend the character is changed into Ã.
So the client side character encoding to encode the POST body is correct, but the server side character encoding to decode the POST body not. You need to create a Filter which does basically the following in doFilter() method
request.setCharacterEncoding("UTF-8");
and map it on URL pattern of interest. Spring also already provides one out the box, the CharacterEncodingFilter which does basically the above. All you need to do is to add it to the web.xml:
<filter>
<filter-name>characterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>characterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
See also:
Unicode - How to get characters right? - JSP/Servlet requests - POST
The HTML meta header is by the way irrelevant in the issue, it's ignored when the page is served over HTTP. It's the HTTP response header which instructs the webbrowser in what charset it should display the response and to send the params back to the server. This is apparently already been set properly since the POST body is correctly encoded. The HTML meta header is only been used when the user saves the page to local disk and revisits it later from local disk.

Resources