struts2 form to accept multi language - utf-8

I have a form which submit details like name, address, affiliations. Here these inputs will in different language like french, Spanish, German, Russian and so on. I point that these inputs are some time have non English keyboard character and are submitted as different character like &,^ and so on.
for example,
this is the input
Instituto de Quı´mica, Universidade de Sa˜ o Paulo, Sa˜ o Paulo,
Brazil
and this is the data that saved in DataBase while I submit the form
Instituto de Qu?´mica, Universidade de Sa˜ o Paulo, Sa˜ o Paulo, Brazil
I have set the character set as UTF-8 in database and in jsp page first later I found that struts 2 form has a tag attribute acceptcharset="UTF-8"
and it has been working for only few other language but not for Spanish, Portuguese nad many more.
so what is the solution for this issue?

I have Fix this by changing UTF-8 in pageEncoding and charset in HTML page where ever I see this and in form i used acceptcharset="UTF-8" and last I get issue in storing it in DB even its charset is charset is UTF-8 so I forced DB connection to use UTF-8 by providing jdbc:mysql://localhost:3306/yourDB?useUnicode=true&characterEncoding=utf8 in connection url

You can use Spring filter, and force the encoding to UTF8.
Add this to your web.xml:
<filter>
<filter-name>encodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>

If you need to force jsp to UTF-8 you can write the following in web.xml:
<jsp-config>
<jsp-property-group >
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>

Related

MS Edge sending one field in ajax request with bad characters ‎. How to omit?

I'm using a time and date controller plugin for jQuery: http://jonthornton.github.com/jquery-timepicker/
In a form on a web page, the default values from a date and a time picker are combined to make a date time string and sent with an Ajax request to a servlet sitting on a Tomcat 8 server running on Red Hat Enterprise Linux.
In Chrome, this works as expected - all JSON text is received as intended.
With MS Edge (IE does not have this problem), the characters come back with garbage characters: ‎ preceding every character. This appears to be the upside down quote character according to an online reference.
Example:
"timeReported":"‎6‎/‎22‎/‎2017 ‎10‎:‎29‎:‎09‎ ‎AM","description":"whatthewhat"
All other fields in the JSON submitted are just fine, the description above being an example.
I populate the fields on the form with this bit of Javascript:
$('#TimeReported .time').timepicker({
'showDuration': true,
'timeFormat': 'g:ia',
'step': 5
});
$('#TimeReported .date').datepicker({
'format': 'mm/dd/yyyy',
'autoclose': true
});
And read the values from the inputs like this:
// joined to avoid any problems with unary + which may take some values as numbers
var timeelements= [
String($('#TimeReported .date').val()),
String($('#TimeReported .time').val())
];
var issueTimeReported=timeelements.join(' ');
The value that is derived and placed into the json record debugs from IE as:
"‎6‎/‎22‎/‎2017 ‎1‎:‎00‎:‎51‎ ‎PM"
The data is submitted as JSON via jQuery with the following options:
type : "POST",
url : "submitForm.page",
async: true,
dataType: 'json',
contentType: 'application/json;charset=Windows-1252',
processData: false,
data : JSON.stringify(rdata),
Reading a bit, I find that this is commonly a misalignment of encodings, where utf-8 or ISO-whatever conflicts with Windows-1252.
I also noticed the debug console in Edge reporting the HTTP header and the page were sending conflicting encodings, so I removed all page specific encodings and applied a filter on the web.xml on tomcat to force everything to Windows-1252.
<!-- A filter that sets character encoding that is used to decode -->
<!-- parameters in a POST request -->
<filter>
<filter-name>setCharacterEncodingFilter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>Windows-1252</param-value>
</init-param>
<async-supported>true</async-supported>
</filter>
<!-- The mapping for the Set Character Encoding Filter -->
<filter-mapping>
<filter-name>setCharacterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
This seems to have resolved the conflicting encoding warning, but the results received at the server are still the same. The effective doctype is X-UA-Compatible (via meta tag).
<meta http-equiv="X-UA-Compatible" content="IE=edge">
I would like to prevent these characters from being sent from the browser, but if necessary, I could explicitly filter them in the servlet. It seems that the problem is between IE's backend, submitting the Ajax request, and Tomcat, and I don't think it's on Tomcat's end.
I've worked around this problem by catching the date parsing exception my code throws when trying to ingest this data and providing a suitable substitute (it only does it with the default values from the controls, which is "now", so we substitute "now" server side when this blows.)
But that's not the answer. The fact that selecting values from the controls bypasses the issue, e.g. not default values, suggests that there may be a problem with the jQuery time and date picker plugin I'm using.
We've submitted an issue on the plugin. At this time all testing points to the the way the controls are initialized. https://github.com/jonthornton/jquery-timepicker/issues/624
This problem typically arises in web development due to a mismatch between character sets on the ajax request and receiver system. Everything tried in the question would normally resolve this issue, namely ensuring whichever character set is set as expected in the SetCharacterEncodingFilter matches that provided in the incoming web request contentType.
In this specific instance, the issue lies within a third party plugin, so there is no specific resolution to this particular question, though the root problem is addressable and correctable.

Spring form and UTF-8 bad encoding

In our Web app we've faced a bad encoding problem. In order to reproduce this problem user selects in browser non unicode encoding(as example in chrome -> More tools->encoding->Koi8) and tries to set Cyrillic text.
Chars were spoiled when it goes to controller (just checked on debug) and even it's stored incorrectly and incorrectly rendered.
We've followed all recommendations: http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html and seems this is a problem with submitting of application/x-www-form-urlencoded encoding content type. Because it's impossible to set charset during such forms submits.
As example if submit the same data using json and set necessry content type everything is stored correctly.
We've also tried example with this article:
http://www.codejava.net/frameworks/spring/spring-mvc-form-handling-tutorial-and-example and added additionally UTF8 filer with the following method:
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException,
ServletException {
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
But the same problem was reproducible.
Could somebody suggest how to resolve mentioned problem?
Is it possible to correctly handle mentioned use case in Spring MVC because we tried on simple example and it seems it doesn't work. Does this use case of changing browser encoding valid at all ?
Try : In web.xml
<filter>
<filter-name>encoding-filter</filter-name>
<filter-class>
org.springframework.web.filter.CharacterEncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encoding-filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
Refer : http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
Also, in view if you use jstl try to set the default encoding
This behavior can be achieved by using accept-charset="UTF-8" attribute in form.
It can be added in spring form tag. Also there's a bug in older versions of struts (1.1 are affected)
https://issues.apache.org/jira/browse/STR-1636
that makes impossible to add this attribute directly to the form. As a workaround jQuery can be used
jQuery( document ).ready(function() {
jQuery("#formSelector").attr("accept-charset", "UTF-8");
});
So in a nutshell this attribute will force browser send data from this form using specified encoding. In case user will set some control characters into the input they will be sent to backend as well so validation is required to prevent such cases. Also cases where the browser encoding and keyboard languages are not working well together will be handled (for example KOI8-U and Chinese keyboard language).
accept-charset official documentation
Somewhere in your request pipeline your overriding the encoding (ie String.getBytes() or new String(bytes) is being called with out the right encoding.
There are so many places that this can potentially happen and its one of the reasons why Spring Boot and various other frameworks force UTF-8 for both input and output particularly since UTF-8 is the recommended encoding.
Your users should not be changing the encoding. In fact when the page loads both the servlet response and the HTML itself should specify UTF-8 and this is for good reason. The server is saying "I speak UTF-8". If you want a different encoding you will have to specify a different encoding in the HTML (ie jsp), and the servlet request/responses so that the browser will then auto select encoding. Even then your mileage will vary for application/x-www-form-urlencoded particularly if you use Javascript (probably because the spec on the encoding in other characters is somewhat ambiguous). To give you some more example of why the HTML has to have the exact same encoding as what your sending over is that the name value pairs will have different encoding. That is your form has UTF-8 request parameter names (because thats whats in the HTML) but when you override it your supplying a different encoding for the parameter values (ie ?UTF-8Name=KoiValue&UTF-8Name=KoiValue). Hopefully you can see why that is bad and I'm not sure chrome is smart enough (or if it even should do it) to change the request parameter names back to Koi8.
Thus if you absolutely must support other character encoding you probably should use multipart/form (you specify this in the enctype attribute on the form element) AND NOT USE the encoding filters that set UTF-8 as those will probably cause corruption.

Spring message in Javascript: cannot display Spanish accent characters properly

I am facing a very general Spring message issue but so far doesn't have a simple solution, so hope everyone here can enlighten me a little bit.
Current Spring MVC application has an issue on properly display Spanish accent characters on javascript alert. The alert message now shows up like this:
Por favor elija la fecha de aplicación
but it is supposed to show up like this:
Por favor elija la fecha de aplicación
above message pops up when user failed the validation, which processed by javascript:
alert("<spring:message code='message_miss_duedate' />");
but if I put the whole string in Spanish into the javascript:
alert("Por favor elija la fecha de aplicación");
the output is fine.
Cause of the issue is obvious: the &Xacute; is generated by method from Spring message to convert Spanish accent characters to HTML friendly codes, which works fine when parsed by html, however, such code is not recognized by javascript.
So far the 'EncodingFilter' is set to UTF-8
<filter>
<filter-name>CharacterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
and same to the pom setting:
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
So, is there anyway to skip the accent character conversion by Spring when using Spring message? Thanks.
UPDATE
The solution is very simple, use htmlEscape="false" in spring:message, by default the value is true hence the escaping characters
so
alert('<spring:message code="message_miss_duedate" htmlEscape="false"/>');
now pop up message looks pretty.
END of UPDATE
Now I have a workaround but not a solution because it has limitation, so I will leave this thread open until there is a general solution:
In the controller using Spring MessageSource object to pass the alert message into uiModel:
Locale currentLocale = LocaleContextHolder.getLocale();
uiModel.addAttribute("message_miss_duedate", messageSource.getMessage("message_miss_duedate", null, currentLocale));
So in javascript we can get the message like normal JSTL variable
alert("${message_miss_duedate}");
But like mentioned this approach as limit because it is difficult to handle runtime responding message with variable, especially with code template.

com.oreilly.servlet.MultipartRequest cant handle big file in post request

I am getting big files from client. I am using MultipartRequest to handle the request.
but its throwing exception
java.io.IOException: Posted content length of 3921442 exceeds limit of 1048576
I tried by adding following code(filter) in web.xml. but its not working
<filter>
<filter-name>multipartFilter</filter-name>
<filter-class>com.oreilly.servlet.MultipartFilter</filter-class>
<init-param>
<param-name>maxSize</param-name>
<param-value>5000000</param-value>
</init-param>
</filter>
Use the request constructor:
public MultipartRequest(HttpServletRequest request,
String saveDirectory,
int maxPostSize)
There you can specify the maxPostSize which is a limit of file size.

url encoded character gets parsed wrongly by webflow/EL/JSF

when I submit the character Ö from a webpage the backend recieves Ã. The webpage is part of a Spring Webflow/JSF1.2/Facelets application. When I inspect the POST with firebug I see:
Content-Type: application/x-www-form-urlencoded
Content-Length: 74
rapport=krediet_aanvragen&fw1=0&fw2=%C3%96ZTEKIN&fw3=0&fw4=0&zoeken=Zoeken
The character Ö is encoded as %C3%96, using this table I can see that it is the correct hexadecimal representation of the UTF-8/Unicode character Ö.
However when it reaches the backend the character is changed into Ã. Using the same table I can see there is some code somewhere that tries to interpret the C3 and the 96 separately (or as unicode \u notation). U+00C3 happens to be Ã, 96 is not a visible character so that explains that.
Now I know this is a typical case of an encoding mismatch, I just don't know where to look to fix this.
The webpage contains
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
When debugging I can see the library responsible for the wrong interpration is jboss-el 2.0.0.GA, which seems right because the value is parsed to the backend in a webflow expression:
<evaluate expression="rapportCriteria.addParameter('fw2', flowScope.fw2)" />
It is put onto the flowScope by:
<evaluate expression="requestParameters.fw2" result="flowScope.fw2"/>
Nevermind the convulated way of getting the form input into the backend, this is code that tries to integrate Webflow with BIRT reports...but I have the same sympton in other webapplications.
Any idea where I have to start looking?
I can see that it is the correct hexadecimal representation of the UTF-8/Unicode character Ö. However when it reaches the backend the character is changed into Ã.
So the client side character encoding to encode the POST body is correct, but the server side character encoding to decode the POST body not. You need to create a Filter which does basically the following in doFilter() method
request.setCharacterEncoding("UTF-8");
and map it on URL pattern of interest. Spring also already provides one out the box, the CharacterEncodingFilter which does basically the above. All you need to do is to add it to the web.xml:
<filter>
<filter-name>characterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>characterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
See also:
Unicode - How to get characters right? - JSP/Servlet requests - POST
The HTML meta header is by the way irrelevant in the issue, it's ignored when the page is served over HTTP. It's the HTTP response header which instructs the webbrowser in what charset it should display the response and to send the params back to the server. This is apparently already been set properly since the POST body is correctly encoded. The HTML meta header is only been used when the user saves the page to local disk and revisits it later from local disk.

Resources