Camel CXF rsClient : Unable read French characters in UTF-8 - utf-8

Here I am calling a REST webservice using CamelCxf rsClient. The webservice returns a JSON response with encoding : ISO-8859-1. And CamelCxf is trying to read it in that encoding which is changing the French characters in json string.
I wanted to change the charset encoding to UTF-8 to read both English and French characters.
I have tried to read the cxfrs client response as
<convertBodyTo type="String"/>
Which changed the French character è to é
I have also tried to convert the response into byte[] and then created a UTF-8 string from the byte[]
<convertBodyTo type="byte[]"/>
and in the processor
byte[] body =(byte[]) exchange.getIn().getBody();
String convertedString = new String(body, "utf8");
This attempt also failed to read french characters properly.
Response headers from external webservice
Content-Type: application/json
Encoding: ISO-8859-1
How do we make camel Cxf rsClient to ignore the encoding coming with json response and change to
Content-Type: application/json;charset=UTF-8
Encoding: UTF-8
Can we use an interceptor to do so?
Update :
Tried
<convertBodyTo type="java.lang.String" charset=UTF-8"/>
It works in local windows server but NOT in remote redhat Linux servers.

How do we make camel Cxf rsClient to ignore the encoding coming with json response and change to
You can instruct Camel to use a particular charset encoding when performing an HTTP request over CXF by setting the Content-Type exchange property to the desired value as below (given that upstream service would support content negotiation properly and provide the desired encoding:
exchange.getMessage().setHeader(Exchange.CONTENT_TYPE, "application/json;charset=UTF-8");
For sure, you need to find a proper way of injecting this harder into your route given you haven't described how it is all plugged together.
This may resolve your issue if requested resource is sent properly encoded.
Alternative solution
Which changed the French character è to é
This translation goes against what you described in your main topic. Having the è characters translate to é means that the input was encoded initially in UTF-8 then it has been evaluated as being encoded in ISO-8859-1 standard.
As you haven't described how your route declaration, here down a generic solution where you can declare a bean / processor translating your incoming Message body from latin1 to UTF-8 encoded strings:
public class CharsetConverter {
public CharsetConverter() {
}
public String convert(Exchange exchange) {
// given `exchange.getMessage().getBody()` is a `String` encoded in ISO-8859-1
return new String(exchange.getMessage().getBody(String.class).getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);
}
}
Additional notes
It works in local windows server but NOT in remote redhat Linux servers.
First you may need to check the default charset (aka encoding scheme) used in your application as its behavior varies between environments. You can inspect the default Charset used across Camel Exchanges by inspecting the below expression value:
ExchangeHelper.getCharsetName(exchange); // where you would / can intercept different exchanges

Related

Why is Spring de-coding + (the plus character) on application/json get requests? and what should I do about it?

I have a Spring application that receives a request like http://localhost/foo?email=foo+bar#example.com. This triggers a controller that roughly looks like this:
#RestController
#RequestMapping("/foo")
public class FooController extends Controller {
#GetMapping
public void foo(#RequestParam("email") String email) {
System.out.println(email)
}
}
By the time I can access email, it's been converted to foo bar#example.com instead of the original foo+bar#example.com. According to When to encode space to plus (+) or %20? this should only happen in requests where the content is application/x-www-form-urlencoded. My request has a content type of application/json. The full MIME headers of the request look like this:
=== MimeHeaders ===
accept = application/json
content-type = application/json
user-agent = Dashman Configurator/0.0.0-dev
content-length = 0
host = localhost:8080
connection = keep-alive
Why is Spring then decoding the plus as a space? And if this is the way it should work, why isn't it encoding pluses as %2B when making requests?
I found this bug report about it: https://jira.spring.io/browse/SPR-6291 which may imply that this is fixed on version 3.0.5 and I'm using Spring > 5.0.0. It is possible that I may misinterpreting something about the bug report.
I also found this discussion about RestTemplate treatment of these values: https://jira.spring.io/browse/SPR-5516 (my client is using RestTemplate).
So, my questions are, why is Spring doing this? How can I disable it? Should I disable it or should I encode pluses on the client, even if the requests are json?
Just to clarify, I'm not using neither HTML nor JavaScript anywhere here. There's a Spring Rest Controller and the client is Spring's RestTemplate with UriTemplate or UriComponentsBuilder, neither of which encode the plus sign the way Spring decodes it.
Original Answer
You are mixing 2 things, a + in the body of the request would mean a space when header has application/x-www-form-urlencoded. The body or content of the request would be dependent on the headers but a request can just have a url and no headers and no body.
So the encoding of a URI cannot be controlled by any headers as such
See the URL Encoding section in https://en.wikipedia.org/wiki/Query_string
Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document. In HTML forms, the character = is used to separate a name from a value. The URI generic syntax uses URL encoding to deal with this problem, while HTML forms make some additional substitutions rather than applying percent encoding for all such characters. SPACE is encoded as '+' or "%20".[10]
HTML 5 specifies the following transformation for submitting HTML forms with the "get" method to a web server.1 The following is a brief summary of the algorithm:
Characters that cannot be converted to the correct charset are replaced with HTML numeric character references[11]
SPACE is encoded as '+' or '%20'
Letters (A–Z and a–z), numbers (0–9) and the characters '*','-','.' and '_' are left as-is
All other characters are encoded as %HH hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)
The octet corresponding to the tilde ("~") is permitted in query strings by RFC3986 but required to be percent-encoded in HTML forms to "%7E".
The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 3986.
And you can see the same behaviour on google.com as well from below screenshots
Also you can see the same behaviour in other frameworks as well. Below is an example of Python Flask
So what you are seeing is correct, you are just comparing it with a document which refers to the body content of a request and not the URL
Edit-1: 22nd May
After debugging it seems the decoding doesn't even happen in Spring. I happens in package org.apache.tomcat.util.buf; and the UDecoder class
/**
* URLDecode, will modify the source.
* #param mb The URL encoded bytes
* #param query <code>true</code> if this is a query string
* #throws IOException Invalid %xx URL encoding
*/
public void convert( ByteChunk mb, boolean query )
throws IOException
{
int start=mb.getOffset();
And below is where the conversion stuff actually happens
if( buff[ j ] == '+' && query) {
buff[idx]= (byte)' ' ;
} else if( buff[ j ] != '%' ) {
This means that it is an embedded tomcat server which does this translation and spring doesn't even participate in this. There is no config to change this behaviour as seen in the class code. So you have to live with it
SPR-6291 fixed this problem in v3.0.5 but this remains unresolved in some other cases like SPR-11047 is still unresolved. While SPR-6291's priority was Major, SPR-11047's priority is Minor.
I faced this problem when I was working on REST API in old Spring last year. There are multiple ways we can get data in Spring controller. So two of them are via #RequestParam or #PathVariable annotation
As others mentioned I think its spring's internal issue and does not specifically belong to URL encoding because I was sending data over POST request but it is somewhat encoding problem. But I also agree with others as now it remains problematic only in URL.
So there are two solutions I know:
You can use #PathVariable instead of #RequestParam because as of SPR-6291 this plus sign issue is fixed in #PathVariable and still remains open for #RequestParam as SPR-11047
My version of spring was not even accepting plus sign via #PathVariable annotation, so this is how I overcome the problem (I don't remember it step by step but it will give you hint).
In your case you can get the fields via JS and escape the plus sign before sending a request. Something like this:
var email = document.getElementById("emailField").value;
email = email.replace('+', '%2B');
If you have this request:
http://localhost/foo?email=foo+bar#example.com
then the original is foo bar#example.com. If you say the original should be foo+bar#example.com then the request should be:
http://localhost/foo?email=foo%2Bbar#example.com
So Spring is working as supposed to. Maybe on client you should check if the URI is properly encoded. The client-side URL encoding is responsible for building a correct HTTP request.
See encodeURI() if you generate the request in JavaScript or uriToString() if you generate the request in Spring.
Build your request string (the part after ?), without any encoding, with unencoded values like foo+bar#email.com, and only in the end, before actually using it in GET, encode all of it with whatever is available on the client platform. If you want to use POST then you should encode it according to the MIME type of your choice.

What's the best way to set charset used by jetty and jersey?

After some googling, I found that:
method to set charset for jersey:
#Produces(MediaType.TEXT_HTML+"; charset=utf-8")
That means I have to add "; charset=utf-8" to every restful method. Is there a better way?
method to set charset for jetty:
modify encoding.properties in jar which in default is:
text/html = ISO-8859-1
text/plain = ISO-8859-1
text/xml = UTF-8
text/json = UTF-8
But, I don't think It's a good way to modify jars.
Actually, I want to use UTF-8 only to avoid annoying garbled. What is the best way to archive this?
If you don't want to add the charset in the "#Produces" tag everywhere, you can add it in response filter for all the responses.
response.getHttpHeaders().putSingle("Content-Type", contentType.toString() + ";charset=UTF-8");
Check this link Jersey / Rest default character encoding

How to send message hl7 to Mirth HTTP connector using POST

I have a mirth instance (version 3.0.1) sending out using a POST method to a web api restfull service.
[POST("MessagesHl7/OML_O21")] public HttpResponseMessage
PostOmlo21([FromBody] sting receivedmessage) {..}
The problem is that the message hl7 that is sent to the service
in a outbound message is cut in the first characters. For example, in the message:
MSH|^~\&|CPSI^1.3.6.1.4.1.27248.1.17^ ISO|CGH|...
in the receivedmessage variable the text MSH|^~\ is received only.
How can I do in order that the message is not cut?
In the http channel, the configuration is:POST, not query parameters,
in
headers content-type application/x-www-form-urlencoded,
Content-Type value application/xml,
and the value that send is =${message.encodedData}.
Change your action method to not use binding and just read the request body as string.
[POST("MessagesHl7/OML_O21")]
public Task<HttpResponseMessage> PostOmlo21()
{
string receivedMessage = await Request.Content.ReadAsStringAsync();
}
I would suggest to use Base64 encoding for the HL7 piped message since
there are many special characters within the message which can be interpreted
in the wrong way during parsing. Especially during the parsing of xml.
Of course you have to decode the HL7 message on Server side.
But i think Mirth gives you all functionallity to do that.
I don't know which class to use in C#/ASP in Java appropriate classes and frameworks for
encoding an decoding Base64 exist. I believe the same is true for C# and ASP.

URL Form Encoded special German Characters between AngularJS and Jackson Backend

I have an AngularJS Frontend and a Spring MVC Backend with Jackson to take care of the Serialization and JS<->Java conversion
When i pass German characters like "ö, ä, ü, ß" to my backend via http body payload, there is no problem. I have the header "Content-Type" "application/json;charset=UTF-8" and all works fine.
But if i have those characters in my url angular encodes them. This is fine however it encodes them a different way that jackson tries to decode i believe.
Here is what Angular makes out of "höhe": h%C3%B6he
I believe Jackson expects: h%f6he
I think this is because UTF8 is 2 byte while ASCII is 1 byte encoding. However is there a setting for either Jackson or Angular to "speak the same encoding language"?
Thanks for any help!
Kind regards,
Pascal
Jackson does not handle URL decoding, as it requires an input source such as InputStream or String: it is most likely that the Servlet container (Jetty?) that service runs on handles this. One problem is that definition of which encoding URL should use is... well, poorly defined really: "Content-Type" does NOT define this (it's just for payload).
So you need to figure out how to make servlet container and client have shared understanding of what encoding is to be used (difference in your case looks like UTF-8 vs Latin-1).
Or: if you can make client escape all non-ASCII characters with JSON escape sequences, that will also work.

Spring REST URL Encoding Scheme: %20 or + Which one?

I made a Spring REST application where you can perform CRUD operations based on HTTP methods of POST, PUT, GET, DELETE. I have the typical URI template of
http://host/root/{id}/{name}/{address} and etc.
We have a client who is accessing this REST service. Apparently they are sending parameters for multi-word name and address in the following form:
http://host/root/11/John+Smith/10+Las+Vegas+USA
They are using the HTML encoding scheme based on application/x-www-form-urlencoded type. According to the article in Wikipedia
The application/x-www-form-urlencoded
type
The encoding used by default is based
on a very early version of the general
URI percent-encoding rules, with a
number of modifications such as
newline normalization and replacing
spaces with "+" instead of "%20". -
http://en.wikipedia.org/wiki/Percent-encoding
However it appears the standard URL encoding scheme is to use %20 in replacing spaces in URI templates. Which one is correct?
My Spring REST automatically converts %20 to spaces. It's interpreted correctly. I'm using Spring 3.0.4. When + is met by my REST service, it's accepted as is. Of course when I put validation to exclude +, it is indeed excluded as expected.
Am I within standards or are there such double standards? Or is the client using an ancient scheme?
The point is that application/x-www-form-urlencoded can be used only in request parameters, whereas percent encoding is also supported in a path.
So,
http://host/root/11/?name=John+Smith&address=10+Las+Vegas+USA
is fine and will be properly decoded by Spring MVC, but
http://host/root/11/John+Smith/10+Las+Vegas+USA
is wrong and Spring MVC doesn't decode it, because the following form should be used instead:
http://host/root/11/John%20Smith/10%20Las%20Vegas%20USA

Resources