Spring RestTemplate is not preserving accent characters - spring

I have been trying to use accent characters in URL to call SOLR.
My Url looks like this:
"http://host:8983/solr/principal/select?q=**name:%22Michaël.e%22**"
When fire the URL from browser I get the correct result but when try from RestTempalte.exchange(URI,HttpMethod.GET, entity, String.class)
The log I see on SOLR is showing the accent characters being coverted to "?" as shown below
q=(name:"Micha?.e")
I have set RestTemple request charSet to "UTF-8" it still does the same.
My SOLR is running on Jetty.

You can try to encode HTML characters before calling RestTemplate using URLEncoder
String baseUri = "http://host:8983/solr/principal/select?q=**name:%22";
// TODO get name from somewhere
String name = "Michaël.e";
String encodedName= = URLEncoder.encode(name, "UTF-8");
RestTempalte.exchange(baseUri + encodedName + "%22**",HttpMethod.GET, entity, String.class);

Related

Spring Web Client response with polish characters

I'm using Spring WebClient for getting html. The response contains polish characters such as: ą, ę, ż and so on.
After calling service i expect the response to look like this: <div>plan zajęć</div>
But the actual response looks like this: <div>plan zaj�ć</div> - and this sign replaces all polish characters.
Here's a WebClient bean config:
#Bean
WebClient webClient() {
return WebClient.builder()
.build();
}
And here's how i use it:
Optional<String> resp = webClient.get()
.uri(uri)
.retrieve()
.bodyToMono(String.class)
.blockOptional();
And here's a link to page that i'm trying to web scrape: https://plan.polsl.pl/plan.php?winW=1000&winH=1000&type=0&id=343126158
I've no idea what to change in the WebClient configuration to get the desired effect, so I'm asking for help.
Please show how you use WebClient. I don't know Polish character but very likely your problem is related to the encoding of the response.
You can try to specify the charset to UTF_8 and see if that helps
WebClient webClient = WebClient.create();
Mono<String> response = webClient.get()
.uri(uri)
.acceptCharset(StandardCharsets.UTF_8)
.retrieve()
.bodyToMono(String.class);
String responseString = response.block();
== Updated 1/2/2023 ==
Note that Java String is using UTF-8 encoding. That's why we attempted to request the web server to return us a document in UTF-8 encoding. Unfortunately, the web server that you specified above returns ISO-8859-2 charset even though WebClient is requesting to return UTF-8 charset. You will have to transcode the response body from ISO-8859-2 to UTF-8 charset yourself. Here is the sample code to do that. I tested it with your web server.
WebClient webClient = WebClient.create();
Mono<ByteArrayResource> responseBody = webClient.get()
.uri(uri)
.retrieve()
.bodyToMono(ByteArrayResource.class);
String responseString = new String(responseBody.block().getByteArray(), Charset.forName("ISO-8859-2"));
If you are building a generic web crawler, instead of hardcoding the above code to always transcode from ISO-8859-2 to UTF-8, you will need to get the charset information from the Content-Type header. Most of the web server would tell you the media type as well as the charset encoding in Content-Type. Then, instead of hardcoding ISO-8859-2 in the above code, you can specify the correct charset. Here is the sample code to find the charset.
WebClient webClient = WebClient.create();
Mono<ClientResponse> response = webClient
.get()
.uri("http://example.com")
.exchange();
response.map(res -> {
String contentType = res.headers().contentType().get().toString();
String charset = null;
// parse the Content-Type header to extract the charset
Matcher m = Pattern.compile("charset=([^;]+)").matcher(contentType);
if (m.find()) {
charset = m.group(1);
}
return charset;
});
Unfortunately, the web server that you specified didn't tell you the charset in Content-Type header either. In this case, you may need to look elsewhere in the response to determine the character encoding.
One place you can check is the charset attribute of the element in the HTML document. Some web servers include a element in the HTML document with a charset attribute that specifies the character encoding of the document. This is how I found out your specified document is using ISO-8859-2 charset.
WebClient doesn't have an easy way to extract the charset information from tag but you can use regular expression to extract that. Here is the sample code
WebClient webClient = WebClient.create();
Mono<String> responseBody = webClient
.get()
.uri("http://example.com")
.retrieve()
.bodyToMono(String.class);
responseBody.map(html -> {
String charset = null;
// use a regular expression to extract the charset attribute from the <meta> element
Matcher m = Pattern.compile("<meta[^>]+charset=[\"']?([^\"'>]+)[\"']?").matcher(html);
if (m.find()) {
charset = m.group(1);
}
return charset;
});

Postman correctly url encodes ø but RestTemplate will not

I have a url, that has ø in it. For example this:
https://server/ø
When I make the GET call in postman, the url is converted correctly into
https://server/%C3%B8
But when I use Java code like this:
UriComponentsBuilder builder = UriComponentsBuilder.fromUri(new URI(url));
String out = restTemplate.getForObject(builder.toUriString(), String.class);
It correctly turns the url into
https://server/%25C3%25B8

Spring RestTemplate API query parameter encoding for doing a GET HTTP Request

The url-string contains a back-slash character that needs to be encoded. The url string is as follows.
String folder = "\\Foo\\Bar\\"; // some folder search path.
String urlString= "http://localhost:8081/certificates/?mypath=%5CFoo%5CBar%5C" // (after encoding)
Here I use Spring RestTemplate to do a GET request. I setup a mock-server to examine the request in detail (mock server setup using Mulesoft, if u must know!).
ResponseEntity<String> responseEntity = api.exchange(urlString, HttpMethod.GET, new HttpEntity<>(new HttpHeaders()), String.class);
Here I use plain vanilla Java URLConnection to perform the request. Attached image with detailed request snapshot.
// 2. Plain vanilla java URLConnection. "result.toString()" has certificate match.
StringBuilder result = new StringBuilder();
URL url = new URL(urlString);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("X-Venafi-Api-Key", apiKey);
conn.setRequestMethod("GET");
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
result.append(line);
}
rd.close();
System.out.println(result.toString());
In the images, you can see that the queryString value is different for these two requests. One of them shows \\ while the other shows %5C, although the parsed parameter value for myPath is still the same.
I am having to deal with an api that seems to work if-and-only-if the queryString looks like the former (i.e. "\\"). Why does the parsed queryString for Spring show "%5C" while this value shows double-backslash for requests originating from plain Java, curl, and even a simple browser?
What baffles me EVEN more, is that just about everything about the two HTTP Requests are IDENTICAL! And yet, why does the queryString/requestUri parse differently for these two requests? Shouldn't it be that a HTTP GET method is completely defined by its header contents and the requestUri? What am I missing to capture in these two GET requests?
Lots of questions. Spent an entire day, but at least I could verify that the way the requestUri/queryString is parsed seems to align with how the remote api-server responds.
Thanks.
Did some digging around the following morning. Turn out, with
ResponseEntity<String> responseEntity = api.exchange(urlString, HttpMethod.GET, new HttpEntity<>(new HttpHeaders()), String.class);
You should NOT have the "urlString" already encoded. The 'exchange' method does that encoding for you under-the-hood.

Spring MVC Url Why do I get a 404 when I encode a linefeed in the url

I am sending an url with certain parameters to my controller, which works generally fine. I am using javascript function encodeURI() to encode the parameter.
But as soon, as there is a linefeed, I receive a 404 error.
This is a working url:
http://localhost:8080/Weasy/virtualtable/execQuery/46/select%20*%20from%20payment
This is a non-working url:
http://localhost:8080/Weasy/virtualtable/execQuery/46/select%20*%20%0Afrom%20payment
And this is my controller method:
#RequestMapping("execQuery/{schema_id}/{query}")
public ModelAndView execQuery(
#PathVariable("schema_id") Integer schemaId
, #PathVariable("query") String query) throws Exception {
SrcSchema schema = this.srcschemaService.getRowById(schemaId);
ModelAndView mav = new ModelAndView("virtualtable/form");
mav.addObject("schema", schema);
mav.addObject("query", query);
try {
int limit = 10;
List<Map<String, Object>> rows = jdbcService.executeQuery(schema.getConnection(), query, limit);
mav.addObject("rows", rows);
mav.addObject("message", "<span class='msg-info'>Result Set reduced to "+limit+" rows</span>");
} catch (Exception ex) {
logger.error("Error executing sql", ex);
mav.addObject("message", "<span class='msg-error'>"+ex.getMessage()+"</span>");
}
return mav;
}
Why does it not work?
I do not think it is your app. I guess it is your web server blocking the request as you are using an odd character. Apache for example denies the access to urls containing %2F (/), %5F (\) or %00 (NULL). As a rule, the ASCII characters between %00 and %1F, named control characters, should not be present at urls, and %0A is one of them.
My advice is you should parse your query and get rid of, not only %0A but also any problematic character, before doing the request.
If you still want to make it works I think you need to include a rewriterule in your .htaccess (I guess you are using Apache), and use a regular expression to remove the line feed and redirect to the same url without that character.
Apache URL Rewriting Guide

Rest query params in Java

I need to do some queries against my datastore in Java but I can't seem to get the parameters syntax right. I tried like this:
String params = "?Active=1";
String urlString = "https://api.parse.com/1/classes/Cars" + params;
Or as per the document here:
String params = "where={Active:1}";
But both ways generate an exception.
If I don't do the query and simply try to get all the objects with this request string:
String urlString = "https://api.parse.com/1/classes/Cars"
everything works fine. So the problem is definitely the params sequence. So is there a way to do Prase.com rest queries in Java?
EDIT: adding the exception string in response to a request from the first comment:
java.io.IOException: Server returned HTTP response code: 400 for URL: https://api.parse.com/1/classes/Cars?where={Active:1}
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1838)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
I should also note that when I use the regular http syntax, as in
params = "?Dealer=asdf";
the query comes back with all the objects, as if the parameter wasn't there.
Here are a couple of working examples for the params string:
String params = "where={\"objectId\":\"ldl49l3kd98\"}";
String params = "where={\"CompanyName\":\"BMW\", \"Price\":{\"$gte\":29000,\"$lte\":49000}}";
And if you need non English characters, like I do, encode the param string like this:
params = URLEncoder.encode(params, "UTF-8");

Resources