Apache HTTP Client forcing UTF-8 encoding - utf-8

I'm making a rest call using the org.apache.http package as below. I'm expecting user profile details in the response in English and other international languages.
HttpGet req = new HttpGet(baseUrl + uri);
HttpResponse res= closeableHttpClient.execute(req);
The response has UTF-8 as character set, which is what I wanted. From here, I used 2 approaches to unmarshall the response to a map.
Approach-1:
String response = EntityUtils.toString(res.getEntity(),"UTF-8");
// String response = EntityUtils.toString(httpResponse.getEntity(),Charset.forName("UTF-8"));
map = jsonConversionUtil.convertStringtoMap(response);
Issue:
httpResponse.getEntity() was returning StringEntity object which had default charset as ISO_8859_1, but even when I force to convert to UTF-8 (uncommmented line and commented line above, both I tried), I'm not able to override to UTF-8.
Approach-2:
HttpEntity responseEntity = res.getEntity();
if (responseEntity != null ) {
InputStream contentStream = responseEntity.getContent();
if (contentStream != null) {
String response = IOUtils.toString(contentStream, "UTF-8");
map = jsonConversionUtil.convertStringtoMap(response);
}
}
Issue:
IOUtils.toString(contentStream, "UTF-8"); is not setting to UT8.
I am using httpclient 4.3.2 jar & httpcore-4.3.1 jar. Java version used in Java 6. I can't upgrade to a higher java version.
Can you please guide how I can set to UTF-8 format.

If the StringEntity object has an ISO-8859-1 encoding, then the server has returned its response encoded as ISO-8859-1. Your assumption that "the response has UTF-8 as character set" is most likely wrong.
Since it's ISO-8859-1, both your approaches don't work:
Approach 1: The "UTF-8" parameter has no effect as the parameter specifies the default encoding in case the server doesn't specify one (see EntityUtils.toString(). But the server has obviously specified one.
Approach 2: Reading the binary content as UTF-8, which is in fact encoded in ISO-8859-1, will likely result in garbage (though many characters have a similar representation in UTF-8 and ISO-8859-1).
So try to ask the server to return UTF-8:
HttpGet req = new HttpGet(baseUrl + uri);
req.addHeader("Accept", "application/json");
req.addHeader("Accept-Charset", "utf-8");
HttpResponse res = closeableHttpClient.execute(req);
If it disregards the specified characters set and still returns JSON in ISO-8859-1, then it will be unable to use characters outside the ISO-8859-1 range (unless it uses escaping within JSON).

Related

Why is the signature verification not working when the signature is constructed by node-forge?

I have a Nuxt application that needs to retrieve some information from a Spring Boot-based auth service.
Right now I sign a text message on the Nuxt app (the auth server is aware of that text message), using node-forge, and then I send it encrypted and with the signature for verification on the auth service.
The problem is that the auth service keeps telling me that the size of the signature is wrong, with a java.security.SignatureException: Signature length not correct: got 3XX but was expecting 256.
Here is the code generating the encrypted message and signature on the Nuxt side:
var md = forge.md.sha256.create();
md.update("123"); // for example purposes
var sign = pPrivateKey.sign(md);
var digestBytes = md.digest().bytes();
console.log("Signature:", sign );
console.log("Encrypted:", digestBytes);
console.log("Encrypted B64:", Buffer.from(digestBytes).toString("base64"));
var keyAuthB64Url = Buffer.from(digestBytes).toString("base64url");
var signB64Url = Buffer.from(sign).toString("base64url");
var jwt = await axios.get(process.env.URL + "/auth", { params: { encrypted: keyAuthB64Url, signature: signB64Url } });
On the auth service I have the following code:
byte[] messageBytes = Base64.getUrlDecoder().decode(encryptedMessage);
byte[] signatureBytes = Base64.getUrlDecoder().decode(signature);
Signature sign = Signature.getInstance("SHA256withRSA");
sign.initVerify(certPublicKey);
sign.update(messageBytes);
boolean verified = sign.verify(signatureBytes);
if (!verified) {
throw new Exception("Not verified!");
}
From all the debugging I have done, it seems like the Spring Boot app has a problem with the signature generated by node-forge on the Nuxt side, with a signature generated in the Spring Boot app the verification works.
There are several issues:
First, the bug that was already mentioned in the comment: While the NodeJS code does not hash implicitly, the Java side does. Therefore, hashing must not be done explicitly on the Java side:
byte[] messageBytes = "123".getBytes("utf-8");
...
sign.update(messageBytes); // Fix 1: Don't hash
Also, in the NodeJS code, sign() returns the data as a bytes string, which must therefore be imported into a NodeJS buffer as a 'binary':
var keyAuthB64Url = Buffer.from(digestBytes, "binary").toString("base64url"); // Fix 2: Import via 'binary' encoding
Without explicit specification of the encoding, a UTF-8 encoding is performed by default, which irreversibly corrupts the data.
And third, latin1 is implicitly used as encoding when generating the hash in the NodeJS code. Other encodings must be specified explicitly, e.g. for the common UTF-8 with utf8:
md.update("123", "utf8"); // Fix 3: Specify the encoding
For the example data 123 used here, this fix has no effect, which changes as soon as characters with a Unicode value larger than 0x7f are included, e.g. 123§. Note that there is little margin for error in the specification of the encoding, e.g. utf-8 would be ignored (because of the hyphen) and latin1 would be used silently.
With these fixes, verification with the Java code works.

Wrong filename when downloading file whose name contains double quote(") from Springboot server

The code for setting filename for the file to be downloaded :
String originalFileNameDecoded = URLDecoder.decode(originalFileName, "UTF-8");
URI uri = new URI(null, null, originalFileNameDecoded, null);
return ResponseEntity.ok()
.header("Content-Disposition", "attachment; filename=\"" + uri.toASCIIString() + "\"")
.contentLength(resource.contentLength())
.contentType(org.springframework.http.MediaType.APPLICATION_OCTET_STREAM)
.body(resource);
The reason why first decode the filename is because the originalFileName may contains URL encoded characters.
For files with regular names (only number and English letter), it works fine. However, when I try to download a file with name like pic201;9050.814,3"731(copy).png in the browser (chrome on linux), the filename becomes pic201;9050.814,3_731(copy).png.
I used to believe it is the browser behaviour, but I tried it in Edge and the same situation happened again.
So I wonder if there is something wrong with my code or something else happened.

Web Api Request Content Length Double the File Size

I upload a 3.5 MB file from my MVC app. When I go to send that file to my web api endpoint, I notice the request's content-length is double the size of the file (7 MB).
I tested this theory with a 5 MB file and sure enough the content-length when I went to send to the web api was 10 MB.
Below is how I am sending the file to my web api endpoint:
using (HttpClient client = new HttpClient())
{
client.BaseAddress = new Uri(url);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
return await client.PostAsync(requestUri, new StringContent(serializedContent, Encoding.Unicode, "application/json"));
}
I am calling this method from my MVC controller in the POST method. Why does my content-length get doubled?
UPDATE:
I should note that I am using JSON.NET's JsonConvert.SerializeObject method to convert the object that contains the byte array to a string
You're using Encoding.Unicode which uses 16-bit characters by default. If you want to save roughly half the space then use Encoding.UTF8 which uses 8-bit characters by default. Note, characters that can't be expressed in just 8-bits will use multiple bytes.

saveAsNewAPIHadoopFile changing the character encoding to UTF-8

I am trying to save the RDD with ISO-8859-1 charset encoded using saveAsNewAPIHadoopFile to AWS S3 bucket
But its changing the character encoding to UTF-8 when its saved to S3 bucket.
Code snippet
val cell = “ MYCOST £25” //This is in UTF-8 character encoding .
val charset: Charset = Charset.forName(“ISO-8859-1”)
val cellData = cell.padTo(50, “ “).mkString
val iso-data = new String(cellData.getBytes(charset), charset) // here it converts the string from UTF-8 to ISO-8859-1
But when I save the file using saveAsNewAPIHadoopFile then it changes to UTF-8 format.
I think saveAsNewAPIHadoopFile TextOutputFormat automatically converting the file encoding to UTF-8. Is there a way I can save the content to S3 bucket with the same encoding (ISO-8859-1)
ds.rdd.map { record =>
val cellData = record.padTo(50, “ “).mkString
new String(cellData.getBytes(“ISO-8859-1”), “ISO-8859-1”)
}.reduce { _ + _ }
}.mapPartitions { iter =>
val text = new Text()
iter.map { item =>
text.set(item)
(NullWritable.get(), text)
}
}.saveAsNewAPIHadoopFile(“”s3://mybucket/“, classOf[NullWritable], classOf[BytesWritable], classOf[TextOutputFormat[NullWritable, BytesWritable]])
Appreciate your help
I still haven't got the correct answer but as a workaround, I am copying the file to HDFS and converting the file to ISO format using ICONV and saving back to S3 bucket. This is doing the job for me but it requires extra two steps in EMR cluster.
I thought it might be useful to anyone who comes across the same problem

Encoding to UTF-8 files in hadoop

I'm writing a MapReduce program in order to clean some files stored in HDFS, for that i have to encode all files in UTF-8, i tried to encode the Text value in my mapper but i still have errors in my result file.
if(encoding.compareTo("UTF-8")!=0){
final Charset fromCharset = Charset.forName(encoding);
final Charset toCharset = Charset.forName("UTF-8");
String fixed = new String(value.toString().getBytes(fromCharset), toCharset);
result= new String(fixed);
I also custom the LineReader in order to encode the bytes readed into UTF-8 before that it's stored in Text Object.
//buffer contain the data readed in a line of the file
String s = new String(buffer, startPosn, appendLength);
byte ptext[] = Charset.forName("UTF-8").encode(s).array();
str.append(ptext, 0, ptext.length);
Can you help me please !
I found the response:
if(encoding.compareTo("CP1252")==0)
valueInString= new String(value.getBytes(),
0, value.getLength(),
StandardCharsets.ISO_8859_1);
else valueInString=value.toString();

Resources