HtmlUnit: Encoding for Chinese Website - utf-8

I expect this is pretty basic:
When downloading pages from a Chinese website, all Chinese characters appear as "?" in the saved file (viw java NIO Files.write).
I know the Chinese webpage is retrieved as UTF-8 (page.getPageEncoding() returns "UTF-8"), but something goes wrong in my saving of the webpage.
My code is as follows:
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setTimeout(15000);
final HtmlPage page = webClient.getPage(urlNow);
pageAsXml = page.asXml();
NioLog.getLogger().debug(page.getPageEncoding());
Files.write(Paths.get(outputPath + File.separator + fileNameTruncated + TXT), pageAsXml.getBytes());

The answer is as follows:
barrayXml = page.asXml().getBytes(Charset.forName("UTF-8"));
Files.write(Paths.get(outputPath + File.separator + fileNameTruncated + TXT), barrayXml );

Related

Request signature failing for Alibaba Cloud API call

I tried creating a method in Postman and got really close but am having issues with the signature. We are trying to query the IP ranges for VPCs to add to a WAF rule, in order to allow traffic to a secure application.
Postman allows a pre-request script, in Javascript, and supports a handful of included JS libraries, including CryptoJS.
The code here creates exactly the request that Ali Cloud says needs to be signed. It signs with HMAC-SHA1 from CryptoJS and encodes to base 64.
All of the variables are included in the request parameters. I'm not sure what else it could be complaining about.
var dateIso = new Date().toISOString();
var randomString = function(length) {
var text = "";
var possible = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
for(var i = 0; i < length; i++) {
text += possible.charAt(Math.floor(Math.random() * possible.length));
}
return text;
}
var accesskeyid = "LTAI4GC7VEijsm5bV3zwcZxZ"
var action = "DescribePublicIPAddress"
var format = "XML"
var regionid = "cn-shanghai-eu13-a01"
var signaturemethod = "HMAC-SHA1"
var signaturenonce = randomString(16)
var signatureversion = "1.0"
var timestamp = dateIso.replace(/:/gi, "%253A")
var version = "2016-04-28"
pm.environment.set("AccessKeyId", accesskeyid)
pm.environment.set("Action", action)
pm.environment.set("Format", format)
pm.environment.set("RegionID", regionid)
pm.environment.set("SignatureMethod", signaturemethod)
pm.environment.set("SignatureNonce", signaturenonce)
pm.environment.set("SignatureVersion", signatureversion)
pm.environment.set("Timestamp", dateIso)
pm.environment.set("Version", version)
var request = "GET&%2F&" + "AccessKeyID%3D" + accesskeyid + "%26Action%3D" + action + "%26Format%3D" + format + "%26RegionID%3D" + regionid + "%26SignatureMethod%3D" + signaturemethod + "%26SignatureNonce%3D" + signaturenonce + "%26SignatureVersion%3D" + signatureversion + "%26Timestamp%3D" + timestamp + "%26Version%3D" + version
pm.environment.set("Request", request)
var hash = CryptoJS.HmacSHA1(request, "spwH5dNeEm4t4dlpqvYWVGgf7aEAxB&")
var base64 = CryptoJS.enc.Base64.stringify(hash)
var encodesig = encodeURIComponent(base64)
pm.environment.set("Signature", encodesig);
console.log(base64)
console.log(request)
The console output shows:
Signature: XbVi12iApzZ0rRgJLBv0ytJJ0LY=
Parameter string to be signed:
GET&%2F&AccessKeyID%3DLTAI4GC7VEijsm5bV3zwcZvC%26Action%3DDescribePublicIPAddress%26Format%3DXML%26RegionID%3Dcn-shanghai-eu13-a01%26SignatureMethod%3DHMAC-SHA1%26SignatureNonce%3DiP1QJtbasQNSOxVY%26SignatureVersion%3D1.0%26Timestamp%3D2020-06-01T15%253A38%253A12.266Z%26Version%3D2016-04-28
Request sent:
GET https://vpc.aliyuncs.com/?AccessKeyID=LTAI4GC7VEijsm5bV3zwcZvC&Action=DescribePublicIPAddress&Format=XML&RegionID=cn-shanghai-eu13-a01&SignatureMethod=HMAC-SHA1&SignatureNonce=iP1QJtbasQNSOxVY&SignatureVersion=1.0&Timestamp=2020-06-01T15:38:12.266Z&Version=2016-04-28&Signature=XbVi12iApzZ0rRgJLBv0ytJJ0LY%3D
Response Received:
<?xml version='1.0' encoding='UTF-8'?><Error><RequestId>B16D216F-56ED-4D16-9CEC-633C303F2B61</RequestId><HostId>vpc.aliyuncs.com</HostId><Code>IncompleteSignature</Code><Message>The request signature does not conform to Aliyun standards. server string to sign is:GET&%2F&AccessKeyID%3DLTAI4GC7VEijsm5bV3zwcZvC%26Action%3DDescribePublicIPAddress%26Format%3DXML%26RegionID%3Dcn-shanghai-eu13-a01%26SignatureMethod%3DHMAC-SHA1%26SignatureNonce%3DiP1QJtbasQNSOxVY%26SignatureVersion%3D1.0%26Timestamp%3D2020-06-01T15%253A38%253A12.266Z%26Version%3D2016-04-28</Message><Recommend><![CDATA[https://error-center.aliyun.com/status/search?Keyword=IncompleteSignature&source=PopGw]]></Recommend></Error>
When I check the "server string to sign" from the response and the parameter string that was signed in a compare, they are identical.
It looks like everything is built as needed but the signature is still barking. Guessing I missed something simple but haven't found it yet.
Note: The accesskeyID and key posted are for example purposes and not a real account so this code will not copy and paste to execute in Postman.
PS - I learned quite a bit from the other few threads on this topic, which is how I got to this point. akostadinov was super helpful on another thread.
I believe you have double encoded &. I have implemented other Alibaba Cloud REST APIs successfully. Could you please check this.
Following is the expected string to sign format:
GET&%2F&AccessKeyId%3Dtestid&Action%3DDescribeVpcs&Format%3DXML&
SignatureMethod%3DHMAC-SHA1&SignatureNonce%3D3ee8c1b8-83d3-44af-a94f-4e0ad82fd6cf&SignatureVersion%3D1.0&TimeStamp%3D2016-02-23T12%253A46%
253A24Z&Version%3D2014-05-15
A bit late to the party, but as this is the first result when googling for the IncompleteSignature error, I thought I might comment and hopefully save someone else the grief I have been through.
For me, the subtle detail that I missed in the official documentation here is that the key used for the signature requires an ampersand & to be added to the end, before being used.
As soon as I caught that, everything else worked perfectly.

Wrong filename when downloading file whose name contains double quote(") from Springboot server

The code for setting filename for the file to be downloaded :
String originalFileNameDecoded = URLDecoder.decode(originalFileName, "UTF-8");
URI uri = new URI(null, null, originalFileNameDecoded, null);
return ResponseEntity.ok()
.header("Content-Disposition", "attachment; filename=\"" + uri.toASCIIString() + "\"")
.contentLength(resource.contentLength())
.contentType(org.springframework.http.MediaType.APPLICATION_OCTET_STREAM)
.body(resource);
The reason why first decode the filename is because the originalFileName may contains URL encoded characters.
For files with regular names (only number and English letter), it works fine. However, when I try to download a file with name like pic201;9050.814,3"731(copy).png in the browser (chrome on linux), the filename becomes pic201;9050.814,3_731(copy).png.
I used to believe it is the browser behaviour, but I tried it in Edge and the same situation happened again.
So I wonder if there is something wrong with my code or something else happened.

UpdateListItems method from SharePoint Lists Web Service fails with a Soap Server Exception

I'm developing a MS Office 2010 Excel AddIn from a client machine which doesn't have SharePoint installed in it. I imported a Lists web service reference from a remote SharePoint server. I developed a wpf user control which can load data from the list and show it in the excel worksheet. It works perfectly. Then I extend my client application to update list items in the server. So I tried to update list items in the server with UpdateListItems method using the web service reference.
But it failed with an exception "Soap Server Exception.". I can't figure out what's wrong here as I can import data without any problem. Following is my code block.
SPListsWS.Lists myListUpdateProxy = new SPListsWS.Lists();
myListUpdateProxy.Credentials = CredentialCache.DefaultCredentials;
myListUpdateProxy.Url = "http://uvo1y1focm66gonf7gw.env.cloudshare.com/_vti_bin/Lists.asmx";
XmlNode listView = myListUpdateProxy.GetListAndView("Products", "");
string listID = listView.ChildNodes[0].Attributes["Name"].Value;
string viewID = listView.ChildNodes[1].Attributes["Name"].Value;
XmlDocument Xdoc = new XmlDocument();
XmlElement updateElement = Xdoc.CreateElement("updateElement");
updateElement.SetAttribute("OnError", "Continue");
updateElement.SetAttribute("ListVersion", "1");
updateElement.SetAttribute("ViewName", viewID);
updateElement.InnerXml = "<Method ID='1' Cmd='Update'>"
+ "<Field Name = 'ID'>" + index + "</Field>"
+ "<Field Name = 'Title'>" + prodTitle + "</Field>"
+ "<Field Name = 'Product_SKU'>" + prodSKU + "</Field>"
+ "<Field Name = 'Product_Price'>" + prodPrice + "</Field>"
+ "</Method>";
XmlNode responseXml = myListUpdateProxy.UpdateListItems("Products", updateElement);
MessageBox.Show(responseXml.OuterXml);
To update items you should use UpdateListItems instead of GetListItems. Also, when using UpdateListItems, wrap your <Metdod> tags in <Batch> elements. This would be in place of your updateElement. See if that works, and, if not, please include the responseText of the actual error message along with what version of SharePoint you are using.

Exporting a CSV file from MVC3 - Internet Explorer Issue

I'm trying to create a CSV export for some data I have. Seems simple enough, and works beautifully in Firefox and Chrome, but in Internet Explorer I just get a message saying the file could not be downloaded. No other error messages, no break in Visual Studio, no debugging information that I can find.
Here's my code. Perhaps I'm doing something wrong?
public ActionResult ExportStudentsCSV(IEnumerable<Student> students) {
MemoryStream output = new MemoryStream();
StreamWriter writer = new StreamWriter(output, System.Text.Encoding.UTF8);
writer.WriteLine("Username,Year Level,School Name,State,Date Joined");
foreach (Student student in students) {
writer.WriteLine(
"\"" + student.username
+ "\",\"" + student.year_level
+ "\",\"" + student.SchoolName
+ "\",\"" + student.state
+ "\",\"" + student.join_date
+ "\""
);
}
writer.Flush();
output.Seek(0, SeekOrigin.Begin);
return File(output, "text/csv", "Students_" + DateTime.Now.ToShortDateString().Replace('/', '-') + ".csv");
}
And I'm calling this function in my controller with:
return ExportStudentsCSV(model.StudentReport.StudentList);
You may need to add a Content-Disposition header.
In your ExportStudentsCSV function, before returning:
var cd = new System.Net.Mime.ContentDisposition();
cd.FileName = "filename.csv";
Response.AddHeader("Content-Disposition", cd.ToString());
Or if you'd rather be brief about it (equivalent to above):
Response.AddHeader("Content-Disposition", "attachment;filename=filename.csv");
It may seem dodgy to be answering my own question, but I thought my experience may help someone. I did some more digging and found a completely alternate way of doing this using DataTables and a specific CsvActionResult which inherits from FileResult.
See this gist: https://gist.github.com/777376
Probably has something to do with the Content-Type/Content-Dispositon because IE follows standards when it wants to.
Check out ASP MVC3 FileResult with accents + IE8 - bugged?

Chrome, pdf display, Duplicate headers received from the server

I have a section on a website where I display a pdf inside a light box. The recent chrome upgrade has broken this displaying:
Error 349 (net::ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION):
Multiple Content-Disposition headers received. This is disallowed to
protect against HTTP response-splitting attacks.
This still works correctly in IE.
I'm using ASP.NET MVC3 on IIS6
The code I use to generate the file is as follows.
If I remove the inline statement then the file downloads, however that breaks the lightbox functionality.
Problem Code
public FileResult PrintServices()
{
//... unrelated code removed
MemoryStream memoryStream = new MemoryStream();
pdfRenderer.PdfDocument.Save(memoryStream);
string filename = "ServicesSummary.pdf";
Response.AppendHeader("Content-Disposition", "inline;");
return File(memoryStream.ToArray(), "application/pdf", filename);
}
The Fix
Remove
Response.AppendHeader("Content-Disposition", "inline;");
Then Change
return File(memoryStream.ToArray(), "application/pdf", filename);
to
return File(memoryStream.ToArray(), "application/pdf");
The solution above is fine if you don't need to specify the filename, but we wanted to keep the filename default specified for the user.
Our solution ended up being the filename itself as it contained some commas. I did a replace on the commas with "" and the file now delivers the document as expected in Chrome.
FileName = FileName.Replace(",", "")
Response.ContentType = "application/pdf"
Response.AddHeader("content-disposition", "attachment; filename=" & FileName)
Response.BinaryWrite(myPDF)
I used #roryok's comment, wrapping the filename in quotes:
Response.AddHeader("content-disposition", "attachment; filename=\"" + FileName + "\"")
#a coder's answer of using single quotes did not work as expected in IE. The file downloaded with the single quotes still in the name.
Had this problem today. Per roryok and others, the solution was to put the filename in quotes.
Previous, Chrome FAIL:
header("Content-Disposition: attachment; filename=$file");
Current, Chrome OK:
header("Content-Disposition: attachment; filename='$file'");
Note the quotes around $file.
I was having the same issue and fixed it by just removing the file name from the return statement.
Change:
return File(outStream.ToArray(), "application/pdf", "Certificate.pdf");
to:
return File(outStream.ToArray(), "application/pdf");
And KEPT the:
Response.AddHeader("content-disposition", "attachment;filename=\"" + "Certificate.pdf" + "\"");
This still keeps the name for the downloaded file.
to fix this for any file type with a custom file name remove the (or similar headers)
Response.AppendHeader("Content-Disposition", "inline;");
and add
string fileName = "myfile.xlsx"
return File(fileStream, System.Web.MimeMapping.GetMimeMapping(Path.GetFileName(filePath)), fileName);
you can also use the filepath instead of a stream in the first parameter
My issue was due to the double quote as shown below:
var encoding = System.Text.Encoding.UTF8;
*Response.AddHeader("Content-Disposition", string.Format("attachment; filename=**\"{0}\"**", HttpUtility.UrlEncode(file, encoding)));*
Changing the above to this worked!
*Response.AddHeader("Content-Disposition", string.Format("attachment; filename=**{0}**", HttpUtility.UrlEncode(file, encoding)));*
This solution will preserve filename AND open file in browser (.net mvc)
Response.Headers["Content-Disposition"] = "inline;filename=\"" + theFile.filename + "\"";
return File(filePath, mimeType);//don't specify filename. It will create second Content-Disposition header

Resources