Detecting querystring encoding in classic ASP - utf-8

I'm having an encoding problem I'm unsure of how to handle. We have a link that we give out (email and other means), that contains non-ascii characters (specifically æ,ø,å and some more). The problem is that depending on what browser the user opens said link in, the querystring we receive is encoded differently.
In most browsers it's encoded as UTF8 like this:
%C3%A5 = å
However, in Internet Explorer it's encoded like ISO-8859-1 like this:
%E5 = å (or it's just sent as å without querystring-encoding at all).
The problem is that if I read the querystring in chrome I get Ã¥, and not å, while in IE I do get å, so I need some way of either making sure the server takes into hand the encoding or that the client always sends in a specific encoding, and as said, the problem is that I don't have control over where the link is used.
Any help would be appreciated.

Try to embed this in the top of your asp.
Response.ContentType = "text/html"
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8"
This should force all browsers to use utf-8

Related

Why it is possible to show a base64 encoded PNG with an "image/jpeg" data URL?

Here's as an example of a base64 encoded PNG data URL (using "image/png" data type, of course):
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAASCAYAAABWzo5XAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAAZdEVYdFNvZnR3YXJlAHBhaW50Lm5ldCA0LjAuMTczbp9jAAAA1UlEQVQ4T62SsQ0CMQxFUzAEA1BQMgIlI1zJCIxAT0HJCBSUlFdQMAQlJWOE78hnObYDIuJLTzr/+/5FkpRz/guh2UM9pLQBWzDXvgVacG4lngncAH08QFgGLcEL0LAX34ROHCBcGaRLiEH+meAMXDhESBlkS3bVrh6KEZetQbOk7FmjmL5M40rKTmQSEJVdeXlCDtcSmgRkz4Ro32Zo+pK7+g7LqqEYjduBjsrzT6Mavl3xhzIJcXDkEBHfTl3WfNln8ARhyQR0sDkX6iU0ewjN38npDdYczGIKuRnZAAAAAElFTkSuQmCC
I noticed that (in Firefox and Chrome) things work even if data type is set to "image/jpeg" (and leaving all the rest untouched) like this:
data:image/jpeg;base64,iVBORw0KGgoAAAANSUhEUgAAABIAAAASCAYAAABWzo5XAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAAZdEVYdFNvZnR3YXJlAHBhaW50Lm5ldCA0LjAuMTczbp9jAAAA1UlEQVQ4T62SsQ0CMQxFUzAEA1BQMgIlI1zJCIxAT0HJCBSUlFdQMAQlJWOE78hnObYDIuJLTzr/+/5FkpRz/guh2UM9pLQBWzDXvgVacG4lngncAH08QFgGLcEL0LAX34ROHCBcGaRLiEH+meAMXDhESBlkS3bVrh6KEZetQbOk7FmjmL5M40rKTmQSEJVdeXlCDtcSmgRkz4Ro32Zo+pK7+g7LqqEYjduBjsrzT6Mavl3xhzIJcXDkEBHfTl3WfNln8ARhyQR0sDkX6iU0ewjN38npDdYczGIKuRnZAAAAAElFTkSuQmCC
But... Why?
They're both using the image handling subsystem, which ignores the mime type and just goes with the actual format of the image.
Specifically, most browsers will translate the viewing of an image into the viewing of an HTML webpage with an <img> tag in them. Since servers lie and browsers are supposed to be able to show even badly-configured websites, the part of the browser that deals with images will in most cases completely ignore any extensions or MIME types. There was no point programming in an exception for data: URIs.

Convert binary data from URL to an image file

I read several articles on StackOverflow, but none of them seems to work in my case so here is the situation.
I have a webpage that is not under my control. It contains an image that is referenced in the markup as something like <img src="getimage.asp?pic=4c54aae0ea..." />. Given the URL of that image, I would like to download it, save it to disk and do something with it.
When I enter the URL directly in my browser I get a binary stream. This is the first load of characters.
ÿØÿàJFIFHHÿþLEAD Technologies Inc. V1.01ÿÛ„ÿÄ¢ }!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚáâãäåæçèéêñòóôõö÷øùúw!1AQaq"2B‘¡±Á #
How do I convert that data to an image using e.g. C# or any other language. Since I do not control the page I have no idea of how the data is encoded - so can I still decode it?
As can be seen from the first couple of characters, the string "LEAD Technologies Inc." is included in the data so I guess its not all image data. But at least, Chrome obviously knows how to decode it. A quick Google check reveals that "LEAD technologies" is an imaging SDK, but their website doesn't seem to offer much information about it's use and Im also not proficient in image manipulation. Any ideas would be appreciated.
The first couple of characters indicate that the response is probably an jpeg file interpreted as ASCII text. I guess the Content-Type header in the HTTP response has the wrong value, probably something like text/plain or text/html instead of image\jpeg. This makes Chrome display the image as plain text.
I don't think you have to convert the data. Just save the response stream to a file and you will have a proper jpeg file:
string url = "http://my-domain/getimage.asp?pic=4c54aae0ea...";
string fileLocation = #"C:\MyImage.jpg";
var client = new WebClient();
client.DownloadFile(url, fileLocation);
The reason I think that the response is probably jpeg, is that a jpeg file begins with 0xFFD8FFE0 which looks like ÿØÿà when displayed as ISO 8859-1 encoded text.

UTF-8 character encoding not working in Firefox [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
UTF-8 issue in Firefox - response header overriding meta tag?
I have a jPlayer playlist that works fine in all browsers except Firefox.
The issue is with non-standard characters, i.e. characters with accents or asian characters. I have set up a demo playlist so that you can see here.
When I enter the characters in UTF-8 form (track 1 in the playlist) they work on all browsers except for Firefox, and when I enter them in ISO Latin 1 (track 2 in the playlist) they work in Firefox but no other browsers.
So, for instance in Firefox 大å°æ¸æ¿.mp3 works, whereas 大地書房.mp3 doesn't.
When I use 大地書房.mp3 in the Firebug console I see the following error:
"NetworkError: 404 Not Found - http://monthlymixup.com/mixups/july_2012/media/simon/03%20????.mp3"
So, for some reason 大地書房.mp3 becomes %20????. When I inspect the page the link to the audio file shows as 大地書房.mp3 though.
There is a meta tag for UTF-8 on the demo page, i.e. <meta charset=utf-8 />
My understanding is that Firefox overwrites this with the response header if a default encoding isn't set in FF. I have however set UTF-8 to be the default encoder and I have checked that the page is using UTF-8 by going to Tools/Page Info (I am on a Mac and I believe this is the way to check the encoding on the page).
So, I'm at a loss as to what is going on, and would be glad of some help.
This seems to be an encoding issue in jQuery or other software used. The entry 2 is in an odd format: looking at the source as UTF-8, I see
mp3:"media/nick/Guessi-Guéré-Guessi (Pop Bariba).mp3"
This means that the letter “é” has been represented in UTF-8, as two octets, and then these octets have been interpreted as if they were ISO-8859-1 encoded, and the resulting characters have been UTF-8 encoded. Presumably the software deals with the mess by performing the opposite double decoding. In any case, it does not work with
mp3:"media/simon/03 大地書房.mp3"
which is just UTF-8 encoded.
It puzzles me how it works on any browser, but presumably the code is browser-dependent.
The software should be changed to deal with UTF-8 as such and pass it forward, if possible. All modern browsers, including Firefox, can then deal with it properly.
As a quick fix, though, you might try to use a percent-encoded form (see e.g. online percent-encoder):
mp3: "media/simon/03%20%E5%A4%A7%E5%9C%B0%E6%9B%B8%E6%88%BF.mp3"
But this is just a guess; the software might munge this, percent-encoding the “%” sign.

UTF-8 but still not showing ÆØÅ (danish chars)

Take a look at this:
http://thebekker.dk/_skole/GFeksamen/
You can see the 2nd menu item show some weird sign, instead of "Ø"
Ive set utf-8 in meta, and even tryed with AddDefaultCharset UTF-8 in .htaccess...
Still no result, if i change to ISO-8859-1 which works fine, but that makes problem when i start making ajax calls for content...
I dont get it?
How do i get it to use UTF-8 and show ÆØÅ
If you declare that your content is encoded in UTF-8 with the meta tags or default charset, then your content needs to be actually encoded in UTF-8. The fact that it shows correctly when declaring your content to be encoded in ISO-8859 means that your content is actually encoded in ISO-8859. Save your source code file as UTF-8 or otherwise make sure that your content is UTF-8 encoded.
Saving the source file in "Western European (Windows)" in EditPlus text editor did it for me + in PHP I used utf8_encode.
you can set this characters with unicode like € or so many others. In my company we work with many translations and languages like france, that has many special chars.
set your website encoding type to utf-8 and use encodings like utf8_encode in php
or manually: http://www.sql-und-xml.de/unicode-database/online-tools/

Encoding issues with Microsoft Word characters in an AJAX request

I'm writing a function to convert MS Word-styled text into Adobe InDesign-formatted text (it uses a kind of XML to indicate styling). The text is pasted into a TinyMCE rich text editor, which then sends the HTML-formatted code to a php function.
I've tried this function to clean up the code once it reaches my conversion code:
$text = iconv("windows-1250", "UTF-8", $html);
When I use any 'special' kind of characters, things go wrong. £ signs, é (or any other accents), and a variety of 'curly' apostrophes/quote marks seem to break things. For example, if I try to convert a £ sign, the code returns \u0141, but I get the Ł symbol displayed onscreen when the function returns.
Does anybody know what I can do to prevent Word's weird characters breaking everything I'm doing?
I seem to have fixed this. I was using escape() to pass the values, but replaced this with encodeURIComponent() instead (and removed the iconv() call in my php code), which seems to have fixed it.

Resources