Server.URLEncode started to replace blank with plus ("+") instead of percent-20 ("%20") - vbscript

Given this piece of code:
<%
Response.Write Server.URLEncode("a doc file.asp")
%>
It output this for a while (like Javascript call encodeURI):
a%20doc%20file.asp
Now, for unknow reason, I get:
a+doc+file%2Easp
I'm not sure of what I touched to make this happen (maybe the file content encoding ANSI/UTF-8). Why did this happen and how can I get the first behavior of Server.URLEncode, ie using a percent encoding?

Classic ASP hasn't been updated in nearly 20 years, so Server.URLEncode still uses the RFC-1866 standard, which specifies spaces be encoded as + symbols (which is a hangover from an old application/x-www-form-urlencoded media type), you must be mistaken in thinking it was encoding spaces as %20 at some point, not unless there's an IIS setting you can change that I'm unaware of.
More modern languages use the RFC-3986 standard for encoding URLs, which is why Javascript's encodeURI function returns spaces encoded as %20.
Both + and %20 should be treated exactly the same when decoded by any browser thanks to RFC backwards compatibility, but it's generally considered best to use %20 when encoding spaces in a URL as it's the more modern standard now, and some decoding functions (such as Javascript's decodeURIComponent) won't recognise + symbols as spaces and will fail to properly decode URLs that use them over %20.
You can always use a custom function to encode spaces as %20:
function URL_encode(ByVal url)
url = Server.URLEncode(url)
url = replace(url,"+","%20")
URL_encode = url
end function

Related

How to handle the javascript decoding error?

When the code is implemented, some characters cannot be decoded. I am getting a bunch of question marks like ??. How can I fix this?
HtmlInput inputBox2 = (HtmlInput)currentPage.getHtmlElementById("classNo");
inputBox2.setValueAttribute("2016同學15");
ScriptResult result = currentPage.executeJavaScript("javascript:Search(2)");
I found this in the compiler: ScriptResult[result=net.sourceforge.htmlunit.corejs.javascript.Undefined#24d7aac3 page=HtmlPage(http://www.xx.org/classNo=2016??15)#1330510442]
You might try to use URL-encoding for some ASCII and all non ASCII characters.
e.g. space by %20
Here is a web site explaning the
HTML URL Encoding Reference.
You can also interactive encode strings there.
Your "2016同學15" would be encoded as:
"2016%E5%90%8C%E5%AD%B815"

Can I treat all domain names as being IDNs without any ill effects?

From testing, it seems like trying to convert both IDNs and regular domain names 'just works' - eg, if the input doesn't need to be changed punycode will just return the input.
punycode.toASCII('lancôme.com');
returns:
'xn--lancme-lxa.com'
And
punycode.toASCII('apple.com');
returns:
'apple.com'
This looks great, but is it specified anywhere? Can I safely convert everything to punycode?
That is correct. If you look at how the procedure for converting unicode strings to ascii punycode, the process only alters any non-ascii character. Since regular domains cannot contain non-ascii characters, if your conversor is correctly implemented, it will never transform any pure-ascii string.
You can read more about how unicode is converted to punycode here: https://en.wikipedia.org/wiki/Punycode
Punycode is specified in RFC 3492: https://www.ietf.org/rfc/rfc3492.txt, and it clearly says:
"Basic code point segregation" is a very simple and
efficient encoding for basic code points occurring in the extended
string: they are simply copied all at once.
Therefore, if your extended string is made of basic code points, it will just be copied without change.

How can i send a parameter with space to .net web api

I would like to receive a long string the contains spaces to my method in my web api
To my understanding i can't send a parameter with white spaces, does it have to be encoded in some way?
EDIT:
My content type is:
Content-Type: application/x-www-form-urlencoded
I've changed it to several other types but none of them allows me to receive a parameter with + instead of spaces
my post method signature is
public HttpResponseMessage EditCommentForExtension(string did, string extention, string comment)
Usually, parameters to an HTTP GET request are URL encoded. This means (among other) that spaces are replaced by "+".
Using + to mean "space" in a URL is an internal convention used by some web sites, but it's not part of the URL encoding standard. If you want to use + to means spaces, you are going to have to convert them yourself.
As you discovered, spaces (like everything else that needs encoding) should be encoded with %XX where X standards for a hex digit.
http://www.w3.org/Addressing/rfc1738.txt
The only thing that work for me is to add %20 instead of the spaces

How to force Firefox addon charset/encoding?

I'm having a trouble with a mobile addon: it shows me the new elements added by scripting with a different charset of the page. E.g. I can read "cuadrúpedo" but the same word in my plugin show "cuadr¡pedo".
I tryed writing the next line to the beginning of my addon, but it didn't work:
document.getElementsByTagName("html")[0].setAttribute("lang", "es");
Then, I wrote a "converter function" which replaces the special characters with unicode, like the next line, but it didn't work.
str.replace( /ú/g, "/xfa־" );
What can I do?
Probably it's a matter of text encoding.
Make sure the file that contains the literal "cuadrúpedo" is saved as utf-8, not ansi.
Keep in mind that a few key files must be ansi encoded. These are install.rdf, chrome.manifest and bootstrap.js. In this case use unicode escapes, "cuadr\u00fapedo".
When the JavaScript file is loaded (in Gecko 1.8 and later) from a chrome:// URL, a Byte Order Mark is used to determine the character encoding of the script. Otherwise, the character encoding will be the same as the one used by the XUL file. So, one solution is the HTTP header can contain a character encoding declaration as part of the Content-Type header, for example:
Content-Type: application/javascript; charset=UTF-8
For cross version compatibility you must limit yourself to ASCII. However, you can use unicode escapes – the earlier example rewritten using them would be:
var text = "Ein sch\u00F6nes Beispiel eines mehrsprachigen Textes: \u65E5\u672C\u8A9E";
JavaScript and Navigator support for UTF-8/Unicode means you can use non-Latin, international, and localized characters, plus special technical symbols in JavaScript programs. Unicode provides a standard way to encode multilingual text: since the UTF-8 encoding of Unicode is compatible with ASCII, programs can use ASCII characters. To receive non-ASCII character input, the client needs to send the input as Unicode.
There is a webpage for text escaping and unescaping in Javascript:
http://0xcc.net/jsescape/
Sources:
https://developer.mozilla.org/en-US/docs/International_characters_in_XUL_JavaScript
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Values,_variables,_and_literals#Unicode

ISO-8859-1 characters treated as UTF-8 in XSLT attributes

The ¬ character (0xAC in ISO-8859-1) works for normal text if I ensure that ISO-8859-1 is always used as the encoding throughout. However, when using it in attributes it is escaped to: %C2%AC. I understand that it needs to be escaped for urls, but not why it escapes it in the same way as it would for UTF-8, rather than just %AC as I'd expect it to for ISO-8859-1.
Since the escapes are in the output html file the only conclusion is that the xslt processor is the cause.
Example:
input.xml
stylesheet.xslt
makefile
Which for me generates:
output.html
Output was generated using xsltproc, compiled against libxml 20707, libxslt 10126 and libexslt 815. This was on #! Linux (amd64). I have also tried: xmlstarlet tr (also uses libxml), xalan and google chrome (by adding an <?xml-stylesheet ... >, see input_ss.xml tag) with the same result.
Opera doesn't escape it at all, and it allows ¬ to be used literally in the url and attribute.
Is this standard behaviour for xslt or is this a bug in the way the attributes are escaped? And either way, is there a solution other than replacing %C2%AC with %AC bearing in mind it is almost certainly the same for other characters that are valid ISO-8859-1 and invalid in UTF-8.
There are 3 different text-based technologies in use here, XML, HTML and URIs.
All of these have escape mechanisms - that is to say, ways to use text to indicate other text that it is impossible or difficult to indicate in a given context.
The not-sign character ¬ (U+00AC) could be escaped in the first two as ¬ or ¬ perhaps with some leading zeros, in both XML and HTML (¬ would also work in HTML). This escape would be used no matter what encoding the XML or HTML was in, because it relates to the character ¬, not to its set of octets in a given character encoding - indeed, we would generally only use it in the case where there was no such set of octets in the encoding being used.
In this case, this is unnecessary, since the output is in a character encoding in which there is no need to escape it, and so in the source you can see The ¬ character unescaped.
This HTML includes the text of a URI. The encoding of the HTML has nothing to do with this, because the encoding is how we get the text of the HTML from one machine to another, but when the HTML is being parsed to read this URI we're past that point and are dealing with some text at the level of text - that is to say, it doesn't have an encoding any more.
Now, URIs have their own escape mechanisms. This must be used in the case of ¬, as it is not a character allowed in URIs (as opposed to IRIs). Sadly, unlike the escapes in XML and HTML, these escapes are based on octets in a given encoding rather than the code-point of the character itself.
It's easy to see this as a mistake now, but URIs were specified in 1994 and that formalised work going back to 1989/1990 while Unicode 1.0 was released in 1991 and didn't have the ground-breaking 2.0 until 1996, so hindsight has considerably more benefits than URI's inventors. (HTML had the same problem many years ago, but the format of its encodings made it much easier to fix this without as many backwards-compatibility issues).
So, what encoding should we use for those octets? The original specs left this undefined, but really the only possible choice is UTF-8. It's the only encoding that gives those escapes commonly used for chracters special to URIs their escapes in the range 0x20 - 0x7F while also covering all of the UCS.
There's also no way to indicate another choice could be more appropriate. Remember, we're working at the level of text, so your use of ISO-8859-1 is completely irrelevant. Even if we kept track of the encoding while parsing the HTML, the URI is going to be made use of in a way that is nothing to do with the document, so we still couldn't use it. In all, if we have to make use of an octet-based encoding, and we have to keep characters in the ASCII range matching the octets they'd have in ASCII, the only possible basis for the encoding is UTF-8.
For that reason, the escape in any URI for ¬ must always be %C2%AC.
There can be some legacy systems that expect URIs to use other encodings, but the solution is to fix the bit that's broken, not the bit that works, so if something expects ¬ to be %AC then catch it close to that by converting %C2%AC close to its use (and if it outputs %AC itself then of course you'll need to fix it to %C2%AC before it hits the outside world).
The XSLT spec says that when serializing URI-valued attributes, all non-ASCII characters are escaped using the %HH-escaping of the UTF-8 octets that represent the character. Although %HH-escaping of other encodings has been used in the past, it is no longer used today. This is quite independent of the encoding of the document itself.

Resources