After checking how others try to do URL encoding in Cocoa (like How do I URL encode a string, Swift - encode URL, etc.) I still have no clue how to correctly URL encode in Cocoa if
URLs come from externally therefore their structure (parts) not known ahead
can be encoded or pure URL strings
BONUS can be relative and local file URLs
I do not want to encode blindly always all the characters but according to rfc3986 (rfc2396, rfc1738, rfc1808)
The catch 22:
stringByAddingPercentEscapesUsingEncoding: converts lazily so the preferred method would be using stringByAddingPercentEncodingWithAllowedCharacters: for each url components one by one
[NSURL URLWithString:], [NSURLComponents componentsWithString:] and companions will fail if the incoming string is not (at least partially) encoded, but if I pass a stringByAddingPercentEscapesUsingEncoding: encoded string than the component splitting will fail (f.e. the encoded # will confuse the splitter and the fragment will be treated the part of a possible query section
How to URL encode correctly in this case without writing my own URL parser, encoder?
You should read all of Apple's release note discussion of this subject, but in particular this part may be most relevant for your case:
If you need to percent-encode an entire URL string, you can use this
code to encode a NSString intended to be a URL (in urlStringToEncode):
NSString *percentEncodedURLString =
[[NSURL URLWithDataRepresentation:[urlStringToEncode dataUsingEncoding:NSUTF8StringEncoding] relativeToURL:nil] relativeString];
(The CoreFoundation equivalent to URLWithDataRepresentation: is
CFURLCreateWithBytes() using the encoding kCFStringEncodingUTF8 with a
fall back to kCFStringEncodingISOLatin1 if kCFStringEncodingUTF8
fails.)
Basically, +URLWithDataRepresentation:relativeToURL: does its best to make a proper URL from the provided bytes. Given that you can't guarantee almost anything about the input, there can't be any promises that it will get it "right" (because "right" isn't well defined in that case), but it's probably your best hope.
Related
Due to some odd circumstances I have the necessity to use uriQuery() in a Power Automate flow in order to extract the query string from an url.
This works as expected in most circumstances, except when the url contains special characters like accented letters, for example
http://www.example.com/peppers/JalapeƱo/recipe #1.docx
In such cases the call triggers an error and the exception message shows a (partially) encoded version of my url (why?).
The template language function 'uriQuery' expects its parameter to be a well-formed absolute URI. The provided value was '......'
Obviously the url was indeed a well-formed, absolute URI.
Since the error only triggers when the url contains special characters I assumed that I had to encode the value before calling uriQuery(), yet nothing I tried seems to work (for example encodeUriComponent() ). And as expected nothing I could find on the web mentioned a similar issue.
As a last attempt I am asking here - does uriQuery() support this use-case? And if it does... how?
Given this piece of code:
<%
Response.Write Server.URLEncode("a doc file.asp")
%>
It output this for a while (like Javascript call encodeURI):
a%20doc%20file.asp
Now, for unknow reason, I get:
a+doc+file%2Easp
I'm not sure of what I touched to make this happen (maybe the file content encoding ANSI/UTF-8). Why did this happen and how can I get the first behavior of Server.URLEncode, ie using a percent encoding?
Classic ASP hasn't been updated in nearly 20 years, so Server.URLEncode still uses the RFC-1866 standard, which specifies spaces be encoded as + symbols (which is a hangover from an old application/x-www-form-urlencoded media type), you must be mistaken in thinking it was encoding spaces as %20 at some point, not unless there's an IIS setting you can change that I'm unaware of.
More modern languages use the RFC-3986 standard for encoding URLs, which is why Javascript's encodeURI function returns spaces encoded as %20.
Both + and %20 should be treated exactly the same when decoded by any browser thanks to RFC backwards compatibility, but it's generally considered best to use %20 when encoding spaces in a URL as it's the more modern standard now, and some decoding functions (such as Javascript's decodeURIComponent) won't recognise + symbols as spaces and will fail to properly decode URLs that use them over %20.
You can always use a custom function to encode spaces as %20:
function URL_encode(ByVal url)
url = Server.URLEncode(url)
url = replace(url,"+","%20")
URL_encode = url
end function
Is there a way to say (programmatically, I mean their API) the Google URL shortener not to produce short URL with characters like:
0 O
1 l
Because people often make mistake when reading those characters from displays and typing them elsewhere.
You cannot request the API to use a custom charset, so no.
Not a proper solution, but you could check the url for unwanted characters and request another short URL for the same long URL until you get one you like. Google URL shortner issues a unique short URL for an already shortned URL if you provide an OAuth token with the request. However I am not sure if a user is limited to one unique short URL per a specific long URL in which case this won't work either.
Since you're doing it programmatically, you could swap out those chars for their ascii value, '%6F' for the letter o, for instance. In this case, just warn the users that in doubt, it's a numeral.
Alternatively, use a font that distinguishes ambiguous chars, or better yet, color-code them (or underline numerals, or whatever visual mark)
I would like to receive a long string the contains spaces to my method in my web api
To my understanding i can't send a parameter with white spaces, does it have to be encoded in some way?
EDIT:
My content type is:
Content-Type: application/x-www-form-urlencoded
I've changed it to several other types but none of them allows me to receive a parameter with + instead of spaces
my post method signature is
public HttpResponseMessage EditCommentForExtension(string did, string extention, string comment)
Usually, parameters to an HTTP GET request are URL encoded. This means (among other) that spaces are replaced by "+".
Using + to mean "space" in a URL is an internal convention used by some web sites, but it's not part of the URL encoding standard. If you want to use + to means spaces, you are going to have to convert them yourself.
As you discovered, spaces (like everything else that needs encoding) should be encoded with %XX where X standards for a hex digit.
http://www.w3.org/Addressing/rfc1738.txt
The only thing that work for me is to add %20 instead of the spaces
I'm trying to do it in VBScript/JScript, to avoid re-encoding.
Should I judge if there is "%" ? Does "%" have other uses in URL?
Thanks.
Edit: Oh, the original encoding function may not be encodeURI.
I'm trying to collect URLs from the browser, and store them after encoding with encodeURI.
But if the URL is already encoded, another encoding will make it wrong.
I might try decoding it and comparing the result to the original URL. If it changed or got shorter in length your original URL was probably already encoded.
iterate over the chars in the url and test for characters that aren't allowed
in an url.
if there are any encode it.
if there aren't any illegal characters, it doesn't matter