Every time I get an error when validating:
<iframe class="forecast" src="http://forecast.io/embed/#lat=-26.201560&lon=28.038995&name=Johannesburg,%20ZA&text-color=#ffffff&color=#ffffff&font=Helvetica&units=ca"></iframe>
Error (screenshot):
http://postimg.org/image/5h1kvzzuh/
I escaped the characters, but it didn't works.
Thanks.
W3C validator maintainer here. Short answer is, use instead the following:
<iframe class="forecast" src="http://forecast.io/embed/%23lat=-26.201560&lon=28.038995&name=Johannesburg,%20ZA&text-color=#ffffff&color=%23ffffff&font=Helvetica&units=ca"></iframe>
That is, the fix is just to replace # with %23 (the percent-encoding of the # character).
Explanation
The specific problem in that URL is the # character references it contains.
# is # (the “number-sign” or “hash” character), which is not a valid URL code point per the URL Standard, and so it’s not allowed in a URL.
The # character is only ever allowed in an absolute URL with fragment or relative URL with fragment—and then, explicitly allowed only after the part the URL spec defines as the actual URL.
And for the purposes of URLs, # and # are exactly the same.
Hence, you must use it as %23 (that is, percent-encoded).
P.S. I plan to get the URL checker in the validator updated to actually report the particular illegal characters it finds in URLs but it will be a while yet before I can get that refinement made.
Related
Due to some odd circumstances I have the necessity to use uriQuery() in a Power Automate flow in order to extract the query string from an url.
This works as expected in most circumstances, except when the url contains special characters like accented letters, for example
http://www.example.com/peppers/Jalapeño/recipe #1.docx
In such cases the call triggers an error and the exception message shows a (partially) encoded version of my url (why?).
The template language function 'uriQuery' expects its parameter to be a well-formed absolute URI. The provided value was '......'
Obviously the url was indeed a well-formed, absolute URI.
Since the error only triggers when the url contains special characters I assumed that I had to encode the value before calling uriQuery(), yet nothing I tried seems to work (for example encodeUriComponent() ). And as expected nothing I could find on the web mentioned a similar issue.
As a last attempt I am asking here - does uriQuery() support this use-case? And if it does... how?
I tried to find a solution for this issue but nothing worked. When my REST api URI request is, ex. https://serverip/meeting/userlist/0
I always get the error "The URI you submitted has disallowed characters”. I have even tried to leave this parameter in the config file blank:
$config['permitted_uri_chars'] = 'a-z 0-9~%.:_-+';
But I get the same error.
Is not allowed to have a 0 at the end of the URI as unique content of that segment? Because I need that to retrieve user with id = 0.
Thanks a lot.
EDIT - SOLVED:
Hi Again,
finally I solved it. I found that long time ago we commented a check related to UTF8 encodig in URI.php
if ( ! empty($str) && ! empty($this->_permitted_uri_chars) && ! preg_match('/^['.$this->_permitted_uri_chars.']+$/i'.(UTF8_ENABLED ? 'u' : ''), $str))
And we only left the first condition. We had some code issues that seem not to reproduce after revert that comment. And /0 now works fine.
So sorry, at the end it was a problem related to our own modifications.
Thanks.
$config['permitted_uri_chars'] is used as a PCRE character class pattern.
With the last character in there being a dash, it looks for a dash. However, when a dash is between two characters, it triggers a range search. So ... when you append the + (plus) sign after the dash, you get:
[_-+] // a range between underscore and plus in the ASCII table
You might be thinking "So what? Zeros are already allowed previously via 0-9", and you'd be correct, but that's not the problem. The problem is that the plus sign has a lower ASCII number than the underscore, and ranges don't work backwards, so _-+ is invalid and triggers a PCRE compilation failure, which in turn means the entire check fails and nothing is actually allowed.
You would see this if you had error_reporting enabled and/or looked at the error logs.
This doesn't happen if you only append the plus sign to the default pattern - the dash is not only the last character, but also escaped with a backslash - as you'd have this instead:
[_\-+] // Underscore, dash and plus sign as individual characters; not a range
I guess you thought it was an actual character to be allowed and removed it. Just add it back:
$config['permitted_uri_chars'] = 'a-z 0-9~%.:_\-+';
I'm looking over Section 3.4 of RFC 3986 trying to understand what constitutes a valid URI query parameter key, but I'm not seeing a clear answer.
The reason I'm asking is because I'm writing a Ruby class that composes a URI with query parameters. When a new parameter is added I want to validate the key. Based on experience, it seems like the key will be invalid if it requires any escaping.
I should also say that I plan to validate the key. I'm not sure how to go about validating this data either, but I do know that in all cases I should escape this value.
Advice is appreciated. Advice in the context of how validation might already be possible through say a Ruby Gem would also be a plus.
I could well be wrong, but that spec seems to say that anything following '?' or '#' is valid as long. I wonder if you should be looking more at the spec for 'application/x-www-form-urlencoded' (ie. the key/value pairs we're all used to)?
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
This is the default content type. Forms submitted with this content
type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by &'.
I don't believe key=value is part of the RFC, it's a convention that has emerged. Wikipedia suggests this is an 'W3C recommendation'.
Seems like some good stuff to be found searching on the application/x-www-form-urlencoded content type.
http://www.w3.org/TR/REC-html40/interact/forms.html#form-data-set
Is there a way to say (programmatically, I mean their API) the Google URL shortener not to produce short URL with characters like:
0 O
1 l
Because people often make mistake when reading those characters from displays and typing them elsewhere.
You cannot request the API to use a custom charset, so no.
Not a proper solution, but you could check the url for unwanted characters and request another short URL for the same long URL until you get one you like. Google URL shortner issues a unique short URL for an already shortned URL if you provide an OAuth token with the request. However I am not sure if a user is limited to one unique short URL per a specific long URL in which case this won't work either.
Since you're doing it programmatically, you could swap out those chars for their ascii value, '%6F' for the letter o, for instance. In this case, just warn the users that in doubt, it's a numeral.
Alternatively, use a font that distinguishes ambiguous chars, or better yet, color-code them (or underline numerals, or whatever visual mark)
I've written a more detailed post about this on my blog at:
http://idisposable.co.uk/2010/07/chrome-are-you-sanitising-my-inputs-without-my-permission/
but basically, I have a string which is:
||abcdefg
hijklmn
opqrstu
vwxyz
||
the pipes I've added to give an indiciation of where the string starts and ends, in particular note the final carriage return on the last line.
I need to put this into a hidden form variable to post off to a supplier.
In basically, any browser except chrome, I get the following:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz
" />
but in chrome, it seems to apply a .Trim() or something else that gives me:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz" />
Notice it's cut off the last carriage return. These carriage returns (when Encoded) come up as %0A if that helps.
Basically, in any browser except chrome, the whole thing just works and I get the desired response from the third party. In Chrome, I get an 'invalid pareq' message (which suggests to me that those last carriage returns are important to the supplier).
Chrome version is 5.0.375.99
Am I going mad, or is this a bug?
Cheers,
Terry
You can't rely on form submission to preserve the exact character data you include in the value of a hidden field. I've had issues in the past with Firefox converting CRLF (\r\n) sequences into bare LFs, and your experience shows that Chrome's behaviour is similarly confusing.
And it turns out, it's not really a bug.
Remember that what you're supplying here is an HTML attribute value - strictly, the HTML 4 DTD defines the value attribute of the <input> element as of type CDATA. The HTML spec has this to say about CDATA attribute values:
User agents should interpret attribute values as follows:
Replace character entities with characters,
Ignore line feeds,
Replace each carriage return or tab with a single space.
User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
So whitespace within the attribute value is subject to a number of user agent transformations - conforming browsers should apparently be discarding all your linefeeds, not only the trailing one - so Chrome's behaviour is indeed buggy, but in the opposite direction to the one you want.
However, note that the browser is also expected to replace character entities with characters - which suggests you ought to be able to encode your CRs and LFs as
and
, and even spaces as , eliminating any actual whitespace characters from your value field altogether.
However, browser compliance with these SGML parsing rules is, as you've found, patchy, so your mileage may certainly vary.
Confirmed it here. It trims trailing CRLFs, they don't get parsed into the browser's DOM (I assume for all HTML attributes).
If you append CRLF with script, e.g.
var pareqMsg = document.forms[0]['pareqMsg']
if (/\r\n$/.test(pareqMsg.value) == false)
pareqMsg.value += '\r\n';
...they do get maintained and POSTed back to the server. Although the hidden <textarea> idea suggested by Gaby might be easier!
Normally in an input box you cannot enter (by keyboard) a newline.. so perhaps chrome enforces this even for embedded, through the attributes, values ..
try using a textarea (with display:none)..