What constitutes a valid URI query parameter key? - ruby

I'm looking over Section 3.4 of RFC 3986 trying to understand what constitutes a valid URI query parameter key, but I'm not seeing a clear answer.
The reason I'm asking is because I'm writing a Ruby class that composes a URI with query parameters. When a new parameter is added I want to validate the key. Based on experience, it seems like the key will be invalid if it requires any escaping.
I should also say that I plan to validate the key. I'm not sure how to go about validating this data either, but I do know that in all cases I should escape this value.
Advice is appreciated. Advice in the context of how validation might already be possible through say a Ruby Gem would also be a plus.

I could well be wrong, but that spec seems to say that anything following '?' or '#' is valid as long. I wonder if you should be looking more at the spec for 'application/x-www-form-urlencoded' (ie. the key/value pairs we're all used to)?
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
This is the default content type. Forms submitted with this content
type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by &'.

I don't believe key=value is part of the RFC, it's a convention that has emerged. Wikipedia suggests this is an 'W3C recommendation'.
Seems like some good stuff to be found searching on the application/x-www-form-urlencoded content type.
http://www.w3.org/TR/REC-html40/interact/forms.html#form-data-set

Related

How to use a DN containing commas as the attribute value in an LDAP search filter?

Was attempting to search our directory based on an attribute whose value is a DN. However, our user RDNs are of the form CN=Surname, GivenName, which requires that the comma be quoted in the full DN. But given an attribute like manager whose value is the DN of another user, I was unable to search for all users having specific manager. I tried (manager=CN=Surname\, GivenName,CN=users,DC=mydomain,DC=com), but got a syntax error "Bad search filter". I tried various options for quoting the DN, but all either gave me a syntax error or failed to match any objects. What am I doing wrong?
(Note that if I were looking for user objects directly, I could search for simply (CN=Surname, GivenName), with no quoting required, but I was searching for users having a specific manager. The comma-containing attribute value only becomes a problem when part of a Distinguished Name.)
The problem is that quoting the comma in the Common Name is not for the benefit of the filter parser, but for the benefit of the DN parser; the attribute value passed to that by the filter has to literally contain the backslash character. Unfortunately, the backslash is also (differently) special in LDAP filters, thus the syntax errors.
The solution is simple, but it isn't as obvious as doubling the backslash; backslash in LDAP filters works like % in URIs, so you have to use a literal backslash followed by the 2-digit hexadecimal code point for a backslash:
(manager=CN=Surname\5c, Givenname,OU=org,DC=mydomain,DC=com)
It turns out there's an example of this specific use case at the very bottom of https://docs.oracle.com/cd/E19424-01/820-4811/gdxpo/index.html#6ng8i269q.

How to Escape Double Quotes from Ruby Page Object text

In using the Page Object gem, I'm trying to pull text from a page to verify error messages. One of these error messages contains double-quotes, but when the page object pulls the text from the page, it pulls some other characters.
expected ["Please select a category other than the Default â?oEMSâ?? before saving."]
to include "Please select a category other than the Default \"EMS\" before saving."
(RSpec::Expectations::ExpectationNotMetError)
I'm not quite sure how to escape these - I'm not sure where I could use Regexs and be able to escape these odd characters.
Honestly you are over complicating your validation.
I would recommend simplifying what you are trying to do, start by asking yourself: Is the part in quotes a critical part of your validation?
If it is, isolate it by doing a String.contains("EMS")
If it is not, then you are probably doing too much work, only check for exactly what you need in validation:
String.beginsWith("Please select a category other than the Default")
With respect to the actual issue you are having, on a technical level you have an encoding issue. Encode your result string with utf-8 before you pass it to your validation and you will be fine.
Good luck
It's pretty likely that somewhere along the line encoded the string improperly. (A tipoff is the accented characters followed by ?.) It seems pretty likely that the quotes were converted to "smart quotes" somewhere. This table compares Window-1252 to UTF-8:
Code Point Characters UTF-8 Bytes
Unicode Windows
1252 Expected Actual
------ ---- - --- -----------
U+201C 0x93 “ “ %E2 %80 %9C
U+201D 0x94 ” †%E2 %80 %9D
What you'll want to do is spot check various places in the code to find the first place the string is encoded in something other than UTF-8:
puts error_str.encoding
(For clarity, error_str is the variable that holds the string you are testing. I'm using puts, but you might want have another way to log diagnostic messages.)
Once you find the string that's not encoded UTF-8, you can convert it:
error_str.encode('UTF-8')
Or, if the string is hardcoded somewhere, just replace the string.
For more debugging advice, see: 3 Steps to Fix Encoding Problems in Ruby and How to Get From They’re to They’re.

Can I treat all domain names as being IDNs without any ill effects?

From testing, it seems like trying to convert both IDNs and regular domain names 'just works' - eg, if the input doesn't need to be changed punycode will just return the input.
punycode.toASCII('lancôme.com');
returns:
'xn--lancme-lxa.com'
And
punycode.toASCII('apple.com');
returns:
'apple.com'
This looks great, but is it specified anywhere? Can I safely convert everything to punycode?
That is correct. If you look at how the procedure for converting unicode strings to ascii punycode, the process only alters any non-ascii character. Since regular domains cannot contain non-ascii characters, if your conversor is correctly implemented, it will never transform any pure-ascii string.
You can read more about how unicode is converted to punycode here: https://en.wikipedia.org/wiki/Punycode
Punycode is specified in RFC 3492: https://www.ietf.org/rfc/rfc3492.txt, and it clearly says:
"Basic code point segregation" is a very simple and
efficient encoding for basic code points occurring in the extended
string: they are simply copied all at once.
Therefore, if your extended string is made of basic code points, it will just be copied without change.

How do you check for a changing value within a string

I am doing some localization testing and I have to test for strings in both English and Japaneses. The English string might be 'Waiting time is {0} minutes.' while the Japanese string might be '待ち時間は{0}分です。' where {0} is a number that can change over the course of a test. Both of these strings are coming from there respective property files. How would I be able to check for the presence of the string as well as the number that can change depending on the test that's running.
I should have added the fact that I'm checking these strings on a web page which will display in the relevant language depending on the location of where they are been viewed. And I'm using watir to verify the text.
You can read elsewhere about various theories of the best way to do testing for proper language conversion.
One typical approach is to replace all hard-coded text matches in your code with constants, and then have a file that sets the constants which can be updated based on the language in use. (I've seen that done by wrapping the require of that file in a case statement based on the language being tested. Another approach is an array or hash for each value, enumerated by a variable with a name like 'language', which lets the tests change the language on the fly. So validations would look something like this
b.div(:id => "wait-time-message).text.should == WAIT_TIME_MESSAGE[language]
To match text where part is expected to change but fall within a predictable pattern, use a regular expression. I'd recommend a little reading about regular expressions in ruby, especially using unicode regular expressions in ruby, as well as some experimenting with a tool like Rubular to test regexes
In the case above a regex such as:
/Waiting time is \d+ minutes./ or /待ち時間は\d+分です。/
would match the messages above and expect one or more digits in the middle (note that it would fail if no digits appear, if you want zero or more digits, then you would need a * in place of the +
Don't check for the literal string. Check for some kind of intermediate form that can be used to render the final string.
Sometimes this is done by specifying a message and any placeholder data, like:
[ :waiting_time_in_minutes, 10 ]
Where that would render out as the appropriate localized text.
An alternative is to treat one of the languages as a template, something that's more limited in flexibility but works most of the time. In that case you could use the English version as the string that's returned and use a helper to render it to the final page.

Allowed characters in map key identifier in YAML?

Which characters are and are not allowed in a key (i.e. example in example: "Value") in YAML?
According to the YAML 1.2 specification simply advises using printable characters with explicit control characters being excluded (see here):
In constructing key names, characters the YAML spec. uses to denote syntax or special meaning need to be avoided (e.g. # denotes comment, > denotes folding, - denotes list, etc.).
Essentially, you are left to the relative coding conventions (restrictions) by whatever code (parser/tool implementation) that needs to consume your YAML document. The more you stick with alphanumerics the better; it has simply been our experience that the underscore has worked with most tooling we have encountered.
It has been a shared practice with others we work with to convert the period character . to an underscore character _ when mapping namespace syntax that uses periods to YAML. Some people have similarly used hyphens successfully, but we have seen it misconstrued in some implementations.
Any character (if properly quoted by either single quotes 'example' or double quotes "example"). Please be aware that the key does not have to be a scalar ('example'). It can be a list or a map.

Resources