Characters in string within post are being auto escaped - ruby

I am sending tokens via a POST request, but when I see them on the server it doesn't match up with what was sent.
"U2FsdGVkX1+pxBHFdSU4NiSIOdR2GCCBr/WF7AOSF5zQjRqjSoTeOKR0Dzwm\nNT+g\n" <-- Original
"U2FsdGVkX1+pxBHFdSU4NiSIOdR2GCCBr/WF7AOSF5zQjRqjSoTeOKR0Dzwm\\nNT+g\\n" <-- Result
Notice that the \n has been replaced with \\n. When I do the token lookup verification, of course, no result is found because the string I'm looking for is not the proper string anymore!
I'm not sure why this string is being auto changed like this or quite how to correct it. I'm just accessing this through the standard params like so.
token.verify(params["token"])
EDIT for further clarity
I'm viewing this from the terminal using the debugger gem. I have autoeval enabled and display with params["token"] without p or puts. I am not trying to create newline characters with \n. The literal \n is an actual part of the string that is received in the post. I randomly generate a token using a hashing and encryption library and the strings sometimes end up with these characters in them. If I run token.verify(params["token"]) from the debugger terminal I get nil back from the database as there is no match due to the extra backslash characters being added into the string.
If I directly run token.verify("U2FsdGVkX1+pxBHFdSU4NiSIOdR2GCCBr/WF7AOSF5zQjRqjSoTeOKR0Dzwm\nNT+g\n") from the debugger terminal I get the correct record back from the database. This leaves me thinking that either Rack or Sinatra is auto escaping the "special" characters in the string before I get a chance to even touch it.

This has something to with the way Ruby is handling special characters. From irb you can see this with a quick check like this.
"\\n" == '\n'
Unexpectedly; at least to me, this returns true as they are treated the same. Rather than trying to deal with special characters coming across the wire I ended up just base 64 encoding everything.

Related

Get the same results from string.start_with? and string[ ]

Basically, I want to check if a string (main) starts with another string (sub), using both of the above methods. For example, following is my code:
main = gets.chomp
sub = gets.chomp
p main.start_with? sub
p main[/^#{sub}/]
And, here is an example with I/O - Try it online!
If I enter simple strings, then both of them works exactly the same, but when I enter strings like "1\2" in stdin, then I get errors in the Regexp variant, as seen in TIO example.
I guess this is because of the reason that the string passed into second one isn't raw. So, I tried passing sub.dump into second one - Try it online!
which gives me nil result. How to do this correctly?
As a general rule, you should never ever blindly execute inputs from untrusted sources.
Interpolating untrusted input into a Regexp is not quite as bad as interpolating it into, say, Kernel#eval, because the worst thing an attacker can do with a Regexp is to construct an Evil Regex to conduct a Regular expression Denial of Service (ReDoS) attack (see also the section on Performance in the Regexp documentation), whereas with eval, they could execute arbitrary code, including but not limited to, deleting the entire file system, scanning memory for unencrypted passwords / credit card information / PII and exfiltrate that via the network, etc.
However, it is still a bad idea. For example, when I say "the worst thing that happen is a ReDoS", that assumes that there are no bugs in the Regexp implementation (Onigmo in the case of YARV, Joni in the case of JRuby and TruffleRuby, etc.) Ruby's Regexps are quite powerful and thus Onigmo, Joni and co. are large and complex pieces of code, and may very well have their own security holes that could be used by a specially crafted Regexp.
You should properly sanitize and escape the user input before constructing the Regexp. Thankfully, the Ruby core library already contains a method which does exactly that: Regexp::escape. So, you could do something like this:
p main[/^#{Regexp.escape(sub)}/]
The reason why your attempt at using String#dump didn't work, is that String#dump is for representing a String the same way you would have to write it as a String literal, i.e. it is escaping String metacharacters, not Regexp metacharacters and it is including the quote characters around the String that you need to have it recognized as a String literal. You can easily see that when you simply try it out:
sub.dump
#=> "\"1\\\\2\""
# equivalent to '"1\\2"'
So, that means that String#dump
includes the quotes (which you don't want),
escapes characters that don't need escaping in Regexp just because they need escaping in Strings (e.g. # or "), and
doesn't escape characters that don't need escaping in Strings (e.g. [, ., ?, *, +, ^, -).

How to Escape Double Quotes from Ruby Page Object text

In using the Page Object gem, I'm trying to pull text from a page to verify error messages. One of these error messages contains double-quotes, but when the page object pulls the text from the page, it pulls some other characters.
expected ["Please select a category other than the Default â?oEMSâ?? before saving."]
to include "Please select a category other than the Default \"EMS\" before saving."
(RSpec::Expectations::ExpectationNotMetError)
I'm not quite sure how to escape these - I'm not sure where I could use Regexs and be able to escape these odd characters.
Honestly you are over complicating your validation.
I would recommend simplifying what you are trying to do, start by asking yourself: Is the part in quotes a critical part of your validation?
If it is, isolate it by doing a String.contains("EMS")
If it is not, then you are probably doing too much work, only check for exactly what you need in validation:
String.beginsWith("Please select a category other than the Default")
With respect to the actual issue you are having, on a technical level you have an encoding issue. Encode your result string with utf-8 before you pass it to your validation and you will be fine.
Good luck
It's pretty likely that somewhere along the line encoded the string improperly. (A tipoff is the accented characters followed by ?.) It seems pretty likely that the quotes were converted to "smart quotes" somewhere. This table compares Window-1252 to UTF-8:
Code Point Characters UTF-8 Bytes
Unicode Windows
1252 Expected Actual
------ ---- - --- -----------
U+201C 0x93 “ “ %E2 %80 %9C
U+201D 0x94 ” †%E2 %80 %9D
What you'll want to do is spot check various places in the code to find the first place the string is encoded in something other than UTF-8:
puts error_str.encoding
(For clarity, error_str is the variable that holds the string you are testing. I'm using puts, but you might want have another way to log diagnostic messages.)
Once you find the string that's not encoded UTF-8, you can convert it:
error_str.encode('UTF-8')
Or, if the string is hardcoded somewhere, just replace the string.
For more debugging advice, see: 3 Steps to Fix Encoding Problems in Ruby and How to Get From They’re to They’re.

What constitutes a valid URI query parameter key?

I'm looking over Section 3.4 of RFC 3986 trying to understand what constitutes a valid URI query parameter key, but I'm not seeing a clear answer.
The reason I'm asking is because I'm writing a Ruby class that composes a URI with query parameters. When a new parameter is added I want to validate the key. Based on experience, it seems like the key will be invalid if it requires any escaping.
I should also say that I plan to validate the key. I'm not sure how to go about validating this data either, but I do know that in all cases I should escape this value.
Advice is appreciated. Advice in the context of how validation might already be possible through say a Ruby Gem would also be a plus.
I could well be wrong, but that spec seems to say that anything following '?' or '#' is valid as long. I wonder if you should be looking more at the spec for 'application/x-www-form-urlencoded' (ie. the key/value pairs we're all used to)?
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
This is the default content type. Forms submitted with this content
type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by &'.
I don't believe key=value is part of the RFC, it's a convention that has emerged. Wikipedia suggests this is an 'W3C recommendation'.
Seems like some good stuff to be found searching on the application/x-www-form-urlencoded content type.
http://www.w3.org/TR/REC-html40/interact/forms.html#form-data-set

Ruby hexacode to unicode conversion

I crawled a website which contains unicode, an the results look something like, if in code
a = "\\u2665 \\uc624 \\ube60! \\uc8fd \\uae30 \\uc804 \\uc5d0"
May I know how do I do it in Ruby to convert it back to the original Unicode text which is in UTF-8 format?
If you have ruby 1.9, you can try:
a.force_encoding('UTF-8')
Otherwise if you have < 1.9, I'd suggest reading this article on converting to UTF-8 in Ruby 1.8.
short answer: you should be able to 'puts a', and see the string printed out. for me, at least, I can print out that string in both 1.8.7 and 1.9.2
long answer:
First thing: it depends on if you're using ruby 1.8.7, or 1.9.2, since the way strings and encodings were handled changed.
in 1.8.7:
strings are just lists of bytes. when you print them out, if your OS can handle it, you can just 'puts a' and it should work correctly. if you do a[0], you'll get the first byte. if you want to get each character, things are pretty darn tricky.
in 1.9.2
strings are lists of bytes, with an encoding. If the webpage was sent with the correct encoding, your string should already be encoded correctly. if not, you'll have to set it (as per Mike Lewis's answer). if you do a[0], you'll get the first character (the heart). if you want each byte, you can do a.bytes.
If your OS, for whatever reason, is giving you those literal ascii characters,my previous answer is obviously invalid, disregard it. :P
here's what you can do:
a.gsub(/\\u([a-z0-9]+)/){|p| [$1.to_i(16)].pack("U")}
this will scan for the ascii string '\u' followed by a hexadecimal number, and replace it with the correct unicode character.
You can also specify the encoding when you open a new IO object: http://www.ruby-doc.org/core/classes/IO.html#M000889
Compared to Mike's solution, this may prevent troubles if you forget to force the encoding before exposing the string to the rest of your application, if there are multiple mechanisms for retrieving strings from your module or class. However, if you begin crawling SJIS or KOI-8 encoded websites, then Mike's solution will be easier to adapt for the character encoding name returned by the web server in its headers.

Bug in Chrome, or Stupidity in User? Sanitising inputs on forms?

I've written a more detailed post about this on my blog at:
http://idisposable.co.uk/2010/07/chrome-are-you-sanitising-my-inputs-without-my-permission/
but basically, I have a string which is:
||abcdefg
hijklmn
opqrstu
vwxyz
||
the pipes I've added to give an indiciation of where the string starts and ends, in particular note the final carriage return on the last line.
I need to put this into a hidden form variable to post off to a supplier.
In basically, any browser except chrome, I get the following:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz
" />
but in chrome, it seems to apply a .Trim() or something else that gives me:
<input type="hidden" id="pareqMsg" value="abcdefg
hijklmn
opqrstu
vwxyz" />
Notice it's cut off the last carriage return. These carriage returns (when Encoded) come up as %0A if that helps.
Basically, in any browser except chrome, the whole thing just works and I get the desired response from the third party. In Chrome, I get an 'invalid pareq' message (which suggests to me that those last carriage returns are important to the supplier).
Chrome version is 5.0.375.99
Am I going mad, or is this a bug?
Cheers,
Terry
You can't rely on form submission to preserve the exact character data you include in the value of a hidden field. I've had issues in the past with Firefox converting CRLF (\r\n) sequences into bare LFs, and your experience shows that Chrome's behaviour is similarly confusing.
And it turns out, it's not really a bug.
Remember that what you're supplying here is an HTML attribute value - strictly, the HTML 4 DTD defines the value attribute of the <input> element as of type CDATA. The HTML spec has this to say about CDATA attribute values:
User agents should interpret attribute values as follows:
Replace character entities with characters,
Ignore line feeds,
Replace each carriage return or tab with a single space.
User agents may ignore leading and trailing white space in CDATA attribute values (e.g., " myval " may be interpreted as "myval"). Authors should not declare attribute values with leading or trailing white space.
So whitespace within the attribute value is subject to a number of user agent transformations - conforming browsers should apparently be discarding all your linefeeds, not only the trailing one - so Chrome's behaviour is indeed buggy, but in the opposite direction to the one you want.
However, note that the browser is also expected to replace character entities with characters - which suggests you ought to be able to encode your CRs and LFs as 
 and
, and even spaces as , eliminating any actual whitespace characters from your value field altogether.
However, browser compliance with these SGML parsing rules is, as you've found, patchy, so your mileage may certainly vary.
Confirmed it here. It trims trailing CRLFs, they don't get parsed into the browser's DOM (I assume for all HTML attributes).
If you append CRLF with script, e.g.
var pareqMsg = document.forms[0]['pareqMsg']
if (/\r\n$/.test(pareqMsg.value) == false)
pareqMsg.value += '\r\n';
...they do get maintained and POSTed back to the server. Although the hidden <textarea> idea suggested by Gaby might be easier!
Normally in an input box you cannot enter (by keyboard) a newline.. so perhaps chrome enforces this even for embedded, through the attributes, values ..
try using a textarea (with display:none)..

Resources