In using the Page Object gem, I'm trying to pull text from a page to verify error messages. One of these error messages contains double-quotes, but when the page object pulls the text from the page, it pulls some other characters.
expected ["Please select a category other than the Default â?oEMSâ?? before saving."]
to include "Please select a category other than the Default \"EMS\" before saving."
(RSpec::Expectations::ExpectationNotMetError)
I'm not quite sure how to escape these - I'm not sure where I could use Regexs and be able to escape these odd characters.
Honestly you are over complicating your validation.
I would recommend simplifying what you are trying to do, start by asking yourself: Is the part in quotes a critical part of your validation?
If it is, isolate it by doing a String.contains("EMS")
If it is not, then you are probably doing too much work, only check for exactly what you need in validation:
String.beginsWith("Please select a category other than the Default")
With respect to the actual issue you are having, on a technical level you have an encoding issue. Encode your result string with utf-8 before you pass it to your validation and you will be fine.
Good luck
It's pretty likely that somewhere along the line encoded the string improperly. (A tipoff is the accented characters followed by ?.) It seems pretty likely that the quotes were converted to "smart quotes" somewhere. This table compares Window-1252 to UTF-8:
Code Point Characters UTF-8 Bytes
Unicode Windows
1252 Expected Actual
------ ---- - --- -----------
U+201C 0x93 “ “ %E2 %80 %9C
U+201D 0x94 ” †%E2 %80 %9D
What you'll want to do is spot check various places in the code to find the first place the string is encoded in something other than UTF-8:
puts error_str.encoding
(For clarity, error_str is the variable that holds the string you are testing. I'm using puts, but you might want have another way to log diagnostic messages.)
Once you find the string that's not encoded UTF-8, you can convert it:
error_str.encode('UTF-8')
Or, if the string is hardcoded somewhere, just replace the string.
For more debugging advice, see: 3 Steps to Fix Encoding Problems in Ruby and How to Get From They’re to They’re.
Related
From testing, it seems like trying to convert both IDNs and regular domain names 'just works' - eg, if the input doesn't need to be changed punycode will just return the input.
punycode.toASCII('lancôme.com');
returns:
'xn--lancme-lxa.com'
And
punycode.toASCII('apple.com');
returns:
'apple.com'
This looks great, but is it specified anywhere? Can I safely convert everything to punycode?
That is correct. If you look at how the procedure for converting unicode strings to ascii punycode, the process only alters any non-ascii character. Since regular domains cannot contain non-ascii characters, if your conversor is correctly implemented, it will never transform any pure-ascii string.
You can read more about how unicode is converted to punycode here: https://en.wikipedia.org/wiki/Punycode
Punycode is specified in RFC 3492: https://www.ietf.org/rfc/rfc3492.txt, and it clearly says:
"Basic code point segregation" is a very simple and
efficient encoding for basic code points occurring in the extended
string: they are simply copied all at once.
Therefore, if your extended string is made of basic code points, it will just be copied without change.
I have a DB using windows-1252 character encoding and dynamic SQL that does simple single quote escaping like this...
l_str := REPLACE(TRIM(someUserInput),'''','''''');
Because the DB is windows-1252 when the notorious Unicode Character 'MODIFIER LETTER APOSTROPHE' (U+02BC) is sent it gets converted.
Example: The front end app submits this...
TESTʼEND
But ends up searching on this...
and someColumn like '%TESTʼEND%'
What I want to know is, since the ʼ was converted into ʼ (which luckily is safe just yields wrong search results) is there any scenario where a non-windows-1252 characters can be converted into something that WILL break this thus making SQL injection possible?
I know about bind variables, I know the DB should be unicode as well, that's not what I'm asking here. I am needing proof that what you see above is not safe. I have searched for days and cannot find a way to cause SQL injection when doing simple single quote escaping like this when the DB is windows-1252. Thanks!
Oh, and always assuming the column being search is a varchar, not number. I am aware of the issues and how things change when dealing with numbers. So assume this is always the case:
l_str := REPLACE(TRIM(someUserInput),'''','''''');
...
... and someVarcharColumn like '%'||l_str||'%'
Putting the argument of using bind variables aside, since you said you wanted proof that it could break without bind variables.
Here's what's going on in your example -
The Unicode character 'MODIFIER LETTER APOSTROPHE' (U+02BC) in UTF-8 is made up of 2 bytes - 0xCA 0xBC.
Of that 0xCA is 'LATIN CAPITAL LETTER E WITH CIRCUMFLEX' which looks like - Ê
and 0xBC is 'VULGAR FRACTION ONE QUARTER' which looks like ¼.
This happens because your client probably uses an encoding that supports multi-byte characters but your DB doesn't. You would want to make sure that the encoding in both database and client is the same to avoid these issues.
Coming back to the question - is it possible that dynamic SQL without bind variables can be injected into because of these special unicode characters - The answer is probably yes.
All you need to break that dynamic sql using this encoding difference is a multibyte character, one of whose bytes is 0x27 which is an apostrophe.
I said 'probably' because a quick search on fileformat.info for 0x27 didn't give me anything back. Not sure if I'm using that site right. However that doesn't mean that it isn't possible, maybe a different client could use a different encoding.
I would recommend to never use dynamic SQL where input parameter values are used without bind variables, irrespective of whatever encoding you choose. You're just setting yourself up for so many problems going forward, apart from the performance penalty you have to pay to do a hard parse every single time.
Edit: And of course, most importantly, there is nothing stopping your client to send an actual apostrophe instead of the unicode multibyte character and that would be your definitive proof that the SQL is not safe and can be injected into.
Edit2: I missed your first part where you replace one apostrophe with 2. That should technically take care of the multibyte characters too. I'd still be against this approach.
Your problem is not about SQL Injection, the problem is the character set of your front end app.
Your front end app sends the text in UTF-8, however the database "thinks" it is a Windows-1252 string.
Set your client NLS_LANG value to AMERICAN_AMERICA.AL32UTF8 (you may choose a different territory and/or language), then it should look better.
Then your front end app sends the string in UTF-8 and the database recognize it as UTF-8. It will be converted to Windows-1252 internally. I case you enter a string which is not supported by CP1252 (e.g. Cyrillic Capital Letter Ж) it will end up to something like Cyrillic Capital Letter ¿ - which should be fine in terms of SQL injection.
See this answer to get more information about database and client character sets.
I scraped some text from the internet, which I put in an UTF8String. I can use this string normally, but when I select some specific characters (strange character with accents, like in my case ú), which are not part of the UTF8 standard, I get an error, saying that I used invalid indexes. This only happens when the string contains strange characters; my code works with normal string that do not contain strange characters.
Any way to solve this?
EDIT:
I have a variable word of type SubString{UTF8String}
When I use do method(word), no problems occur. When I do method(word[2:end]) (assuming length of at least 2), I get an error in case the second character is strange (not in UTF8).
Julia does indexing on byte positions instead of character position. It is way more efficient for a variable length encoding like UTF-8, but it makes some operations use some more boilerplate.
The problem is that some codepoints is encoded as multiple bytes and when you slice the string from 2:end you would have got half of the first character (witch is invalid and you get an error).
The solution is to get the second valid index instead of 2 in the slice. I think that is something like str[nextind(str, 1):end]
PS. Sorry for a less than clear answer on my phone.
EDIT:
I tried this, and it seems like SubString{UTF8String} and UTF8String has different behaviour on slicing. I've reported it as bug #7811 on GitHub.
I am sending tokens via a POST request, but when I see them on the server it doesn't match up with what was sent.
"U2FsdGVkX1+pxBHFdSU4NiSIOdR2GCCBr/WF7AOSF5zQjRqjSoTeOKR0Dzwm\nNT+g\n" <-- Original
"U2FsdGVkX1+pxBHFdSU4NiSIOdR2GCCBr/WF7AOSF5zQjRqjSoTeOKR0Dzwm\\nNT+g\\n" <-- Result
Notice that the \n has been replaced with \\n. When I do the token lookup verification, of course, no result is found because the string I'm looking for is not the proper string anymore!
I'm not sure why this string is being auto changed like this or quite how to correct it. I'm just accessing this through the standard params like so.
token.verify(params["token"])
EDIT for further clarity
I'm viewing this from the terminal using the debugger gem. I have autoeval enabled and display with params["token"] without p or puts. I am not trying to create newline characters with \n. The literal \n is an actual part of the string that is received in the post. I randomly generate a token using a hashing and encryption library and the strings sometimes end up with these characters in them. If I run token.verify(params["token"]) from the debugger terminal I get nil back from the database as there is no match due to the extra backslash characters being added into the string.
If I directly run token.verify("U2FsdGVkX1+pxBHFdSU4NiSIOdR2GCCBr/WF7AOSF5zQjRqjSoTeOKR0Dzwm\nNT+g\n") from the debugger terminal I get the correct record back from the database. This leaves me thinking that either Rack or Sinatra is auto escaping the "special" characters in the string before I get a chance to even touch it.
This has something to with the way Ruby is handling special characters. From irb you can see this with a quick check like this.
"\\n" == '\n'
Unexpectedly; at least to me, this returns true as they are treated the same. Rather than trying to deal with special characters coming across the wire I ended up just base 64 encoding everything.
I have a string contain some special char like "\u2012" i.e. FIGURE DASH. When i am trying to print this on console I am getting a '?' mark instead of its symbol. I have an editor where in I can insert the symbol using alt+numpad like alt+2012. In editor it I could see the symbol save it in a xml file and get the value using nodevalue, I get a '?' mark.
To summerize I am facing problem to read extended latin a charset. What i need is When i insert such symbols and read it, i should get something like &#xXXXX;.
Please help!
TIA :)
Simply I have a String inpath = "À";, I want to get its unicode value..like &#xXXXX;
The default console encoding in Windows is some MS-DOS code page and they don't support the character. You can try running chcp 65001 before running the program but you might also need to change the console font as well.
You don't need to do anything you wouldn't do with any other character, as long as you use UTF-8. You aren't doing that in many places. You need to explicitly write in your code to save and read the file in UTF-8, and not rely on the platform default encoding.