Why is Ruby base64 encoded string different from all other base64 encoded strings? - ruby

For end-to-end encrypted communication between a client and a server, I am implementing an encryption/decryption algorithm.
However, they (encryption/decryption and base64 encoding/decoding) work fine only when it's in Ruby.
But the actual problem I see is with the Base64 encoding of Ruby.
For example, let's say I have this (32 bytes) AES key:
"\"1\xAF\xC7\xC0\xA6\xC9\xBA\xD6\x9F\xBA\xD2\xC9\xBE\x0F\x8E*\x88\x87(\x9B\xCBp\x15!/\x13\x8F\xCE\xFB\x15\x9B"
that I am using to encrypt data in AES algorithm.
I want to send this key to a client in Base64 encoded format. For that, I am doing (two ways, each produces different encoded output):
Key in double quotes
Base64.urlsafe_encode64("\"1\xAF\xC7\xC0\xA6\xC9\xBA\xD6\x9F\xBA\xD2\xC9\xBE\x0F\x8E*\x88\x87(\x9B\xCBp\x15!/\x13\x8F\xCE\xFB\x15\x9B")
# => "IjGvx8CmybrWn7rSyb4PjiqIhyiby3AVIS8Tj877FZs="
Key in single quotes
Base64.urlsafe_encode64('\"1\xAF\xC7\xC0\xA6\xC9\xBA\xD6\x9F\xBA\xD2\xC9\xBE\x0F\x8E*\x88\x87(\x9B\xCBp\x15!/\x13\x8F\xCE\xFB\x15\x9B')
# => "XCIxXHhBRlx4QzdceEMwXHhBNlx4QzlceEJBXHhENlx4OUZceEJBXHhEMlx4QzlceEJFXHgwRlx4OEUqXHg4OFx4ODcoXHg5Qlx4Q0JwXHgxNSEvXHgxM1x4OEZceENFXHhGQlx4MTVceDlC"
Output 1 is different from all other libraries: Java, Swift, and an online site, another site, which all produce the same output.
Output 2 is the same with other libraries with respect to the output encoding. But I have issues converting AES key and AES encrypted data to be used in single quotes, which is not possible, as I have encrypted data that already have those single quotes and other illegal characters, for which Ruby's Base64 encoding does not work correctly.
Any help would be appreciated.

The problem is that when you use single quote you will get a different result:
a = "\"1\xAF\xC7\xC0\xA6\xC9\xBA\xD6\x9F\xBA\xD2\xC9\xBE\x0F\x8E*\x88\x87(\x9B\xCBp\x15!/\x13\x8F\xCE\xFB\x15\x9B"
#=> "\"1\xAF\xC7\xC0\xA6ɺ֟\xBA\xD2ɾ\u000F\x8E*\x88\x87(\x9B\xCBp\u0015!/\u0013\x8F\xCE\xFB\u0015\x9B"
b = '\"1\xAF\xC7\xC0\xA6\xC9\xBA\xD6\x9F\xBA\xD2\xC9\xBE\x0F\x8E*\x88\x87(\x9B\xCBp\x15!/\x13\x8F\xCE\xFB\x15\x9B'
#=> "\\\"1\\xAF\\xC7\\xC0\\xA6\\xC9\\xBA\\xD6\\x9F\\xBA\\xD2\\xC9\\xBE\\x0F\\x8E*\\x88\\x87(\\x9B\\xCBp\\x15!/\\x13\\x8F\\xCE\\xFB\\x15\\x9B"
a == b
=> false
a.bytes.count
=> 32
b.bytes.count
=> 108
a.length
=> 29
b.length
=> 108
So you see single quotes will double escape the escapes.
However, as Jordan Running mentions in comments you should not roll your own encryption, it is not secure please read this answer to understand why.
It is a much better idea to use ruby's openssl gem. For instructions on how to use it, please refer to documentation here.

If you need single quote behavior but cannot use single quotes then Ruby has the %q option, that is %q followed by brackets or parens etc surrounding your string, or any char following %q will act as single quote: %q|it's your string| or %q{it's your string}

Related

How does one read files/text encrypted in AES 256 CBC by Ruby in Clojure

We have some JSON files that live on a filesystem in our internal network that are written by Ruby and read by Clojure (and also Ruby). We'd like to encrypt them to increase their security. We've used AES 256 CBC inside our Ruby project for other things that need symmetric encryption so we'd like to use that. However, this time, the encryption will need to be decrypted in a Clojure application. The output of encryption (using this as a guide: OpenSSL::Cipher), as represented in a Ruby string, looks like: "a\x96\xECLI\xBC%\xC4#{\xBD\x99%\xA1\x84\x84" and putting that into a Clojure REPL results in a bunch of "Syntax error... Unsupported escape character: \x" I tried making every "\" a "\\" but then using Clojure's library "Buddy-Core" and :aes256-cbc-hmac-sha512 as the encryption algorithm results in:
Execution error (AssertionError) at buddy.core.crypto/eval1558$fn (crypto.clj:478).
Assert failed: (keylength? key 64)
Even though the key/iv were a fine length when used to encrypt the string in Ruby.
To sum up:
To convert from a Ruby String to a Clojure String is it correct to
replace all backslashes with double backslashes?
Is :aes256-cbc-hmac-sha512 the correct algorithm for use with Clojure's Buddy-Core to decrypt AES 256 CBC?
Would I be better off doing this in Java inside of Clojure? (sub question: Please do advise on converting Ruby Strings to Java/Clojure Strings)
Whoops misread the question!
Try to avoid strings at all cost when working with raw binary data
The Ruby string is not a valid Java string (i.e. \x is not a valid escape character)
The allowed Java escape characters are here:
https://docs.oracle.com/javase/specs/jls/se13/html/jls-3.html#jls-EscapeSequence
I'm not sure if Ruby strings are unicode by default, so this is a bit tricky.
The safest bet is to turn your encryption result into BASE64 and use that.

Password Hashes Characters in Applications

I appreciate your time.
I'm trying to understand something. Why is it that hashes that I generate manually only seem to include alphanumeric characters 0-9, a-f, but all of the hashes used by our favorite applications seem to contain all of the letters [and capitalized ones at that]?
Example:
Manual hash using sha256:
# sha256sum <<< asdf
d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1 -
You never see any letters above f. And nothing is capitalized.
But if I create a SHA hash using htpasswd, it's got all the alphanumerics:
# htpasswd -snb test asdf
test:{SHA}PaVBVZkYqAjCQCu6UBL2xgsnZhw=
Same thing happens if you look at a password hash in a website CMS database for example. There must be some extra step I'm missing or the end format is different than the actual hash format. I thought it might be base64 encoded or something, but it did not seem to decode.
Can someone please explain what's happening behind the scenes here? My friend explained that piping "asdf" to sha256sum is showing the checksum, which is not the actual hash itself. Is that correct? If so, how can I see the actual hash?
Thank you so much in advanced!
There's two things going on here.
First, your manual hash is using a different algorithm than htpasswd. The -s flag causes htpasswd to use SHA1, not SHA256. Use sha1sum instead of sha256sum.
Second, the encoding of the hashes are different. Your manual hash is Hex encoded, the htpasswd hash is Base64 encoded. The htpasswd hash will decode, it just decodes to binary. If you try to print this binary it will look like =¥AU™¨Â#+ºPöÆ'f (depending on what character encoding you're using), and that may be why you believe it's not decoding.
If you convert the Base64 directly to Hex (you can use an online tool like this one), you'll find that sha1sum will generate the same hash.
My friend explained that piping "asdf" to sha256sum is showing the checksum, which is not the actual hash itself.
Your friend is incorrect. You're seeing the Hex encoding of the hash. But the piping does affect the hash that's generated, it adds a newline character, so what you're actually hashing is asdf\n. Use this command instead:
echo -n "asdf" | sha1sum
It is base64 encoded.
Base64 encoding ends an an equal sign. So that is the first indicator. Although the htpasswd man page doesn't mention it, other Apache docs about "the password encryption formats generated and understood by Apache" does say that the SHA format understood by Apache is base64 encoded.

Create a SHA1 hash in Ruby via .net technique

OK hopefully the title didn't scare you away. I'm creating a sha1 hash using Ruby but it has to follow a formula that our other system uses to create the hash.
How can I do the following via Ruby? I'm creating hashes fine - but the format stuff is confusing me - curious if there's something equiv in the Ruby standard library.
System Security Cryptography (MSDN)
Here's the C# code that I'm trying to convert to Ruby. I'm making my hash fine, but not sure about the 'String.Format("{0,2:X2}' part.
//create our SHA1 provider
SHA1 sha = new SHA1CryptoServiceProvider();
String str = "Some value to hash";
String hashedValue; -- hashed value of the string
//hash the data -- use ascii encoding
byte[] hashedDataBytes = sha.ComputeHash(Encoding.ASCII.GetBytes(str));
//loop through each byte in the byte array
foreach (byte b in hashedDataBytes)
{
//convert each byte -- append to result
hashedValue += String.Format("{0,2:X2}", b);
}
A SHA1 hash of a specific piece of data is always the same hash (effectively just a large number), the only variation should be how you need to format it, to e.g. send to the other system. Although particularly obscure systems might post-process the data, truncate it or only take alternate bytes etc.
At a very rough guess from reading the C# code, this ends up a standard looking 40 character hex string. So my initial thought in Ruby is:
require 'digest/sha1'
Digest::SHA1.hexdigest("Some value to hash").upcase
. . . I don't know however what the force to ascii format in C# would do when it starts with e.g. a Latin-1 or UTF-8 string. They would be useful example inputs, if only to see C# throw an exception - you know then whether you need to worry about character encoding for your Ruby version. The data input to SHA1 is bytes, not characters, so which encoding to use and how to convert are parts of your problem.
My current understanding is that Encoding.ASCII.GetBytes will force anything over character number 127 to a '?', so you may need to emulate that in Ruby using a .gsub or similar, especially if you are actually expecting that to come in from the data source.

Escaping special characters in ruby

This is a common question, but just can't seem to find the answer without resorting to unreliable regular expressions.
Basically if there is a \302\240 or similar combination in a string I want to replace it with the real character.
I am using PLruby for this, hence the warn.
obj = {"a"=>"some string with special chars"}
warn obj.inspect
NOTICE: {"Outputs"=>["a\302\240b"]} <- chars are escaped
warn "\302\240"
NOTICE: <-- there is a non breaking space here, like I want
warn "#{json.inspect}"
NOTICE: {"Outputs"=>["a\302\240"b]} <- chars are escaped
So these can be decoded when I use a string literal, but with the "#{x}" format the \xxx placeholders are never decoded into characters.
How would I assign the same string as the middle command yields?
Ruby Version: 1.8.5
You mentioned that you're using PL/ruby. That suggests that your strings are actually bytea values (the PostgreSQL version of a BLOB) using the old "escape" format. The escape format encodes non-ASCII values in octal with a leading \ so a bit of gsub and Array#pack should sort you out:
bytes = s.gsub(/\\([0-8]{3})/) { [ $1.to_i(8) ].pack('C') }
That will expand the escape values in s to raw bytes and leave them in bytes. You're still dealing with binary data though so just trying to display it on a console won't necessarily do anything useful. If you know that you're dealing with comprehensible strings then you'll have to figure out what encoding they're in and use String methods to sort out the encoding.
Perhaps you just want to use .to_s instead?

How do I escape a Unicode string with Ruby?

I need to encode/convert a Unicode string to its escaped form, with backslashes. Anybody know how?
In Ruby 1.8.x, String#inspect may be what you are looking for, e.g.
>> multi_byte_str = "hello\330\271!"
=> "hello\330\271!"
>> multi_byte_str.inspect
=> "\"hello\\330\\271!\""
>> puts multi_byte_str.inspect
"hello\330\271!"
=> nil
In Ruby 1.9 if you want multi-byte characters to have their component bytes escaped, you might want to say something like:
>> multi_byte_str.bytes.to_a.map(&:chr).join.inspect
=> "\"hello\\xD8\\xB9!\""
In both Ruby 1.8 and 1.9 if you are instead interested in the (escaped) unicode code points, you could do this (though it escapes printable stuff too):
>> multi_byte_str.unpack('U*').map{ |i| "\\u" + i.to_s(16).rjust(4, '0') }.join
=> "\\u0068\\u0065\\u006c\\u006c\\u006f\\u0639\\u0021"
To use a unicode character in Ruby use the "\uXXXX" escape; where XXXX is the UTF-16 codepoint. see http://leejava.wordpress.com/2009/03/11/unicode-escape-in-ruby/
If you have Rails kicking around you can use the JSON encoder for this:
require 'active_support'
x = ActiveSupport::JSON.encode('µ')
# x is now "\u00b5"
The usual non-Rails JSON encoder doesn't "\u"-ify Unicode.
There are two components to your question as I understand it: Finding the numeric value of a character, and expressing such values as escape sequences in Ruby. Further, the former depends on what your starting point is.
Finding the value:
Method 1a: from Ruby with String#dump:
If you already have the character in a Ruby String object (or can easily get it into one), this may be as simple as displaying the string in the repl (depending on certain settings in your Ruby environment). If not, you can call the #dump method on it. For example, with a file called unicode.txt that contains some UTF-8 encoded data in it – say, the currency symbols €£¥$ (plus a trailing newline) – running the following code (executed either in irb or as a script):
s = File.read("unicode.txt", :encoding => "utf-8") # this may be enough, from irb
puts s.dump # this will definitely do it.
... should print out:
"\u20AC\u00A3\u00A5$\n"
Thus you can see that € is U+20AC, £ is U+00A3, and ¥ is U+00A5. ($ is not converted, since it's straight ASCII, though it's technically U+0024. The code below could be modified to give that information, if you actually need it. Or just add leading zeroes to the hex values from an ASCII table – or reference one that already does so.)
(Note: a previous answer suggested using #inspect instead of #dump. That sometimes works, but not always. For example, running ruby -E UTF-8 -e 'puts "\u{1F61E}".inspect' prints an unhappy face for me, rather than an escape sequence. Changing inspect to dump, though, gets me the escape sequence back.)
Method 1b: with Ruby using String#encode and rescue:
Now, if you're trying the above with a larger input file, the above may prove unwieldy – it may be hard to even find escape sequences in files with mostly ASCII text, or it may be hard to identify which sequences go with which characters. In such a case, one might replace the second line above with the following:
encodings = {} # hash to store mappings in
s.split("").each do |c| # loop through each "character"
begin
c.encode("ASCII") # try to encode it to ASCII
rescue Encoding::UndefinedConversionError # but if that fails
encodings[c] = $!.error_char.dump # capture a dump, mapped to the source character
end
end
# And then print out all the captured non-ASCII characters:
encodings.each do |char, dumped|
puts "#{char} encodes to #{dumped}."
end
With the same input as above, this would then print:
€ encodes to "\u20AC".
£ encodes to "\u00A3".
¥ encodes to "\u00A5".
Note that it's possible for this to be a bit misleading. If there are combining characters in the input, the output will print each component separately. For example, for input of 🙋🏾 ў ў, the output would be:
🙋 encodes to "\u{1F64B}".
🏾 encodes to "\u{1F3FE}".
ў encodes to "\u045E".
у encodes to "\u0443". ̆
encodes to "\u0306".
This is because 🙋🏾 is actually encoded as two code points: a base character (🙋 - U+1F64B), with a modifier (🏾, U+1F3FE; see also). Similarly with one of the letters: the first, ў, is a single pre-combined code point (U+045E), while the second, ў – though it looks the same – is formed by combining у (U+0443) with the modifier ̆ (U+0306 - which may or may not render properly, including on this page, since it's not meant to stand alone). So, depending on what you're doing, you may need to watch out for such things (which I leave as an exercise for the reader).
Method 2a: from web-based tools: specific characters:
Alternatively, if you have, say, an e-mail with a character in it, and you want to find the code point value to encode, if you simply do a web search for that character, you'll frequently find a variety of pages that give unicode details for the particular character. For example, if I do a google search for ✓, I get, among other things, a wiktionary entry, a wikipedia page, and a page on fileformat.info, which I find to be a useful site for getting details on specific unicode characters. And each of those pages lists the fact that that check mark is represented by unicode code point U+2713. (Incidentally, searching in that direction works well, too.)
Method 2b: from web-based tools: by name/concept:
Similarly, one can search for unicode symbols to match a particular concept. For example, I searched above for unicode check marks, and even on the Google snippet there was a listing of several code points with corresponding graphics, though I also find this list of several check mark symbols, and even a "list of useful symbols" which has a bunch of things, including various check marks.
This can similarly be done for accented characters, emoticons, etc. Just search for the word "unicode" along with whatever else you're looking for, and you'll tend to get results that include pages that list the code points. Which then brings us to putting that back into ruby:
Representing the value, once you have it:
The Ruby documentation for string literals describes two ways to represent unicode characters as escape sequences:
\unnnn Unicode character, where nnnn is exactly 4 hexadecimal digits ([0-9a-fA-F])
\u{nnnn ...} Unicode character(s), where each nnnn is 1-6 hexadecimal digits ([0-9a-fA-F])
So for code points with a 4-digit representation, e.g. U+2713 from above, you'd enter (within a string literal that's not in single quotes) this as \u2713. And for any unicode character (whether or not it fits in 4 digits), you can use braces ({ and }) around the full hex value for the code point, e.g. \u{1f60d} for 😍. This form can also be used to encode multiple code points in a single escape sequence, separating characters with whitespace. For example, \u{1F64B 1F3FE} would result in the base character 🙋 plus the modifier 🏾, thus ultimately yielding the abstract character 🙋🏾 (as seen above).
This works with shorter code points, too. For example, that currency character string from above (€£¥$) could be represented with \u{20AC A3 A5 24} – requiring only 2 digits for three of the characters.
You can directly use unicode characters if you just add #Encoding: UTF-8 to the top of your file. Then you can freely use ä, ǹ, ú and so on in your source code.
try this gem. It converts Unicode or non-ASCII punctuation and symbols to nearest ASCII punctuation and symbols
https://github.com/qwuen/punctuate
example usage:
"100٪".punctuate
=> "100%"
the gem uses the reference in https://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/docs/designDoc/UDF/unicode/DefaultTables/symbolTable.html for the conversion.

Resources