Changing Double Quotes to Single - ruby

I'm working on a project in Ruby. The library I'm using returns a string in double quotes, for example: "\x00\x40". Since the string is in double quotes, any hex that can be converted to an ASCII character is converted. Therefore, when I print, I actually see: "\x00#".
I figured out that, if I use single quotes, then the string will print in pure hex (without conversion), which is what I want. How do I change a double quoted string to single quoted?
I do not have any way to change the return type in the library since it is a C extension, and I can't figure out where the value is being returned from. Any ideas greatly appreciated.

"\x00\x40" and '\x00\x40' produce totally different strings.
"\x00\x40" creates a 2 byte string with hex values 0x00 and 0x40:
"\x00\x40".length
# => 2
"\x00\x40".chars.to_a
# => ["\u0000", "#"]
'\x00\x40' creates a string with 8 characters:
'\x00\x40'.length
# => 8
'\x00\x40'.chars.to_a
# => ["\\", "x", "0", "0", "\\", "x", "4", "0"]
This is done by Ruby's parser and you cannot change it once the string is created.
However, you can convert the string to get its hexadecimal representation.
String#unpack decodes the string as a hex string, i.e. it returns the hex value of each byte as a string:
hex = "\x00\x40".unpack("H*")[0]
# => "0040"
String#gsub adds/inserts \x every 2 bytes:
hex.gsub(/../) { |s| '\x' + s }
# => "\\x00\\x40"

Related

Decode in Ruby on rails

Is there any way to decode the below string,
"location.replace(i+\"&utm_content=\"+s)}(document,window,navigator,screen,\"\\x68\\x74\\x74\\x70\\x3a\\x2f\\x2f\\x6d\\x6f\\x62\\x76\\x69\\x64\\x69\\x2e\\x6d\\x6f\\x62\\x73\\x74\\x61\\x72\\x72\\x2e\\x63\\x6f\\x6d\\x2f\\x3f\\x75\\x74\\x6d\\x5f\\x74\\x65\\x72\\x6d\\x3d\\x36\\x35\\x34\\x33\\x34\\x39\\x39\\x37\\x36\\x39\\x31\\x38\\x32\\x39\\x34\\x36\\x33\\x30\\x32\\x26\\x63\\x6c\\x69\\x63\\x6b\\x76\\x65\\x72\\x69\\x66\\x79\\x3d\\x31\",fi
I have tried as,
URI.unescape string
But its not working
There may be another way to do this, but here's one way:
>> hex = "\\x68\\x74\\x74\\x70\\x3a\\x2f\\x2f\\x6d\\x6f\\x62\\x76\\x69\\x64\\x69\\x2e\\x6d\\x6f\\x62\\x73\\x74\\x61\\x72\\x72\\x2e\\x63\\x6f\\x6d\\x2f\\x3f\\x75\\x74\\x6d\\x5f\\x74\\x65\\x72\\x6d\\x3d\\x36\\x35\\x34\\x33\\x34\\x39\\x39\\x37\\x36\\x39\\x31\\x38\\x32\\x39\\x34\\x36\\x33\\x30\\x32\\x26\\x63\\x6c\\x69\\x63\\x6b\\x76\\x65\\x72\\x69\\x66\\x79\\x3d\\x31"
=> "\\x68\\x74\\x74\\x70\\x3a\\x2f\\x2f\\x6d\\x6f\\x62\\x76\\x69\\x64\\x69\\x2e\\x6d\\x6f\\x62\\x73\\x74\\x61\\x72\\x72\\x2e\\x63\\x6f\\x6d\\x2f\\x3f\\x75\\x74\\x6d\\x5f\\x74\\x65\\x72\\x6d\\x3d\\x36\\x35\\x34\\x33\\x34\\x39\\x39\\x37\\x36\\x39\\x31\\x38\\x32\\x39\\x34\\x36\\x33\\x30\\x32\\x26\\x63\\x6c\\x69\\x63\\x6b\\x76\\x65\\x72\\x69\\x66\\x79\\x3d\\x31"
>> Array(hex.gsub("\\x","")).pack('H*')
=> "http://mobvidi.mobstarr.com/?utm_term=6543499769182946302&clickverify=1"
I created a string variable for the hex string and then stripped out the backslashes and 'x' characters. Then, this is converted into an array so we can call the pack method (specifying the capital H string directive for a high nibble first hex string) which you can read about here.

How does pack work in Ruby?

I am a tad confused about what I see here:
a = [ "a", "b", "c" ]
n = [ 65, 66, 67 ]
a.pack("A3A3A3") #=> "a b c "
a.pack("a3a3a3") #=> "a\000\000b\000\000c\000\000"
n.pack("ccc") #=> "ABC"
From the docs:
Packs the contents of arr into a binary sequence according to the directives in aTemplateString (see the table below) Directives “A,'' “a,'' and “Z'' may be followed by a count, which gives the width of the resulting field.
Here are the directives:
So we're using the A directive 3 times it seems? What does it mean to pack the string a into an arbitrary binary string (space padded, count is width?) Can you help me understand the output? Why are there so many 0s?
In the first case, you're printing "a" but padding its length to 3 with spaces, hence the two spaces to get the total length to 3.
In the second case, you're doing the same but padding with null bytes instead (ASCII value 0). Null bytes in Ruby are printed (and can be read) using the escape syntax \000 (this is one character), so \000\000 is actually just two null bytes.
The variable n is irrelevant, so you can ignore it.
In the pack statements, the bytes "a", "b" and "c" are concatenated ("packed") into a single string, with padding between them. The padding is such that the number of bytes (the width) taken up by the contents plus the padding equals the number provided.
So in the first pack statement, the "a" is padded with two spaces to make these three bytes: "a.." where I've put a . in place of the spaces to make it clear. That is concatenated with the "b" and the "c" similarly padded, to produce "a..b..c..".
In the second pack statement, null characters ('\000') are used instead of spaces. The \xxx notation (called an "escape sequence") means the byte with octal value xxx. It's used when there isn't a useful ASCII character (like 'a' or ' ') to show. A null character has no useful ASCII character, so the \xxx notation is used instead.

Unable to substitute escaped characters in string

I have this string:
str = "no,\"contact_last_name\",\"token\""
=> "no,\"contact_last_name\",\"token\""
I want to remove the escaped double quoted string character \". I use gsub:
result = str.gsub('\\"','')
=> "no,\"contact_last_name\",\"token\""
It appears that the string has not substituted the double quote escape characters in the string.
Why am I trying to do this? I have this csv file:
no,"contact_last_name","token",company,urbanization,sec-"property_address","property_address",city-state-zip,ase,oel,presorttrayid,presortdate,imbno,encodedimbno,fca,"property_city","property_state","property_zip"
1,MARIE A JEANTY,1083123,,,,17 SW 6TH AVE,DANIA BEACH FL 33004-3260,Electronic Service Requested,,T00215,12/14/2016,00-314-901373799-105112-33004-3260-17,TATTTADTATTDDDTTFDDFATFTDDDTTFADTTDFAAADDATDAATTFDTDFTTAFFTTATFFF,017,DANIA BEACH,FL, 33004-3260
When I try to open it with CSV, I get the following error:
CSV.foreach(path, headers: true) do |row|
end
CSV::MalformedCSVError: Illegal quoting in line 1.
Once I removed those double quoted strings in the first row (the header), the error went away. So I am trying to remove those double quoted strings before I run it through CSV:
file = File.open "file.csv"
contents = file.read
"no,\"contact_last_name\",\"token\" ... "
contents.gsub!('\\"','')
So again my question is why is gsub not removing the specified characters? Note that this actuall does work:
contents.gsub /"/, ""
as if the string is ignoring the \ character.
There is no escaped double quote in this string:
"no,\"contact_last_name\",\"token\""
The interpreter recognizes the text above as a string because it is enclosed in double quotes. And because of the same reason, the double quotes embedded in the string must be escaped; otherwise they signal the end of the string.
The enclosing double quote characters are part of the language, not part of the string. The use of backslash (\) as an escape character is also the language's way to put inside a string characters that otherwise have special meaning (double quotes f.e.).
The actual string stored in the str variable is:
no,"contact_last_name","token"
You can check this for yourself if you tell the interpreter to put the string on screen (puts str).
To answer the issue from the question's title, all your efforts to substitute escaped characters string were in vain just because the string doesn't contain the character sequences you tried to find and replace.
And the actual problem is that the CSV file is malformed. The 6th value on the first row (sec-"property_address") doesn't follow the format of a correctly encoded CSV file.
It should read either sec-property_address or "sec-property_address"; i.e. the value should be either not enclosed in quotes at all or completely enclosed in quotes. Having it partially enclosed in quotes confuses the Ruby's CSV parser.
The string looks fine; You're not understanding what you're seeing. Meditate on this:
"no,\"contact_last_name\",\"token\"" # => "no,\"contact_last_name\",\"token\""
'no,"contact_last_name","token"' # => "no,\"contact_last_name\",\"token\""
%q[no,"contact_last_name","token"] # => "no,\"contact_last_name\",\"token\""
%Q#no,"contact_last_name","token"# # => "no,\"contact_last_name\",\"token\""
When looking at a string that is delimited by double-quotes, it's necessary to escape certain characters, such as embedded double-quotes. Ruby, along with many other languages, has multiple ways of defining a string to remove that need.

How encode sequence of bytes into ruby string with characters

How encode sequence of bytes from ruby string into ruby string human-readable characters?
This is input string:
"\x127\x00\x06\x00\x00\x00\x01\x00\xA2\x8F"
So how parse this string into array with bytes,
and encode every element from array to ASCII character?
P.S. However, I can't find a way to roundtrip from bytes back to an array. I tried to use Array.pack with the U* option, but that doesn't work for multibyte characters.
You can try something like:
"string\xaa".each_byte.map {|b| "%c(%x)" % [ b, b ] }.join( ' ' )
# => "s(73) t(74) r(72) i(69) n(6e) g(67) ª(aa)"

Ruby regex remove ^C character from string

There is a file that has control B and control C commands separating fields of text. It looks like:
"TEST\003KEY\002TEST\003KEY"
I tried to create a regex that will match this and remove it. I am not sure why this regex is not working:
"TEST\003KEY\002TEST\003KEY".gsub(/\00[23]/, ',')
Try the following:
"TEST\003KEY\002TEST\003KEY".gsub(/\002|\003/, ',')
Here it is demonstrated in irb on my machine:
$ irb
1.9.3p448 :007 > "TEST\003KEY\002TEST\003KEY".gsub(/\002|\003/, ',')
=> "TEST,KEY,TEST,KEY"
The syntax \002|\003 means "match the character literal \002 or the character literal \003". The expression given in the original question \00[23] is not valid: this is the character literal \00 (a null character) followed by the character class [23]: i.e. it matches two-character sequences.
You can also use the [[:cntrl:]] character class to match all control characters:
$ irb
1.9.3p448 :007 > "TEST\003KEY\002TEST\003KEY\005TEST".gsub(/[[:cntrl:]]/, ',')
=> "TEST,KEY,TEST,KEY,TEST"
Here's the deal. First and foremost, computers cannot store characters--they can only store numbers. So when a computer stores a string it converts every character to a number. The numbers for all the basic characters are given by an ascii chart(you can search google for one).
When you tell a computer to print a string, it retrieves the numbers saved for the string and outputs them as characters (using an ascii chart to convert the numbers to characters).
Double quoted strings can contain what are called escape sequences. The most common escape sequence is "\n":
puts "hello\nworld"
--output:--
hello
world
A double quoted string converts the escape sequence "\n" to the ascii code 10:
puts "\n".ord #=>10 (ord() will show you the ascii code for a character)
A double quoted string can also contain escape sequences of the form \ddd, e.g. \002. Escape sequences like that are called octal escape sequences, which means 002 is the octal representation of an ascii code.
In an octal number, the right most digit is the 1's column, and the next digit to the left is the 8's column and the next digit to the left is the 64's column. For instance, this octal number:
\123
is equivalent to 3*1 + 2*8 + 1*64 = 83. It so happens that an "S" has the ascii code 83:
puts "\123" #=>S
Because you also can use octal escape sequences in a double quoted string, that means that instead of using the escape sequence "\n" you could use the octal escape "\012" (2*1 + 1*8 + 0*64 = 10). A double quoted string converts the octal escape sequence "\012" to the ascii code 10, which is the same thing that a double quoted string does to "\n". Here is an example:
puts "hello" + "\012" + "world"
--output:--
hello
world
The final thing to note about octal escape sequences is that you can optionally leave off any leading 0's:
puts "hello" + "\12" + "world"
--output:--
hello
world
Okay, now examine your string:
"TEST\003KEY\002TEST\003KEY"
You can see that it contains three octal escape sequences. A double quoted string converts the octal escape sequence \003 to the ascii code: 3*1 + 0*8 + 0*64 = 3. If you check an ascii chart, the ascii code 3 represents a character called "end of text". A double quoted string converts the octal escape sequence \002 to the ascii code: 2*1 + 0*8 + 0*64 = 2, which represents a character called 'start of text'. I'm not sure where you are getting the "control B" and "control C" names from (maybe those are the key strokes on your keyboard that are mapped to those characters?).
Next, a regex acts like a double quoted string, so
/<in here>/
you can use the same escape sequences as in a double quoted string, and the regex will convert the escape sequences to ascii codes.
Now, in light of all the above, examine your regex:
/\00[23]/
As Richard Cook pointed out, your regex gets interpreted as the octal escape sequence \00 followed by the character class [23]. The octal escape sequence \00 gets converted to the ascii code: 0*1 + 0*8 = 0. And if you look at an ascii chart, the number 0 represents a character called 'null'. So your regex is looking for a null character, followed by either a "2" or a "3", which means your regex is looking for a two character string. But a two character string will never match the octal escape sequence "\003" (or "\002"), which represents only one character.
The main thing to take away from all this is that when you see a string that contains an octal escape sequence:
"hello\012world"
...that string does not contain the characters \, 0, 1, and 2. A double quoted string converts that sequence of characters into one ascii code, which represents ONE character. You can prove that very easily:
puts "hello".length #=>5
puts "hello\012".length #=>6
There are also many other types of escape sequences that can appear in double quoted strings. You would think they would be listed in the String class docs, but they are not.
s = "TEST\003KEY\002TEST\003KEY"
s.split(/[[:cntrl:]]/) * ","
# => "TEST,KEY,TEST,KEY"

Resources