I have the following hex as a string: "\xfe\xff". I'd like to convert this to "feff". How do I do this?
The closest I got was "\xfe\xff".inspect.gsub("\\x", ""), which returns "\"FEFF\"".
"\xfe\xff".unpack("H*").first
# => "feff"
You are dealing with what's called an escape sequence in your double quoted string. The most common escape sequence in a double quoted string is "\n", but ruby allows you to use other escape sequences in strings too. Your string, "\xfe\xff", contains two hex escape sequences, which are of the form:
\xNN
Escape sequences represent ONE character. When ruby processes the string, it notices the "\" and converts the whole hex escape sequence to one character. After ruby processes the string, there is no \x left anywhere in the string. Therefore, looking for a \x in the string is fruitless--it doesn't exist. The same is true for the characters 'f' and 'e' found in your escape sequences: they do not exist in the string after ruby processes the string.
Note that ruby processes hex escape sequences in double quoted strings only, so the type of string--double or single quoted--is entirely relevant. In a single quoted string, the series of characters '\xfe' is four characters long because there is no such thing as a hex escape sequence in a single quoted string:
str = "\xfe"
puts str.length #=>1
str = '\xfe'
puts str.length #=>4
Regexes behave like double quoted strings, so it is possible to use an entire escape sequence in a regex:
/\xfe/
When ruby processes the regex, then just like with a double quoted string, ruby converts the hex escape sequence to a single character. That allows you to search for the single character in a string containing the same hex escape sequence:
if "abc\xfe" =~ /\xfe/
If you pretend for a minute that the character ruby converts the escape sequence "\xfe" to is the character 'z', then that if statement is equivalent to:
if "abcz" =~ /z/
It's important to realize that the regex is not searching the string for a '\' followed by an 'x' followed by an 'f' followed by an 'e'. Those characters do not exist in the string.
The inspect() method allows you to see the escape sequences in a string by nullifying the escape sequences, like this:
str = "\\xfe\\xff"
puts str
--output:--
\xfe\xff
In a double quoted string, "\\" represents a literal backslash, while an escape sequence begins with only one slash.
Once you've nullified the escape sequences, then you can match the literal characters, like the two character sequence '\x'. But it's easier to just pick out the parts you want rather than matching the parts you don't want:
str = "\xfe\xff"
str = str.inspect #=> "\"\\xFE\\xFF\""
result = ""
str.scan /x(..)/ do |groups_arr|
result << groups_arr[0]
end
puts result.downcase
--output:--
feff
Here it is with gsub:
str = "\xfe\xff"
str = str.inspect #=>"\"\\xFE\\xFF\""
str.gsub!(/
"? #An optional quote mark
\\ #A literal '\'
x #An 'x'
(..) #Any two characters, captured in group 1
"? #An optional quote mark
/xm) do
Regexp.last_match(1)
end
puts str.downcase
--output:--
feff
Remember, a regex acts like a double quoted string, so to specify a literal \ in a regex, you have to write \\. However, in a regex you don't have to worry about a " being mistaken for the end of the regex, so you don't need to escape it, like you do in a double quoted string.
Just for fun:
str = "\xfe\xff"
result = ""
str.each_byte do |int_code|
result << sprintf('%x', int_code)
end
p result
--output:--
"feff"
Why are you calling inspect? That's adding the extra quotes..
Also, putting that in double quotes means the \x is interpolated. Put it in single quotes and everything should be good.
'\xfe\xff'.gsub("\\x","")
=> "feff"
Related
I have:
long_string # => "\nIt was the best of times,\nIt was the worst of times.\n"
I get:
long_string[0,1] # => "\n"
I am curious why I get two characters rather than merely "\" as in other cases.
Is this how escaped characters are treated in substrings and beyond?
From documentation of String#[]
str[start, length] → new_str or nil
If passed a start index and a length, returns a substring containing length characters starting at the start index
For example
"Hello"[0, 1] #=> "H"
'Hello'[0, 1] #=> "H"
But there is difference between single quotes and double quotes.
Double quotes allow for many escape sequences, e.g. "\n", "\t", "\s", "\r" and others. All this is not two, but one character.
"\n" is just one (newline) character. But '\n' contains two characters (backslash and letter).
"\n".size #=> 1
'\n'.size #=> 2
Compare the different behavior of double quotes and single quotes when you try to return one character starting from zero index
"\n"[0, 1] #=> "\n"
'\n'[0, 1] #=> "\\"
As is clear from the above "\\" is just one character (backslash). Another backslash is used to escape.
Solved - flexible quotes as above store the string as "\nIt was the best of times,\nIt was the worst of times.\n" (in double quotes). Double quoted strings interpret escaped characters, whereas single quoted strings do not.
E.g.
string = "\n"
string.size == 1 in above
string.size == 2 in below
string = '\n'
[0,1] will always return 2 characters - character 0 and character 1. [0,0] will return the first character.
Im trying to match double back slashes in a string, but only when there is 2 and not 3 so I can swap out the 2 for 3.
I know that \\{2} will match double back slash except it will also match the first 2 slashes when 3 are present.
For example in the string
{"files":{"windows": {"%windir%\\\System32\\drivers\\etc\\lmhosts.sam":{"ignore":{"id":32}},"%windir%\\System32\\\drivers\\etc":{"ignore":{"id":32}},"%windir%\\System32\\drivers\\etc\\hosts":{"ignore":{"id":32}}}}}
There are multiple double slashes that I wish to match and replace, but there are also a few triple slashes which I wish to leave alone.
So, my question, how do match the double slash when it does not sit adjacent to another slash?
Heres a Regex101 link to toy with.
https://regex101.com/r/kWIscW/1
Also, doing this in Ruby.
How about:
\b\\{2}\b
To define that you \\ are the only one characters evaluated
Another possibility to is looking behind and look ahead, however, not sure your regex engine supports it:
(?<=[^\\])\\{2}(?=[^\\])
r = /
(?<!\\) # do not match a backslash, negative lookbehind
\\\\ # match two backslashes
(?!\\) # do not match a backslash, negative lookahead
/x # free-spacing regex definition mode
str = "\\\\\ are two backslashes and here are three \\\\\\ of 'em"
puts str
# \\ are two backslashes and here are three \\\ of 'em
str.scan(r)
#=> ["\\\\"]
Note that s = "\\\\\ " is two backslashes followed by an escaped space.
s.size
#=> 3
s[0].ord
#=> 92
92.chr
#=> "\\"
s[1].ord
#=> 92
s[2].ord
#=> 32
Let's first address backslashes in string literals,
"\\" is one backslash
"\\\\" are two backslashes
"\\\\\\" are three backslashes
Why? Backslash is the escape sequence in string literals, eg "\n" is a linebreak, and hence a backslash must be escaped with a backslash to encode one backslash.
Now, try this
string = "\\\\aaa\\\\bbb\\\\\\ccc"
string.gsub(/\\+/) { |match| match.size == 2 ? '/' : match }
# => "/aaa/bbb\\\\\\ccc"
How does this work?
/\\+/ matches any sequence of backslashes
match.size == 2 filters those that have length 2
And then we just replace those
Why might you use ''' instead of """, as in Learn Ruby the Hard Way, Chapter 10 Study Drills?
There are no triple quotes in Ruby.
Two String literals which are juxtaposed are parsed as a single String literal. So,
'Hello' 'World'
#=> "HelloWorld"
is the same as
'HelloWorld'
#=> "HelloWorld"
And
'' 'Hello' ''
#=> "Hello"
is the same as
'''Hello'''
#=> "Hello"
is the same as
'Hello'
#=> "Hello"
Since adding an empty string literal does not change the result, you can add as many empty strings as you want:
""""""""""""'''''Hello'''''''''
#=> "Hello"
There are no special rules for triple single quotes vs. triple double quotes, because there are no triple quotes. The rules are simply the same as for quotes.
I assume the author confused Ruby and Python, because a triple-quote will not work in Ruby the way author thought it would. It'll just work like three separate strings ('' '' '').
For multi-line strings one could use:
%q{
your text
goes here
}
=> "\n your text\n goes here\n "
or %Q{} if you need string interpolation inside.
Triple-quotes ''' are the same as single quotes ' in that they don't interpolate any #{} sequences, escape characters (like "\n"), etc.
Triple-double-quotes (ugh) """ are the same as double-quotes " in that they do interpolation and escape sequences.
This is further down on the same page you linked.
The triple-quoted versions """ ''' allows for multi-line strings... as does the singly-quoted ' and ", so I don't know why both are available.
In Ruby """ supports interpolation, ''' does not.
Rubyists use triple quotes for multi-line strings (similar to 'heredocs').
You could just as easily use one of these characters.
Just like normal strings the double quotes will allow you to use variables inside of your strings (also known as 'interpolation').
Save this to a file called multiline_example.rb and run it:
interpolation = "(but this one can use interpolation)"
single = '''
This is a multi-line string.
'''
double = """
This is also a multi-line string #{interpolation}.
"""
puts single
puts double
This is the output:
$ ruby multiline_string_example.rb
This is a multi-line string.
This is also a multi-line string (but this one can use interpolation).
$
Now try it the other way around:
nope = "(this will never get shown)"
single = '''
This is a multi-line string #{nope}.
'''
double = """
This is also a multi-line string.
"""
puts single
puts double
You'll get this output:
$ ruby multiline_example.rb
This is a multi-line string #{nope}.
This is also a multi-line string.
$
Note that in both examples you got some extra newlines in your output. That's because multiline strings keep any newlines inside them, and puts adds a newline to every string.
I'm trying to build a regex from a string object, which happens to be stored in a variable.
The problem I'm facing is that escaped sequences (in the string) such "\d" doesn't make to the resulting regex.
Regexp.new("\d") => /d/
If I use single quotes, tough, it works flawless.
Regexp.new('\d') => /\d/
But, as my string is stored in a variable, I always get the double-quoted string.
Is there a way to turn a double-quoted string to single-quoted string, so that I could use in the Regexp constructor ?
(I'd like to use the string interpolation feature of the double quotes)
ex.:
email_pattern = "/[a-z]*\.com"
whole_pattern = "to: #{email_pattern}"
Regexp.new(whole_pattern)
For better readability, I'd like to avoid escaping escape characters.
"\\d"
The problem is, that you end up with completely different strings, depending on whether you use single or double quotes:
"\d".chars.to_a
#=> ["d"]
'\d'.chars.to_a
#=> ["\\", "d"]
so when you are using double quotes, the single \ is immediately lost and cannot be recovered by definition, for example:
"\d" == "d"
#=> true
so you can never know what the string contained before the escaping took place. As #FrankSchmitt suggested, use the double backslash or stick with single quotes. There's no other way.
There's an option, though. You can define your regex parts as regexes themselves, instead of strings. They behave exactly as expected:
regex1 = /\d/
#=> /\d/
regex2 = /foobar/
#=> /foobar/
Then, you can build your final regex with #{}-style interpolation, instead of building the regex source from strings:
regex3 = /#{regex1} #{regex2}/
#=> /(?-mix:\d) (?-mix:foobar)/
Reflecting your example this would translate to:
email_regex = /[a-z]*\.com/
whole_regex = /to: #{email_regex}/
#=> /to: (?-mix:[a-z]*\.com)/
You may also find Regexp#escape interesting. (see the docs)
If you run into further escaping problems (with the slashes), you can also use the alternative Regexp literal syntax with %r{<your regex here>}, in which you do not need to escape the / character. For example:
%r{/}
#=> /\//
There's no getting around escaping the backslash \ with \\, though.
Either create your string with single quotes:
s = '\d'
r = Regexp.new(s)
or quote the backslash:
s = "\\d"
r = Regexp.new(s)
Both should work.
I've been working my way through the Ruby Koans and am confused by the "escape clauses and single quoted strings" examples.
One example shows that I can't really use escape characters in this way, but immediately after, the following example is given:
def test_single_quotes_sometimes_interpret_escape_characters
string = '\\\''
assert_equal 2, string.size # <-- my answer is correct according to the program
assert_equal "\\'", string # <-- my answer is correct according to the program
end
This has confused me on two fronts:
Single quotes can sometimes be used with escape characters.
Why is the string size 2, when assert_equal is "\\\'"? (I personally thought the answer was "\'", which would make more sense with regards to size).
You can break your string into two pieces to clarify things:
string = '\\' + '\''
Each part is a string of length one; '\\' is the single character \ and '\'' is the single character '. When you put them together you get the two character string \'.
There are two characters that are special within a single quoted string literal: the backslash and the single quote itself. The single quote character is, of course, used to delimit the string so you need something special to get a single quote into a single quoted string, the something special is the backslash so '\'' is a single quoted string literal that represents a string containing one single quote character. Similarly, if you need to get a backslash into a single quoted string literal you escape it with another backslash so '\\' has length one and contains one backslash.
The single quote character has no special meaning within a double quoted string literal so you can say "'" without any difficulty. The backslash, however, does have a special meaning in double quoted strings so you have to say "\\" to get a single backslash into your double quoted string.
Consider your guess off "\'". The single quote has no special meaning within a double quoted string and escaping something that doesn't need escaping just gives you your something back; so, if c is a character that doesn't need to be escaped within a double quoted string, then \c will be just c. In particular, "\'" evaluates to "'" (i.e. one single quote within a double quoted string).
The result is that:
'\\\'' == "\\'"
"\\\"" == '\\"'
"\'" == '\''
"\'" == "'"
'\\\''.length == 2
"\\\"".length == 2
"\'".length == 1
"'".length == 1
The Wikibooks reference that Kassym gave covers these things.
I usually switch to %q{} (similar to single quoting) or %Q{} (similar to double quoting) when I need to get quotes into strings, all the backslashes make my eyes bleed.
This might be worth a read : http://en.wikibooks.org/wiki/Ruby_Programming/Strings
ruby-1.9.3-p0 :002 > a = '\\\''
=> "\\'"
ruby-1.9.3-p0 :003 > a.size
=> 2
ruby-1.9.3-p0 :004 > puts a
\'
In single quotes there are only two escape characters : \\ and \'.