How to split a string with "\" using ruby? - ruby

Lets say I have a string:
str = "12345\56789"
How to split above string into 2 words?
["12345","56789"]

str = "12345/56789"
print str.split('/') # => ["12345", "56789"]
Edit: With the change to a backslash, it should be:
str = '12345\56789'
print str.split('\\') # => ["12345", "56789"]
You need the double backslash to avoid escaping the closing quote mark.

Regexp.quote returns a string with special characters escaped. This returned string can be split with '\\'.
So the solution is: Regexp.quote('00050\00050').split('\\')[0]

Related

How to add double quotes to elements inside a array in Ruby

I am very new to the Ruby, I have array and I want to add double quotes to all alpha-numeric elements
array I have: a = [a255b78, wr356672]
Array i required: a =["a255b78", "wr356672"]
Is there any direct way to do this?
Get rid of the "[]" and split on ", ":
str = "[a255b78, wr356672]"
p arr = str[1..-2].split(", ") # => ["a255b78", "wr356672"]
This does NOT add double quotes to all alpha-numeric elements; it converts a String to an Array of Strings.
str = "[a255b78, wr356672]"
str.scan(/\w+/)
#=> ["a255b78", "wr356672"]
See String#scan. The regular expression matches one or more word characters. Word characters (matching \w) are letters, digits and the underscore.

String does not match when a it contains text after parenthesis

This works:
str = "California (LA) rocks"
match_string = "rocks"
str.match(match_string) # => #<MatchData "rocks">
Why does this not work?
match_string = "(LA) rocks"
str.match(match_string) # => nil
You must escape the parenthesis for match to work, otherwise they will be interpreted as part of the regex pattern (i.e. as a capturing group), and not as part of the string to be matched.
To escape the parenthesis you can use a \:
match_string = '\(LA\) rocks'
str.match(match_string)
#=> #<MatchData "(LA) rocks">
Notice the use of single quotes (') instead of double quotes ("); if you want to use double quotes instead, you will need to use double \:
match_string = "\\(LA\\) rocks"
Because the argument of match is converted into a regex. In particular, the parentheses in "(LA) rocks" are interpreted as meta characters, not as literal parentheses. In fact, the following matches:
"California LA rocks".match("(LA) rocks")
# => #<MatchData "LA rocks" 1:"LA">

Remove hex escape from string

I have the following hex as a string: "\xfe\xff". I'd like to convert this to "feff". How do I do this?
The closest I got was "\xfe\xff".inspect.gsub("\\x", ""), which returns "\"FEFF\"".
"\xfe\xff".unpack("H*").first
# => "feff"
You are dealing with what's called an escape sequence in your double quoted string. The most common escape sequence in a double quoted string is "\n", but ruby allows you to use other escape sequences in strings too. Your string, "\xfe\xff", contains two hex escape sequences, which are of the form:
\xNN
Escape sequences represent ONE character. When ruby processes the string, it notices the "\" and converts the whole hex escape sequence to one character. After ruby processes the string, there is no \x left anywhere in the string. Therefore, looking for a \x in the string is fruitless--it doesn't exist. The same is true for the characters 'f' and 'e' found in your escape sequences: they do not exist in the string after ruby processes the string.
Note that ruby processes hex escape sequences in double quoted strings only, so the type of string--double or single quoted--is entirely relevant. In a single quoted string, the series of characters '\xfe' is four characters long because there is no such thing as a hex escape sequence in a single quoted string:
str = "\xfe"
puts str.length #=>1
str = '\xfe'
puts str.length #=>4
Regexes behave like double quoted strings, so it is possible to use an entire escape sequence in a regex:
/\xfe/
When ruby processes the regex, then just like with a double quoted string, ruby converts the hex escape sequence to a single character. That allows you to search for the single character in a string containing the same hex escape sequence:
if "abc\xfe" =~ /\xfe/
If you pretend for a minute that the character ruby converts the escape sequence "\xfe" to is the character 'z', then that if statement is equivalent to:
if "abcz" =~ /z/
It's important to realize that the regex is not searching the string for a '\' followed by an 'x' followed by an 'f' followed by an 'e'. Those characters do not exist in the string.
The inspect() method allows you to see the escape sequences in a string by nullifying the escape sequences, like this:
str = "\\xfe\\xff"
puts str
--output:--
\xfe\xff
In a double quoted string, "\\" represents a literal backslash, while an escape sequence begins with only one slash.
Once you've nullified the escape sequences, then you can match the literal characters, like the two character sequence '\x'. But it's easier to just pick out the parts you want rather than matching the parts you don't want:
str = "\xfe\xff"
str = str.inspect #=> "\"\\xFE\\xFF\""
result = ""
str.scan /x(..)/ do |groups_arr|
result << groups_arr[0]
end
puts result.downcase
--output:--
feff
Here it is with gsub:
str = "\xfe\xff"
str = str.inspect #=>"\"\\xFE\\xFF\""
str.gsub!(/
"? #An optional quote mark
\\ #A literal '\'
x #An 'x'
(..) #Any two characters, captured in group 1
"? #An optional quote mark
/xm) do
Regexp.last_match(1)
end
puts str.downcase
--output:--
feff
Remember, a regex acts like a double quoted string, so to specify a literal \ in a regex, you have to write \\. However, in a regex you don't have to worry about a " being mistaken for the end of the regex, so you don't need to escape it, like you do in a double quoted string.
Just for fun:
str = "\xfe\xff"
result = ""
str.each_byte do |int_code|
result << sprintf('%x', int_code)
end
p result
--output:--
"feff"
Why are you calling inspect? That's adding the extra quotes..
Also, putting that in double quotes means the \x is interpolated. Put it in single quotes and everything should be good.
'\xfe\xff'.gsub("\\x","")
=> "feff"

Replace single quote with backslash single quote

I have a very large string that needs to escape all the single quotes in it, so I can feed it to JavaScript without upsetting it.
I have no control over the external string, so I can't change the source data.
Example:
Cote d'Ivoir -> Cote d\'Ivoir
(the actual string is very long and contains many single quotes)
I'm trying to this by using gsub on the string, but can't get this to work:
a = "Cote d'Ivoir"
a.gsub("'", "\\\'")
but this gives me:
=> "Cote dIvoirIvoir"
I also tried:
a.gsub("'", 92.chr + 39.chr)
but got the same result; I know it's something to do with regular expressions, but I never get those.
The %q delimiters come in handy here:
# %q(a string) is equivalent to a single-quoted string
puts "Cote d'Ivoir".gsub("'", %q(\\\')) #=> Cote d\'Ivoir
The problem is that \' in a gsub replacement means "part of the string after the match".
You're probably best to use either the block syntax:
a = "Cote d'Ivoir"
a.gsub(/'/) {|s| "\\'"}
# => "Cote d\\'Ivoir"
or the Hash syntax:
a.gsub(/'/, {"'" => "\\'"})
There's also the hacky workaround:
a.gsub(/'/, '\#').gsub(/#/, "'")
# prepare a text file containing [ abcd\'efg ]
require "pathname"
backslashed_text = Pathname("/path/to/the/text/file.txt").readlines.first.strip
# puts backslashed_text => abcd\'efg
unslashed_text = "abcd'efg"
unslashed_text.gsub("'", Regexp.escape(%q|\'|)) == backslashed_text # true
# puts unslashed_text.gsub("'", Regexp.escape(%q|\'|)) => abcd\'efg

How to split a string by colons NOT in quotes

I have a CSV-file delimited by colons, but it contains text-fields wrapped in quotes, which themselves contain several colons.
I would like a simple solution for getting the data fields, but eg. in ruby the split method splits on every colon.
Is there a regex which matches all colons, except those wrapped in quotes?
Given:
str = 'foo:bar:"jim:jam":jar'
You can do this:
a = str.scan( /([^":]+)|"([^"]+)"/ ).flatten.compact
p a
#=> ["foo", "bar", "jim:jam", "jar"]
Or you can do this:
a = []
str.scan( /([^":]+)|"([^"]+)"/ ){ a << ($1 || $2) }
p a
#=> ["foo", "bar", "jim:jam", "jar"]
Those regex say to find either
One or more characters that are not a-quote-or-a-colon, or
A quote, followed by one or more characters that are not a quote, followed by a quote.
Just use http://ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
you can split on double quotes instead of colons
>> str = 'foo:bar:"jim:jam":jar'
=> "foo:bar:\"jim:jam\":jar"
>> str.split("\"").each_with_index do |x,y|
?> puts y%2==0 ? x.split(":") : x
>> end
foo
bar
jim:jam
jar
First attempt was so bad, revised the entire thing. This is my regex solution:
GETS LAST delimeter field ':' = :last
Trims: /(?:^\s*:|:|^)\s*(".*?"|.*?)(?=\s*(?:\:|$))/
No-trim: /(?:(?<!^):|^)(\s*".*?"\s*|.*?)(?=\:|$)/
------------------
GETS FIRST AND LAST delimeter fields ':' = first:last
Trims: /(?:^|:)\s*(".*?"|(?<!^).*?|)(?=\s*(?:\:|$))/
No trim: /(?:^|:)(\s*".*?"\s*|\s*(?<!^).*?|)(?=\:|$)/
And yes, its not as easy as one thinks ..

Resources