Extracting all but a certain phrase using regular expressions - ruby

I have a string that I want to extract all but a certain pattern into another variable.
first_string = "Q13 Hello, World!"
I'd like to get the Hello, World! out of the string and into another variable so that: second_string = "Hello, World!".
I attempted to create a regex that extracts all but the "Q13" and it works on Rubular but not in the console.
> first_string = "Q13 Hello, World!"
> second_string = first_string.scan(/[^(Q[0-9]{1,})]/)
=> [" ", "H", "e", "l", "l", "o", ",", " ", "W", "o", "r", "l", "d", "!"]
> second_string.join()
=> " Hello World!"
This is fine but I can't lose the leading space using the regex. That wouldn't be a problem except I have some application specific caveats...
Not all strings will have "Q13"... the "Q" will be there but the number will change. I don't know if "Q13" will come at the beginning or end of the text. I can't be certain what text will be in the string.
I can't rely on the leading space being there. It might also be a trailing space.
Any ideas?

Assuming you want to omit the Q[number] and any surrounding whitespace:
second_string = first_string.gsub(/\s?Q\d+\s?/, "")
If you want to omit the Q[number] but not the surrounding whitespace:
second_string = first_string.gsub(/Q\d+/, "")

Try this:
second_string = first_string.scan(/\A(?:Q[0-9]+)?(?: )?(.*?)(?: )?(?:Q[0-9]+)?\z/).flatten.first
Live test in Ruby console
2.0.0p247 :001 > first_string = "Q12 Hello World! Q87"
=> "Q12 Hello World! Q87"
2.0.0p247 :002 > second_string = first_string.scan(/\A(?:Q[0-9]+)?(?: )?(.*?)(?: )?(?:Q[0-9]+)?\z/).flatten.first
=> "Hello World!"

Related

Replacing characters that don't match a particular regex expression

I have the following regex expression from Amazon Web Services (AWS) which is required for the Instance Name:
^([\p{L}\p{Z}\p{N}_.:/=+-#]*)$
However, I am unsure a more efficient way to find characters that do not match this string and replace them with just a simple space character.
For example, the string Hello (World) should be replaced to Hello World (the parentheses have been replaced with a space). This is just one of numerous examples of a character that does not match this string.
The only way I've been able to do this is by using the following code:
first_test_string.split('').each do |char|
if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-#]*)$/] == nil
second_test_string = second_test_string.gsub(char, " ")
end
end
When using this code, I get the following result:
irb(main):037:0> first_test_string = "Hello (World)"
=> "Hello (World)"
irb(main):038:0> second_test_string = first_test_string
=> "Hello (World)"
irb(main):039:0>
irb(main):040:0> first_test_string.split('').each do |char|
irb(main):041:1* if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-#]*)$/] == nil
irb(main):042:2> second_test_string = second_test_string.gsub(char, " ")
irb(main):043:2> end
irb(main):044:1> end
=> ["H", "e", "l", "l", "o", " ", "(", "W", "o", "r", "l", "d", ")"]
irb(main):045:0> first_test_string
=> "Hello (World)"
irb(main):046:0> second_test_string
=> "Hello World "
irb(main):047:0>
Is there another way to do this, one that less hacky? I was hoping for a solution where I could just provide a regex string and then simply look for everything but the characters that match the regex string.
Use String#gsub and negate the character class of acceptable characters with [^...].
2.6.5 :014 > "Hello (World)".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]}, " ")
=> "Hello World "
Note I've also escaped - as [+-#] may be interpreted as the range of characters between + and #. For example, , lies between + and #.
2.6.5 :004 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+-#]+}, " ")
=> "Hello, World"
2.6.5 :005 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]+}, " ")
=> "Hello World"
Add a + if you want multiple consecutive invalid characters to be replaced with a single space.
2.6.5 :024 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]}, " ")
=> " Hello World "
2.6.5 :025 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]+}, " ")
=> " Hello World "

gsub escaped double quote to an escaped single quote in string

I have the following input string:
string = "\"Newegg.com\" <Promo#email.newegg.com>"
I want to replace the \" with \'. I tried this:
string.gsub(/\"/) {|i| "\\'" }
# => "\\'Newegg.com\\' <Promo#email.newegg.com>"
string.gsub(/\"/,%q(\\\'))
# => "\\'Newegg.com\\' <Promo#email.newegg.com>"
In both ways, it actually has two instances of \, but I want only one. It seems to be an issue with the backslash and escaping b/c this works otherwise:
string.gsub(/\"/,%q('))
# => "'Newegg.com' <Promo#email.newegg.com>"
-- Update 1--
yes, puts does display the "correct" value
temp = string.gsub(/\"/,%q(\\\'))
# => "\\'Newegg.com\\' <Promo#email.newegg.com>"
puts temp
# >> \'Newegg.com\' <Promo#email.newegg.com>
but I want to store this exact value displayed on the last line.
Your actual string doesn't include \
puts "\"Newegg.com\" <Promo#email.newegg.com>"
> "Newegg.com" <Promo#email.newegg.com>
This will replace " with ' as you wished:
puts "\"Newegg.com\" <Promo#email.newegg.com>".gsub('"', "'")
> 'Newegg.com' <Promo#email.newegg.com>
If you really wanted \", try another escape character like:
puts "\\\"Newegg.com\\\" <Promo#email.newegg.com>"
> \"Newegg.com\" <Promo#email.newegg.com>
Same replace should work:
puts "\\\"Newegg.com\\\" <Promo#email.newegg.com>".gsub('"', "'")
> \'Newegg.com\' <Promo#email.newegg.com>
Looks like your getting a bit confused (understandably so) by the returned result.
Keep in mind that in irb, the last result is formatted using .inspect, which means that it wraps strings in double quotes, and then escapes characters (backslashes and double quotes)' that would need to be escaped in a double quoted string. This is to distinguish between strings and other values such as numbers, arrays, hashes, etc.
However, that is just the result of inspect. if you use puts to output the value, it will output it without any escaping - it is a more accurate representation of your value.The value displayed by puts is the real value, and what would be stored if you saved the value to a variable.
If you still can't tell what your string looks like, try this:
temp = string.gsub(/\"/,%q(\\\'))
temp.split('')
=> ["\\", "'", "N", "e", "w", "e", "g", "g", ".", "c", "o", "m", "\\", "'", " ", "<", "P", "r", "o", "m", "o", "#", "e", "m", "a", "i", "l", ".", "n", "e", "w", "e", "g", "g", ".", "c", "o", "m", ">"]
This explodes your string into an array of single characters, and can make it easier to see exactly what is in your string. Notice you have a \ character (displayed as "\\", but since each string is guaranteed to be exactly one character long, you know it is being displayed that way because of inspect) and a ' character at the beginning.

In Ruby, how do I split on two or more spaces or a tab?

With Ruby, how do I split into two or more spaces or a tab? that is I have:
2.4.0 :005 > str = "a\t\tb c d"
=> "a\t\tb c d"
and applying my rules above, I would like the result to be:
["a", "", "b", "c d"]
since the consecutive tabs are capturing an empty string. But when I try the below:
2.4.0 :007 > str.split(/(?:[[:space:]][[:space:]]+|\t)/)
=> ["a", "b", "c d"]
The tabs are getting merged into a single [[:space:]].
How do I adjust my regular expression to split into two or more spaces or a tab character?
You could try this:
"a\t\tb c d".split(/\t| {2,}/)
#=> ["a", "", "b", "c d"]
"ab \t\t\tf".split(/\t| {2,}/)
#=> ["ab ", "", "", "f"]
Where \t is for a tab and {2,} for two or more spaces. Notice that there is a space before {2,}.
To include non-breaking spaces you could add \u00A0 to the expression, like this:
str.split(/\t|[ |\u00A0]{2,}/)
Examples:
str = "a\t\tb \u00A0 c d" #=> "a\t\tb   c d"
str.split(/\t|[ |\u00A0]{2,}/) #=> ["a", "", "b", "c d"]
str = "ab \t\t\tf" #=> "ab \t\t\tf"
str.split(/\t|[ |\u00A0]{2,}/) #=> ["ab ", "", "", "f"]
Where [ |\u00A0]{2,} will check for 2 or more occurrences of either a space or non-breaking space.

Why the Ruby each iterator goes first in the execution?

I've came across a weird thing doing simple tasks in Ruby. I just want to iterate the alphabet with the each method but the iteration goes first in the execution:
alfawit = ("a".."z")
puts "That's an alphabet: \n\n #{ alfawit.each { |litera| puts litera } } "
and this code results in this: (abbreviated)
a
b
c
⋮
x
y
z
That's an alphabet:
a..z
Any ideas why it works like this or what supposedly I did wrong?
Thanks in advance.
Because your each call is interpolated in your string literal that's executed before the fixed string. Also, each returns an Enumerable, in fact you print even that. Try this one
alfawit = ("a".."z")
puts "That's an alphabet: \n\n"
alfawit.each { |litera| puts litera }
or
puts "That's an alphabet: \n\n"
("a".."z").each { |litera| puts litera }
you can use interpolation if you want but in this way
alfawit = ("a".."z")
puts "That's an alphabet: \n\n#{alfawit.to_a.join("\n")}"
You can easily see what's going on if you extract the interpolation part into a variable:
alfawit = ("a".."z")
foo = alfawit.each { |litera| puts litera }
puts "That's an alphabet: \n\n #{ foo } "
The second line is causing the trouble: each invokes the block for each element of the range and then returns the receiver, so that foo becomes alfawit.
Here's another way to get the desired result:
alfawit = "a".."z"
puts "That's an alphabet:", alfawit.to_a
puts outputs each argument on a new line, but for array arguments, it outputs each element on a new line. Result:
That's an alphabet:
a
b
c
⋮
x
y
z
Likewise, you can turn the range into an argument list via *:
alfawit = "a".."z"
puts "That's an alphabet:", *alfawit
That's equivalent to:
puts "That's an alphabet:", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"

How could I split string and keep the whitespaces, as well?

I did the following in Python:
s = 'This is a text'
re.split('(\W)', s)
# => ['This', ' ', 'is', ' ', 'a', 'text']
It worked just great. How do I do the same split in Ruby?
I've tried this, but it eats up my whitespace.:
s = "This is a text"
s.split(/[\W]/)
# => ["This", "is", "a", "text"]
From the String#split documentation:
If pattern contains groups, the respective matches will be returned in
the array as well.
This works in Ruby the same as in Python, square brackets are for specify character classes, not match groups:
"foo bar baz".split(/(\W)/)
# => ["foo", " ", "bar", " ", "baz"]
toro2k's answer is most straightforward. Alternatively,
string.scan(/\w+|\W+/)

Resources