Replacing characters that don't match a particular regex expression - ruby

I have the following regex expression from Amazon Web Services (AWS) which is required for the Instance Name:
^([\p{L}\p{Z}\p{N}_.:/=+-#]*)$
However, I am unsure a more efficient way to find characters that do not match this string and replace them with just a simple space character.
For example, the string Hello (World) should be replaced to Hello World (the parentheses have been replaced with a space). This is just one of numerous examples of a character that does not match this string.
The only way I've been able to do this is by using the following code:
first_test_string.split('').each do |char|
if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-#]*)$/] == nil
second_test_string = second_test_string.gsub(char, " ")
end
end
When using this code, I get the following result:
irb(main):037:0> first_test_string = "Hello (World)"
=> "Hello (World)"
irb(main):038:0> second_test_string = first_test_string
=> "Hello (World)"
irb(main):039:0>
irb(main):040:0> first_test_string.split('').each do |char|
irb(main):041:1* if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-#]*)$/] == nil
irb(main):042:2> second_test_string = second_test_string.gsub(char, " ")
irb(main):043:2> end
irb(main):044:1> end
=> ["H", "e", "l", "l", "o", " ", "(", "W", "o", "r", "l", "d", ")"]
irb(main):045:0> first_test_string
=> "Hello (World)"
irb(main):046:0> second_test_string
=> "Hello World "
irb(main):047:0>
Is there another way to do this, one that less hacky? I was hoping for a solution where I could just provide a regex string and then simply look for everything but the characters that match the regex string.

Use String#gsub and negate the character class of acceptable characters with [^...].
2.6.5 :014 > "Hello (World)".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]}, " ")
=> "Hello World "
Note I've also escaped - as [+-#] may be interpreted as the range of characters between + and #. For example, , lies between + and #.
2.6.5 :004 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+-#]+}, " ")
=> "Hello, World"
2.6.5 :005 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]+}, " ")
=> "Hello World"
Add a + if you want multiple consecutive invalid characters to be replaced with a single space.
2.6.5 :024 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]}, " ")
=> " Hello World "
2.6.5 :025 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]+}, " ")
=> " Hello World "

Related

Replace the pipe character “|” with line blank space

I'm trying to solve a code challenge about Morse Code, the idea is to:
Receive: morse_text = '.... ..|--. ..- -.-- ...'
Return: 'HI GUYS'
But I am getting 'HIGUYS'
where the pipe should be converted into a space between the 2 words. So far I got:
def decode(morse_text)
# TODO: Decode the morse text
morse_text = morse_text.tr("|", " ")
array = morse_text.split(" ").map { |word| encode_word.invert[word].upcase }
array.join
end
def encode_word
morse_code = {
"a" => ".-",
"b" => "-...",
"c" => "-.-.",
"d" => "-..",
"e" => ".",
"f" => "..-.",
"g" => "--.",
"h" => "....",
"i" => "..",
"j" => ".---",
"k" => "-.-",
"l" => ".-..",
"m" => "--",
"n" => "-.",
"o" => "---",
"p" => ".--.",
"q" => "--.-",
"r" => ".-.",
"s" => "...",
"t" => "-",
"u" => "..-",
"v" => "...-",
"w" => ".--",
"x" => "-..-",
"y" => "-.--",
"z" => "--..",
" " => "|"
}
end
I'm struggling in convert the pipe into a blank space so I can get the desire result.
The issue is that you're converting the pipe to a space which means you lose the unique separator for words and treat it as just a standard separator of characters. Instead, split by the pipe and operate on an array of words:
def decode(morse_text)
# Split the morse code into an array of encoded words
encoded_words = morse_text.split('|')
# Decode each word letter by letter
decoded_words = encoded_words.map do |word|
word.split(' ').map { |letter| encode_word.invert[letter].upcase }
end
# Join each decoded word into a string
joined_words = decoded_words.map { |word| word.join }
# Join each word into a single string
decoded_text = joined_words.join(' ')
end
The result is:
decode('.... ..|--. ..- -.-- ...')
=> "HI GUYS"
Simply use the form of String#gsub that employs a hash for making substitutions.
If the variable morse_code holds your hash, with the additional key_value pair ""=>" ", compute the following hash.
decoding_map = morse_code.invert.transform_values(&:upcase)
#=> {".-"=>"A", "-..."=>"B", "-.-."=>"C", "-.."=>"D", "."=>"E",
# ...
# "-..-"=>"X", "-.--"=>"Y", "--.."=>"Z", "|"=>" ", , " "=>""}
Then
morse_text = '.... ..|--. ..- -.-- ...'
morse_text.gsub(/[| ]|[.-]+/, decoding_map)
#=> "HI GUYS"
The regular expression reads, "match a pipe or space or a mix of one or more periods or hyphens".

How to break a string into two arrays in Ruby

Is there a way to extract the strings removed by String#split into a separate array?
s = "This is a simple, uncomplicated sentence."
a = s.split( /,|\./ ) #=> [ "This is a simple", "uncomplicated sentence" ]
x = ... => should contain [ ",", "." ]
Note that the actual regex I need to use is much more complex than this example.
Something like this ?
a = s.scan( /,|\./ )
When you want both the matched delimiters and the substrings in between as in Stefan's comment, then you should use split with captures.
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
# => ["This is a simple", ",", " uncomplicated sentence", "."]
If you want to separate them into different arrays, then do:
a, x =
"This is a simple, uncomplicated sentence."
.split(/([,.])/).each_slice(2).to_a.transpose
a # => ["This is a simple", " uncomplicated sentence"]
x # => [",", "."]
or
a =
"This is a simple, uncomplicated sentence."
.split(/([,.])/)
a.select.with_index{|_, i| i.even?}
# => ["This is a simple", " uncomplicated sentence"]
a.select.with_index{|_, i| i.odd?}
# => [",", "."]
try this:
a = s.split(/,/)[1..-1]

String::scan with string argument behave strange

According to documentation, #scan should accept both String and Regexp instances as parameter. But tests show strange behaviour:
▶ cat scantest.rb
#!/usr/bin/ruby
puts '='*10
puts 'foo'.scan '.'
puts '='*10
puts 'foo'.scan /./
puts '='*10
▶ rb scantest.rb
# ⇒ ==========
# ⇒ ==========
# ⇒ f
# ⇒ o
# ⇒ o
# ⇒ ==========
Inside both pry and irb, it doesn't properly scan for a string as well. What am I doing wrong?
With string '.', it scans for literal dots:
'foo'.scan '.'
# => []
'fo.o'.scan '.'
# => ["."]
While with regular expression /./, it matches any characters (except newline):
'foo'.scan /./
# => ["f", "o", "o"]
"foo\nbar".scan /./
# => ["f", "o", "o", "b", "a", "r"]
your scan should have a parameter that match the string you want to scan otherwise it will return empty arrray
My case:
irb(main):039:0> "foo".scan("o")
=> ["o", "o"]
Your case
'foo'.scan '.'
# => []
There is no dot. present on the 'foo' string so scan return empty array

Extracting all but a certain phrase using regular expressions

I have a string that I want to extract all but a certain pattern into another variable.
first_string = "Q13 Hello, World!"
I'd like to get the Hello, World! out of the string and into another variable so that: second_string = "Hello, World!".
I attempted to create a regex that extracts all but the "Q13" and it works on Rubular but not in the console.
> first_string = "Q13 Hello, World!"
> second_string = first_string.scan(/[^(Q[0-9]{1,})]/)
=> [" ", "H", "e", "l", "l", "o", ",", " ", "W", "o", "r", "l", "d", "!"]
> second_string.join()
=> " Hello World!"
This is fine but I can't lose the leading space using the regex. That wouldn't be a problem except I have some application specific caveats...
Not all strings will have "Q13"... the "Q" will be there but the number will change. I don't know if "Q13" will come at the beginning or end of the text. I can't be certain what text will be in the string.
I can't rely on the leading space being there. It might also be a trailing space.
Any ideas?
Assuming you want to omit the Q[number] and any surrounding whitespace:
second_string = first_string.gsub(/\s?Q\d+\s?/, "")
If you want to omit the Q[number] but not the surrounding whitespace:
second_string = first_string.gsub(/Q\d+/, "")
Try this:
second_string = first_string.scan(/\A(?:Q[0-9]+)?(?: )?(.*?)(?: )?(?:Q[0-9]+)?\z/).flatten.first
Live test in Ruby console
2.0.0p247 :001 > first_string = "Q12 Hello World! Q87"
=> "Q12 Hello World! Q87"
2.0.0p247 :002 > second_string = first_string.scan(/\A(?:Q[0-9]+)?(?: )?(.*?)(?: )?(?:Q[0-9]+)?\z/).flatten.first
=> "Hello World!"

How could I split string and keep the whitespaces, as well?

I did the following in Python:
s = 'This is a text'
re.split('(\W)', s)
# => ['This', ' ', 'is', ' ', 'a', 'text']
It worked just great. How do I do the same split in Ruby?
I've tried this, but it eats up my whitespace.:
s = "This is a text"
s.split(/[\W]/)
# => ["This", "is", "a", "text"]
From the String#split documentation:
If pattern contains groups, the respective matches will be returned in
the array as well.
This works in Ruby the same as in Python, square brackets are for specify character classes, not match groups:
"foo bar baz".split(/(\W)/)
# => ["foo", " ", "bar", " ", "baz"]
toro2k's answer is most straightforward. Alternatively,
string.scan(/\w+|\W+/)

Resources