String::scan with string argument behave strange - ruby

According to documentation, #scan should accept both String and Regexp instances as parameter. But tests show strange behaviour:
▶ cat scantest.rb
#!/usr/bin/ruby
puts '='*10
puts 'foo'.scan '.'
puts '='*10
puts 'foo'.scan /./
puts '='*10
▶ rb scantest.rb
# ⇒ ==========
# ⇒ ==========
# ⇒ f
# ⇒ o
# ⇒ o
# ⇒ ==========
Inside both pry and irb, it doesn't properly scan for a string as well. What am I doing wrong?

With string '.', it scans for literal dots:
'foo'.scan '.'
# => []
'fo.o'.scan '.'
# => ["."]
While with regular expression /./, it matches any characters (except newline):
'foo'.scan /./
# => ["f", "o", "o"]
"foo\nbar".scan /./
# => ["f", "o", "o", "b", "a", "r"]

your scan should have a parameter that match the string you want to scan otherwise it will return empty arrray
My case:
irb(main):039:0> "foo".scan("o")
=> ["o", "o"]
Your case
'foo'.scan '.'
# => []
There is no dot. present on the 'foo' string so scan return empty array

Related

Replacing characters that don't match a particular regex expression

I have the following regex expression from Amazon Web Services (AWS) which is required for the Instance Name:
^([\p{L}\p{Z}\p{N}_.:/=+-#]*)$
However, I am unsure a more efficient way to find characters that do not match this string and replace them with just a simple space character.
For example, the string Hello (World) should be replaced to Hello World (the parentheses have been replaced with a space). This is just one of numerous examples of a character that does not match this string.
The only way I've been able to do this is by using the following code:
first_test_string.split('').each do |char|
if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-#]*)$/] == nil
second_test_string = second_test_string.gsub(char, " ")
end
end
When using this code, I get the following result:
irb(main):037:0> first_test_string = "Hello (World)"
=> "Hello (World)"
irb(main):038:0> second_test_string = first_test_string
=> "Hello (World)"
irb(main):039:0>
irb(main):040:0> first_test_string.split('').each do |char|
irb(main):041:1* if char[/^([\p{L}\p{Z}\p{N}_.:\/=+-#]*)$/] == nil
irb(main):042:2> second_test_string = second_test_string.gsub(char, " ")
irb(main):043:2> end
irb(main):044:1> end
=> ["H", "e", "l", "l", "o", " ", "(", "W", "o", "r", "l", "d", ")"]
irb(main):045:0> first_test_string
=> "Hello (World)"
irb(main):046:0> second_test_string
=> "Hello World "
irb(main):047:0>
Is there another way to do this, one that less hacky? I was hoping for a solution where I could just provide a regex string and then simply look for everything but the characters that match the regex string.
Use String#gsub and negate the character class of acceptable characters with [^...].
2.6.5 :014 > "Hello (World)".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]}, " ")
=> "Hello World "
Note I've also escaped - as [+-#] may be interpreted as the range of characters between + and #. For example, , lies between + and #.
2.6.5 :004 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+-#]+}, " ")
=> "Hello, World"
2.6.5 :005 > "Hello, World".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]+}, " ")
=> "Hello World"
Add a + if you want multiple consecutive invalid characters to be replaced with a single space.
2.6.5 :024 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]}, " ")
=> " Hello World "
2.6.5 :025 > "((Hello~(World)))".gsub(%r{[^\p{L}\p{Z}\p{N}_.:/=+\-#]+}, " ")
=> " Hello World "

Verifying if a string (several words) in an array of strings matches with another string in ruby

I am a beginner in ruby. I have an array of strings like this ["a b c", "d e f"] and I have a string like this "xapqbrc". I want to verify if "xapqbrc" contains all the words in each string but not necessarily one next to other. How can I do that in ruby?
["a b c", "d e f"].include? "xapqbrc"
is not working as expected
include? just checks if any object equals any object in the Array. ["a b c", "d e f"].include? "xapqbrc" would only be true if the whole string "xapqbrc" was in the Array.
Splitting this into two parts, first checking if one string contains all the words in another. First, split the string of words up into an Array.
words = "a b c".split(/ /) # ["a", "b", "c"]
Now we can use include? but on a String to check if the String contains another string. "food".include?("foo") is true. But that's only for one word, we need to do this for all words. Use all? to check if a thing is true for all items in an Array.
words.all? { |word| "xapqbrc".include?(word) }
Finally, we need to do this for an Array of those words. We can use select to get only the items in the Array for which the block is true.
# ["a b c"]
matches = ["a b c", "d e f"].select { |string|
words = string.split(/ /)
# The last statement in the block determines the truthiness of the block.
words.all? { |word| "xapqbrc".include?(word) }
}
To check if all characters in a string occur in the arrays of strings:
('xapqbrc'.chars - ['a b c', 'd e f'].map(&:chars).flatten).empty?
# => false
Here is a breakdown of the above statement.
The method chars returns the array of all characters in a string. Note that for a string with blanks, this array will include blanks. If you want to extract the characters separated by blanks, or, more generally, whitespace, use split like so (as in the answer by Schwern):
['a b c', 'd e f'].map{ |s| s.split(/ /) }.flatten
The method flatten takes the array of arrays and flattens it into a single array.
puts 'xapqbrc'.chars.inspect
# => ["x", "a", "p", "q", "b", "r", "c"]
puts ['a b c', 'd e f'].map(&:chars).flatten.inspect
# => ["a", " ", "b", " ", "c", "d", " ", "e", " ", "f"]
puts ('xapqbrc'.chars - ['a b c', 'd e f'].map(&:chars).flatten).inspect
# => ["x", "p", "q", "r"]
puts ('xapqbrc'.chars - ['a b c', 'd e f'].map(&:chars).flatten).empty?
# => false
A few more examples:
strings1 = ['a b c', 'd e f']
['xapqbrc', 'accbbb', 'xpq', ' ', ''].each do |string2|
puts "string2=#{string2};"
puts (string2.chars - strings1.map(&:chars).flatten).empty?
end
# => string2=xapqbrc;
# => false
# => string2=accbbb;
# => true
# => string2=xpq;
# => false
# => string2= ;
# => true
# => string2=;
# => true

How do I split a string without keeping the delimiter?

In Ruby, how do I split a string and not keep the delimiter in the resulting split array? I though tthis was the default, but when I try
2.4.0 :016 > str = "a b c"
=> "a b c"
2.4.0 :017 > str.split(/([[:space:]]|,)+/)
=> ["a", " ", "b", " ", "c"]
I see the spaces included in my result. I would like the result to simply be
["a", "b", "c"]
From the String#split documentation:
If pattern contains groups, the respective matches will be returned in the array as well.
Answering your explicitly stated question: do not match the group:
# ⇓⇓ HERE
str.split(/(?:[[:space:]]|,)+/)
or, even without groups:
str.split(/[[:space:],]+/)
or, in more Rubyish way:
'a b, c,d e'.split(/[\p{Space},]+/)
#⇒ ["a", "b", "c", "d", "e"]
String#splitsplits on white-space by default, so don 't bother with a regex:
"a b c".split # => ["a", "b", "c"]
Try this please
str.split(' ')

Checking if a certain letter comes x spaces after another in a string

I'm solving Coderbyte problems, and came across one called ABCheck, which takes a string and returns true if the letter 'a' and b are separated by exactly three places. I know there's an easier way to do this with regexes, but I'm trying to do it the logical way first, for learning purposes.
Here's the code I have:
def ABCheck(str)
str = str.downcase.split('')
str.each_with_index do |char,index|
if char == 'a' && str[index+4] == 'b'
return "true"
elsif
char == 'b' && str[index+4] == 'a'
return "true"
else
return "false"
end
end
end
ABCheck("Laura sobs")
My code isn't returning the correct answer. It returns false even though the answer should be true.
As #Arkku diagnosed your problem I will confine my comments to an alternative method for the non-regex solution. (In real life you certainly would want to use a regular expression.)
The Ruby way, as I see it, would be to use Enumerable#each_cons rather than indices:
def a3b_match?(str)
str.each_char.each_cons(5).any? { |f,*_,l|
(f=='a' && l=='b') || (f=='b' && l=='a') }
end
a3b_match?('xadogbite') #=> true
a3b_match?('xbdogaite') #=> true
a3b_match?('xbdgaite') #=> false
a3b_match?('xadoggybite') #=> false
If you instead wanted the number of matches, change Enumerable#any? to Enumerable#count.
Here are the steps:
str = 'xadogbite'
enum0 = str.each_char
#=> #<Enumerator: "xadogbite":each_char>
enum1 = enum0.each_cons(5)
#=> #<Enumerator: #<Enumerator: "xadogbite":each_char>:each_cons(5)>
Carefully examine the return values for the calculations of the enumerators enum0 and enum1. You can think of enum1 as a "compound" enumerator.
We can see the (five) values of enum1 that any? will pass into the block by converting that enumerator to an array:
enum1.to_a
#=> [["x", "a", "d", "o", "g"],
# ["a", "d", "o", "g", "b"],
# ["d", "o", "g", "b", "i"],
# ["o", "g", "b", "i", "t"],
# ["g", "b", "i", "t", "e"]]
Let's simulate the passing of the first value of enum1 into the block and assign it to the block variables1:
f,*m,l = enum1.next
f #=> "x"
m #=> ["a", "d", "o"]
l #=> "g"
We then perform the block calculation:
(f=='a' && l=='b') || (f=='b' && l=='a') }
#=> ('x'=='a' && 'g'=='b') || ('x'=='b' && 'g'=='a') }
#=> false || false => false
any? must therefore pass the next element of enum1 into the block:
f,*_,l = enum1.next
#=> ["a", "d", "o", "g", "b"]
f #=> "a"
l #=> "b"
(f=='a' && l=='b') || (f=='b' && l=='a') }
#=> ('a'=='a' && 'b'=='b') => true
Since we have a match on (f=='a' && l=='b'), there is no need for Ruby to evaluate (f=='b' && l=='a') or to perform similar calculations for the rest of the elements of enum1, so she doesn't. any? returns true.
1 I used the local variable m instead of _ because IRB uses the latter for its own purpose. When run from the command line, _ works just fine.
The problem is that you only check the first character – if that first character is not a or b meeting the search condition, you immediately return "false". You need to search through all the possible positions in the string before you know that none of the matched.
(This is a common pattern when searching for a match in some sort of collection; if you find it you can return immediately, but if you don't you must keep searching until the end.)
Also note that you return the string "false", not the boolean false.
Example solution (without regex):
def axb_match?(str, in_between = 3)
distance = in_between + 1 # chars in between + the 'b'
str, i = str.downcase, -1
while i = str.index('a', i + 1)
return true if (str[i + distance] == 'b') || (i >= distance && str[i - distance] == 'b')
end
false # no match was found (finding would return immediately)
end
axb_match? "Laura sobs" # -> true
And of course with regex it's quite simple:
str =~ /(a...b)|(b...a)/i

Match sequences of consecutive characters in a string

I have the string "111221" and want to match all sets of consecutive equal integers: ["111", "22", "1"].
I know that there is a special regex thingy to do that but I can't remember and I'm terrible at Googling.
Using regex in Ruby 1.8.7+:
p s.scan(/((\d)\2*)/).map(&:first)
#=> ["111", "22", "1"]
This works because (\d) captures any digit, and then \2* captures zero-or-more of whatever that group (the second opening parenthesis) matched. The outer (…) is needed to capture the entire match as a result in scan. Finally, scan alone returns:
[["111", "1"], ["22", "2"], ["1", "1"]]
…so we need to run through and keep just the first item in each array. In Ruby 1.8.6+ (which doesn't have Symbol#to_proc for convenience):
p s.scan(/((\d)\2*)/).map{ |x| x.first }
#=> ["111", "22", "1"]
With no Regex, here's a fun one (matching any char) that works in Ruby 1.9.2:
p s.chars.chunk{|c|c}.map{ |n,a| a.join }
#=> ["111", "22", "1"]
Here's another version that should work even in Ruby 1.8.6:
p s.scan(/./).inject([]){|a,c| (a.last && a.last[0]==c[0] ? a.last : a)<<c; a }
# => ["111", "22", "1"]
"111221".gsub(/(.)(\1)*/).to_a
#=> ["111", "22", "1"]
This uses the form of String#gsub that does not have a block and therefore returns an enumerator. It appears gsub was bestowed with that option in v2.0.
I found that this works, it first matches each character in one group, and then it matches any of the same character after it. This results in an array of two element arrays, with the first element of each array being the initial match, and then the second element being any additional repeated characters that match the first character. These arrays are joined back together to get an array of repeated characters:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
repeated_chars = input.scan(/(.)(\1*)/)
# => [["W", "W"], ["B", ""], ["W", "WWW"], ["B", "BB"], ["W", "WWWWWW"], ["B", ""], ["3", "333"], ["!", "!!!"]]
repeated_chars.map(&:join)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
As an alternative I found that I could create a new Regexp object to match one or more occurrences of each unique characters in the input string as follows:
input = "WWBWWWWBBBWWWWWWWB3333!!!!"
regexp = Regexp.new("#{input.chars.uniq.join("+|")}+")
#=> regexp created for this example will look like: /W+|B+|3+|!+/
and then use that Regex object as an argument for scan to split out all the repeated characters, as follows:
input.scan(regexp)
# => ["WW", "B", "WWWW", "BBB", "WWWWWWW", "B", "3333", "!!!!"]
you can try is
string str ="111221";
string pattern =#"(\d)(\1)+";
Hope can help you

Resources