Find elements of array in a string in ruby language - ruby

What is the efficient and straightforward code for finding array elements in a string.
For example:
a = Array['aaa', 'bbb', 'ccc', 'ddd', 'eee']
b = "This is the sample string with aaa , blah blah"
c = someFunction(b, a)
puts c
=> ['aaa']
Suppose a array have 100 elements, I want to know which of array element is found in the string.
I should match exacct word. So xbbb, bbaa, ... not matched.

I think this is one of the possible solutions:
def some_method(string, array)
string.split & array
end
a = Array['aaa', 'bbb', 'ccc', 'ddd', 'eee']
b = "This is the sample string with aaa , blah blah"
> some_method(b, a)
=> ['aaa']
a = Array['aaa', 'bbb', 'ccc', 'ddd', 'eee']
b = "This is the sample string with xaaa , blah blah"
> some_method(b, a)
=> []

def find_elements(my_string, my_array)
my_string.split & my_array
end
You can split the string into an array and then find the intersection of both arrays using & or even intersection if you are on ruby 2.7. This will return an array containing all of the unique matching elements.

One way I found is like below -
array = Array['aaa', 'bbb', 'ccc', 'ddd', 'eee']
string = "This is the sample string with aaa , blah blah"
found = []
array.each { |a| found << a if string.include? a }
puts found
=> ["aaa"]
EDIT
After knowing another use case where it is needed exact match and as include? matches 'aaa' even if it is in 'xxaaa', one possible solution is using Set Intersection with Arrays in Ruby -
def some_methodd(array, string)
string.split & array
end
Then it will return the exact match.
=> ["aaa"]

You can also use #select with a regular expression to determine which array elements are in the string.
def check_string(ary, str)
ary.select do |e|
str =~ /\b#{e}\b/
end
end
p check_string(%w(aaa bbb ccc), 'Here is a saaample bbb string ccc') # => ['bbb', 'ccc']
This gives you a lot of flexibility as to what matches and what doesn't, since if you want to change that, all you have to do is change the regex. This example assumes that you want whole word matches with words in an array.

Related

How to calculate elements of many arrays in Ruby?

I have a question about array in ruby
I have an array which contains many elements (strings with uppercases and down case ) and i want to know how many element (how many string) in this array contains an uppercase letters :
i obtain many element but i dont know how to calculate them
thank you.
array.each do |arr|
print arr.scan(/[A-Z]/)
end
Following your example, what you need is match? if you want a boolean result regarding if the element matches or not with an uppercase letter on it:
['foo', 'Foo', 'FoO'].each { |string| p string.match?(/[A-Z]/) }
# false
# true
# true
You can use count and pass a block to check if the current element returns true when evaluating if it contains uppercase characters. The result is the total of elements yielding a true value from the block:
['foo', 'Foo', 'FoO'].count { |string| /[[:upper:]]/ =~ string }
# 2
So i did:
a = ["HellO", "hello", "World", "worlD"]
b = 0
a.each do |x|
b += x.scan(/[A-Z]/).length
end
puts b # Which equals 4 in this case
The problem I had in some of the answers above with my array.
For Cary's answer I get 3 which somehow missing one of the capital letters.
For Sebastian's answer I get 3 as well which is also somehow missing one of the capital letters.
My array has 2 capitals in the first string, 1 in the third and 1 in the fourth.
Of course the more normal ruby way would be with b += x.scan(/[A-Z]/).count instead of .length but it worked for me in irb.
Some sample output from my console of the three methods:
030 > a.grep(/\p{Lu}/).size
=> 3
:031 > a.count {|string| /[[:upper:]]/ =~ string}
=> 3
:026 > a.each do |x|
:027 > b += x.scan(/[A-Z]/).length
:028?> end
=> ["HellO", "hello", "World", "worlD"]
:029 > b
=> 4
It appears as if the two regex examples above just check for any capital in the string and count it as one so if you had multiple in the same string like my first "HellO" then it only counts as one:
:039 > ["HellO"].grep(/\p{Lu}/).size
=> 1
:040 > ["HellO"].count {|string| /[[:upper:]]/ =~ string}
=> 1
Of course this may not matter to you but if the string is longer than one word it very well may:
2.5.3 :045 > a = ["Hello World"]
2.5.3 :047 > a.each do |x|
2.5.3 :048 > b += x.scan(/[A-Z]/).count
2.5.3 :049?> end
=> ["Hello World"]
2.5.3 :050 > b
=> 2
2.5.3 :051 > a.count {|string| /[[:upper:]]/ =~ string}
=> 1
2.5.3 :052 > a.grep(/\p{Lu}/).size
=> 1
With two words in one string it you can see the difference.
Of course I am counting the total capital letters when you asked,
i want to know how many element (how many string) in this array contains an uppercase letters
In which case either of the other two answers above do beautifully :)

Regex express to find a number in a string in ruby

'xyz_1_yx'
'xyz-1-yx'
I have tried to find the correct regex expression to extract 1
def find_number(image_basename)
startIndex= 1
endIndex= 10
(startIndex..endIndex).each do |n|
return n if /\b-|_#{n}-|_\b/.match(image_basename)
end
nil
end
2.5.0 :376 > find_number('xyz_1_yx')
=> nil
b
=> "asdas-12-asd"
b.scan(/([A-z]*)(_{1}|-{1})(\d*)(_{1}|-{1})([A-z]*)/)
=> [["asdas", "-", "12", "-", "asd"]]
UPDATE
here it is with noncapturing group
b.scan(/(?:[A-z]*)(?:_{1}|-{1})(\d*)(?:_{1}|-{1})(?:[A-z]*)/)
=> [["12"]]
str = 'xyz_1_y1x'
If you want the first '1',
r = /1/
str[r]
#=> "1"
which uses the method String#[]. Alternatively:
str.each_char.find { |c| c == '1' }
#=> `'1'`
If you want all '1''s, use String#scan:
str.scan r
#=> ["1", "1"]
Note that str[r] is equivalent to determining whether the string contains one or more '1''s and str.scan(r) tells us nothing more than the number of '1' that are contained in the string.
If you want to extract any digit, change the regular expression: r = /\d/.
As per my understanding of your question, you want the characters between - or _. If that's the case you can simply do
str = 'xyz_1_yx'
str.split(/[-,_]/)[1]
#=> '1'
str = 'xyz-129-yx'
str.split(/[-,_]/)[1]
#=> '129'

Check whether a string contains all the characters of another string in Ruby

Let's say I have a string, like string= "aasmflathesorcerersnstonedksaottersapldrrysaahf". If you haven't noticed, you can find the phrase "harry potter and the sorcerers stone" in there (minus the space).
I need to check whether string contains all the elements of the string.
string.include? ("sorcerer") #=> true
string.include? ("harrypotterandtheasorcerersstone") #=> false, even though it contains all the letters to spell harrypotterandthesorcerersstone
Include does not work on shuffled string.
How can I check if a string contains all the elements of another string?
Sets and array intersection don't account for repeated chars, but a histogram / frequency counter does:
require 'facets'
s1 = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
s2 = "harrypotterandtheasorcerersstone"
freq1 = s1.chars.frequency
freq2 = s2.chars.frequency
freq2.all? { |char2, count2| freq1[char2] >= count2 }
#=> true
Write your own Array#frequency if you don't want to the facets dependency.
class Array
def frequency
Hash.new(0).tap { |counts| each { |v| counts[v] += 1 } }
end
end
I presume that if the string to be checked is "sorcerer", string must include, for example, three "r"'s. If so you could use the method Array#difference, which I've proposed be added to the Ruby core.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
str = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
target = "sorcerer"
target.chars.difference(str.chars).empty?
#=> true
target = "harrypotterandtheasorcerersstone"
target.chars.difference(str.chars).empty?
#=> true
If the characters of target must not only be in str, but must be in the same order, we could write:
target = "sorcerer"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /s.*o.*r.*c.*e.*r.*e.*r/
str =~ r
#=> 2 (truthy)
(or !!(str =~ r) #=> true)
target = "harrypotterandtheasorcerersstone"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /h.*a.*r.*r.*y* ... o.*n.*e/
str =~ r
#=> nil
A different albeit not necessarily better solution using sorted character arrays and sub-strings:
Given your two strings...
subject = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
search = "harrypotterandthesorcerersstone"
You can sort your subject string using .chars.sort.join...
subject = subject.chars.sort.join # => "aaaaaaacddeeeeeffhhkllmnnoooprrrrrrssssssstttty"
And then produce a list of substrings to search for:
search = search.chars.group_by(&:itself).values.map(&:join)
# => ["hh", "aa", "rrrrrr", "y", "p", "ooo", "tttt", "eeeee", "nn", "d", "sss", "c"]
You could alternatively produce the same set of substrings using this method
search = search.chars.sort.join.scan(/((.)\2*)/).map(&:first)
And then simply check whether every search sub-string appears within the sorted subject string:
search.all? { |c| subject[c] }
Create a 2 dimensional array out of your string letter bank, to associate the count of letters to each letter.
Create a 2 dimensional array out of the harry potter string in the same way.
Loop through both and do comparisons.
I have no experience in Ruby but this is how I would start to tackle it in the language I know most, which is Java.

Compare string against array and extract array elements present in ruby

I have the following string:
str = "This is a string"
What I want to do is compare it with this array:
a = ["this", "is", "something"]
The result should be an array with "this" and "is" because both are present in the array and in the given string. "something" is not present in the string so it shouldn't appear. How can I do this?
One way to do this:
str = "This is a string"
a = ["this","is","something"]
str.downcase.split & a
# => ["this", "is"]
I am assuming Array a will always have keys(elements) in downcase.
There's always many ways to do this sort of thing
str = "this is the example string"
words_to_compare = ["dogs", "ducks", "seagulls", "the"]
words_to_compare.select{|word| word =~ Regexp.union(str.split) }
#=> ["the"]
Your question has an XY problem smell to it. Usually when we want to find what words exist the next thing we want to know is how many times they exist. Frequency counts are all over the internet and Stack Overflow. This is a minor modification to such a thing:
str = "This is a string"
a = ["this", "is", "something"]
a_hash = a.each_with_object({}) { |i, h| h[i] = 0 } # => {"this"=>0, "is"=>0, "something"=>0}
That defined a_hash with the keys being the words to be counted.
str.downcase.split.each{ |k| a_hash[k] += 1 if a_hash.key?(k) }
a_hash # => {"this"=>1, "is"=>1, "something"=>0}
a_hash now contains the counts of the word occurrences. if a_hash.key?(k) is the main difference we'd see compared to a regular word-count as it's only allowing word-counts to occur for the words in a.
a_hash.keys.select{ |k| a_hash[k] > 0 } # => ["this", "is"]
It's easy to find the words that were in common because the counter is > 0.
This is a very common problem in text processing so it's good knowing how it works and how to bend it to your will.

Regexp to match repeated substring

I would like to verify a string containing repeated substrings. The substrings have a particular structure. Whole string has a particular structure (substring split by "|"). For instance, the string can be:
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
How can I check that all repeated substrings match a regexp? I tried to check it with:
"1=23.00|6=22.12|12=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
But checking gives true even when several substrings do not match the regexp:
"1=23.00|6=ass|=21.34".match(/([1-9][0-9]*[=][0-9\.]+)+/)
# => #<MatchData "1=23.00" 1:"1=23.00">
The question is whether every repeated substring matches a regex. I understand that the substrings are separated by the character | or $/, the latter being the end of a line. We first need to obtain the repeated substrings:
a = str.split(/[#{$/}\|]/)
.map(&:strip)
.group_by {|s| s}
.select {|_,v| v.size > 1 }
.keys
Next we specify whatever regex you wish to use. I am assuming it is this:
REGEX = /[1-9][0-9]*=[1-9]+\.[0-9]+/
but it could be altered if you have other requirements.
As we wish to determine if all repeated substrings match the regex, that is simply:
a.all? {|s| s =~ REGEX}
Here are the calculations:
str =<<_
1=23.00|6=22.12|12=21.34|112=20.34
1=23.00|6=22.12|12=21.34
1=23.00|12=21.34
1=23.00**
_
c = str.split(/[#{$/}\|]/)
#=> ["1=23.00", "6=22.12", "12=21.34", "112=20.34", "1=23.00",
# "6=22.12", "12=21.34", "1=23.00", "12=21.34", "1=23.00**"]
d = c.map(&:strip)
# same as c, possibly not needed or not wanted
e = d.group_by {|s| s}
# => {"1=23.00" =>["1=23.00", "1=23.00", "1=23.00"],
# "6=22.12" =>["6=22.12", "6=22.12"],
# "12=21.34" =>["12=21.34", "12=21.34", "12=21.34"],
# "112=20.34"=>["112=20.34"], "1=23.00**"=>["1=23.00**"]}
f = e.select {|_,v| v.size > 1 }
#=> {"1=23.00"=>["1=23.00", "1=23.00" , "1=23.00"],
# "6=22.12"=>["6=22.12", "6=22.12"],
# "12=21.34"=>["12=21.34", "12=21.34", "12=21.34"]}
a = f.keys
#=> ["1=23.00", "6=22.12", "12=21.34"]
a.all? {|s| s =~ REGEX}
#=> true
This will return true if there are any duplicates, false if there are not:
s = "1=23.00|6=22.12|12=21.34|112=20.34|3=23.00"
arr = s.split(/\|/).map { |s| s.gsub(/\d=/, "") }
arr != arr.uniq # => true
If you want to resolve it through regexp (not ruby), you should match whole string, not substrings. Well, I added [|] symbol and line ending to your regexp and it should works like you want.
([1-9][0-9]*[=][0-9\.]+[|]*)+$
Try it out.

Resources