Convert named matches in MatchData to Hash

Convert named matches in MatchData to Hash - ruby

I have a rather simple regexp, but I wanted to use named regular expressions to make it cleaner and then iterate over results.
Testing string:
testing_string = "111x222b333"
My regexp:
regexp = %r{
(?<width> [0-9]{3} ) {0}
(?<height> [0-9]{3} ) {0}
(?<depth> [0-9]+ ) {0}
\g<width>x\g<height>b\g<depth>
}x
dimensions = regexp.match(testing_string)
This work like a charm, but heres where the problem comes:
dimensions.each { |k, v| dimensions[k] = my_operation(v) }
# ERROR !
undefined method `each' for #<MatchData "111x222b333" width:"111" height:"222" depth:"333">.
There is no .each method in MatchData object, and I really don't want to monkey patch it.
How can I fix this problem ?
I wasn't as clear as I thought: the point is to keep names and hash-like structure.

If you need a full Hash:
captures = Hash[ dimensions.names.zip( dimensions.captures ) ]
p captures
#=> {"width"=>"111", "height"=>"222", "depth"=>"333"}
If you just want to iterate over the name/value pairs:
dimensions.names.each do |name|
value = dimensions[name]
puts "%6s -> %s" % [ name, value ]
end
#=> width -> 111
#=> height -> 222
#=> depth -> 333
Alternatives:
dimensions.names.zip( dimensions.captures ).each do |name,value|
# ...
end
[ dimensions.names, dimensions.captures ].transpose.each do |name,value|
# ...
end
dimensions.names.each.with_index do |name,i|
value = dimensions.captures[i]
# ...
end

So today a new Ruby version (2.4.0) was released which includes many new features, amongst them feature #11999, aka MatchData#named_captures. This means you can now do this:
h = '12'.match(/(?<a>.)(?<b>.)(?<c>.)?/).named_captures
#=> {"a"=>"1", "b"=>"2", "c"=>nil}
h.class
#=> Hash
So in your code change
dimensions = regexp.match(testing_string)
to
dimensions = regexp.match(testing_string).named_captures
And you can use the each method on your regex match result just like on any other Hash, too.

I'd attack the whole problem of creating the hash a bit differently:
irb(main):052:0> testing_string = "111x222b333"
"111x222b333"
irb(main):053:0> hash = Hash[%w[width height depth].zip(testing_string.scan(/\d+/))]
{
"width" => "111",
"height" => "222",
"depth" => "333"
}
While regex are powerful, their siren-call can be too alluring, and we get sucked into trying to use them when there are more simple, or straightforward, ways of accomplishing something. It's just something to think about.
To keep track of the number of elements scanned, per the OPs comment:
hash = Hash[%w[width height depth].zip(scan_result = testing_string.scan(/\d+/))]
=> {"width"=>"111", "height"=>"222", "depth"=>"333"}
scan_result.size
=> 3
Also hash.size will return that, as would the size of the array containing the keys, etc.

#Phrogz's answer is correct if all of your captures have unique names, but you're allowed to give multiple captures the same name. Here's an example from the Regexp documentation.
This code supports captures with duplicate names:
captures = Hash[
dimensions.regexp.named_captures.map do |name, indexes|
[
name,
indexes.map { |i| dimensions.captures[i - 1] }
]
end
]
# Iterate over the captures
captures.each do |name, values|
# name is a String
# values is an Array of Strings
end

If you want to keep the names, you can do
new_dimensions = {}
dimensions.names.each { |k| new_dimensions[k] = my_operation(dimensions[k]) }

Related

Check whether a string contains all the characters of another string in Ruby

Let's say I have a string, like string= "aasmflathesorcerersnstonedksaottersapldrrysaahf". If you haven't noticed, you can find the phrase "harry potter and the sorcerers stone" in there (minus the space).
I need to check whether string contains all the elements of the string.
string.include? ("sorcerer") #=> true
string.include? ("harrypotterandtheasorcerersstone") #=> false, even though it contains all the letters to spell harrypotterandthesorcerersstone
Include does not work on shuffled string.
How can I check if a string contains all the elements of another string?

Sets and array intersection don't account for repeated chars, but a histogram / frequency counter does:
require 'facets'
s1 = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
s2 = "harrypotterandtheasorcerersstone"
freq1 = s1.chars.frequency
freq2 = s2.chars.frequency
freq2.all? { |char2, count2| freq1[char2] >= count2 }
#=> true
Write your own Array#frequency if you don't want to the facets dependency.
class Array
def frequency
Hash.new(0).tap { |counts| each { |v| counts[v] += 1 } }
end
end

I presume that if the string to be checked is "sorcerer", string must include, for example, three "r"'s. If so you could use the method Array#difference, which I've proposed be added to the Ruby core.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
str = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
target = "sorcerer"
target.chars.difference(str.chars).empty?
#=> true
target = "harrypotterandtheasorcerersstone"
target.chars.difference(str.chars).empty?
#=> true
If the characters of target must not only be in str, but must be in the same order, we could write:
target = "sorcerer"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /s.*o.*r.*c.*e.*r.*e.*r/
str =~ r
#=> 2 (truthy)
(or !!(str =~ r) #=> true)
target = "harrypotterandtheasorcerersstone"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /h.*a.*r.*r.*y* ... o.*n.*e/
str =~ r
#=> nil

A different albeit not necessarily better solution using sorted character arrays and sub-strings:
Given your two strings...
subject = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
search = "harrypotterandthesorcerersstone"
You can sort your subject string using .chars.sort.join...
subject = subject.chars.sort.join # => "aaaaaaacddeeeeeffhhkllmnnoooprrrrrrssssssstttty"
And then produce a list of substrings to search for:
search = search.chars.group_by(&:itself).values.map(&:join)
# => ["hh", "aa", "rrrrrr", "y", "p", "ooo", "tttt", "eeeee", "nn", "d", "sss", "c"]
You could alternatively produce the same set of substrings using this method
search = search.chars.sort.join.scan(/((.)\2*)/).map(&:first)
And then simply check whether every search sub-string appears within the sorted subject string:
search.all? { |c| subject[c] }

Create a 2 dimensional array out of your string letter bank, to associate the count of letters to each letter.
Create a 2 dimensional array out of the harry potter string in the same way.
Loop through both and do comparisons.
I have no experience in Ruby but this is how I would start to tackle it in the language I know most, which is Java.

Array of strings Group by first common letters [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Is there anyway of grouping first common letters in an array of strings?
For example:
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
so when i do
array.group_by{ |string| some_logic_with_string }
The result should be,
{
'hello' => ['hello', 'hello you'],
'people' => ['people'],
'fin' => ['finally', 'finland']
}

NOTE: Some test cases are ambiguous and expectations conflict with other tests, you need to fix them.
I guess plain group_by may not work, a further processing is needed.
I have come up with below code that seems to work for all the given test cases in consistent manner.
I have left notes in the code to explain the logic. Only way to fully understand it will be to inspect value of h and see the flow for a simple test case.
def group_by_common_chars(array)
# We will iteratively group by as many time as there are characters
# in a largest possible key, which is max length of all strings
max_len = array.max_by {|i| i.size}.size
# First group by first character.
h = array.group_by{|i| i[0]}
# Now iterate remaining (max_len - 1) times
(1...max_len).each do |c|
# Let's perform a group by next set of starting characters.
t = h.map do |k,v|
h1 = v.group_by {|i| i[0..c]}
end.reduce(&:merge)
# We need to merge the previously generated hash
# with the hash generated in this iteration. Here things get tricky.
# If previously, we had
# {"a" => ["a"], "ab" => ["ab", "abc"]},
# and now, we have
# {"a"=>["a"], "ab"=>["ab"], "abc"=>["abc"]},
# We need to merge the two hashes such that we have
# {"a"=>["a"], "ab"=>["ab", "abc"], "abc"=>["abc"]}.
# Note that `Hash#merge`'s block is called only for common keys, so, "abc"
# will get merged, we can't do much about it now. We will process
# it later in the loop
h = h.merge(t) do |k, o, n|
if (o.size != n.size)
diff = [o,n].max - [o,n].min
if diff.size == 1 && t.value?(diff)
[o,n].max
else
[o,n].min
end
else
o
end
end
end
# Sort by key length, smallest in the beginning.
h = h.sort {|i,j| i.first.size <=> j.first.size }.to_h
# Get rid of those key-value pairs, where value is single element array
# and that single element is already part of another key-value pair, and
# that value array has more than one element. This step will allow us
# to get rid of key-value like "abc"=>["abc"] in the example discussed
# above.
h = h.tap do |h|
keys = h.keys
keys.each do |k|
v = h[k]
if (v.size == 1 &&
h.key?(v.first) &&
h.values.flatten.count(v.first) > 1) then
h.delete(k)
end
end
end
# Get rid of those keys whose value array consist of only elements that
# already part of some other key. Since, hash is ordered by key's string
# size, this process allows us to get rid of those keys which are smaller
# in length but consists of only elements that are present somewhere else
# with a key of larger length. For example, it lets us to get rid of
# "a"=>["aba", "abb", "aaa", "aab"] from a hash like
# {"a"=>["aba", "abb", "aaa", "aab"], "ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
h.tap do |h|
keys = h.keys
keys.each do |k|
values = h[k]
other_values = h.values_at(*(h.keys-[k])).flatten
already_present = values.all? do |v|
other_values.include?(v)
end
h.delete(k) if already_present
end
end
end
Sample Run:
p group_by_common_chars ['hello', 'hello you', 'people', 'finally', 'finland']
#=> {"fin"=>["finally", "finland"], "hello"=>["hello", "hello you"], "people"=>["people"]}
p group_by_common_chars ['a', 'ab', 'abc']
#=> {"a"=>["a"], "ab"=>["ab", "abc"]}
p group_by_common_chars ['aba', 'abb', 'aaa', 'aab']
#=> {"ab"=>["aba", "abb"], "aa"=>["aaa", "aab"]}
p group_by_common_chars ["Why", "haven't", "you", "answered", "the", "above", "questions?", "Please", "do", "so."]
#=> {"a"=>["answered", "above"], "do"=>["do"], "Why"=>["Why"], "you"=>["you"], "so."=>["so."], "the"=>["the"], "Please"=>["Please"], "haven't"=>["haven't"], "questions?"=>["questions?"]}

Not sure, if you can sort by all common letters. But if you want to do sort only by first letter then here it is:
array = [ 'hello', 'hello you', 'people', 'finally', 'finland' ]
result = {}
array.each { |st| result[st[0]] = result.fetch(st[0], []) + [st] }
pp result
{"h"=>["hello", "hello you"], "p"=>["people"], "f"=>["finally", "finland"]}
Now result contains your desired hash.

Hmm, you're trying to do something that's pretty custom. I can think of two classical approaches that sort of do what you want: 1) Stemming and 2) Levenshtein Distance.
With stemming you're finding the root word to a longer word. Here's a gem for it.
Levenshtein is a famous algorithm which calculates the difference between two strings. There is a gem for it that runs pretty fast due to a native C extension.

Compare string against array and extract array elements present in ruby

I have the following string:
str = "This is a string"
What I want to do is compare it with this array:
a = ["this", "is", "something"]
The result should be an array with "this" and "is" because both are present in the array and in the given string. "something" is not present in the string so it shouldn't appear. How can I do this?

One way to do this:
str = "This is a string"
a = ["this","is","something"]
str.downcase.split & a
# => ["this", "is"]
I am assuming Array a will always have keys(elements) in downcase.

There's always many ways to do this sort of thing
str = "this is the example string"
words_to_compare = ["dogs", "ducks", "seagulls", "the"]
words_to_compare.select{|word| word =~ Regexp.union(str.split) }
#=> ["the"]

Your question has an XY problem smell to it. Usually when we want to find what words exist the next thing we want to know is how many times they exist. Frequency counts are all over the internet and Stack Overflow. This is a minor modification to such a thing:
str = "This is a string"
a = ["this", "is", "something"]
a_hash = a.each_with_object({}) { |i, h| h[i] = 0 } # => {"this"=>0, "is"=>0, "something"=>0}
That defined a_hash with the keys being the words to be counted.
str.downcase.split.each{ |k| a_hash[k] += 1 if a_hash.key?(k) }
a_hash # => {"this"=>1, "is"=>1, "something"=>0}
a_hash now contains the counts of the word occurrences. if a_hash.key?(k) is the main difference we'd see compared to a regular word-count as it's only allowing word-counts to occur for the words in a.
a_hash.keys.select{ |k| a_hash[k] > 0 } # => ["this", "is"]
It's easy to find the words that were in common because the counter is > 0.
This is a very common problem in text processing so it's good knowing how it works and how to bend it to your will.

How to make this hash creation prettier

I was wondering if there is a more elegant way of writing the following lines:
section_event_hash = []
sections.each do |s|
section_event_hash << { s => s.find_all_events }
end
I want to create a hash whose keys are the elements of sections, and the values are arrays of elements returned by the find_all_events method.

If you want section_event_hash to really be a Hash rather than an Array, then you could use each_with_object:
section_event_hash = sections.each_with_object({}) { |s, h| h[s] = s.find_all_events }
You could use map to build an array of arrays and then feed that to Hash[]:
section_event_hash = Hash[sections.map { |s| [s, s.find_all_events] }]

The code you posted isn't quite doing what you said you want. Let's take a closer look at it by testing like so:
sections = ["ab", "12"]
section_event_hash = []
sections.each do |s|
section_event_hash << { s => s.split("") }
end
puts section_event_hash.inspect
Gives:
[{"ab"=>["a", "b"]}, {"12"=>["1", "2"]}]
So you've actually created an array of hashes, where each hash contains one key-value pair.
The following code produces one hash with multiple elements. Notice how an empty hash is created with {} instead of []. Curly braces are the symbol for a hash, while the square brackets refer to a particular key.
section_event_hash = {}
sections.each do |s|
section_event_hash[s] = s.split("")
end
puts section_event_hash.inspect
=> {"ab"=>["a", "b"], "12"=>["1", "2"]}
As for a "more elegant" way of doing it, well that depends on your definition. As the other answers here demonstrate, there is usually more than one way to do something in ruby. seph's produces the same data structure as your original code, while mu's produces the hash you describe. Personally, I'd just aim for code that is easy to read, understand, and maintain.

array_of_section_event_hashes = sections.map do |s|
{s => s.find_all_events}
end

RoR different bracket notation

I'm getting to grips with rails and whilst I feel I am progressing there is one thing that I am struggling to get to grips with and it's very basic. I am trying to understand the different usage of [] {} and () Are there any good sources of their usage and are there any tips you can give to a beginner in recognizing when to use one or the other, or as I seem to see in some cases when they are not required at all?
I know this is extremely basic but I have struggled to find literature which explains concisely the interplay between them and Ruby or specifically RoR

It has nothing to do with RoR; the various brackets are Ruby language constructs.
[] is the array operator, for arrays and other classes that implement it (like a string taking a range to get substrings, or hashes to look up a key's value):
a = [1, 2, 3]
a.each { |n| puts n }
s = "ohai"
puts s[1..-1]
h = { foo: "bar", baz: "plugh" }
puts h[:foo]
{} is for hashes, and one of two ways of delimiting blocks (the other being begin/end). (And used with # for string interpolation.)
h = { foo: "bar", baz: "plugh" }
h.each { |k, v| puts "#{k} == #{v}" }
() is for method parameters, or for enforcing evaluation order in an expression.
> puts 5 * 3 + 5 # Normal precedence has * ahead of +
=> 20
> puts 5 * (3 + 5) # Force 3+5 to be evaluated first
=> 40
def foo(s)
puts(s)
end
They're sometimes optional if the statement has no ambiguity:
def foo s
puts s
end
(They're not always optional, and putting a space between the method call and its parenthetical parameter list can cause issues--best not to, IMO.)
(I probably missed something, too, but there's the nutshell.)

[] are used to access objects within a hash (via a key) or within an array (via an index).
hash[:key] # returns a value
array[0] # returns the first array element
[] is used to describe an array.
array = ['a', 'b', 'c']
Of course this can be nested.
nested = [['a','b','c'], [1,2,3]]
[] can be used to declare a hash, but that's because the Hash class can accept an array.
hash = Hash[['a',1], ['b',2]] # { 'a' => 1, 'b', => 2 }
{} is used to declare a hash.
hash = { 'a' => 1, 'b' => 2 }
This too can be nested.
hash = { 'a' => { 'c' => 3 }, 'b' => { 'd' => 4 } }
{} is also used to delimit blocks. The .each method is a common one. The following two blocks of code are equivalent.
array.each do |n|
puts n
end
array.each { |n| puts n }
The () is just used for grouping in cases where ambiguity needs clarification. This is especially true in methods that take many arguments, some of which may be nil, some of which may be obejcts, etc. You'll see a lot of code that omit them entirely as no grouping is needed for clarity.
puts(string)
puts string
I recommend firing up the rails console and start declaring variables and accessing them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Convert named matches in MatchData to Hash - ruby

If you want to keep the names, you can do new_dimensions = {} dimensions.names.each { |k| new_dimensions[k] = my_operation(dimensions[k]) }

Related

Check whether a string contains all the characters of another string in Ruby

Array of strings Group by first common letters [closed]

Compare string against array and extract array elements present in ruby

How to make this hash creation prettier

RoR different bracket notation

Categories

Resources