Ruby regex selecting multiple words at the same time - ruby

I have a hash that I am using regex on to select what key/value pairs I want. Here is the method I have written:
def extract_gender_race_totals(gender, race)
totals = #data.select {|k,v| k.to_s.match(/(#{gender})(#{race})/)}
temp = 0
totals.each {|key, value| temp += value}
temp
end
the hash looks like this:
#data = {
:number_of_african_male_senior_managers=>2,
:number_of_coloured_male_senior_managers=>0,
:number_of_indian_male_senior_managers=>0,
:number_of_white_male_senior_managers=>0,
:number_of_african_female_senior_managers=>0,
:number_of_coloured_female_senior_managers=>0,
:number_of_indian_female_senior_managers=>0,
:number_of_white_female_senior_managers=>0,
:number_of_african_male_middle_managers=>2,
:number_of_coloured_male_middle_managers=>0,
:number_of_indian_male_middle_managers=>0,
:number_of_white_male_middle_managers=>0,
:number_of_african_female_middle_managers=>0,
:number_of_coloured_female_middle_managers=>0,
:number_of_indian_female_middle_managers=>0,
:number_of_white_female_middle_managers=>0,
:number_of_african_male_junior_managers=>0,
:number_of_coloured_male_junior_managers=>0,
:number_of_indian_male_junior_managers=>0,
:number_of_white_male_junior_managers=>0,
:number_of_african_female_junior_managers=>0,
:number_of_coloured_female_junior_managers=>0,
:number_of_indian_female_junior_managers=>0,
:number_of_white_female_junior_managers=>0
}
but it's re-populated with data after a SQL Query.
I would like to make it so that the key must contain both the race and the gender in order for it to return something. Otherwise it must return 0. Is this right or is the regex syntax off?
It's returning 0 for all, which it shouldn't.
So the example would be
%td.total_cell= #ee_demographics_presenter.extract_gender_race_totals("male","african")
This would return 4, there are 4 African, male managers.

Try something like this.
def extract_gender_race_totals(gender, race)
#data.select{|k, v| k.to_s.match(/#{race}_#{gender}/)}.values.reduce(:+)
end
extract_gender_race_totals("male", "african")
# => 4

gmalete's answer gives an elegant solution, but here is just an explanation of why your regexp isn't quite right. If you corrected the regexp I think your approach would work, it just isn't as idiomatic Ruby.
/(#{gender})(#{race})/ won't match number_of_african_male_senior_managers for 2 reasons:
1) the race comes before the gender in the hash key and 2) there is an underscore in the hash key that needs to be in the regexp. e.g.
/(#{race})_(#{gender})/
would work, but the parentheses aren't needed so this can be simplified to
/#{race}_#{gender}/

Rather than having specific methods to query pieces of your keys (i.e. "gender_race"), you could make a general method to query any attribute in any order:
def extract_totals(*keywords)
keywords.inject(#data) { |memo, keyword| memo.select { |k, v| k.to_s =~ /_#{keyword}(?:_|\b)/ } }.values.reduce(:+)
end
Usage:
extract_totals("senior")
extract_totals("male", "african")
extract_totals("managers") # maybe you'll have _employees later...
# etc.
Not exactly what you asked for, but maybe it will help.

Related

Ruby Nokogiri parsing omit duplicates

I'm parsing XML files and wanting to omit duplicate values from being added to my Array. As it stands, the XML will looks like this:
<vulnerable-software-list>
<product>cpe:/a:octopus:octopus_deploy:3.0.0</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.1</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.2</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.3</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.4</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.5</product>
<product>cpe:/a:octopus:octopus_deploy:3.0.6</product>
</vulnerable-software-list>
document.xpath("//entry[
number(substring(translate(last-modified-datetime,'-.T:',''), 1, 12)) > #{last_imported_at} and
cvss/base_metrics/access-vector = 'NETWORK'
]").each do |entry|
product = entry.xpath('vulnerable-software-list/product').map { |product| product.content.split(':')[-2] }
effected_versions = entry.xpath('vulnerable-software-list/product').map { |product| product.content.split(':').last }
puts product
end
However, because of the XML input, that's parsing quite a bit of duplicates, so I end up with an array like ['Redhat','Redhat','Redhat','Fedora']
I already have the effected_versions taken care of, since those values don't duplicate.
Is there a method of .map to only add unique values?
If you need to get an array of unique values, then just call uniq method to get the unique values:
product =
entry.xpath('vulnerable-software-list/product').map do |product|
product.content.split(':')[-2]
end.uniq
There are many ways to do this:
input = ['Redhat','Redhat','Redhat','Fedora']
# approach 1
# self explanatory
result = input.uniq
# approach 2
# iterate through vals, and build a hash with the vals as keys
# since hashes cannot have duplicate keys, it provides a 'unique' check
result = input.each_with_object({}) { |val, memo| memo[val] = true }.keys
# approach 3
# Similar to the previous, we iterate through vals and add them to a Set.
# Adding a duplicate value to a set has no effect, and we can convert it to array
result = input.each_with_object.(Set.new) { |val, memo| memo.add(val) }.to_a
If you're not familiar with each_with_object, it's very similar to reduce
Regarding performance, you can find some info if you search for it, for example What is the fastest way to make a uniq array?
From a quick test, I see these performing in increasing time. uniq is 5 times faster than each_with_object, which is 25% slower than the Set.new approach. Probably because sort is implemetned using C. I only tested with only an arbitrary input though, so it might not be true for all cases.

Find key of value within array in hash

I have a hash categories as following:
categories = {"horeca" => ["bar", "waiter", "kitchen"],
"retail" => ["eerste", "tweede"]}
I want to find they key if the value is included in the array of values.
Something like following
categories.key("bar")
which would return "horeca"
as of now I can only get "horeca" if I do
categories.key(["bar", "waiter", "kitchen"])
Try Enumberable#find:
categories.find { |key, values|
values.include?("bar")
}.first
As Máté mentioned, you can use find if you want to find the first matching element. Use select if you want all matching elements. To just get the keys you would do:
categories.select { |key, values| values.include?("bar") }.map(&:first)
See https://ruby-doc.org/core-2.2.3/Enumerable.html#method-i-select
Creating intermediate array and then calling first on it is all unnecessary, also if the hash is large and you want first matched value, following solution is better
categories.each{ |k,v| break k if v.include?('bar') }
#=> "horeca"
Md. Farhan Memon's solution is generally the preferable solution but it has one downside: If there's no match in the collection, it returns the collection itself – which probably isn't a desirable result. You can fix this with a simple adjustment that combines both detect/find and break:
categories.detect { |key, values| break key if values.include?('bar') }
This breaks and returns the value if it finds it and otherwise returns nil (which I assume to be the preferable behavior).
If your collection may also contain nil values and/or non-arrays, you can improve it further:
categories.detect { |key, values| break key if Array(values).include?('bar') }
The only downside of this general approach is that it's not particularly intuitive to newcomers: You have to know a bit more than just basic Ruby to understand what's going on without running the code first.

In Ruby, group_by where I know there's only 1 element per group

I have a CSV file where one column is a primary key. When I do this:
CSV.read(ARGV[0], headers: true).group_by {|r| r['myKey']}
I get a hash table from key to a list of rows, where the list is always length 1.
Is there a version of group_by which asserts that there's only a single value per key, and creates a hash from key to that single value?
Failing that, is there something like .first which asserts that there's exactly one element in the array/enumerable? I like my scripts to fail when my assumptions are wrong, rather than silently return the wrong thing.
If you use Rails you can use index_by method.
If you know the values r['myKey'] are unique, there's no point in using group_by. As I understand the question, you could do this:
rows = CSV.read(ARGV[0], headers: true)
Hash[rows.map { |r| r['myKey'] }.zip(rows)]
In Ruby 2.0+ the second row could be written:
rows.map { |r| r['myKey'] }.zip(rows).to_h
No. I don't believe there is. But you can solve your problem with each_with_object like so:
CSV.
read(ARGV[0], headers: true).
each_with_object({}) do |r, hash|
key = r['myKey']
value = r
hash[key] = value
end
It's a shame Ruby doesn't have this. Here's what I decided to go on, based on Humza's answer:
module Enumerable
def group_by_uniq
each_with_object({}) do |value, hash|
key = yield value
raise "Multiple values for key \"{key}\"!" unless ! hash.key?(key)
hash[key] = value
end
end
end
If you use your code in you first example you can run this code to check that all hashes are of length 1:
raise 'multiple entries per key!' unless my_hash.values.any?{|val| val.size!=1}
IF you can get the keys into an array you can check that they do not iclude duplicates by:
raise 'multiple entries per key!' unless my_keys.uniq.size == my_keys.size

Ruby Hash destructive vs. non-destructive method

Could not find a previous post that answers my question...I'm learning how to use destructive vs. non-destructive methods in Ruby. I found an answer to the exercise I'm working on (destructively adding a number to hash values), but I want to be clear on why some earlier solutions of mine did not work. Here's the answer that works:
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| the_hash[k] = v + number_to_add_to_each_value}
end
These two solutions come back as non-destructive (since they all use "each" I cannot figure out why. To make something destructive is it the equals sign above that does the trick?):
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each_value { |v| v + number_to_add_to_each_value}
end
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| v + number_to_add_to_each_value}
end
The terms "destructive" and "non-destructive" are a bit misleading here. Better is to use the conventional "in-place modification" vs. "returns a copy" terminology.
Generally methods that modify in-place have ! at the end of their name to serve as a warning, like gsub! for String. Some methods that pre-date this convention do not have them, like push for Array.
The = performs an assignment within the loop. Your other examples don't actually do anything useful since each returns the original object being iterated over regardless of any results produced.
If you wanted to return a copy you'd do this:
def modify_a_hash(the_hash, number_to_add)
Hash[
the_hash.collect do |k, v|
[ k, v + number_to_add ]
end
]
end
That would return a copy. The inner operation collect transforms key-value pairs into new key-value pairs with the adjustment applied. No = is required since there's no assignment.
The outer method Hash[] transforms those key-value pairs into a proper Hash object. This is then returned and is independent of the original.
Generally a non-destructive or "return a copy" method needs to create a new, independent version of the thing it's manipulating for the purpose of storing the results. This applies to String, Array, Hash, or any other class or container you might be working with.
Maybe this slightly different example will be helpful.
We have a hash:
2.0.0-p481 :014 > hash
=> {1=>"ann", 2=>"mary", 3=>"silvia"}
Then we iterate over it and change all the letters to the uppercase:
2.0.0-p481 :015 > hash.each { |key, value| value.upcase! }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}
The original hash has changed because we used upcase! method.
Compare to method without ! sign, that doesn't modify hash values:
2.0.0-p481 :017 > hash.each { |key, value| value.downcase }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}

parsing in ruby

I have this Hash:
cookie = {"fbs_138415639544444"=>["\"access_token=138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt&expires=0
&secret=64aa8b3327eafbfd22ba070b&session_key=5c682220fa7dsfdsafas3523340
&sig=4a494b851ff43d3a58dfa8757b702dfe&uid=503523340\""],
"_play_session"=>["fdasdfasdf"]}
I need to get the substring from right after access_token= to right before &expires. The problem is that the number in the key fbs_138415639544444 changes every time, just the part fbs_ remains constant.
Any idea how to only get:
"138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt"
This is a common task when decoding parameters and queries in HTML URLs. Here's a little method to break down the parameters into a hash. From there it's easy to get the value you want:
def get_params_hash(params)
Hash[ *params.split('&').map{ |q| q.split('=') }.flatten ]
end
p get_params_hash(cookie['fbs_138415639544444'].first)['"access_token']
# >> "138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt"
In Ruby 1.9+, hashes retain their insertion order, so if the hash always has the value you want as its first entry, you can use
cookie.keys.first #=> "fbs_138415639544444"
otherwise use:
cookie.keys.select{ |k| k[/^fbs_/] }.first #=> "fbs_138415639544444"
I never code in ruby, but this sounds like a typical task for split function.
you just need to split this
"\"access_token=138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt&expires=0
&secret=64aa8b3327eafbfd22ba070b&session_key=5c682220fa7dsfdsafas3523340
&sig=4a494b851ff43d3a58dfa8757b702dfe&uid=503523340\""
by & symbol. The first element of result array will be:
"\"access_token=138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt"
and after split it by =, and the second element of result array should be:
138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt
If you only need the access_key part, then a regex is probably easiest.
cookie["fbs_138415639544444"][0] =~ /access_token\=([-\w\d\|]*)&/
access_key = $1
Here the access_key is in the first capture group and you can get it with $1.
A better option if you'll need other parts of the string (say the session_key), would probably be to use a couple splits and parse the string into it's own hash.
Edit: Just realized you need the key too.
key = cookie.each_key.find { |k| k.start_with? "fbs_" }
Then you can use key to get the value.
Since the key changes, the first step is to get right key:
key = cookie.keys.select {|k| k =~ /^fbs_/}.first
This matches them if they begin with the text "fbs_". The first match is returned.
Next you can get the other value by a few (ugly) splits:
cookie[key].first.split('=')[1].split('&').first
Using a regex might be a bit cleaner, but it depends on what the valid characters are in that string.
Regexs are brittle so I wouldn't use those when the reality is you are parsing query string params in the end so use the CGI lib:
> require 'cgi'
=> true
> cookie = {"fbs_138415639544444"=>["\"access_token=138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt&expires=0&secret=64aa8b3327eafbfd22ba070b&session_key=5c682220fa7dsfdsafas3523340&sig=4a494b851ff43d3a58dfa8757b702dfe&uid=503523340\""], "_play_session"=>["fdasdfasdf"]}
> CGI.parse(cookie.select {|k,v| k =~ /^fbs_/}.first[1][0])["\"access_token"][0]
=> "138415639544444|5c682220fa7ebccafd97ec58-503523340|9HHx3z7GzOBPdk444wtt"
This is how i solved the problem...
access_token_key = cookies.keys.find{|item| item.starts_with?('fbs_') }
token = cookies[access_token_key].first
access_token = token.split("&").find{|item| item.include?('access_token') }
fb_access_token = access_token.split("=").find{|item| !item.include?('access_token') }

Resources