Sum of same values in ruby hash - ruby

everybody.
I have hash for example
{-2=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
There can be equal values. My task is to sum keys where values are equal. Result:
{51=>"a", -1=>"c", -38=>"ab"}
How can I do this?
hash.group_by{|key,val| val}
Gives awful result.

hash = {-2=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
hash.reduce({}) do |memo, (k,v)|
memo[v] ||= 0
memo[v] += k
memo
end.invert
# => {51=>"a", -1=>"c", -38=>"ab"}
reduce - lets you build up a new value by iterating over the values of a collection, in this case hash. See the docs for more.
invert - swaps the keys and values of a hash. See the docs for more.
Other ways to do this:
hash.reduce(Hash.new(0)) { |memo, (k,v)| memo[v] += k; memo }.invert

h = {-2=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
then
h.group_by(&:last).each_with_object({}) { |(k,v),h| h[v.map(&:first).sum] = k }
#=> {51=>"a", -1=>"c", -38=>"ab"}
but that would be crazy as it relies on the sums being unique. (Recall that hashes have unique keys.) Suppose
h = {-54=>"a", -1=>"c", 1=>"a", 3=>"a", 49=>"a", -43=>"ab", 5=>"ab"}
then
h.group_by(&:last).each_with_object({}) { |(k,v),h| h[v.map(&:first).sum] = k }
#=> {-1=>"c", -38=>"ab"}
as -1=>"a" is overwritten by -1=<"c". I doubt that this is wanted.
It would be better to save the contents of h in an array:
a = [[-2, "a"], [-1, "c"], [-1, "a"], [49, "a"], [-43, "ab"], [5, "ab"]]
(as it permits duplicate values of the integers--here -1) and then compute
a.group_by(&:last).each_with_object({}) { |(e,ar),h| h[e] = ar.map(&:first).sum }
#=> {"a"=>46, "c"=>-1, "ab"=>-38}
Note that (for the original value of h)
h.group_by(&:last)
#=> {"a"=>[[-2, "a"], [1, "a"], [3, "a"], [49, "a"]],
# "c"=>[[-1, "c"]], "ab"=>[[-43, "ab"], [5, "ab"]]}
and v.map(&:first).sum could be replaced with
v.reduce(0) { |t,(n,_)| t+n }

Related

Can someone explain how inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h works?

alphabet = ["A","B","C","D","E","F","G","H","I","J",
"K","L","M","N","O","P","Q","R","S","T",
"U","V","W","X","Y","Z"," ",".",",",";",
"-","'"
]
file = File.read("vt_00.txt")
i = 0
while i < alphabet.count do
single_char = alphabet[i]
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
j = 0
while j < alphabet.count do
two_chars = alphabet[i] + alphabet[j]
two_chars_count = file.scan(two_chars).count
if two_chars_count > 10 && two_chars_count < 15
print "#{two_chars} = #{two_chars_count} "
end
k = 0
while k < alphabet.count do
three_chars = alphabet[i] + alphabet[j] + alphabet[k]
three_chars_count = file.scan(three_chars).count
if three_chars_count > 10 && three_chars_count < 15
print "#{three_chars} = #{three_chars_count} "
end
k += 1
end
j += 1
end
i += 1
end
I had code like upper code. But then I found a solution through each_cons, can u explain how it works?
I don't understand .inject.. part.
count = string.each_cons(1).inject(Hash.new(0)) { |total, bigram| total[bigram] += 1; total }.sort_by { |_key, value| value }.reverse.to_h
A more elaborate way to write it would be:
total = Hash.new(0)
string.each_cons(1).each{|bigram| total[bigram] += 1}
inject allows to inject some start value (Hash.new(0) --> we use the default 0 so we can safely use the += operator), and whatever the block returns is injected in the next iteration. So in this case we have to explicitly return the hash (total) to be able to manipulate it in the next step.
A simple example is adding all values of an array:
[1,4,5,23,2,66,123].inject(0){|sum, value| sum += value}
We start with 0, the first iteration we execute 0 + 1 and the result of that will then be injected in the next iteration.
Note: in your original code, instead of using while loops and maintaining counters, you could more easily iterate over the arrays as follows:
alphabet.each do |single_char|
single_char_count = file.count(single_char)
print "#{alphabet[i]} = #{single_char_count} "
alphabet.each do |second_char|
two_chars = single_char + second_char
# do something with two_chars
alphabet.each do |third_char|
three_chars = single_char + second-char + third_char
# do something with three_chars
end
end
end
I am guessing it depends on the size of the file whether iterating over all each_cons (1-2-3) or using file.scan will be more efficient.
The question
You wish to know how the following works:
g = Hash.new(0)
count = str.each_char.inject(g) do |h, s|
h[s] += 1
h
end.sort_by { |_key, value| value }.reverse.to_h
str.each_cons(1) does not work because the class String, of which str is an instance, does not have an instance method each_cons. There is a method Enumerable#each_cons, but the class String does not include that module, so strings to not respond to that method:
String.included_modules
#=> [Comparable, Kernel]
String#each_char does make sense here, as it returns an enumerator that generates each character of the string. I therefore assume that each_char was meant where each_cons(1) was written.
I have changed the variable names to something more generic, and have moved
g = Hash.new(0)
to a separate line.
An example
Suppose str is as follows:
str = "The Cat and the Hat"
Examine steps performed
Let's break the calculation into pieces:
g = Hash.new(0)
#=> {}
h = str.each_char.inject(g) do |h,s|
h[s] += 1
h
end
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
a = h.sort_by { |_key, value| value }
#=> [["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1],
# ["h", 2], ["e", 2], ["a", 3], ["t", 3], [" ", 4]]
b = a.reverse
#=> [[" ", 4], ["t", 3], ["a", 3], ["e", 2], ["h", 2],
# ["H", 1], ["d", 1], ["n", 1], ["C", 1], ["T", 1]]
count = b.to_h
#=> {" "=>4, "t"=>3, "a"=>3, "e"=>2, "h"=>2,
# "H"=>1, "d"=>1, "n"=>1, "C"=>1, "T"=>1}
The calculations of a, b and count are straightforward, so let's consider them first.
Calculation of a
Like all Enumerable methods, Enumerable#sort_by requires that its receiver responds to the method each. Here sort_by's receiver is a hash so h must respond to Hash#each. Indeed, sort_by's first operation is to convert h to an enumerator by sending it the method Hash#each:
enum = h.each
#=> #<Enumerator: {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1, "a"=>3,
# "t"=>3, "n"=>1, "d"=>1, "H"=>1}:each>
We can see the values that are generated by this enumerator by repeatedly sending it the method Enumerator#next:
enum.next #=> ["T", 1]
enum.next #=> ["h", 2]
enum.next #=> ["e", 2]
...
enum.next #=> ["H", 1]
enum.next #=> StopIteration (iteration reached an end)
It is seen that enum generates a sequence of the hash's key-value pairs. Therefore,
h.sort_by { |_key, value| value }
is equivalent to
[["T", 1], ["h", 2], ["e", 2],..., ["H", 1]].sort_by { |_key, value| value }
which explains why a equals the array shown above.
Calculation of b
This calculation could not be more straightforward. Note that we could save a step by replacing b = h.sort_by { |_key, value| value }.reverse with
b = h.sort_by { |_key, value| -value }
#=> [[" ", 4], ["a", 3], ["t", 3], ["h", 2], ["e", 2],
# ["T", 1], ["C", 1], ["n", 1], ["d", 1], ["H", 1]]
This sorts the the key-value pairs of h in decreasing order of value, as before, though ties are ordered somewhat differently.
Calculation of count
This is a straightforward application of the method Array#to_h.
Calculation of h
The first step in this calculation is to use the method Hash::new to create an empty hash with a default value of zero:
h = Hash.new(0)
#=> {}
This simply causes h[k] to return the default value of zero when h does not have a key k. For example, since h now has no keys:
h['cat']
#=> 0
If we now set
h['cat'] = 3
then
h['cat']
#=> 3
as the default value no longer applies. A hash h created this way is often called a counting hash. Ruby's first step in parsing the expression h[s] += 1 is to expand it to:
h[s] = h[s] + 1
If h does not have a key s the expression reduces to
h[s] = 0 + 1
because h[s] on the right of the equals sign (the method Hash#[], as contrasted with the method Hash#[]= on the left) returns the default value of zero. If the string were "aaa", the following calculations would be made:
h['a'] = h['a'] + 1 => 0 + 1 => 1
h['a'] = h['a'] + 1 => 1 + 1 => 2
h['a'] = h['a'] + 1 => 2 + 1 => 3
h['a'] on the right returns the default value of zero in the first step, but since h then has the key 'a' in the second and third steps the current values of h['a'] are returned after the first step.
Enumerable#inject (a.k.a reduce) can be used here but the calculation of h is more commonly written as follows:
h = str.each_char.each_with_object(Hash.new(0)) { |s,h| h[s] += 1 }
#=> {"T"=>1, "h"=>2, "e"=>2, " "=>4, "C"=>1,
# "a"=>3, "t"=>3, "n"=>1, "d"=>1, "H"=>1}
See Enumerable#each_with_object.

Create a hash out of an array where the values are the indices of the elements

I have an array and I want to create a hash whose keys are the elements of the array and whose values are (an array of) the indices of the array. I want to get something like:
array = [1,3,4,5]
... # => {1=>0, 3=>1, 4=>2, 5=>3}
array = [1,3,4,5,6,6,6]
... # => {1=>0, 3=>1, 4=>2, 5=>3, 6=>[4,5,6]}
This code:
hash = Hash.new 0
array.each_with_index do |x, y|
hash[x] = y
end
works fine only if I don't have duplicate elements. When I have duplicate elements, it does not.
Any idea on how I can get something like this?
You can change the logic to special-case the situation when the key already exists, turning it into an array and pushing the new index:
arr = %i{a a b a c}
result = arr.each.with_object({}).with_index do |(elem, memo), idx|
memo[elem] = memo.key?(elem) ? [*memo[elem], idx] : idx
end
puts result
# => {:a=>[0, 1, 3], :b=>2, :c=>4}
It's worth mentioning, though, that whatever you're trying to do here could possibly be accomplished in a different way ... we have no context. In general, it's a good idea to keep key-val data types uniform, e.g. the fact that values here can be numbers or arrays is a bit of a code smell.
Also note that it doesn't make sense to use Hash.new(0) here unless you're intentionally setting a default value (which there's no reason to do). Use {} instead
I'm adding my two cents:
array = [1,3,4,5,6,6,6,8,8,8,9,7,7,7]
hash = {}
array.map.with_index {|val, idx| [val, idx]}.group_by(&:first).map do |k, v|
hash[k] = v[0][1] if v.size == 1
hash[k] = v.map(&:last) if v.size > 1
end
p hash #=> {1=>0, 3=>1, 4=>2, 5=>3, 6=>[4, 5, 6], 8=>[7, 8, 9], 9=>10, 7=>[11, 12, 13]}
It fails with duplicated element not adjacent, of course.
This is the expanded version, step by step, to show how it works.
The basic idea is to build a temporary array with pairs of value and index, then work on it.
array = [1,3,4,5,6,6,6]
tmp_array = []
array.each_with_index do |val, idx|
tmp_array << [val, idx]
end
p tmp_array #=> [[1, 0], [3, 1], [4, 2], [5, 3], [6, 4], [6, 5], [6, 6]]
tmp_hash = tmp_array.group_by { |e| e[0] }
p tmp_hash #=> {1=>[[1, 0]], 3=>[[3, 1]], 4=>[[4, 2]], 5=>[[5, 3]], 6=>[[6, 4], [6, 5], [6, 6]]}
hash = {}
tmp_hash.map do |k, v|
hash[k] = v[0][0] if v.size == 1
hash[k] = v.map {|e| e[1]} if v.size > 1
end
p hash #=> {1=>1, 3=>3, 4=>4, 5=>5, 6=>[4, 5, 6]}
It can be written as one line as:
hash = {}
array.map.with_index.group_by(&:first).map { |k, v| v.size == 1 ? hash[k] = v[0][1] : hash[k] = v.map(&:last) }
p hash
If you are prepared to accept
{ 1=>[0], 3=>[1], 4=>[2], 5=>[3], 6=>[4,5,6] }
as the return value you may write the following.
array.each_with_index.group_by(&:first).transform_values { |v| v.map(&:last) }
#=> {1=>[0], 3=>[1], 4=>[2], 5=>[3], 6=>[4, 5, 6]}
The first step in this calculation is the following.
array.each_with_index.group_by(&:first)
#=> {1=>[[1, 0]], 3=>[[3, 1]], 4=>[[4, 2]], 5=>[[5, 3]], 6=>[[6, 4], [6, 5], [6, 6]]}
This may help readers to follow the subsequent calculations.
I think you will find this return value generally more convenient to use than the one given in the question.
Here are a couple of examples where it's clearly preferable for all values to be arrays. Let:
h_orig = { 1=>0, 3=>1, 4=>2, 5=>3, 6=>[4,5,6] }
h_mod { 1=>[0], 3=>[1], 4=>[2], 5=>[3], 6=>[4,5,6] }
Create a hash h whose keys are unique elements of array and whose values are the numbers of times the key appears in the array
h_mod.transform_values(&:count)
#=> {1=>1, 3=>1, 4=>1, 5=>1, 6=>3}
h_orig.transform_values { |v| v.is_a?(Array) ? v.count : 1 }
Create a hash h whose keys are unique elements of array and whose values equal the index of the first instance of the element in the array.
h_mod.transform_values(&:min)
#=> {1=>0, 3=>1, 4=>2, 5=>3, 6=>4}
h_orig.transform_values { |v| v.is_a?(Array) ? v.min : v }
In these examples, given h_orig, we could alternatively convert values that are indices to arrays containing a single index.
h_orig.transform_values { |v| [*v].count }
h_orig.transform_values { |v| [*v].min }
This is hardly proof that it is generally more convenient for all values to be arrays, but that has been my experience and the experience of many others.

Reduce hash with key, value and index as block parameters

h = { "a" => 1, "b" => 2 }
Is there a way to reduce a hash and have the key, value and index as block parameters?
As a starting point I can iterate over a hash getting key, value and index:
h.each_with_index { |(k,v), i| puts [k,v,i].inspect }
# => ["a", 1, 0]
# => ["b", 2, 1]
However when I add reduce I seem to loose the ability to have the key and value as separate values and instead they are provided as a two element array:
h.each_with_index.reduce([]) { |memo, (kv,i)| puts [kv,i].inspect }
# => [["a", 1], 0]
# => [["b", 2], 1]
This is okay, I can in the block do kv[0] and kv[1], but I'd like something like this:
h.each_with_index.reduce([]) { |memo, (k,v), i| puts [k,v,i].inspect }
I'd like to do this without monkey-patching.
Maybe something like this?:
h.each_with_index.reduce([]) { |memo, ((k,v), i)| puts [k,v,i].inspect }
#=> ["a", 1, 0]
#=> ["b", 2, 1]
#=> nil
All you need is scoping: ((k,v), i).
Keeping in mind with reduce, we always have to return the object at the end of block. Which is kind of an extra overhead unless last operation isn't on the memo object which returns the object itself.Otherwise it won't return the desired result.
Same thing can be achieved with each_with_index chained with with_object like so:
h.each_with_index.with_object([]) { |((k,v), i), memo| memo << [k,v,i].inspect }
#=> ["a", 1, 0]
#=> ["b", 2, 1]
#=> []
See the array at last line of output? That's our memo object, which isn't same as reduce that we used above.
When in doubt what the block arguments are, create an instance of an Enumerator and call #next on it:
▶ h = {a: 1, b: 2}
#⇒ {:a=>1, :b=>2}
▶ enum = h.each.with_index.with_object([])
#⇒ #<Enumerator: ...>
▶ enum.next
#⇒ [[[:a, 1], 0], []]
The returned value consists of:
array of key and value, joined into:
array with an index, joined into:
array with an accumulator (for reduce it’d go in front, if reduce returned an enumerator when called without a block—credits to #Stefan for nitpicking.)
Hence, the proper parentheses for decomposing it would be:
# ⇓ ⇓ ⇓ ⇓
# [ [ [:a, 1], 0 ], [] ]
{ | ( (k, v), idx ), memo| ...
Enumerable#each_with_index yields two values into the block: the item and its index. When it is invoked for a Hash, the item is an array that contains two elements: the key and the associated value.
When you declare the block arguments |(k,v), i| you, in fact, deconstruct the first block argument (the item) into its two components: the key and the value. Without a block h.each_with_index produces an Enumerator that yields both arguments of the previously used block wrapped into an array.
This array is the second argument of Enumerator#reduce.
You can tell this by running:
irb> h.each_with_index.reduce([]) { |memo, j| p j }
[["a", 1], 0]
[["b", 2], 1]
Now, the answer to your question is easy: just deconstruct j and you get:
irb> h.each_with_index.reduce([]) { |memo, ((k,v), i)| puts [k,v,i].inspect }
["a", 1, 0]
["b", 2, 1]
Of course, you should memo << [k,v,i] or put the values in memo using other other rules and return memo to get your final desired result.

Inverting a hash value (that's an array) into new individual keys

I have the following:
lumpy_hash = { 1 => ["A", "B"] }
then if I invoke Hash#invert on this hash, I'd like to get:
lumpy_hash = {"A" => 1, "B" => 1}
I don't get that from using Hash#invert. Any ideas on doing this? I'm not sure if I should try Hash#map or Hash#invert.
There are many ways to do this. Here is one:
Hash[lumpy_hash.map { |k,v| v.product([k]) }.first]
#=> {"A"=>1, "B"=>1}
I don't think the method Hash#invert is useful here.
The steps:
enum = lumpy_hash.map
#=> #<Enumerator: {1=>["A", "B"]}:map>
k,v = enum.next
#=> [1, ["A", "B"]]
k #=> 1
v #=> ["A", "B"]
a = v.product([k])
#=> ["A", "B"].product([1])
#=> [["A", 1], ["B", 1]]
Hash[a]
#=> {"A"=>1, "B"=>1}
Here's another way that makes use of a hash's default value. This one is rather interesting:
key,value = lumpy_hash.to_a.first
#=> [1, ["A","B"]]
Hash.new { |h,k| h[k]=key }.tap { |h| h.values_at(*value) }
#=> {"A"=>1,"B"=>1}
Object#tap passes an empty hash to its block, assigning it to the block variable h. The block returns h after adding three key-value pairs, each having a value equal to the hash's default value. It adds the pairs merely by computing the values of keys the hash doesn't have!
Here's another, more pedestrian, method:
lumpy_hash.flat_map{|k,vs| vs.map{|v| {v => k}}}.reduce(&:merge)
=> {"A"=>1, "B"=>1}

How can I sort by word frequency and then sort alphabetically within each frequency in Ruby?

wordfrequency = Hash.new(0)
splitfed.each { |word| wordfrequency[word] += 1 }
wordfrequency = wordfrequency.sort_by {|x,y| y }
wordfrequency.reverse!
puts wordfrequency
I have added the words into a hash table and have gotten it to sort by word frequency, but then order within each frequency is random when I want it to be in alphabetical order. Any quick fixes? Thanks! Much appreciated.
You can use:
wordfrequency = wordfrequency.sort_by{|x,y| [y, x] }
to sort by the value then the key.
In your case,
splitfed = ["bye", "hi", "hi", "a", "a", "there", "alphabet"]
wordfrequency = Hash.new(0)
splitfed.each { |word| wordfrequency[word] += 1 }
wordfrequency = wordfrequency.sort_by{|x,y| [y, x] }
wordfrequency.reverse!
puts wordfrequency.inspect
will output:
[["hi", 2], ["a", 2], ["there", 1], ["bye", 1], ["alphabet", 1]]
which is reverse ordered by the occurrence of the word then the word itself.
Make sure you note (which might be pretty obvious) that wordfrequency is now an array.
Hashes do not necessarily sort in natural order; it is down to the individual data structure. If you want to pretty print a hash, you need to sort the keys, then iterate over that sorted list of keys, outputting the value for each key as you go.
There are tricks you can do to do this on a single line, or collect the entries from the hash into a sorted array of arrays, but ultimately they all come back to sorting the keys then retrieving the data for the sorted key list.
Some hashes maintain insertion order, some hashes maintain a sorted structure which you can then traverse as you process the hash, but these are exceptions to the rule.
Ruby's group_by is the basis for this:
words = %w[foo bar bar baz]
words.group_by{ |w| w }
# => {"foo"=>["foo"], "bar"=>["bar", "bar"], "baz"=>["baz"]}
words.group_by{ |w| w }.map{ |k, v| [k, v.size ] }
# => [["foo", 1], ["bar", 2], ["baz", 1]]
If you want to sort by the words then by their frequency:
words.group_by{ |w| w }.map{ |k, v| [k, v.size ] }.sort_by{ |k, v| [k, v] }
# => [["bar", 2], ["baz", 1], ["foo", 1]]
If you want to sort by the frequency then by the words:
words.group_by{ |w| w }.map{ |k, v| [k, v.size ] }.sort_by{ |k, v| [v, k] }
# => [["baz", 1], ["foo", 1], ["bar", 2]]

Resources