Consolidate csv data in ruby to get totals/sums of unique values - ruby

I'm still struggling with a basic problem I have not found an answer to online.
I am getting CSV like data as name and quantity:
Foo, 1.5
Bar, 1.2
Foo, 1.1
...
And want to consolidate it to unique names with the totals as a new value:
Foo, 2.6 #total of both Foo lines
Bar, 1.2
...
Every single time the data set is not large, but the task is quite repetitive.
I tried to convert it into an array of hashes, finding uniq names, and then use inject, but somehow it got quite complicated and did not work. Also, looping through everything seems not to be the ideal approach.
Does anyone have a nice and easy idea or solution I am missing? (I only found "Extract value from row in csv and sum it" for PHP.)

First of all, you can use Ruby's CSV library to parse and convert your CSV data:
require 'csv'
csv_data = "Foo, 1.5\nBar, 1.2\nFoo, 1.1"
data_array = CSV.parse(csv_data, converters: :numeric)
#=> [["Foo", 1.5], ["Bar", 1.2], ["Foo", 1.1]]
To sum the values I'd use a hash along with each_with_object:
data_array.each_with_object(Hash.new(0)) { |(k, v), h| h[k] += v }
#=> {"Foo"=>2.6, "Bar"=>1.2}

Passing 0.0 as the default option for your Hash accounts nicely for the first occurrence of each item:
input = [ ['Foo', 1.5],
['Bar', 1.2],
['Foo', 1.1] ]
result = input.inject(Hash.new(0.0)) do |sum, (key, value)|
sum[key] += value
sum
end
p result

The array of hash seems to be the easiest approach:
Let's say that:
CSV=[["foo",1.5],["bar",2.2],["foo",1.1]]
Just do:
myCSV=[["foo",1.5],["bar",1.2],["foo",1.1]]
myCSV.each_with_object(Hash.new(0.0)){|row,sum| sum[row[0]]+=row[1]}
=> {
"foo" => 2.6,
"bar" => 1.2
}
If you are reading from a file, it's more or less the same using the CSV library:
sum=Hash.new(0.0)
CSV.foreach("path/to/file.csv") do |row|
sum[row[0]]+=row[1]
end

Related

One-liner version of this function? (Ruby 1.8.6)

In the version of Ruby i'm using, (1.8.6 - don't ask), the Hash class doesn't define the Hash#hash method, which means that calling uniq on an array of hashes doesn't test whether the content is the same - it tests whether the objects are the same (using the default base Object#hash method).
To get around this, I can use include?, like so:
hashes = <a big list of hashes>
uniq_hashes = []
hashes.each do |hash|
unless uniq_hashes.include?(hash)
uniq_hashes << hash
end
end;uniq_hashes.size
Can anyone think of a way to condense this into a one-line method?
Can you use each_with_object?
hashes = [{title: 'a'}, {title: 'b'}, {title: 'c'}, {title: 'a'}]
p hashes.each_with_object([]) { |el, array| array << el unless array.include? el }.size
# 3
Rather than using include? to check if each hash matches a previously-examined hash, one can speed things up by making use of a set. Recall that a set is implemented with a hash under the covers, which explains why lookups are so fast.
require 'set'
def uniq_hashes(arr)
st = Set.new
arr.select { |h| st.add?(h) }
end
uniq_hashes [{ a: 1, b: 2 }, { b: 2, a: 1 }, { a: 1, c: 2 }]
#=> [{:a=>1, :b=>2}, {:a=>1, :c=>2}]
See Set#add?.
hashes = <a big list of hashes>
uniq_hashes = []
hashes.each do |hash|
unless uniq_hashes.include?(hash)
uniq_hashes << hash
end
end;uniq_hashes.size
Can anyone think of a way to condense this into a one-line method?
Easy:
hashes = <a big list of hashes>; uniq_hashes = []; hashes.each do |hash| unless uniq_hashes.include?(hash) then uniq_hashes << hash end end;uniq_hashes.size
In fact, you can always condense any Ruby code into one line, since newlines are completely optional. Newlines can always be replaced with either semicolons, separator keywords, or just nothing.

How to output sorted hash in ruby template

I'm building a config file for one of our inline apps. Its essentially a json file. I'm having a lot of trouble getting puppet/ruby 1.8 to output the hash/json the same way each time.
I'm currently using
<%= require "json"; JSON.pretty_generate data %>
But while outputting human readable content, it doesn't guarantee the same order each time. Which means that puppet will send out change notifications often for the same data.
I've also tried
<%= require "json"; JSON.pretty_generate Hash[*data.sort.flatten] %>
Which will generate the same data/order each time. The problem comes when data has a nested array.
data => { beanstalkd => [ "server1", ] }
becomes
"beanstalkd": "server1",
instead of
"beanstalkd": ["server1"],
I've been fighting with this for a few days on and off now, so would like some help
Since hashes in Ruby are ordered, and the question is tagged with ruby, here's a method that will sort a hash recursively (without affecting ordering of arrays):
def sort_hash(h)
{}.tap do |h2|
h.sort.each do |k,v|
h2[k] = v.is_a?(Hash) ? sort_hash(v) : v
end
end
end
h = {a:9, d:[3,1,2], c:{b:17, a:42}, b:2 }
p sort_hash(h)
#=> {:a=>9, :b=>2, :c=>{:a=>42, :b=>17}, :d=>[3, 1, 2]}
require 'json'
puts sort_hash(h).to_json
#=> {"a":9,"b":2,"c":{"a":42,"b":17},"d":[3,1,2]}
Note that this will fail catastrophically if your hash has keys that cannot be compared. (If your data comes from JSON, this will not be the case, since all keys will be strings.)
Hash is an unordered data structure. In some languages (ruby, for example) there's an ordered version of hash, but in most cases in most languages you shouldn't rely on any specific order in a hash.
If order is important to you, you should use an array. So, your hash
{a: 1, b: 2}
becomes this
[{a: 1}, {b: 2}]
I think, it doesn't force too many changes in your code.
Workaround to your situation
Try this:
data = {beanstalkId: ['server1'], ccc: 2, aaa: 3}
data2 = data.keys.sort.map {|k| [k, data[k]]}
puts Hash[data2]
#=> {:aaa=>3, :beanstalkId=>["server1"], :ccc=>2}

Convert array-of-hashes to a hash-of-hashes, indexed by an attribute of the hashes

I've got an array of hashes representing objects as a response to an API call. I need to pull data from some of the hashes, and one particular key serves as an id for the hash object. I would like to convert the array into a hash with the keys as the ids, and the values as the original hash with that id.
Here's what I'm talking about:
api_response = [
{ :id => 1, :foo => 'bar' },
{ :id => 2, :foo => 'another bar' },
# ..
]
ideal_response = {
1 => { :id => 1, :foo => 'bar' },
2 => { :id => 2, :foo => 'another bar' },
# ..
}
There are two ways I could think of doing this.
Map the data to the ideal_response (below)
Use api_response.find { |x| x[:id] == i } for each record I need to access.
A method I'm unaware of, possibly involving a way of using map to build a hash, natively.
My method of mapping:
keys = data.map { |x| x[:id] }
mapped = Hash[*keys.zip(data).flatten]
I can't help but feel like there is a more performant, tidier way of doing this. Option 2 is very performant when there are a very minimal number of records that need to be accessed. Mapping excels here, but it starts to break down when there are a lot of records in the response. Thankfully, I don't expect there to be more than 50-100 records, so mapping is sufficient.
Is there a smarter, tidier, or more performant way of doing this in Ruby?
Ruby <= 2.0
> Hash[api_response.map { |r| [r[:id], r] }]
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
However, Hash::[] is pretty ugly and breaks the usual left-to-right OOP flow. That's why Facets proposed Enumerable#mash:
> require 'facets'
> api_response.mash { |r| [r[:id], r] }
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
This basic abstraction (convert enumerables to hashes) was asked to be included in Ruby long ago, alas, without luck.
Note that your use case is covered by Active Support: Enumerable#index_by
Ruby >= 2.1
[UPDATE] Still no love for Enumerable#mash, but now we have Array#to_h. It creates an intermediate array, but it's better than nothing:
> object = api_response.map { |r| [r[:id], r] }.to_h
Something like:
ideal_response = api_response.group_by{|i| i[:id]}
#=> {1=>[{:id=>1, :foo=>"bar"}], 2=>[{:id=>2, :foo=>"another bar"}]}
It uses Enumerable's group_by, which works on collections, returning matches for whatever key value you want. Because it expects to find multiple occurrences of matching key-value hits it appends them to arrays, so you end up with a hash of arrays of hashes. You could peel back the internal arrays if you wanted but could run a risk of overwriting content if two of your hash IDs collided. group_by avoids that with the inner array.
Accessing a particular element is easy:
ideal_response[1][0] #=> {:id=>1, :foo=>"bar"}
ideal_response[1][0][:foo] #=> "bar"
The way you show at the end of the question is another valid way of doing it. Both are reasonably fast and elegant.
For this I'd probably just go:
ideal_response = api_response.each_with_object(Hash.new) { |o, h| h[o[:id]] = o }
Not super pretty with the multiple brackets in the block but it does the trick with just a single iteration of the api_response.

Best way to convert strings to symbols in hash

What's the (fastest/cleanest/straightforward) way to convert all keys in a hash from strings to symbols in Ruby?
This would be handy when parsing YAML.
my_hash = YAML.load_file('yml')
I'd like to be able to use:
my_hash[:key]
Rather than:
my_hash['key']
In Ruby >= 2.5 (docs) you can use:
my_hash.transform_keys(&:to_sym)
Using older Ruby version? Here is a one-liner that will copy the hash into a new one with the keys symbolized:
my_hash = my_hash.inject({}){|memo,(k,v)| memo[k.to_sym] = v; memo}
With Rails you can use:
my_hash.symbolize_keys
my_hash.deep_symbolize_keys
Here's a better method, if you're using Rails:
params.symbolize_keys
The end.
If you're not, just rip off their code (it's also in the link):
myhash.keys.each do |key|
myhash[(key.to_sym rescue key) || key] = myhash.delete(key)
end
For the specific case of YAML in Ruby, if the keys begin with ':', they will be automatically interned as symbols.
require 'yaml'
require 'pp'
yaml_str = "
connections:
- host: host1.example.com
port: 10000
- host: host2.example.com
port: 20000
"
yaml_sym = "
:connections:
- :host: host1.example.com
:port: 10000
- :host: host2.example.com
:port: 20000
"
pp yaml_str = YAML.load(yaml_str)
puts yaml_str.keys.first.class
pp yaml_sym = YAML.load(yaml_sym)
puts yaml_sym.keys.first.class
Output:
# /opt/ruby-1.8.6-p287/bin/ruby ~/test.rb
{"connections"=>
[{"port"=>10000, "host"=>"host1.example.com"},
{"port"=>20000, "host"=>"host2.example.com"}]}
String
{:connections=>
[{:port=>10000, :host=>"host1.example.com"},
{:port=>20000, :host=>"host2.example.com"}]}
Symbol
if you're using Rails, it is much simpler - you can use a HashWithIndifferentAccess and access the keys both as String and as Symbols:
my_hash.with_indifferent_access
see also:
http://api.rubyonrails.org/classes/ActiveSupport/HashWithIndifferentAccess.html
Or you can use the awesome "Facets of Ruby" Gem, which contains a lot of extensions to Ruby Core and Standard Library classes.
require 'facets'
> {'some' => 'thing', 'foo' => 'bar'}.symbolize_keys
=> {:some=>"thing", :foo=>"bar}
see also:
http://rubyworks.github.io/rubyfaux/?doc=http://rubyworks.github.io/facets/docs/facets-2.9.3/core.json#api-class-Hash
Even more terse:
Hash[my_hash.map{|(k,v)| [k.to_sym,v]}]
Since Ruby 2.5.0 you can use Hash#transform_keys or Hash#transform_keys!.
{'a' => 1, 'b' => 2}.transform_keys(&:to_sym) #=> {:a => 1, :b => 2}
http://api.rubyonrails.org/classes/Hash.html#method-i-symbolize_keys
hash = { 'name' => 'Rob', 'age' => '28' }
hash.symbolize_keys
# => { name: "Rob", age: "28" }
If you are using json, and want to use it as a hash, in core Ruby you can do it:
json_obj = JSON.parse(json_str, symbolize_names: true)
symbolize_names: If set to true, returns symbols for the names (keys) in a JSON object. Otherwise strings are returned. Strings are the default.
Doc: Json#parse symbolize_names
Here's a way to deep symbolize an object
def symbolize(obj)
return obj.inject({}){|memo,(k,v)| memo[k.to_sym] = symbolize(v); memo} if obj.is_a? Hash
return obj.inject([]){|memo,v | memo << symbolize(v); memo} if obj.is_a? Array
return obj
end
I really like the Mash gem.
you can do mash['key'], or mash[:key], or mash.key
A modification to #igorsales answer
class Object
def deep_symbolize_keys
return self.inject({}){|memo,(k,v)| memo[k.to_sym] = v.deep_symbolize_keys; memo} if self.is_a? Hash
return self.inject([]){|memo,v | memo << v.deep_symbolize_keys; memo} if self.is_a? Array
return self
end
end
params.symbolize_keys will also work. This method turns hash keys into symbols and returns a new hash.
In Rails you can use:
{'g'=> 'a', 2 => {'v' => 'b', 'x' => { 'z' => 'c'}}}.deep_symbolize_keys!
Converts to:
{:g=>"a", 2=>{:v=>"b", :x=>{:z=>"c"}}}
So many answers here, but the one method rails function is hash.symbolize_keys
This is my one liner for nested hashes
def symbolize_keys(hash)
hash.each_with_object({}) { |(k, v), h| h[k.to_sym] = v.is_a?(Hash) ? symbolize_keys(v) : v }
end
In case the reason you need to do this is because your data originally came from JSON, you could skip any of this parsing by just passing in the :symbolize_names option upon ingesting JSON.
No Rails required and works with Ruby >1.9
JSON.parse(my_json, :symbolize_names => true)
You could be lazy, and wrap it in a lambda:
my_hash = YAML.load_file('yml')
my_lamb = lambda { |key| my_hash[key.to_s] }
my_lamb[:a] == my_hash['a'] #=> true
But this would only work for reading from the hash - not writing.
To do that, you could use Hash#merge
my_hash = Hash.new { |h,k| h[k] = h[k.to_s] }.merge(YAML.load_file('yml'))
The init block will convert the keys one time on demand, though if you update the value for the string version of the key after accessing the symbol version, the symbol version won't be updated.
irb> x = { 'a' => 1, 'b' => 2 }
#=> {"a"=>1, "b"=>2}
irb> y = Hash.new { |h,k| h[k] = h[k.to_s] }.merge(x)
#=> {"a"=>1, "b"=>2}
irb> y[:a] # the key :a doesn't exist for y, so the init block is called
#=> 1
irb> y
#=> {"a"=>1, :a=>1, "b"=>2}
irb> y[:a] # the key :a now exists for y, so the init block is isn't called
#=> 1
irb> y['a'] = 3
#=> 3
irb> y
#=> {"a"=>3, :a=>1, "b"=>2}
You could also have the init block not update the hash, which would protect you from that kind of error, but you'd still be vulnerable to the opposite - updating the symbol version wouldn't update the string version:
irb> q = { 'c' => 4, 'd' => 5 }
#=> {"c"=>4, "d"=>5}
irb> r = Hash.new { |h,k| h[k.to_s] }.merge(q)
#=> {"c"=>4, "d"=>5}
irb> r[:c] # init block is called
#=> 4
irb> r
#=> {"c"=>4, "d"=>5}
irb> r[:c] # init block is called again, since this key still isn't in r
#=> 4
irb> r[:c] = 7
#=> 7
irb> r
#=> {:c=>7, "c"=>4, "d"=>5}
So the thing to be careful of with these is switching between the two key forms. Stick with one.
Would something like the following work?
new_hash = Hash.new
my_hash.each { |k, v| new_hash[k.to_sym] = v }
It'll copy the hash, but you won't care about that most of the time. There's probably a way to do it without copying all the data.
a shorter one-liner fwiw:
my_hash.inject({}){|h,(k,v)| h.merge({ k.to_sym => v}) }
How about this:
my_hash = HashWithIndifferentAccess.new(YAML.load_file('yml'))
# my_hash['key'] => "val"
# my_hash[:key] => "val"
This is for people who uses mruby and do not have any symbolize_keys method defined:
class Hash
def symbolize_keys!
self.keys.each do |k|
if self[k].is_a? Hash
self[k].symbolize_keys!
end
if k.is_a? String
raise RuntimeError, "Symbolizing key '#{k}' means overwrite some data (key :#{k} exists)" if self[k.to_sym]
self[k.to_sym] = self[k]
self.delete(k)
end
end
return self
end
end
The method:
symbolizes only keys that are String
if symbolize a string means to lose some informations (overwrite part of hash) raise a RuntimeError
symbolize also recursively contained hashes
return the symbolized hash
works in place!
The array we want to change.
strings = ["HTML", "CSS", "JavaScript", "Python", "Ruby"]
Make a new variable as an empty array so we can ".push" the symbols in.
symbols = [ ]
Here's where we define a method with a block.
strings.each {|x| symbols.push(x.intern)}
End of code.
So this is probably the most straightforward way to convert strings to symbols in your array(s) in Ruby. Make an array of strings then make a new variable and set the variable to an empty array. Then select each element in the first array you created with the ".each" method. Then use a block code to ".push" all of the elements in your new array and use ".intern or .to_sym" to convert all the elements to symbols.
Symbols are faster because they save more memory within your code and you can only use them once. Symbols are most commonly used for keys in hash which is great. I'm the not the best ruby programmer but this form of code helped me a lot.If anyone knows a better way please share and you can use this method for hash too!
If you would like vanilla ruby solution and as me do not have access to ActiveSupport here is deep symbolize solution (very similar to previous ones)
def deep_convert(element)
return element.collect { |e| deep_convert(e) } if element.is_a?(Array)
return element.inject({}) { |sh,(k,v)| sh[k.to_sym] = deep_convert(v); sh } if element.is_a?(Hash)
element
end
Starting on Psych 3.0 you can add the symbolize_names: option
Psych.load("---\n foo: bar")
# => {"foo"=>"bar"}
Psych.load("---\n foo: bar", symbolize_names: true)
# => {:foo=>"bar"}
Note: if you have a lower Psych version than 3.0 symbolize_names: will be silently ignored.
My Ubuntu 18.04 includes it out of the box with ruby 2.5.1p57
ruby-1.9.2-p180 :001 > h = {'aaa' => 1, 'bbb' => 2}
=> {"aaa"=>1, "bbb"=>2}
ruby-1.9.2-p180 :002 > Hash[h.map{|a| [a.first.to_sym, a.last]}]
=> {:aaa=>1, :bbb=>2}
This is not exactly a one-liner, but it turns all string keys into symbols, also the nested ones:
def recursive_symbolize_keys(my_hash)
case my_hash
when Hash
Hash[
my_hash.map do |key, value|
[ key.respond_to?(:to_sym) ? key.to_sym : key, recursive_symbolize_keys(value) ]
end
]
when Enumerable
my_hash.map { |value| recursive_symbolize_keys(value) }
else
my_hash
end
end
I like this one-liner, when I'm not using Rails, because then I don't have to make a second hash and hold two sets of data while I'm processing it:
my_hash = { "a" => 1, "b" => "string", "c" => true }
my_hash.keys.each { |key| my_hash[key.to_sym] = my_hash.delete(key) }
my_hash
=> {:a=>1, :b=>"string", :c=>true}
Hash#delete returns the value of the deleted key
Facets' Hash#deep_rekey is also a good option, especially:
if you find use for other sugar from facets in your project,
if you prefer code readability over cryptical one-liners.
Sample:
require 'facets/hash/deep_rekey'
my_hash = YAML.load_file('yml').deep_rekey
In ruby I find this to be the most simple and easy to understand way to turn string keys in hashes to symbols :
my_hash.keys.each { |key| my_hash[key.to_sym] = my_hash.delete(key)}
For each key in the hash we call delete on it which removes it from the hash (also delete returns the value associated with the key that was deleted) and we immediately set this equal to the symbolized key.
Similar to previous solutions but written a bit differently.
This allows for a hash that is nested and/or has arrays.
Get conversion of keys to a string as a bonus.
Code does not mutate the hash been passed in.
module HashUtils
def symbolize_keys(hash)
transformer_function = ->(key) { key.to_sym }
transform_keys(hash, transformer_function)
end
def stringify_keys(hash)
transformer_function = ->(key) { key.to_s }
transform_keys(hash, transformer_function)
end
def transform_keys(obj, transformer_function)
case obj
when Array
obj.map{|value| transform_keys(value, transformer_function)}
when Hash
obj.each_with_object({}) do |(key, value), hash|
hash[transformer_function.call(key)] = transform_keys(value, transformer_function)
end
else
obj
end
end
end

Inserting into Set changing the order of Elements in an Array in Ruby

#search_results = Array.new
duplicates = Set.new
results.each { |result| #search_results.push(result) unless duplicates.add?(result[:url]) }
This piece of code is garbling the order of elements in the array #search_results. Why would inserting the same element in a set and an array change the insertion order for Array? Seems like some issue with element references. Can someone explain?
Edit 1:
I am using an Array. Sorry for the earlier typo. I double checked by code and it uses Array too (there is no push method for Hash anyways)
The order of elements in a Hash is not guaranteed. You'll have to sort the keys if you want a guaranteed order.
This is supposedly fixed in Ruby 1.9 I believe.
Edit: I'm assuming your results in an Array, if its a Hash then order isn't guaranteed and you'll have to sort the keys, here's what my test looks like:
#!/usr/bin/ruby -W
require 'pp'
require 'set'
results = Array.new
results << {:url => 'http://lifehacker.com'}
results << {:url => 'http://stackoverflow.com'}
results << {:url => 'http://43folders.com'}
results << {:url => 'http://lolindrath.com'}
results << {:url => 'http://stackoverflow.com'}
results << {:url => 'http://lifehacker.com'}
#search_results = Array.new
duplicates = Set.new
results.each { |result| #search_results.push(result) unless duplicates.add?(result[:url])}
puts "## #search_results"
pp #search_results
If I run that, here's the result:
## #search_results
[{:url=>"http://stackoverflow.com"}, {:url=>"http://lifehacker.com"}]
I found that odd, so just to be sure, I put a .nil? add the end of .add? and here was my result:
## #search_results
[{:url=>"http://lifehacker.com"},
{:url=>"http://stackoverflow.com"},
{:url=>"http://43folders.com"},
{:url=>"http://lolindrath.com"}]
Now that was what I was expecting: is this what you mean by "garbled"?
Edit 2: Upon further investigation, I think this is because of Ruby's super strict rules when converting non-Boolean data to Booleans (see Ruby Gotchas on Wikipedia and Stack Overflow, of course) so that basically anything that only false is really false and everything else is true. so the .nil? is converting it explicitly to true/false.
irb(main):007:0> puts "zero is true" if 0
zero is true
=> nil
irb(main):008:0> puts "zero is false" unless 0
=> nil
Garbled how? What kind of object is results? If results is a Set or a Hash, then you're not guaranteed that any two traversals of results will be in the same order.
Also, you could do
#search_results = results.uniq
if results is an Array to get all the unique results.
------------------------------------------------------------- Array#uniq
array.uniq -> an_array
------------------------------------------------------------------------
Returns a new array by removing duplicate values in self.
a = [ "a", "a", "b", "b", "c" ]
a.uniq #=> ["a", "b", "c"]

Resources