Most efficient way to format a hash of data? - ruby

I am using this array of hashes to do a batch insert into a mongo DB. Each hash was populated by parsing a text file so the formatting of fields are in an unpredictable format. It might look something like:
{date => "March 5", time => "05:22:21", first_name = "John", middle_initial = "JJ", ...}
And I would have a series of formatting functions. So maybe:
def format_date
..convert if needed..
end
def format_time
...
end
How would I go about calling the formatting functions on various records? I could see doing some kind of lambda call where I iterate through the hash and call a format_record_name function, but not all records will have formatting functions. For instance above the first_name record wouldn't need one. Any ideas?

Just keep a list of the keys that you do want to handle. You could even tie it to the transformation functions with a Hash:
transformations = {
:date => lambda {|date| whatever},
:time => lambda {|time| whatever}
}
transformations.default = lambda {|v| v}
data.map do |hash|
Hash[ hash.map {|key, val| transformations[key][val] } ]
end

Here's one idea, pretty similar to what you stated. You might just have an identity function for the fields you don't want to format
def pass(x)
x
end
method_hash = {:date=>method(:your_format_date)}
method_hash.default = method(:pass)
x = {:date => "March 5", :time => "05:22:21", :first_name => "John", :middle_initial => "JJ"}
x.reduce({}) { |hsh,k| hsh[k[0]] = method_hash[k[0]].call(k[1]); hsh }

Make use of Ruby's Singleton (or Eigen) class and then the following one liner solves your problem:
module Formatter
def format_date
Date.parse(self[:date]).strftime('%Y-%m-%d')
end
def format_time
self[:time].split(':')[0,2].join('-')
end
def format_first_name
self[:first_name].upcase
end
def format
{:date => format_date, :time => format_time, :first_name => format_first_name, :last_name => self[:last_name]}
end
end
records = [
{:date => 'March 05', :time => '12:13:00', :first_name => 'Wes', :last_name => 'Bailey'},
{:date => 'March 06', :time => '09:15:11', :first_name => 'Joe', :last_name => 'Buck'},
{:date => 'March 07', :time => '18:35:48', :first_name => 'Troy', :last_name => 'Aikmen'},
]
records.map {|h| h.extend(Formatter).format}
=> [{:date=>"2011-03-05", :time=>"12-13", :first_name=>"WES", :last_name=>"Bailey"},
{:date=>"2011-03-06", :time=>"09-15", :first_name=>"JOE", :last_name=>"Buck"},
{:date=>"2011-03-07", :time=>"18-35", :first_name=>"TROY", :last_name=>"Aikmen"}]

class Formatters
def self.time(value)
"FORMATTED TIME"
end
def self.date(value)
"FORMATTED DATE"
end
def self.method_missing(name, arg)
arg
end
end
your_data = [{:date => "March 5", :time => "05:22:21", :first_name => "John", :middle_initial => "JJ"},
{:date => "March 6", :time => "05:22:22", :first_name => "Peter", :middle_initial => "JJ"},
{:date => "March 7", :time => "05:22:23", :first_name => "Paul", :middle_initial => "JJ"}]
formatted_data = your_data.map do |item|
Hash[ *item.map { |k, v| [k, Formatters.send(k, v)] }.flatten ]
end

Related

is there a built in way to get new hash consisting of only certain key in Ruby?

Say i have data hash like this:
data = [{..}, {..}, {..}]
each hash is like this
{ :ctiy => 'sdfd', :pop => 33, :best_food=> 'sdfa'....}
now how can I get an Array of hashes only containing certain key/value or multiple keys. So take city, if I want new array of hashes containing city only.
I know, I can loop and filter manually but is there a built in method I am missing on.
map will help:
original_array_of_hashes.map do |hash|
{ city: hash[:city] }
end
If you're using Rails, the slice method will be available:
original_array_of_hashes.map do |hash|
hash.slice(:city)
end
For multiple keys:
# without 'slice'
original_array_of_hashes.map do |hash|
{ key_one: hash[:key_one], key_two: hash[:key_two] }
end
# with 'slice'
original_array_of_hashes.map do |hash|
hash.slice(:key_one, :key_two)
end
arr = [{ :city => 'Paris', :country => 'France', :pop => 2240001 },
{ :city => 'Bardelona', :country => 'Spain', :pop => 1600002},
{ :city => 'Vancouver', :country => 'Canada', :pop => 603503 }]
def some_keys(arr, *keys_to_keep)
arr.map { |h| h.select { |k,_| keys_to_keep.include? k } }
end
some_keys (arr)
#=> [{}, {}, {}]
some_keys(arr, :city)
#=> [{:city=>"Paris"}, {:city=>"Bardelona"}, {:city=>"Vancouver"}]
some_keys(arr, :city, :pop)
#=> [{:city=>"Paris", :pop=>2240001},
# {:city=>"Bardelona", :pop=>1600002},
# {:city=>"Vancouver", :pop=>603503}]
some_keys(arr, :city, :country, :pop)
#=> [{:city=>"Paris", :country=>"France", :pop=>2240001},
# {:city=>"Bardelona", :country=>"Spain", :pop=>1600002},
# {:city=>"Vancouver", :country=>"Canada", :pop=>603503}]
This uses Enumerable#map and Hash#select (not Enumerable#select).

How would I construct a Hash from this scenario in Ruby?

Given I have the following code:
ENDPOINT = 'http://api.eventful.com'
API_KEY = 'PbFVZfjTXJQWrnJp'
def get_xml(url, options={})
compiled_url = "#{ENDPOINT}/rest#{url}" << "?app_key=#{API_KEY}&sort_order=popularity"
options.each { |k, v| compiled_url << "&#{k.to_s}=#{v.to_s}" }
REXML::Document.new((Net::HTTP.get(URI.parse(URI.escape(compiled_url)))))
end
def event_search(location, date)
get_xml('/events/search',
:location => "#{location}, United Kingdom",
:date => date
)
end
And we access the XML data formatted by REXML::Document like this:
events = event_search('London', 'Today').elements
And we can access these elements like this (this prints all the titles in the events):
events.each('search/events/event/title') do |title|
puts title.text
end
The XML I'm using can be found here. I would like this construct a Hash like so:
{"Title1" => {:title => 'Title1', :date => 'Date1', :post_code => 'PostCode1'},
"Title2" => {:title => 'Title2', :date => 'Date2', :post_code => 'PostCode2'}}
When using events.each('search/events/event/title'), events.each('search/events/event/date'), and events.each('search/events/event/post_code').
So I want to create a Hash from the XML provided by the URL I have included above. Thanks!
You should loop over the events themselves, not the titles. Something like this
events_by_title = {}
elements.each('search/events/event') do |event|
title = event.get_elements('title').first.text
events_by_title[title] = {
:title => title,
:date => event.get_elements('start_time').first.text
:post_code => event.get_elements('postal_code').first.text,
}
end
Get the root element using root() on the REXML:Document object then use each_element("search/events/event") to iterate over "event" node. You can then extract the different values out of it using the different methods on element: http://ruby-doc.org/stdlib-1.9.3/libdoc/rexml/rdoc/REXML/Element.html

Convert Nested Array into Nested Hash in Ruby

Without knowing the dimension of array, how do I convert an array to a nested hash?
For example:
[["Message", "hello"]]
to:
{{:message => "Hello"}}
Or:
[["Memory", [["Internal Memory", "32 GB"], ["Card Type", "MicroSD"]]]]
to:
{{:memory => {:internal_memroy => "32 GB", :card_type => "MicroSD"}}}
or:
[["Memory", [["Internal Memory", "32 GB"], ["Card Type", "MicroSD"]]], ["Size", [["Width", "12cm"], ["height", "20cm"]]]]
to:
{ {:memory => {:internal_memroy => "32 GB", :card_type => "MicroSD"}, {:size => {:width => "12cm", :height => "20cm" } } }
Considering your format of nested arrays of pairs, that following function transforms it into the hash you'd like
def nested_arrays_of_pairs_to_hash(array)
result = {}
array.each do |elem|
second = if elem.last.is_a?(Array)
nested_arrays_to_hash(elem.last)
else
elem.last
end
result.merge!({elem.first.to_sym => second})
end
result
end
A shorter version
def nested_arrays_to_hash(array)
return array unless array.is_a? Array
array.inject({}) do |result, (key, value)|
result.merge!(key.to_sym => nested_arrays_to_hash(value))
end
end
> [:Message => "hello"]
=> [{:Message=>"hello"}]
Thus:
> [:Message => "hello"][0]
=> {:Message=>"hello"}

How do I create a diff of hashes with a correction factor?

I want to compare hashes inside an array:
h_array = [
{:name => "John", :age => 23, :eye_color => "blue"},
{:name => "John", :age => 22, :eye_color => "green"},
{:name => "John", :age => 22, :eye_color => "black"}
]
get_diff(h_array, correct_factor = 2)
# should return [{:eye_color => "blue"}, {:eye_color => "green"}, {:eye_color => "black"}]
get_diff(h_array, correct_factor = 3)
# should return
# [[{:age => 23}, {:age => 22}, {:age => 22}],
# [{:eye_color => "blue"}, {:eye_color => "green"}, {:eye_color => "black"}]]
I want to diff the hashes contained in the h_array. It looks like a recursive call/method because the h_array can have multiple hashes but with the same number of keys and values. How can I implement the get_diff method?
def get_diff h_array, correct_factor
h_array.first.keys.reject{|k|
h_array.map{|h| h[k]}.sort.chunk{|e| e}.map{|_,e| e.size}.max >= correct_factor
}.map{|k|
h_array.map{|hash| hash.select{|key,_| k == key}}
}
end
class Array
def find_ndups # also returns the number of items
uniq.map { |v| diff = (self.size - (self-[v]).size); (diff > 1) ? [v, diff] : nil}.compact
end
end
h_array = [
{:name => "John", :age => 22, :eye_color => "blue", :hair => "black"},
{:name => "John", :age => 33, :eye_color => "orange", :hair => "green"},
{:name => "John", :age => 22, :eye_color => "black", :hair => "yello"}
]
def get_diff(h_array, correct_factor)
temp = h_array.inject([]){|result, element| result << element.to_a}
master_array = []
unmatched_arr = []
matched_arr = []
temp = temp.transpose
temp.each_with_index do |arr, index|
ee = arr.find_ndups
if ee.length == 0
unmatched_arr << temp[index].inject([]){|result, arr| result << {arr.first => arr.last} }
elsif ee.length > 0 && ee[0][1] != correct_factor && ee[0][1] < correct_factor
return_arr << temp[index].inject([]){|result, arr| result << {arr.first => arr.last} }
elsif ee[0][1] = correct_factor
matched_arr << temp[index].inject([]){|result, arr| result << {arr.first => arr.last} }
end
end
return [matched_arr, unmatched_arr]
end
puts get_diff(h_array, 2).inspect
hope it helps
found this ActiveSupport::CoreExtensions::Hash::Diff module.
ActiveSupport 2.3.2 and 2.3.4 has a built in Hash::Diff module which returns a hash that represents the difference between two hashes.

What is the best way to remap a Hash in Ruby?

Is there a simple way of remapping a hash in ruby the following way:
from:
{:name => "foo", :value => "bar"}
to:
{"foo" => "bar"}
Preferably in a way that makes it simple to do this operation while iterating over an array of this type of hashes:
from:
[{:name => "foo", :value => "bar"}, {:name => "foo2", :value => "bar2"}]
to:
{"foo" => "bar", "foo2" => "bar2"}
Thanks.
arr = [ {:name=>"foo", :value=>"bar"}, {:name=>"foo2", :value=>"bar2"}]
result = {}
arr.each{|x| result[x[:name]] = x[:value]}
# result is now {"foo2"=>"bar2", "foo"=>"bar"}
A modified version of Vanson Samuel's code does the intended.
It's a one-liner, but quite a long one.
arr = [{:name=>"foo", :value=>"bar"}, {:name=>"foo2", :value=>"bar2"}]
arr.inject({}){|r,item| r.merge({item['name'] => item['value']})}
# result {"foo" => "bar", "foo2" => "bar2"}
I wouldn't say that it's prettier than Gishu's version, though.
As a general rule of thumb, if you have a hash of the form {:name => "foo", :value => "bar"}, you're usually better off with using a tuple of ["foo", "bar"].
arr = [["foo", "bar"], ["foo2", "bar2"]]
arr.inject({}) { |accu, (key, value)| accu[key] = value; accu }
I know this is old, but the neatest way to achieve this is to map the array of hashes to an array of tuples, then use Hash[] to build a hash from that, as follows:
arr = [{:name => "foo", :value => "bar"}, {:name => "foo2", :value => "bar2"}]
Hash[ array.map { |item| [ item[:name], item[:value] ] } ]
# => {"foo"=>"bar", "foo2"=>"bar2"}
a bit late but:
[{ name: "foo", value: "bar" },{name: "foo2", value: "bar2"}].map{ |k| k.values }.to_h

Resources