How to create a Hash from a nested CSV in Ruby? - ruby

I have a CSV in the following format:
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,codes.1
YK,1234,4567,AB001,AK002
As you can see, this is a nested structure. The CSV may contain multiple rows. I would like to convert this into an array of hashes like this:
[
{
name: 'YK',
contacts: [
{
phone_no: '1234'
},
{
phone_no: '4567'
}
],
codes: ['AB001', 'AK002']
}
]
The structure uses numbers in the given format to represent arrays. There can be hashes inside arrays. Is there a simple way to do that in Ruby?
The CSV headers are dynamic. It can change. I will have to create the hash on the fly based on the CSV file.
There is a similar node library called csvtojson to do that for JavaScript.

Just read and parse it line-by-line. The arr variable in the code below will hold an array of Hash that you need
arr = []
File.readlines('README.md').drop(1).each do |line|
fields = line.split(',').map(&:strip)
hash = { name: fields[0], contacts: [fields[1], fields[2]], address: [fields[3], fields[4]] }
arr.push(hash)
end

Let's first construct a CSV file.
str = <<~END
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,IQ,codes.1
YK,1234,4567,AB001,173,AK002
ER,4321,7654,BA001,81,KA002
END
FName = 't.csv'
File.write(FName, str)
#=> 121
I have constructed a helper method to construct a pattern that will be used to convert each row of the CSV file (following the first, containing the headers) to an element (hash) of the desired array.
require 'csv'
def construct_pattern(csv)
csv.headers.group_by { |col| col[/[^.]+/] }.
transform_values do |arr|
case arr.first.count('.')
when 0
arr.first
when 1
arr
else
key = arr.first[/(?<=\d\.).*/]
arr.map { |v| { key=>v } }
end
end
end
In the code below, for the example being considered:
construct_pattern(csv)
#=> {"name"=>"name",
# "contacts"=>[{"phone_no"=>"contacts.0.phone_no"},
# {"phone_no"=>"contacts.1.phone_no"}],
# "codes"=>["codes.0", "codes.1"],
# "IQ"=>"IQ"}
By tacking if pattern.empty? onto the above expression we ensure the pattern is constructed only once.
We may now construct the desired array.
pattern = {}
CSV.foreach(FName, headers: true).map do |csv|
pattern = construct_pattern(csv) if pattern.empty?
pattern.each_with_object({}) do |(k,v),h|
h[k] =
case v
when Array
case v.first
when Hash
v.map { |g| g.transform_values { |s| csv[s] } }
else
v.map { |s| csv[s] }
end
else
csv[v]
end
end
end
#=> [{"name"=>"YK",
# "contacts"=>[{"phone_no"=>"1234"}, {"phone_no"=>"4567"}],
# "codes"=>["AB001", "AK002"],
# "IQ"=>"173"},
# {"name"=>"ER",
# "contacts"=>[{"phone_no"=>"4321"}, {"phone_no"=>"7654"}],
# "codes"=>["BA001", "KA002"],
# "IQ"=>"81"}]
The CSV methods I've used are documented in CSV. See also Enumerable#group_by and Hash#transform_values.

Related

Converting Ruby Hash into string with escapes

I have a Hash which needs to be converted in a String with escaped characters.
{name: "fakename"}
and should end up like this:
'name:\'fakename\'
I don't know how this type of string is called. Maybe there is an already existing method, which I simply don't know...
At the end I would do something like this:
name = {name: "fakename"}
metadata = {}
metadata['foo'] = 'bar'
"#{name} AND #{metadata}"
which ends up in that:
'name:\'fakename\' AND metadata[\'foo\']:\'bar\''
Context: This query a requirement to search Stripe API: https://stripe.com/docs/api/customers/search
If possible I would use Stripe's gem.
In case you can't use it, this piece of code extracted from the gem should help you encode the query parameters.
require 'cgi'
# Copied from here: https://github.com/stripe/stripe-ruby/blob/a06b1477e7c28f299222de454fa387e53bfd2c66/lib/stripe/util.rb
class Util
def self.flatten_params(params, parent_key = nil)
result = []
# do not sort the final output because arrays (and arrays of hashes
# especially) can be order sensitive, but do sort incoming parameters
params.each do |key, value|
calculated_key = parent_key ? "#{parent_key}[#{key}]" : key.to_s
if value.is_a?(Hash)
result += flatten_params(value, calculated_key)
elsif value.is_a?(Array)
result += flatten_params_array(value, calculated_key)
else
result << [calculated_key, value]
end
end
result
end
def self.flatten_params_array(value, calculated_key)
result = []
value.each_with_index do |elem, i|
if elem.is_a?(Hash)
result += flatten_params(elem, "#{calculated_key}[#{i}]")
elsif elem.is_a?(Array)
result += flatten_params_array(elem, calculated_key)
else
result << ["#{calculated_key}[#{i}]", elem]
end
end
result
end
def self.url_encode(key)
CGI.escape(key.to_s).
# Don't use strict form encoding by changing the square bracket control
# characters back to their literals. This is fine by the server, and
# makes these parameter strings easier to read.
gsub("%5B", "[").gsub("%5D", "]")
end
end
params = { name: 'fakename', metadata: { foo: 'bar' } }
Util.flatten_params(params).map { |k, v| "#{Util.url_encode(k)}=#{Util.url_encode(v)}" }.join("&")
I use it now with that string, which works... Quite straigt forward:
"email:\'#{email}\'"
email = "test#test.com"
key = "foo"
value = "bar"
["email:\'#{email}\'", "metadata[\'#{key}\']:\'#{value}\'"].join(" AND ")
=> "email:'test#test.com' AND metadata['foo']:'bar'"
which is accepted by Stripe API

Ruby - Delete rows in csv file using enumerator CSV.open

I know how to do it with CSV.read, but CSV.open and enumerator I'm not sure how. Or how do I omit those specific row before loading them in the new_csv[] ?
Thanks!
new_csv = []
CSV.open(file, headers:true) do |unit|
units = unit.each
units.select do |row|
#delete row [0][1][2][3]
new_csv << row
end
Code Example
If you want to skip the first four rows plus the header, this are some options.
Get pure array:
new_csv = CSV.read(filename)[5..]
or keep the csv object
new_csv = []
CSV.open(filename, headers:true) do |csv|
csv.each_with_index do |row, i|
new_csv << row if i > 3
end
end
or using Enumerable#each_with_object:
csv = CSV.open(filename, headers:true)
new_csv = csv.each_with_index.with_object([]) do |(row, i), ary|
ary << row if i > 3
end
Let's begin by creating a CSV file:
contents =<<~END
name,nickname,age
Robert,Bobbie,23
Wilma,Stretch,45
William,Billy-Bob,72
Henrietta,Mama,53
END
FName = 'x.csv'
File.write(FName, contents)
#=> 91
We can use CSV::foreach without a block to return an enumerator.
csv = CSV.foreach(FName, headers:true)
#=> #<Enumerator: CSV:foreach("x.csv", "r", headers: true)>
The enumerator csv generates CSV::ROW objects:
obj = csv.next
#=> #<CSV::Row "name":"Robert" "nickname":"Bobbie" "age":"23">
obj.class
#=> CSV::Row
Before continuing let me Enumerator#rewind csv so that csv.next will once again generate its first element.
csv.rewind
Suppose we wish to skip the first two records. We can do that using Enumerator#next:
2.times { csv.next }
Now continue generating elements with the enumerator, mapping them to an array of hashes:
loop.map { csv.next.to_h }
#=> [{"name"=>"William", "nickname"=>"Billy-Bob", "age"=>"72"},
# {"name"=>"Henrietta", "nickname"=>"Mama", "age"=>"53"}]
See Kernel#loop and CSV::Row#to_h. The enumerator csv raises a StopInteration exception when next invoked after the enumerator has generated its last element. As you see from its doc, loop handles that exception by breaking out of the loop.
loop is a very versatile method. I generally use it in place of while and until, as well as when I need it to handle a StopIteration exception.
If you just want the values, then:
csv.rewind
2.times { csv.next }
loop.with_object([]) { |_,arr| arr << csv.next.map(&:last) }
#=> [["William", "Billy-Bob", "72"],
# ["Henrietta", "Mama", "53"]]

How can I shorten the following Ruby code?

I have an array of hashes, all with the same keys, all with a key of id. I need a comma delimited string of ids.
arr_widgets = []
widgets.each do |widget|
arr_widgets.push(widget['id'])
end
str_widgets = arr_widgets.join(',')
Have you tried something like this?
str_widgets = widgets.map { |w| w['id'] }.join(',')
There is no need to create an intermediate array.
widgets = [
{"id"=>"dec"}, {"id"=>21}, {"id"=>2020}, {"id"=>"was"},
{"id"=>"the"}, {"id"=>"shortest"}, {"id"=>"day"}, {"id"=>"of"},
{"id"=>"the"}, {"id"=>"longest"}, {"id"=>"year"}
]
Note that two values are integers.
s = widgets.each_with_object(String.new) do |widget, s|
s << ',' unless s.empty?
s << widget["id"].to_s
end
puts s
#=> "dec,21,2020,was,the,shortest,day,of,the,longest,year"

How to group Date and time data from API

I am trying to group data I am getting from an API to serve to our front application. I mean group "time" by "date".
dates: {date1: [time1, time2, timeN], date2: [time1...]}
My input is like this:
{"date"=>"2017-04-04T00:00:00", "time"=>"1754-01-01T13:00:00"}
{"date"=>"2017-04-04T00:00:00", "time"=>"1754-01-01T14:00:00"}
{"date"=>"2017-04-05T00:00:00", "time"=>"1754-01-01T12:00:00"}
{"date"=>"2017-04-05T00:00:00", "time"=>"1754-01-01T13:00:00"}
And my output should be like this:
dates: [{date: "2017-04-04T00:00:00", availableTimes: ["1754-01-01T13:00:00", "1754-01-01T14:00:00"]}, {date: "2017-04-05T00:00:00", availableTimes: ["1754-01-01T12:00:00", "1754-01-01T13:00:00"]}]
I am trying to do this this way but without going into loop madness. I have the following:
dates = Hash[input_data.map{|sd| [sd.date, [""]]}]
This gives me the data outpout like this:
{"2017-04-04T00:00:00"=>[""],
"2017-04-05T00:00:00"=>[""],
"2017-04-11T00:00:00"=>[""],
"2017-04-12T00:00:00"=>[""],
"2017-04-18T00:00:00"=>[""],
"2017-04-19T00:00:00"=>[""],
"2017-04-25T00:00:00"=>[""],
"2017-04-26T00:00:00"=>[""]}
Just one possible way:
input.each_with_object(Hash.new { |h, k| h[k] = [] }) do |h, m|
m[h['date']] << h['time']
end.map { |k, v| { date: k, avaliable_times: v } }
#=> [{:date=>"2017-04-04T00:00:00", :avaliable_times=>["1754-01-01T13:00:00", "1754-01-01T14:00:00"]},
# {:date=>"2017-04-05T00:00:00", :avaliable_times=>["1754-01-01T12:00:00", "1754-01-01T13:00:00"]}]
Actually, it seems like your data structure would be more concise without last map, I mean:
#=> {"2017-04-04T00:00:00"=>["1754-01-01T13:00:00", "1754-01-01T14:00:00"],
# "2017-04-05T00:00:00"=>["1754-01-01T12:00:00", "1754-01-01T13:00:00"]}
You are getting that output because your map function is not actually modifying any sort of data structure. It is simply returning a new array full of arrays that contain the date and an array with an empty string. Basically, this isn't going to be done with just a single map call.
So, the basic algorithm would be:
Find array of all unique dates
Loop through unique dates and use select to only get the date/time pairs for the current date in the loop iteration
Set up the data in the format you prefer
This code will have filteredDates be in the format you need the data
filteredDates = { dates: [] }
uniqueDates = input_data.map { |d| d["date"] }.uniq # This is an array of only unique dates
uniqueDates.each do |date|
dateTimes = input_data.select { |d| d["date"] == date }
newObj = { date: date }
newObj[:availableTimes] = dateTimes.map { |d| d["time"] }
filteredDates[:dates].push(newObj)
end
Here is what filteredDates will look like:
{:dates=>[{:date=>"2017-04-04T00:00:00", :availableTimes=>["1754-01-01T13:00:00", "1754-01-01T14:00:00"]}, {:date=>"2017-04-05T00:00:00", :availableTimes=>["1754-01-01T12:00:00", "1754-01-01T13:00:00"]}]}
There is many ways you can do this, one way is to create a new hash, and set the default value to be an array, then loop over the results and insert the dates:
dates = Hash.new { |hash, key| hash[key] = [] }
input_data.each{ |sd| dates[sd["date"]] << sd["time"] }
I would use Enumerable#group_by.
dates = [{"date"=>"2017-04-04T00:00:00", "time"=>"1754-01-01T13:00:00"},
{"date"=>"2017-04-04T00:00:00", "time"=>"1754-01-01T14:00:00"},
{"date"=>"2017-04-05T00:00:00", "time"=>"1754-01-01T12:00:00"},
{"date"=>"2017-04-05T00:00:00", "time"=>"1754-01-01T13:00:00"}]
dates.group_by { |g| g["date"] }.
map { |k,v| { date: k, available_times: v.map { |h| h["time"] } } }
#=> [{:date=>"2017-04-04T00:00:00",
# :available_times=>["1754-01-01T13:00:00", "1754-01-01T14:00:00"]},
# {:date=>"2017-04-05T00:00:00",
# :available_times=>["1754-01-01T12:00:00", "1754-01-01T13:00:00"]}]
The first step produces the following intermediate value:
dates.group_by { |g| g["date"] }
#=> {"2017-04-04T00:00:00"=>
# [{"date"=>"2017-04-04T00:00:00", "time"=>"1754-01-01T13:00:00"},
# {"date"=>"2017-04-04T00:00:00", "time"=>"1754-01-01T14:00:00"}],
# "2017-04-05T00:00:00"=>
# [{"date"=>"2017-04-05T00:00:00", "time"=>"1754-01-01T12:00:00"},
# {"date"=>"2017-04-05T00:00:00", "time"=>"1754-01-01T13:00:00"}]}
There are probably more elegant ways, but
results = Hash.new
dates.each do |date|
d, t = date['date'].split('T') # (clean up/split date and time formatting)
results.key?(d) ? nil : results[d] = Array.new
results[d] << t
end
puts results
# => {"2017-04-04"=>["13:00:00", "14:00:00"], "2017-04-05"=>["12:00:00", "13:00:00"]}

Ruby library function to transform Enumerable to Hash

Consider this extension to Enumerable:
module Enumerable
def hash_on
h = {}
each do |e|
h[yield(e)] = e
end
h
end
end
It is used like so:
people = [
{:name=>'fred', :age=>32},
{:name=>'barney', :age=>42},
]
people_hash = people.hash_on { |person| person[:name] }
p people_hash['fred'] # => {:age=>32, :name=>"fred"}
p people_hash['barney'] # => {:age=>42, :name=>"barney"}
Is there a built-in function which already does this, or close enough to it that this extension is not needed?
Enumerable.to_h converts a sequence of [key, value]s into a Hash so you can do:
people.map {|p| [p[:name], p]}.to_h
If you have multiple values mapped to the same key this keeps the last one.
[ {:name=>'fred', :age=>32},
{:name=>'barney', :age=>42},
].group_by { |person| person[:name] }
=> {"fred"=>[{:name=>"fred", :age=>32}],
"barney"=>[{:name=>"barney", :age=>42}]}
Keys are in form of arrays to have a possibility to have a several Freds or Barneys, but you can use .map to reconstruct if you really need.

Resources