Hash to csv
hash :
{
"employee" => [
{
"name" => "Claude",
"lastname"=> "David",
"profile" => [
"age" => "43",
"jobs" => [
{
"name" => "Ingeneer",
"year" => "5"
}
],
"graduate" => [
{
"place" => "Oxford",
"year" => "1990"
},
],
"kids" => [
{
"name" => "Viktor",
"age" => "18",
}
]
}
}]
this is an example of an hash I would work on. So, as you can see, there is many level of array in it.
My question is, how do I put it properly in a CSV file?
I tried this :
column_names = hash['employee'].first.keys
s=CSV.generate do |csv|
csv << column_names
hash['scrap'].each do |x|
csv << x.values
end
end
File.write('myCSV.csv', s)
but I only get name, lastname and profile as keys, when I would catch all of them (age, jobs, name , year, graduate, place...).
Beside, how can I associate one value per case?
Because I actually have all employee[x] which take a cell alone. Is there any parameters I have missed?
Ps: This could be the following of this post
A valid CSV output has a fixed number of columns, your hash has a variable number of values. The keys jobs, graduate and kids could all have multiple values.
If your only goal is to make a CSV output that can be read in Excel for example, you could enumerate your Hash, take the maximum number of key/value pairs per key, total it and then write your CSV output, filling the blank values with "".
There are plenty of examples here on Stack Overflow, search for "deep hash" to start with.
Your result would have a different number of columns with each Hash you provide it.
That's too much work if you ask me.
If you just want to present a readable result, your best and easiest option is to convert the Hash to YAML which is created for readability:
require 'yaml'
hash = {.....}
puts hash.to_yaml
employee:
- name: Claude
lastname: David
profile:
- age: '43'
jobs:
- name: Ingeneer
year: '5'
graduate:
- place: Oxford
year: '1990'
kids:
- name: Viktor
age: '18'
If you want to convert the hash to a CSV file or record, you'll need to get a 'flat' representation of your keys and values. Something like the following:
h = {
a: 1,
b: {
c: 3,
d: 4,
e: {
f: 5
},
g: 6
}
}
def flat_keys(h)
h.keys.reject{|k| h[k].is_a?(Hash)} + h.values.select{|v| v.is_a?(Hash)}.flat_map{|v| flat_keys(v)}
end
flat_keys(h)
# [:a, :c, :d, :g, :f]
def flat_values(h)
h.values.flat_map{|v| v.is_a?(Hash) ? flat_values(v) : v}
end
flat_values(h)
# [1, 3, 4, 5, 6]
Then you can apply that to create a CSV output.
It depends on how those fields are represented in the database.
For example, your jobs has a hash with name key and your kids also has a hash with name key, so you can't just 'flatten' them, because keys have to be unique.
jobs is probably another model (database table), so you probably would have to (depending on the database) write it separately, including things like the id of the related object and so on.
Are you sure you're not in over your head? Judging from your last question and because you seem to treat csv's as simple key-values pair omitting all the database representation and relations.
Related
i have a matrix like this
[
["name", "company1", "company2", "company3"],
["hr_admin", "Tom", "Joane", "Kris"],
["manager", "Philip", "Daemon", "Kristy"]
]
How can I convert into this data structure?
{
"company1" => {
"hr_admin"=> "Tom",
"manager" => "Philip"
},
"Company2" => {
"hr_admin"=> "Joane",
"manager" => "Daemon"
},
"company3" => {
"hr_admin"=> "Kris",
"manager" => "Kristy"
}
}
I have tried approach like taking out the first row of matrix (header) and zipping the rest o
f the matrix to change their position. It worked to some extent but it doesnt looks very good. So I am turning up here for help.
matrix[0][1...matrix[0].length].each_with_index.map do |x,i|
values = matrix[1..matrix.length].map do |x|
[x[0], x[i+1]]
end.to_h
[x, values]
end.to_h
matrix[0].length and matrix.length could be omittable depending on ruby version.
First you take all elements of first row but first.
then you map them with index to e.g. [["hr_admin", "Tom"],["manager", "Phil"]] using the index
then you call to_h on every element and on whole array.
arr = [
["name", "company1", "company2", "company3"],
["hr_admin", "Tom", "Joane", "Kris"],
["manager", "Philip", "Daemon", "Kristy"]
]
Each key-value pair of the hash to be constructed is formed from the "columns" of arr. It therefore is convenient to compute the transpose of arr:
(_, *positions), *by_company = arr.transpose
#=> [["name", "hr_admin", "manager"],
# ["company1", "Tom", "Philip"],
# ["company2", "Joane", "Daemon"],
# ["company3", "Kris", "Kristy"]]
I made use of Ruby's array decomposition (a.k.a array destructuring) feature (see this blog for elabortion) to assign different parts of the inverse of arr to variables. Those values are as follows1.
_ #=> "name"
positions
#=> ["hr_admin", "manager"]
by_company
#=> [["company1", "Tom", "Philip"],
# ["company2", "Joane", "Daemon"],
# ["company3", "Kris", "Kristy"]]
It is now a simple matter to form the desired hash. Once again I will use array decomposition to advantage.
by_company.each_with_object({}) do |(company_name, *employees),h|
h[company_name] = positions.zip(employees).to_h
end
#=> {"company1"=>{"hr_admin"=>"Tom", "manager"=>"Philip"},
# "company2"=>{"hr_admin"=>"Joane", "manager"=>"Daemon"},
# "company3"=>{"hr_admin"=>"Kris", "manager"=>"Kristy"}}
When, for example,
company_name, *employees = ["company1", "Tom", "Philip"]
company_name
#=> "company1"
employees
#=> ["Tom", "Philip"]
so
h[company_name] = positions.zip(employees).to_h
h["company1"] = ["hr_admin", "manager"].zip(["Tom", "Philip"]).to_h
= [["hr_admin", "Tom"], ["manager", "Philip"]].to_h
= {"hr_admin"=>"Tom", "manager"=>"Philip"}
Note that these calculations do not depend on the numbers of rows or columns of arr.
1. As is common practice, I used the special variable _ to signal to the reader that its value is not used in subsequent calculations.
For example, I have
array = [ {name: 'robert', nationality: 'asian', age: 10},
{name: 'robert', nationality: 'asian', age: 5},
{name: 'sira', nationality: 'african', age: 15} ]
I want to get the result as
array = [ {name: 'robert', nationality: 'asian', age: 15},
{name: 'sira', nationality: 'african', age: 15} ]
since there are 2 Robert's with the same nationality.
Any help would be much appreciated.
I have tried Array.uniq! {|e| e[:name] && e[:nationality] } but I want to add both numbers in the two hashes which is 10 + 5
P.S: Array can have n number of hashes.
I would start with something like this:
array = [
{ name: 'robert', nationality: 'asian', age: 10 },
{ name: 'robert', nationality: 'asian', age: 5 },
{ name: 'sira', nationality: 'african', age: 15 }
]
array.group_by { |e| e.values_at(:name, :nationality) }
.map { |_, vs| vs.first.merge(age: vs.sum { |v| v[:age] }) }
#=> [
# {
# :name => "robert",
# :nationality => "asian",
# :age => 15
# }, {
# :name => "sira",
# :nationality => "african",
# :age => 15
# }
# ]
Let's take a look at what you want to accomplish and go from there. You have a list of some objects, and you want to merge certain objects together if they have the same ethnicity and name. So we have a key by which we will merge. Let's put that in programming terms.
key = proc { |x| [x[:name], x[:nationality]] }
We've defined a procedure which takes a hash and returns its "key" value. If this procedure returns the same value (according to eql?) for two hashes, then those two hashes need to be merged together. Now, what do we mean by "merge"? You want to add the ages together, so let's write a merge function.
merge = proc { |x, y| x.dup.tap { |x1| x1[:age] += y[:age] } }
If we have two values x and y such that key[x] and key[y] are the same, we want to merge them by making a copy of x and adding y's age to it. That's exactly what this procedure does. Now that we have our building blocks, we can write the algorithm.
We want to produce an array at the end, after merging using the key procedure we've written. Fortunately, Ruby has a handy function called each_with_object which will do something very nice for us. The method each_with_object will execute its block for each element of the array, passing in a predetermined value as the other argument. This will come in handy here.
result = array.each_with_object({}) do |x, hsh|
# ...
end.values
Since we're using keys and values to do the merge, the most efficient way to do this is going to be with a hash. Hence, we pass in an empty hash as the extra object, which we'll modify to accumulate the merge results. At the end, we don't care about the keys anymore, so we write .values to get just the objects themselves. Now for the final pieces.
if hsh.include? key[x]
hsh[ key[x] ] = merge.call hsh[ key[x] ], x
else
hsh[ key[x] ] = x
end
Let's break this down. If the hash already includes key[x], which is the key for the object x that we're looking at, then we want to merge x with the value that is currently at key[x]. This is where we add the ages together. This approach only works if the merge function is what mathematicians call a semigroup, which is a fancy way of saying that the operation is associative. You don't need to worry too much about that; addition is a very good example of a semigroup, so it works here.
Anyway, if the key doesn't exist in the hash, we want to put the current value in the hash at the key position. The resulting hash from merging is returned, and then we can get the values out of it to get the result you wanted.
key = proc { |x| [x[:name], x[:nationality]] }
merge = proc { |x, y| x.dup.tap { |x1| x1[:age] += y[:age] } }
result = array.each_with_object({}) do |x, hsh|
if hsh.include? key[x]
hsh[ key[x] ] = merge.call hsh[ key[x] ], x
else
hsh[ key[x] ] = x
end
end.values
Now, my complexity theory is a bit rusty, but if Ruby implements its hash type efficiently (which I'm fairly certain it does), then this merge algorithm is O(n), which means it will take a linear amount of time to finish, given the problem size as input.
array.each_with_object(Hash.new(0)) { |g,h| h[[g[:name], g[:nationality]]] += g[:age] }.
map { |(name, nationality),age| { name:name, nationality:nationality, age:age } }
[{ :name=>"robert", :nationality=>"asian", :age=>15 },
{ :name=>"sira", :nationality=>"african", :age=>15 }]
The two steps are as follows.
a = array.each_with_object(Hash.new(0)) { |g,h| h[[g[:name], g[:nationality]]] += g[:age] }
#=> { ["robert", "asian"]=>15, ["sira", "african"]=>15 }
This uses the class method Hash::new to create a hash with a default value of zero (represented by the block variable h). Once this hash heen obtained it is a simple matter to construct the desired hash:
a.map { |(name, nationality),age| { name:name, nationality:nationality, age:age } }
h = {
users: {
u_548912: {
name: "John",
age: 30
},
u_598715: {
name: "Doe",
age: 30
}
}
}
Given a hash like above, say I want to get user John, I can do
h[:users].values.first[:name] # => "John"
In Ruby 2.3 use Hash#dig can do the same thing:
h.dig(:users, :u_548912, :name) # => "John"
But given that the u_548912 is just a random number(no way to know it before hand), is there a way to get the information still using Hash#dig?
You can, of course, pass an expression as an argument to #dig:
h.dig(:users, h.dig(:users)&.keys&.first, :name)
#=> John
Extract the key if you want more legibility, at the cost of lines of code:
first_user_id = h.dig(:users)&.keys&.first
h.dig(:users, first_user_id, :name)
#=> John
Another option would be to chain your #dig method calls. This is shorter, but a bit less legible.
h.dig(:users)&.values&.dig(0, :name)
#=> John
I'm afraid there is no "neater" way of doing this while still having safe navigation.
I have an array of hashes (CSV rows, actually) and I need to find and keep all the rows that match two specific keys (user, section). Here is a sample of the data:
[
{ user: 1, role: "staff", section: 123 },
{ user: 2, role: "staff", section: 456 },
{ user: 3, role: "staff", section: 123 },
{ user: 1, role: "exec", section: 123 },
{ user: 2, role: "exec", section: 456 },
{ user: 3, role: "staff", section: 789 }
]
So what I would need to return is an array that contained only the rows where the same user/section combo appears more than once, like so:
[
{ user: 1, role: "staff", section: 123 },
{ user: 1, role: "exec", section: 123 },
{ user: 2, role: "staff", section: 456 },
{ user: 2, role: "exec", section: 456 }
]
The double loop solution I'm trying looks like this:
enrollments.each_with_index do |a, ai|
enrollments.each_with_index do |b, bi|
next if ai == bi
duplicates << b if a[2] == b[2] && a[6] == b[6]
end
end
but since the CSV is 145K rows it's taking forever.
How can I more efficiently get the output I need?
In terms of efficiency you might want to try this:
grouped = csv_arr.group_by{|row| [row[:user],row[:section]]}
filtered = grouped.values.select { |a| a.size > 1 }.flatten
The first statement groups the records by the :user and :section keys. the result is:
{[1, 123]=>[{:user=>1, :role=>"staff", :section=>123}, {:user=>1, :role=>"exec", :section=>123}],
[2, 456]=>[{:user=>2, :role=>"staff", :section=>456}, {:user=>2, :role=>"exec", :section=>456}],
[3, 123]=>[{:user=>3, :role=>"staff", :section=>123}],
[3, 789]=>[{:user=>3, :role=>"staff", :section=>789}]}
The second statement only selects the values of the groups with more than one member and then it flattens the result to give you:
[{:user=>1, :role=>"staff", :section=>123},
{:user=>1, :role=>"exec", :section=>123},
{:user=>2, :role=>"staff", :section=>456},
{:user=>2, :role=>"exec", :section=>456}]
This could improve the speed of your operation, but memory wise I can't say what the effect would be with a large input, because it would depend on your machine, resources and the size of file
To do this check in memory you don't need a double loop, you can keep an array of unique values and check each new csv line against it:
found = []
unique_enrollments = []
CSV.foreach('/path/to/csv') do |row|
# do whatever you're doing to parse this row into the hash you show in your question:
# => { user: 1, role: "staff", section: 123 }
# you might have to do `next if row.header_row?` if the first row is the header
enrollment = parse_row_into_enrollment_hash(row)
unique_tuple = [enrollment[:user], enrollment[:section]]
unless found.include? unique_tuple
found << unique_tuple
unique_enrollments << enrollment
end
end
Now you have unique_enrollments. With this approach you parse the CSV line by line so you're not keeping the whole thing in memory. Then building a smaller array of unique tuples made of the user and section with which you'll use for your uniqueness check and also building up the array of unique rows.
You can further optimize this by not saving the unique_enrollments in a big array but rather just building your Model and saving it to the db:
unless found.include? unique_tuple
found << unique_tuple
Enrollment.create enrollment
end
With the above tweak you'll be able to save on memory by not keeping a big array of enrollments. Although the drawback would be that if something blows up you won't be able to rollback. For example, had we done the former and kept an array of unique_enrollments at the end you can do:
Enrollment.transaction do
unique_enrollments.each &:save!
end
And now you have the ability to rollback if any of those saves blow up. Also, wrapping a bunch of db calls in a single transaction is much faster. I'd go with this approach.
Edit: Using the array of unique_enrollments you can iterate over these at the end and create a new CSV:
CSV.open('path/to/new/csv') do |csv|
csv << ['user', 'role', 'staff'] # write the header
unique_enrollments.each do |enrollment|
csv << enrollment.values # just the values not the keys
end
end
I have a set of data that is an array of hashes, with each hash representing one record of data:
data = [
{
:id => "12345",
:bucket_1_rank => "2",
:bucket_1_count => "12",
:bucket_2_rank => "7",
:bucket_2_count => "25"
},
{
:id => "45678",
:bucket_1_rank => "2",
:bucket_1_count => "15",
:bucket_2_rank => "9",
:bucket_2_count => "68"
},
{
:id => "78901",
:bucket_1_rank => "5",
:bucket_1_count => "36"
}
]
The ranks values are always between 1 and 10.
What I am trying to do is select each of the possible values for the rank fields (the :bucket_1_rank and :bucket_2_rank fields) as keys in my final resultset, and the values for each key will be an array of all the values in its associated :bucket_count field. So, for the data above, the final resulting structure I have in mind is something like:
bucket 1:
{"2" => ["12", "15"], "5" => ["36"]}
bucket 2:
{"7" => ["25"], "9" => ["68"]}
I can do this working under the assumption that the field names stay the same, or through hard coding the field/key names, or just using group_by for the fields I need, but my problem is that I work with a different data set each month where the rank fields are named slightly differently depending on the project specs, and I want to identify the names for the count and rank fields dynamically as opposed to hard coding the field names.
I wrote two quick helpers get_ranks and get_buckets that use regex to return an array of fieldnames that are either ranks or count fields, since these fields will always have the literal string "_rank" or "_count" in their names:
ranks = get_ranks
counts = get_counts
results = Hash.new{|h,k| h[k] = []}
data.each do |i|
ranks.each do |r|
unless i[r].nil?
counts.each do |c|
results[i[r]] << i[c]
end
end
end
end
p results
This seems to be close, but feels awkward, and it seems to me there has to be a better way to iterate through this data set. Since I haven't worked on this project using Ruby I'd use this as an opportunity to improve my understanding iterating through arrays of hashes, populating a hash with arrays as values, etc. Any resources/suggestions would be much appreciated.
You could shorten it to:
result = Hash.new{|h,k| h[k] = Hash.new{|h2,k2| h2[k2] = []}}
data.each do |hsh|
hsh.each do |key, value|
result[$1][value] << hsh["#{$1}_count".to_sym] if key =~ /(.*)_rank$/
end
end
puts result
#=> {"bucket_1"=>{"2"=>["12", "15"], "5"=>["36"]}, "bucket_2"=>{"7"=>["25"], "9"=>["68"]}}
Though this is assuming that :bucket_2_item_count is actually supposed to be :bucket_2_count.