Transform data from one column to multiple columns in Ruby - ruby

I have two columns of data in csv format as shown below from a prediction server.The first column is an index position for each variable for each prediction. Therefore new data starts at index 1.
1,2.0
2,1.5
3,1.4
1,1.1
2,2.0
3,1.5
4,2.0
5,1.6
1,2.0
2,4.0
.
.
.
I would like to have the data in this format instead,
2.0,1.1,2.0
1.5,2.0,4.0
1.4,1.5
2.0
1.6
For ease of work, The empty 'cells' can be filled with zeros or # e.g
2.0,1.1,2.0
1.5,2.0,4.0
1.4,1.5,0
0, 2.0,0
0, 1.6,0
Someone with an elegant way to do this in Ruby?

Let's try to transpose it with Array#transpose:
# first get a 2d representation of the data
rows = CSV.read(fn).slice_before{|row| "1" == row[0]}.map{|x| x.map{|y| y[1]}}
# we want to transpose the array but first we have to fill empty cells
max_length = rows.max_by{|x| x.length}.length
rows.each{|row| row.fill '#', row.length..max_length}
# now we can transpose the array
pp rows.transpose
["2.0", "1.1", "2.0", "5.0"],
["1.5", "2.0", "4.0", "#"],
["1.4", "1.5", "#", "#"],
["#", "2.0", "#", "#"],
["#", "1.6", "#", "#"],
["#", "#", "#", "#"]

This should work for you:
require 'csv'
# read csv contents from file to array
rows = CSV.read("path/to/in_file.csv")
res = Hash.new {|h,k| h[k] = []}
rows.each do |(key, val)|
res[key] << val
end
# write to output csv file
CSV.open("path/to/out_file.csv", "wb") do |csv|
# sort res hash by keys, map to have array of values and add to csv
res.sort_by{|k, v| k}.map{|k, v| v}.each do |r|
csv << r
end
end

Related

ruby - create CSV::Table from 2d array

I have a 2D array... is their any way to create CSV::Table with first row considered as headers and assuming all rows has same number of headers.
You can create a CSV::Table object with headers from a 2D array using CSV.parse.
First convert your 2d array to a string where the values in each row are joined by a comma, and each row is joined by a newline, then pass that string to CSV.parse along with the headers: true option
require 'csv'
sample_array = [
["column1", "column2", "column3"],
["r1c1", "r1c2", "r1c3"],
["r2c1", "r2c2", "r2c3"],
["r3c1", "r3c2", "r3c3"],
]
csv_data = sample_array.map {_1.join(",")}.join("\n")
table = CSV.parse(csv_data, headers: true)
p table
p table.headers
p table[0]
p table[1]
p table[2]
=>
#<CSV::Table mode:col_or_row row_count:4>
["column1", "column2", "column3"]
#<CSV::Row "column1":"r1c1" "column2":"r1c2" "column3":"r1c3">
#<CSV::Row "column1":"r2c1" "column2":"r2c2" "column3":"r2c3">
#<CSV::Row "column1":"r3c1" "column2":"r3c2" "column3":"r3c3">
Below is the basic example to create CSV file in ruby:
hash = {a: [1, 2, 3], b: [4, 5, 6]}
require 'csv'
CSV.open("my_file.csv", "wb") do |csv|
csv << %w(header1 header2 header3)
hash.each_value do |array|
csv << array
end
end
#diwanshu-tyagi will this help to resolve your question? if not please add example of your input value, I'll update this answer.
Thanks

How to generate the expected output by using split method used in my code?

Question:
Create a method for Array that returns a hash having 'key' as length of the element and value as an array of all the elements of that length. Make use of Array#each.
Returned Hash should be sorted by key.
I have tried to do it through Hash sorting over length. I have almost resolved it using another method but I want to use split and hash to achieve expected output.
Can anyone suggest any amendments in my code below?
Input argument:
array-hash.rb "['abc','def',1234,234,'abcd','x','mnop',5,'zZzZ']"
Expected output:
{1=>["x", "5"], 3=>["abc", "def", "234"], 4=>["1234", "abcd", "mnop", "zZzZ"]}
class String
def key_length(v2)
hash = {}
v2.each do |item|
item_length = item.to_s.length
hash[item_length] ||= []
hash[item_length].push(item)
end
Hash[hash.sort]
end
end
reader = ''
if ARGV.empty?
puts 'Please provide an input'
else
v1 = ARGV[0]
v2 = v1.tr("'[]''",'').split
p reader.key_length(v2)
end
Actual output:
{35=>["abc,def,1234,234,abcd,x,mnop,5,zZzZ"]}
Given the array (converted from string, note integers as string between ""):
ary = str[1..-2].delete('\'').split(',')
ary #=> ["abc", "def", "1234", "234", "abcd", "x", "mnop", "5", "zZzZ"]
The most "idiomatic" way should be using group_by:
ary.group_by(&:size)
If you want to use each, then you could use Enumerable#each_with_object, where the object is an Hash#new with an empty array as default:
ary.each_with_object(Hash.new{ |h,k| h[k] = []}) { |e, h| h[e.size] << e }
Which is the same as
res = Hash.new{ |h,k| h[k] = []}
ary.each { |e| res[e.size] << e }
Not sure why you need to monkeypatch* array here, is this a school exercise or something?
I think your bug is you need to pass in the comma delimiter arg to split.
I would solve the underlying problem as a reduce/inject/fold thing, myself.
s = "['abc','def',1234,234,'abcd','x','mnop',5,'zZzZ']"
splits = s.tr("'[]''",'').split(',') # need to pass in the comma for the split
Hash[splits.inject({}) { |memo,s| memo[s.length] ||= []; memo[s.length] << s; memo }.sort] # doesn't use Array.each but?
{1=>["x", "5"], 3=>["def", "234"], 4=>["1234", "abcd", "mnop"],
5=>["['abc"], 6=>["zZzZ']"]}

Merge duplicate values in json using ruby

I have the following item.json file
{
"items": [
{
"brand": "LEGO",
"stock": 55,
"full-price": "22.99",
},
{
"brand": "Nano Blocks",
"stock": 12,
"full-price": "49.99",
},
{
"brand": "LEGO",
"stock": 5,
"full-price": "199.99",
}
]
}
There are two items named LEGO and I want to get output for the total number of stock for the individual brand.
In ruby file item.rb i have code like:
require 'json'
path = File.join(File.dirname(__FILE__), '../data/products.json')
file = File.read(path)
products_hash = JSON.parse(file)
products_hash["items"].each do |brand|
puts "Stock no: #{brand["stock"]}"
end
I got output for stock no individually for each brand wherein I need the stock to be summed for two brand name "LEGO" displayed as one.
Anyone has solution for this?
json = File.open(path,'r:utf-8',&:read) # in case the JSON uses UTF-8
items = JSON.parse(json)['items']
stock_by_brand = items
.group_by{ |h| h['brand'] }
.map do |brand,array|
[ brand,
array
.map{ |item| item['stock'] }
.inject(:+) ]
end
.to_h
#=> {"LEGO"=>60, "Nano Blocks"=>12}
It works like this:
Enumerable#group_by takes the array of items and creates a hash mapping the brand name to an array of all item hashes with that brand
Enumerable#map turns each brand/array pair in that hash into an array of the brand (unchanged) followed by:
Enumerable#map on the array of items picks out just the "stock" counts, and then
Enumerable#inject sums them all together
Array#to_h then turns that array of two-value arrays into a hash, mapping the brand to the sum of stock values.
If you want simpler code that's less functional and possibly easier to understand:
stock_by_brand = {} # an empty hash
items.each do |item|
stock_by_brand[ item['brand'] ] ||= 0 # initialize to zero if unset
stock_by_brand[ item['brand'] ] += item['stock']
end
p stock_by_brand #=> {"LEGO"=>60, "Nano Blocks"=>12}
To see what your JSON string looks like, let's create it from your hash, which I've denoted h:
require 'json'
j = JSON.generate(h)
#=> "{\"items\":[{\"brand\":\"LEGO\",\"stock\":55,\"full-price\":\"22.99\"},{\"brand\":\"Nano Blocks\",\"stock\":12,\"full-price\":\"49.99\"},{\"brand\":\"LEGO\",\"stock\":5,\"full-price\":\"199.99\"}]}"
After reading that from a file, into the variable j, we can now parse it to obtain the value of "items":
arr = JSON.parse(j)["items"]
#=> [{"brand"=>"LEGO", "stock"=>55, "full-price"=>"22.99"},
# {"brand"=>"Nano Blocks", "stock"=>12, "full-price"=>"49.99"},
# {"brand"=>"LEGO", "stock"=>5, "full-price"=>"199.99"}]
One way to obtain the desired tallies is to use a counting hash:
arr.each_with_object(Hash.new(0)) {|g,h| h.update(g["brand"]=>h[g["brand"]]+g["stock"])}
#=> {"LEGO"=>60, "Nano Blocks"=>12}
Hash.new(0) creates an empty hash (represented by the block variable h) with with a default value of zero1. That means that h[k] returns zero if the hash does not have a key k.
For the first element of arr (represented by the block variable g) we have:
g["brand"] #=> "LEGO"
g["stock"] #=> 55
Within the block, therefore, the calculation is:
g["brand"] => h[g["brand"]]+g["stock"]
#=> "LEGO" => h["LEGO"] + 55
Initially h has no keys, so h["LEGO"] returns the default value of zero, resulting in { "LEGO"=>55 } being merged into the hash h. As h now has a key "LEGO", h["LEGO"], will not return the default value in subsequent calculations.
Another approach is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged:
arr.each_with_object({}) {|g,h| h.update(g["brand"]=>g["stock"]) {|_,o,n| o+n}}
#=> {"LEGO"=>60, "Nano Blocks"=>12}
1 k=>v is shorthand for { k=>v } when it appears as a method's argument.

open two csv file in ruby and copy some of the columns from both files into third file

Open two csv files.The field name and values label, sample are the same in both files, combine the average_old and average_new to to third csv.
I could find some of the results indicating faster csv but it's totally new to me. Any small snippet is appreciated.If the row exists in only one file I want to keep it in the new file.
For example:
File1.csv
label,sample,average_old
t1,10,12
t2,11,13
File2.csv
label,sample,average_new
t1,10,16
t2,11,15
File3.csv should be
label,sample,average_old,average_new
t1,10,12,16
t2,11,13,15
require 'csv'
CSV.foreach('/tmp/File2.csv').inject(
CSV.foreach('/tmp/File1.csv').inject({}) do |memo, row|
(memo[row.first] ||= []) << row
memo
end
) do |memo, row|
(memo[row.first] ||= []) << row
memo
end.map(&:flatten).map(&:uniq)
#⇒ [
# ["label", "sample", "average_old", "average_new"],
# ["t1", "10", "12", "16"],
# ["t2", "11", "13", "15"]
# ]
Writing this array to file should be easy.

How to replace CSV headers

If using the 'csv' library in ruby, how would you replace the headers without re-reading in a file?
foo.csv
'date','foo',bar'
1,2,3
4,5,6
Using a CSV::Table because of this answer
Here is a working solution, however it requires writing and reading from a file twice.
require 'csv'
#csv = CSV.table('foo.csv')
# Perform additional operations, like remove specific pieces of information.
# Save fixed csv to a file (with incorrect headers)
File.open('bar.csv','w') do |f|
f.write(#csv.to_csv)
end
# New headers
new_keywords = ['dur','hur', 'whur']
# Reopen the file, replace the headers, and print it out for debugging
# Not sure how to replace the headers of a CSV::Table object, however I *can* replace the headers of an array of arrays (hence the file.open)
lines = File.readlines('bar.csv')
lines.shift
lines.unshift(new_keywords.join(',') + "\n")
puts lines.join('')
# TODO: re-save file to disk
How could I modify the headers without reading from disk twice?
'dur','hur','whur'
1,x,3
4,5,x
Update
For those curious, here is the unabridged code. In order to use things like delete_if() the CSV must be imported with the CSV.table() function.
Perhaps the headers could be changed by converting the csv table into an array of arrays, however I'm not sure how to do that.
Given a test.csv file whose contents look like this:
id,name,age
1,jack,8
2,jill,9
You can replace the header row using this:
require 'csv'
array_of_arrays = CSV.read('test.csv')
p array_of_arrays # => [["id", "name", "age"],
# => ["1", "jack", "26"],
# => ["2", "jill", "27"]]
new_keywords = ['dur','hur','whur']
array_of_arrays[0] = new_keywords
p array_of_arrays # => [["dur", "hur", "whur"],
# => ["1", " jack", " 26"],
# => ["2", " jill", " 27"]]
Or if you'd rather preserve your original two-dimensional array:
new_array = Array.new(array_of_arrays)
new_array[0] = new_keywords
p new_array # => [["dur", "hur", "whur"],
# => ["1", " jack", " 26"],
# => ["2", " jill", " 27"]]
p array_of_arrays # => [["id", "name", "age"],
# => ["1", "jack", "26"],
# => ["2", "jill", "27"]]

Resources