Convert CSV file into array of hashes - ruby

I have a csv file, some hockey stats, for example:
09.09.2008,1,HC Vitkovice Steel,BK Mlada Boleslav,1:0 (PP)
09.09.2008,1,HC Lasselsberger Plzen,RI OKNA ZLIN,6:2
09.09.2008,1,HC Litvinov,HC Sparta Praha,3:5
I want to save them in an array of hashes. I don't have any headers and I would like to add keys to each value like "time" => "09.09.2008" and so on. Each line should by accessible like arr[i], each value by for example arr[i]["time"]. I prefer CSV class rather than FasterCSV or split. Can you show the way or redirect to some thread where a similar problem was solved?

Just pass headers: true
CSV.foreach(data_file, headers: true) do |row|
puts row.inspect # hash
end
From there, you can manipulate the hash however you like.
(Tested with Ruby 2.0, but I think this has worked for quite a while.)
Edit
You say you don't have any headers - could you add a header line to the beginning of the file contents after reading them?

You can use the Ruby CSV parser to parse it, and then use Hash[ keys.zip(values) ] to make it a hash.
Example:
test = '''
09.09.2008,1,HC Vitkovice Steel,BK Mlada Boleslav,1:0 (PP)
09.09.2008,1,HC Lasselsberger Plzen,RI OKNA ZLIN,6:2
09.09.2008,1,HC Litvinov,HC Sparta Praha,3:5
'''.strip
keys = ['time', etc... ]
CSV.parse(test).map {|a| Hash[ keys.zip(a) ] }

This is a fantastic post by Josh Nichols which explains how to do what you're asking.
To summarize, here his code:
csv = CSV.new(body, :headers => true, :header_converters => :symbol, :converters => [:all, :blank_to_nil])
csv.to_a.map {|row| row.to_hash }
=> [{:year=>1997, :make=>"Ford", :model=>"E350", :description=>"ac, abs, moon", :price=>3000.0}, {:year=>1999, :make=>"Chevy", :model=>"Venture \"Extended Edition\"", :description=>nil, :price=>4900.0}, {:year=>1999, :make=>"Chevy", :model=>"Venture \"Extended Edition, Very Large\"", :description=>nil, :price=>5000.0}, {:year=>1996, :make=>"Jeep", :model=>"Grand Cherokee", :description=>"MUST SELL!\nair, moon roof, loaded", :price=>4799.0}]
So, you could save the body of your CSV file into a string called body.
body = "09.09.2008,1,HC Vitkovice Steel,BK Mlada Boleslav,1:0 (PP)
09.09.2008,1,HC Lasselsberger Plzen,RI OKNA ZLIN,6:2
09.09.2008,1,HC Litvinov,HC Sparta Praha,3:5"
And then run his code as listed above on it.

A little shorter solution
Parse string:
CSV.parse(content, headers: :first_row).map(&:to_h)
Parse file:
CSV.open(filename, headers: :first_row).map(&:to_h)

Slight variation on Nathan Long's answer
data_file = './sheet.csv'
data = CSV.foreach(data_file, headers: true).map(&:to_h)
Now data is an array of hashes to do your bidding with!

The headers option to the CSV module accepts an array of strings to be used as the headers, when they're not present as the first row in the CSV content.
CSV.parse(content, headers: %w(time number team_1 team_2 score))
This will generate an enumerable of hashes using the given headers as keys.

You can try the following gem also
require 'csv_hasher'
arr_of_hashes = CSVHasher.hashify('/path/to/csv/file')
The keys of the returned hashes will be the header values of the csv file.
If you want to pass your own keys then
keys = [:key1, :key2, ... ]
arr_of_hashers = CSVHasher.hashify('/path/to/csv/file', { keys: keys })

I guess this is the shortest version:
keys = ["time", ...]
CSV.parse(content, headers: keys).map(&:to_h)

you could also use the SmarterCSV gem,
which returns data from CSV files as Ruby hashes by default.
It has a lot of features, including processing the data in chunks, which is very benefitial for huge data files.
require 'smarter_csv'
options = {} # see GitHub README
data = SmarterCSV.process(your_file_name, options)

Related

How to add/read rows from Ruby CSV instance

Though it seems far more common for people to use the Ruby CSV class methods, I have an occasion to use a CSV instance, but it seams completely uncooperative.
What I'd like to do is create a CSV instance, add some rows to it, then be able to retrieve all those rows and write them to a file. Sadly, the following code doesn't work as I would like at all.
require 'csv'
csv = CSV.new('', headers: ['name', 'age'])
csv.read # Apparently I need to do this so that the headers are actually read in.
csv.add_row(['john', '22'])
csv.add_row(['jane', '24'])
csv.read
csv.to_a
csv.to_s
All I want to be able to retrieve the information I put into the csv and then write that to a file, but I can't seem to do that :/
What am I doing wrong?
You need to use CSV#rewind
Here is the sample:
require 'csv'
csv = CSV.new(File.new("data1.csv", "r+"), headers: ['name', 'age'], write_headers: true)
csv.add_row(['john', '22'])
csv.add_row(['jane', '24'])
p csv.to_a # Empty array
csv.rewind
p csv.to_a # Array with three CSV::Row objects (including header)

How can I convert this CSV to JSON with Ruby?

I am trying to convert a CSV file to JSON using Ruby. I am very, very, green when it comes to working with Ruby (or any language for that matter) so the answers may need to be dumbed down for me. Putting it in JSON seems like the most reasonable solution to me because I understand how to work with JSON when assigning variables equal to the attributes that come in the response. If there is a better way to do it, feel free to teach me.
My CSV is in the following format:
Header1,Header,Header3
ValueX,ValueY,ValueZ
I would like to be able to use the data to say something along the lines of this:
For each ValueX in Row 1 after the headers, check if valueZ is > ValueY. If yes, do this, if no do that. I understand how to do the if statement, just now how to parse out my information into variables/arrays.
Any ideas here?
require 'csv'
require 'json'
rows = []
CSV.foreach('a.csv', headers: true, converters: :all) do |row|
rows << row.to_hash
end
puts rows.to_json
# => [{"Header1":"ValueX","Header":"ValueY","Header3":"ValueZ"}]
Here is a first pointer:
require 'csv'
data = CSV.read('your_file.csv', { :col_sep => ',' }
Now you should have the data in data; you can test in irb.
I don't entirely understand the question:
if z > y
# do this
else
# do that
end
For JSON, you should be able to do JSON.parse().
I am not sure what target format JSON requires, probably a Hash.
You can populate your hash with the dataset from the CVS:
hash = Hash.new
hash[key_goes_here] = value_here

Merging CSV tables with Ruby

I'm trying to join CSV files containing stock indexes with Ruby, and having a surprisingly hard time understanding the documentation out there. It's late, and I could use a friend, so go easy on me:
I have several files, with identical headers:
["Date", "Open", "High", "Low", "Close", "Volume"]
I would like my ruby script to read each "Date" column, and write to a new CSV compiling an all encompassing date range from the earliest date to the latest.
Bonus:
Ideally, I would like to add all of the other column data ("Open", "High", etc.) into this new CSV file, split by a column simply containing the following CSV's filename for reference.
Thanks for any consideration given to this. What I'd really like to do is sit down with a Ruby sensei to help me make sense of the documentation. How can I use the CSV.read() or CSV.foreach() do |x| methods to create arrays / hashes to perform upon?
(Theoretical and intelligent responses welcomed)
hypothetical:
CSV.read("data/DOW.csv") do |output|
puts output
end
returns:
[["Date", "Open", "High", "Low", "Close", "Volume"], ["2014-07-14", "71.35", "71.52", "70.82", "71.28", "823063.0"], ["2014-07-15", "71.32", "71.76", "71.0", "71.28", "813861.0"], ["2014-07-16", "71.34", "71.58", "70.68", "71.02", "843347.0"], ["2014-07-17", "70.54", "71.46", "70.54", "71.13", "1303839.0"], ["2014-07-18", "71.46", "72.95", "71.09", "72.46", "1375922.0"], ["2014-07-21", "72.21", "73.46", "71.88", "73.38", "1603854.0"], ["2014-07-22", "73.46", "74.76", "73.46", "74.57", "1335305.0"], ["2014-07-23", "74.54", "75.1", "73.77", "74.88", "1834953.0"]]
How can I identify rows, columns, etc? I'm looking for methods or ways to transform this array into hashes etc. Honestly, an overarching theoretical approach would suit my needs.
I've been playing with Ruby and CSV most of this day, I might be able to help (even though I am beginner myself) but I don't understand what do you want as output (little example would help).
This example would load only columns "Date", "High" and "Volume" into "my_array".
my_array = []
CSV.foreach("data.csv") do |row|
my_array.push([row[0], row[2], row[5]])
end
If you want every column try:
my_array = []
CSV.foreach("data.csv") do |row|
my_array.push(row)
end
If you want to access element of array inside array:
puts my_array[0][0].inspect #=> "Date"
puts my_array[1][0].inspect #=> "2014-07-14"
When you finally get what you want as output, if you are on Windows you can do this from command prompt to save it:
my_file.rb > output_in_text_form.txt
You can do something like this:
#!/usr/bin/env ruby
require 'csv'
input = ARGV.shift
output = ARGV.shift
File.open(output, 'w') do |o|
csv_string = File.read(input)
CSV.parse(csv_string).each do |r|
# r is an array of columns. Do something with it.
...
# Generate string version.
new_csv_row = CSV.generate_line(r, {:force_quotes => true})
# Write to file
o.puts new_csv_row
end
end
Using files is optional. You can use shell redirection and directly read from STDIN and/or directly write to STDOUT.

Parse CSV file with header fields as attributes for each row

I would like to parse a CSV file so that each row is treated like an object with the header-row being the names of the attributes in the object. I could write this, but I'm sure its already out there.
Here is my CSV input:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
The code would look something like this:
CSV.open('my_file.csv','r') do |csv_obj|
puts csv_obj.foo #prints 1 the 1st time, "blah" 2nd time, etc
puts csv.bar #prints 2 the first time, 7 the 2nd time, etc
end
With Ruby's CSV module I believe I can only access the fields by index. I think the above code would be a bit more readable. Any ideas?
Using Ruby 1.9 and above, you can get a an indexable object:
CSV.foreach('my_file.csv', :headers => true) do |row|
puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end
It's not dot syntax but it is much nicer to work with than numeric indexes.
As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.
Here is an example of the symbolic syntax using Ruby 1.9. In the examples below, the code reads a CSV file named data.csv from Rails db directory.
:headers => true treats the first row as a header instead of a data row. :header_converters => :symbolize parameter then converts each cell in the header row into Ruby symbol.
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
In Ruby 1.8:
require 'fastercsv'
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
Based on the CSV provided by the Poul (the StackOverflow asker), the output from the example code above will be:
1,2,3
blah,7,blam
4,5,6
Depending on the characters used in the headers of the CSV file, it may be necessary to output the headers in order to see how CSV (FasterCSV) converted the string headers to symbols. You can output the array of headers from within the CSV.foreach.
row.headers
Easy to get a hash in Ruby 2.3:
CSV.foreach('my_file.csv', headers: true, header_converters: :symbol) do |row|
puts row.to_h[:foo]
puts row.to_h[:bar]
end
Although I am pretty late to the discussion, a few months ago I started a "CSV to object mapper" at https://github.com/vicentereig/virgola.
Given your CSV contents, mapping them to an array of FooBar objects is pretty straightforward:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
require 'virgola'
class FooBar
include Virgola
attribute :foo
attribute :bar
attribute :baz
end
csv = <<CSV
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
CSV
foo_bars = FooBar.parse(csv).all
foo_bars.each { |foo_bar| puts foo_bar.foo, foo_bar.bar, foo_bar.baz }
Since I hit this question with some frequency:
array_of_hashmaps = CSV.read("path/to/file.csv", headers: true)
puts array_of_hashmaps.first["foo"] # 1
This is the non-block version, when you want to slurp the whole file.

how to store a Ruby array into a file?

How to store a Ruby array into a file?
I am not sure what exactly you want, but, to serialize an array, write it to a file and read back, you can use this:
fruits = %w{mango banana apple guava}
=> ["mango", "banana", "apple", "guava"]
serialized_array = Marshal.dump(fruits)
=> "\004\b[\t\"\nmango\"\vbanana\"\napple\"\nguava"
File.open('/tmp/fruits_file.txt', 'w') {|f| f.write(serialized_array) }
=> 33
# read the file back
fruits = Marshal.load File.read('/tmp/fruits_file.txt')
=> ["mango", "banana", "apple", "guava"]
There are other alternatives you can explore, like json and YAML.
To just dump the array to a file in the standard [a,b,c] format:
require 'pp'
$stdout = File.open('path/to/file.txt', 'w')
pp myArray
That might not be so helpful, perhaps you might want to read it back? In that case you could use json. Install using rubygems with gem install json.
require 'rubygems'
require 'json'
$stdout = File.open('path/to/file.txt', 'w')
puts myArray.to_json
Read it back:
require 'rubygems'
require 'json'
buffer = File.open('path/to/file.txt', 'r').read
myArray = JSON.parse(buffer)
There are multiple ways to dump an array to disk. You need to decide if you want to serialize in a binary format or in a text format.
For binary serialization you can look at Marshal
For text format you can use json, yaml, xml (with rexml, builder, ... ) , ...
Some standard options for serializing data in Ruby:
Marshal
YAML
JSON (built-in as of 1.9, various gems available as well)
(There are other, arguably better/faster implementations of YAML and JSON, but I'm linking to built-ins for a start.)
In practice, I seem to see YAML most often, but that may not be indicative of anything real.
Here's a quick yaml example
config = {"rank" => "Admiral", "name"=>"Akbar",
"wallet_value" => 9, "bills" => [5,1,1,2]}
open('store.yml', 'w') {|f| YAML.dump(config, f)}
loaded = open('store.yml') {|f| YAML.load(f) }
p loaded
# => {"name"=>"Akbar", "wallet_value"=>9, \
# "bills"=>[5, 1, 1, 2], "rank"=>"Admiral"}
Example: write text_area to a file where text_area is an array of strings.
File.open('output.txt', 'w') { |f| text_area.each { |line| f << line } }
Don't forget to do error checking on file operations :)
Afaik.. files contain lines not arrays. When you read the files, the data can then be stored in an array or other data structures. I am anxious to know if there is another way.

Resources