How to parse a Hash of Hashes from a CSV file - ruby

I have a CSV file that I need to read and extract all rows which have a "created_at" within a certain range. The CSV itself is about 5000 lines in Excel.
This is how I am pulling the info from the file:
CSV.foreach("sample_data.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
data[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
Here's the last Hash created after using CSV.foreach:
2760=>{:created_at=>1483189568, :readable_date=>"12/31/2016", :first_name=>"Louise", :last_name=>"Garza", :email=>"lgarza24n#drupal.org", :gender=>"Female", :company=>"Cogilith", :currency=>"EUR", :word=>"orchestration", :drug_brand=>"EPIVIR", :drug_name=>"lamivudine", :drug_company=>"State of Florida DOH Central Pharmacy", :pill_color=>"Maroon", :frequency=>"Yearly", :token=>"_", :keywords=>"in faucibus", :bitcoin_address=>"19jTjXLPQUL1nEmHrpqeqM1FdtDFZmUZ2E"}}
When I run data[2759].first I get:
created_at
1309380645
I need to pull every hash where created_at is between range = 1403321503..1406082945. I tried about twenty different methods using each and collect on the data hash with no success. My last attempt printed out an empty {} for each original hash.
I'm trying to test this with no success:
data.each do |hash|
if hash.first.to_s.to_i > 1403321503 && hash.first.to_s.to_i < 1406082945
puts hash
end
end
I'm not sure how to isolate the value of key:created_at and then see if it is within the range. I also tried doing hash.first.to_s.to_i =/== range.
I am able to get just the :created_at value by using data[1].first.last but when I try to use that in a method it errors out.
Here is a link to the original CSV: goo.gl/NOjAPo
It is not on my work computer so I can't do a pastebin of it.

I would only store rows in the data hash that are within the range. IMO that performs betters, because it needs less memory than reading all data into data and remove the unwanted entries in a second step.
DATE_RANGE = (1403321503..1406082945)
CSV.foreach("sample_data.csv",
:headers => true,
:header_converters => :symbol,
:converters => :all) do |row|
attrs = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
data[row.fields[0]] = attrs if DATE_RANGE.cover?(attrs[:created_at])
end
It might make sense to check the condition before actually creating the hash by checking DATE_RANGE.cover? against the column number (is created_at in row.fields[1]?).

Use Enumerable#select
hash.select do |_, v|
(1403321503..1406082945) === v[:created_at]
end
Here we also use Range#=== also known as case-equal, or triple-equal, to check if the value is inside the range.

Related

Access csv data from htttp request in ruby

I'm trying to access the csv data, which I recive if I make a http-request.
I don't save it to a csv file, so I save it to the variable.
Let's say this is the response I get, how can I print food?
uuid,event_id,category
12,1,food
13,2,cars
And this is the part of the ruby code which is important.
That's something I found, but it was originally used with a file, so it doesn't work.
csvdata = request(action,parameter)
#data_hash = {}
CSV.foreach(csvdata) do |row|
uuid, event_id, category = row
#data_hash[uuid] = event_id
end
Do I really need files for that or is there a easy way I can access the values?
Update
CSV.parse(csvdata,data = Hash.new) do |row|
puts data
end
The hash should look like this so I can use the column names
{"uuid" => "12,13", "event_id" => "323,3243", "category" => "food,cars"}
csv_data = Hash.new{|k, v| k[v] = []}
CSV.parse(csv_string, headers: true) do |row|
row.each{|k, v| csv_data[k] << v}
end
csv_data = Hash[csv_data.map{|k, v| [k, v.join(",")]}]
Update after specification Requested output.
Try this:
csvdata = request(action,parameter)
#data_hash = {}
CSV.parse(csvdata, headers: true) do |row|
#data_hash[row['uuid']] = row['event_id']
end
#data_hash
# => {"12"=>"1", "13"=>"2"}
When you parse a CSV, the seconds parameter (data = Hash.new in your code) is actually an options parameter. You can see the available options here:
:headers
If set to :first_row or true, the initial row of the CSV file will be treated as a row of headers. If set to an Array, the contents will be used as the headers. If set to a String, the String is run through a call of ::parse_line with the same :col_sep, :row_sep, and :quote_char as this instance to produce an Array of headers. This setting causes #shift to return rows as CSV::Row objects instead of Arrays and #read to return CSV::Table objects instead of an Array of Arrays.
When passing headers: true - values are parsed into a Row object, where they can be accessed by name.

How to remove a row from a CSV with Ruby

Given the following CSV file, how would you remove all rows that contain the word 'true' in the column 'foo'?
Date,foo,bar
2014/10/31,true,derp
2014/10/31,false,derp
I have a working solution, however it requires making a secondary CSV object csv_no_foo
#csv = CSV.read(#csvfile, headers: true) #http://bit.ly/1mSlqfA
#headers = CSV.open(#csvfile,'r', :headers => true).read.headers
# Make a new CSV
#csv_no_foo = CSV.new(#headers)
#csv.each do |row|
# puts row[5]
if row[#headersHash['foo']] == 'false'
#csv_no_foo.add_row(row)
else
puts "not pushing row #{row}"
end
end
Ideally, I would just remove the offending row from the CSV like so:
...
if row[#headersHash['foo']] == 'false'
#csv.delete(true) #Doesn't work
...
Looking at the ruby documentation, it looks like the row class has a delete_if function. I'm confused on the syntax that that function requires. Is there a way to remove the row without making a new csv object?
http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Row.html#method-i-each
You should be able to use CSV::Table#delete_if, but you need to use CSV::table instead of CSV::read, because the former will give you a CSV::Table object, whereas the latter results in an Array of Arrays. Be aware that this setting will also convert the headers to symbols.
table = CSV.table(#csvfile)
table.delete_if do |row|
row[:foo] == 'true'
end
File.open(#csvfile, 'w') do |f|
f.write(table.to_csv)
end
You might want to filter rows in a ruby manner:
require 'csv'
csv = CSV.parse(File.read(#csvfile), {
:col_sep => ",",
:headers => true
}
).collect { |item| item[:foo] != 'true' }
Hope it help.

import from CSV into Ruby array, with 1st field as hash key, then lookup a field's value given header row

Maybe somebody can help me.
Starting with a CSV file like so:
Ticker,"Price","Market Cap"
ZUMZ,30.00,933.90
XTEX,16.02,811.57
AAC,9.83,80.02
I manage to read them into an array:
require 'csv'
tickers = CSV.read("stocks.csv", {:headers => true, :return_headers => true, :header_converters => :symbol, :converters => :all} )
To verify data, this works:
puts tickers[1][:ticker]
ZUMZ
However this doesn't:
puts tickers[:ticker => "XTEX"][:price]
How would I go about turning this array into a hash using the ticker field as unique key, such that I could easily look up any other field associatively as defined in line 1 of the input? Dealing with many more columns and rows.
Much appreciated!
Like this (it works with other CSVs too, not just the one you specified):
require 'csv'
tickers = {}
CSV.foreach("stocks.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
tickers[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
Result:
{"ZUMZ"=>{:price=>30.0, :market_cap=>933.9}, "XTEX"=>{:price=>16.02, :market_cap=>811.57}, "AAC"=>{:price=>9.83, :market_cap=>80.02}}
You can access elements in this data structure like this:
puts tickers["XTEX"][:price] #=> 16.02
Edit (according to comment): For selecting elements, you can do something like
tickers.select { |ticker, vals| vals[:price] > 10.0 }
CSV.read(file_path, headers:true, header_converters: :symbol, converters: :all).collect do |row|
Hash[row.collect { |c,r| [c,r] }]
end
CSV.read(file_path, headers:true, header_converters: :symbol, converters: :all).collect do |row|
row.to_h
end
To add on to Michael Kohl's answer, if you want to access the elements in the following manner
puts tickers[:price]["XTEX"] #=> 16.02
You can try the following code snippet:
CSV.foreach("Workbook1.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
hash_row = row.headers[1..-1].zip( (Array.new(row.fields.length-1, row.fields[0]).zip(row.fields[1..-1])) ).to_h
hash_row.each{|key, value| tickers[key] ? tickers[key].merge!([value].to_h) : tickers[key] = [value].to_h}
end
To get the best of both worlds (very fast reading from a huge file AND the benefits of a native Ruby CSV object) my code had since evolved into this method:
$stock="XTEX"
csv_data = CSV.parse IO.read(%`|sed -n "1p; /^#{$stock},/p" stocks.csv`), {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all}
# Now the 1-row CSV object is ready for use, eg:
$company = csv_data[:company][0]
$volatility_month = csv_data[:volatility_month][0].to_f
$sector = csv_data[:sector][0]
$industry = csv_data[:industry][0]
$rsi14d = csv_data[:relative_strength_index_14][0].to_f
which is closer to my original method, but only reads in one record plus line 1 of the input csv file containing the headers. The inline sed instructions take care of that--and the whole thing is noticably instant. This this is better than last because now I can access all the fields from Ruby, and associatively, not caring about column numbers anymore as was the case with awk.
Not as 1-liner-ie but this was more clear to me.
csv_headers = CSV.parse(STDIN.gets)
csv = CSV.new(STDIN)
kick_list = []
csv.each_with_index do |row, i|
row_hash = {}
row.each_with_index do |field, j|
row_hash[csv_headers[0][j]] = field
end
kick_list << row_hash
end
While this isn't a 100% native Ruby solution to the original question, should others stumble here and wonder what awk call I wound up using for now, here it is:
$dividend_yield = IO.readlines("|awk -F, '$1==\"#{$stock}\" {print $9}' datafile.csv")[0].to_f
where $stock is the variable I had previously assigned to a company's ticker symbol (the wannabe key field).
Conveniently survives problems by returning 0.0 if: ticker or file or field #9 not found/empty, or if value cannot be typecasted to a float. So any trailing '%' in my case gets nicely truncated.
Note that at this point one could easily add more filters within awk to have IO.readlines return a 1-dim array of output lines from the smaller resulting CSV, eg.
awk -F, '$9 >= 2.01 && $2 > 99.99 {print $0}' datafile.csv
outputs in bash which lines have a DivYld (col 9) over 2.01 and price (col 2) over 99.99. (Unfortunately I'm not using the header row to to determine field numbers, which is where I was ultimately hoping for some searchable associative Ruby array.)

Parse CSV file with header fields as attributes for each row

I would like to parse a CSV file so that each row is treated like an object with the header-row being the names of the attributes in the object. I could write this, but I'm sure its already out there.
Here is my CSV input:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
The code would look something like this:
CSV.open('my_file.csv','r') do |csv_obj|
puts csv_obj.foo #prints 1 the 1st time, "blah" 2nd time, etc
puts csv.bar #prints 2 the first time, 7 the 2nd time, etc
end
With Ruby's CSV module I believe I can only access the fields by index. I think the above code would be a bit more readable. Any ideas?
Using Ruby 1.9 and above, you can get a an indexable object:
CSV.foreach('my_file.csv', :headers => true) do |row|
puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end
It's not dot syntax but it is much nicer to work with than numeric indexes.
As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.
Here is an example of the symbolic syntax using Ruby 1.9. In the examples below, the code reads a CSV file named data.csv from Rails db directory.
:headers => true treats the first row as a header instead of a data row. :header_converters => :symbolize parameter then converts each cell in the header row into Ruby symbol.
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
In Ruby 1.8:
require 'fastercsv'
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
Based on the CSV provided by the Poul (the StackOverflow asker), the output from the example code above will be:
1,2,3
blah,7,blam
4,5,6
Depending on the characters used in the headers of the CSV file, it may be necessary to output the headers in order to see how CSV (FasterCSV) converted the string headers to symbols. You can output the array of headers from within the CSV.foreach.
row.headers
Easy to get a hash in Ruby 2.3:
CSV.foreach('my_file.csv', headers: true, header_converters: :symbol) do |row|
puts row.to_h[:foo]
puts row.to_h[:bar]
end
Although I am pretty late to the discussion, a few months ago I started a "CSV to object mapper" at https://github.com/vicentereig/virgola.
Given your CSV contents, mapping them to an array of FooBar objects is pretty straightforward:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
require 'virgola'
class FooBar
include Virgola
attribute :foo
attribute :bar
attribute :baz
end
csv = <<CSV
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
CSV
foo_bars = FooBar.parse(csv).all
foo_bars.each { |foo_bar| puts foo_bar.foo, foo_bar.bar, foo_bar.baz }
Since I hit this question with some frequency:
array_of_hashmaps = CSV.read("path/to/file.csv", headers: true)
puts array_of_hashmaps.first["foo"] # 1
This is the non-block version, when you want to slurp the whole file.

How do you change headers in a CSV File with FasterCSV then save the new headers?

I'm having trouble understanding the :header_converters and :converters in FasterCSV. Basically, all I want to do is change column headers to their appropriate column names.
something like:
FasterCSV.foreach(csv_file, {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all} ) do |row|
puts row[:some_column_header] # Would be "Some Column Header" in the csv file.
execpt I don't umderstand :symbol and :all in the converter parameters.
The :all converter means that it tries all of the built-in converters, specifically:
:integer: Converts any field Integer() accepts.
:float: Converts any field Float() accepts.
:date: Converts any field Date::parse() accepts.
:date_time: Converts any field DateTime::parse() accepts.
Essentially, it means that it will attempt to convert any field into those values (if possible) instead of leaving them as a string. So if you do row[i] and it would have returned the String value '9', it will instead return an Integer value 9.
Header converters change the way the headers are used to index a row. For example, if doing something like this:
FastCSV.foreach(some_file, :header_converters => :downcase) do |row|
You would index a column with the header "Some Header" as row['some header'].
If you used :symbol instead, you would index it with row[:some_header]. Symbol downcases the header name, replaces spaces with underscores, and removes characters other than a-z, 0-9, and _. It's useful because comparison of symbols is far faster than comparison of strings.
If you want to index a column with row['Some Header'], then just don't provide any :header_converter option.
EDIT:
In response to your comment, headers_convert won't do what you want, I'm afraid. It doesn't change the values of the header row, just how they are used as an index. Instead, you'll have to use the :return_headers option, detect the header row, and make your changes. To change the file and write it out again, you can use something like this:
require 'fastercsv'
input = File.open 'original.csv', 'r'
output = File.open 'modified.csv', 'w'
FasterCSV.filter input, output, :headers => true, :write_headers => true, :return_headers => true do |row|
change_headers(row) if row.header_row?
end
input.close
output.close
If you need to completely replace the original file, add this line after doing the above:
FileUtils.mv 'modified.csv', 'original.csv', :force => true
I've found a simple approach for solving this problem. The FasterCSV library works just fine. I'm sure that ~7 years between when the post was created to now may have something to do with it, but I thought it was worth noting here.
When reading CSV files, the FasterCSV :header_converters option isn't well documented, in my opinion. But, instead of assigning a symbol (header_converters: :symbol) one can assign a lambda (header_converters: lambda {...}). When the CSV library reads the file it transforms the headers using the lambda. Then, one can save a new CSV file that reflects the transformed headers.
For example:
options = {
headers: true,
header_converters: lambda { |h| HEADER_MAP.keys.include?(h.to_sym) ? HEADER_MAP[h.to_sym] : h }
}
table = CSV.read(FILE_TO_PROCESS, options)
File.open(PROCESSED_FILE, "w") do |file|
file.write(table.to_csv)
end
Rewriting CSV file headers is a common requirement for anyone converting exported CSV files into imports.
I found the following approach gave me what I needed:
lookup_headers = { "old": "new", "cat": "dog" } # The desired header swaps
CSV($>, headers: true, write_headers: true) do |csv_out|
CSV.foreach( ARGV[0],
headers: true,
# the following lambda replaces the header if it is found, leaving it if not...
header_converters: lambda{ |h| lookup_headers[h] || h},
return_headers: true) do |master_row|
if master_row.header_row?
# The headers are now correctly replaced by calling the updated headers
csv_out << master_row.headers
else
csv_out << master_row
end
end
end
Hope this helps!

Resources