Import CSV column where header includes X? - ruby

I have a CSV file with a column header like so:
[NPSScore] On a scale of 0 to 10, how likely would you b...
But sometimes the header may look like this:
[NPSScore]
I need to read this column into an array. My code looks like this:
CSV.foreach(csv_file_to_read_for_is_average_nps_score, :col_sep => "\t", :encoding => "BOM|UTF-16LE:UTF-8", :headers => true) do |column|
is_average_nps_score_arr << column['[NPSScore]']
end
However, this will only pick up the latter of the two header options.
How can I write this to pick up any column that includes NPSScore?

To get the first column that includes the string '[NPSScore]' do
column.find{|k,_| k.include? '[NPSScore]' }.last
instead of
column['[NPSScore]']

Related

Ruby CSV converter, remove all converters?

I have some data I was writing from one CSV to another CSV because I need to do some data manipulation.
I noticed the CSV library has some default converters that are taking my values that look like dates and parsing those into new date strings.
I was wondering if I could remove all converters? I tried using my custom converter, but no matter what I do it seems that the dates keep getting parsed.
Here is my code simplified:
require 'csv'
CSV::Converters[:my_converter] = lambda do |value|
value
end
CSV.open('new-data.csv', 'w') do |csv|
data = CSV.read('original-data.csv', :converters => [:my_converter]).each do |row|
csv << row
end
end
The value 9/30/14 0:00 is getting changed to 9/30/2014 0:00, for example.
Are you sure that your CSV file doesn't actually contain the 4-digit year? Try looking at puts File.read('original-data.csv')
When I tried this on Ruby 2.1.8, it didn't change the value
require 'csv'
my_csv_data = 'hello,"9/30/14 0:00",world'
CSV.new(my_csv_data).each do |row|
puts row.inspect # prints ["hello", "9/30/14 0:00", "world"], as expected
end
CSV files are not parsed and converted into objects, the data in the fields is returned as a string. Always. This behavior is different than YAML or JSON, which do convert back to their base types.
Consider this:
require 'csv'
CSV.parse("1,10/1/14,foo") # => [["1", "10/1/14", "foo"]]
All values are strings.
csv = ["foo", 'bar', 1, Date.new(2014, 10, 1)].to_csv # => "foo,bar,1,2014-10-01\n"
Converting an array containing native Ruby objects results in a string of comma-delimited values.
CSV.parse(csv) # => [["foo", "bar", "1", "2014-10-01"]]
Reparsing that string returns the string versions but doesn't attempt to return them to their original types as CSV doesn't have a way of knowing what those were. The developer (you) has to know and do that.
The end-result of all that is that CSV won't change a year from '14' to '2014'. It doesn't know that it's a date, and, because it's not CSV's place to convert to objects, it only splits the fields appropriately and passes the information on to be massaged by the developer.

How to read specific columns of a zipped CSV file

I used the code below to read the contents of a zipped CSV file.
Zip::ZipFile.foreach(file) do |entry|
istream = entry.get_input_stream
data = istream.read
#...
end
It gives me the entire content of the text (CSV) file with headers like below:
NAME AGE GENDER NAME1 29 MALE NAME2 30 FEMALE
but I need specific data of the column. For example, I want to display only the names (NAME). Please help me proceed with this.
Though your example shows ZipFile, you're really asking a CSV question. First, you should check the docs in http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html
You'll find that if you parse your data with the :headers => true option, you'll get a CSV::table object that knows how to extract a column of data as follows. (For obvious reasons, I wouldn't code it this way -- this is for example only.)
require 'zip'
require 'csv'
csv_table = nil
Zip::ZipFile.foreach("x.csv.zip") do |entry|
istream = entry.get_input_stream
data = istream.read
csv_table = CSV.parse(data, :col_sep => " ", :headers => true)
end
With the data you gave, we need `col_sep => " " since you're using spaces as column separators. But now we can do:
>> csv_table["NAME"] # extract the NAME column
=> ["NAME1", "NAME2"]
First you can use this for reference:
http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html
If you have a string you can do
array = CSV.parse("data")
This would give you an array of arrays, one for each line.
Now if you know that the first column for each line is the name you can just manipulate that array i.e
array.map { |line| line[0] }.join(",") # returns NAME,<name>,<name>,<name> ...

How do I make an array of arrays out of a CSV?

I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.

Parse CSV file with header fields as attributes for each row

I would like to parse a CSV file so that each row is treated like an object with the header-row being the names of the attributes in the object. I could write this, but I'm sure its already out there.
Here is my CSV input:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
The code would look something like this:
CSV.open('my_file.csv','r') do |csv_obj|
puts csv_obj.foo #prints 1 the 1st time, "blah" 2nd time, etc
puts csv.bar #prints 2 the first time, 7 the 2nd time, etc
end
With Ruby's CSV module I believe I can only access the fields by index. I think the above code would be a bit more readable. Any ideas?
Using Ruby 1.9 and above, you can get a an indexable object:
CSV.foreach('my_file.csv', :headers => true) do |row|
puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end
It's not dot syntax but it is much nicer to work with than numeric indexes.
As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.
Here is an example of the symbolic syntax using Ruby 1.9. In the examples below, the code reads a CSV file named data.csv from Rails db directory.
:headers => true treats the first row as a header instead of a data row. :header_converters => :symbolize parameter then converts each cell in the header row into Ruby symbol.
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
In Ruby 1.8:
require 'fastercsv'
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
Based on the CSV provided by the Poul (the StackOverflow asker), the output from the example code above will be:
1,2,3
blah,7,blam
4,5,6
Depending on the characters used in the headers of the CSV file, it may be necessary to output the headers in order to see how CSV (FasterCSV) converted the string headers to symbols. You can output the array of headers from within the CSV.foreach.
row.headers
Easy to get a hash in Ruby 2.3:
CSV.foreach('my_file.csv', headers: true, header_converters: :symbol) do |row|
puts row.to_h[:foo]
puts row.to_h[:bar]
end
Although I am pretty late to the discussion, a few months ago I started a "CSV to object mapper" at https://github.com/vicentereig/virgola.
Given your CSV contents, mapping them to an array of FooBar objects is pretty straightforward:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
require 'virgola'
class FooBar
include Virgola
attribute :foo
attribute :bar
attribute :baz
end
csv = <<CSV
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
CSV
foo_bars = FooBar.parse(csv).all
foo_bars.each { |foo_bar| puts foo_bar.foo, foo_bar.bar, foo_bar.baz }
Since I hit this question with some frequency:
array_of_hashmaps = CSV.read("path/to/file.csv", headers: true)
puts array_of_hashmaps.first["foo"] # 1
This is the non-block version, when you want to slurp the whole file.

How do you change headers in a CSV File with FasterCSV then save the new headers?

I'm having trouble understanding the :header_converters and :converters in FasterCSV. Basically, all I want to do is change column headers to their appropriate column names.
something like:
FasterCSV.foreach(csv_file, {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all} ) do |row|
puts row[:some_column_header] # Would be "Some Column Header" in the csv file.
execpt I don't umderstand :symbol and :all in the converter parameters.
The :all converter means that it tries all of the built-in converters, specifically:
:integer: Converts any field Integer() accepts.
:float: Converts any field Float() accepts.
:date: Converts any field Date::parse() accepts.
:date_time: Converts any field DateTime::parse() accepts.
Essentially, it means that it will attempt to convert any field into those values (if possible) instead of leaving them as a string. So if you do row[i] and it would have returned the String value '9', it will instead return an Integer value 9.
Header converters change the way the headers are used to index a row. For example, if doing something like this:
FastCSV.foreach(some_file, :header_converters => :downcase) do |row|
You would index a column with the header "Some Header" as row['some header'].
If you used :symbol instead, you would index it with row[:some_header]. Symbol downcases the header name, replaces spaces with underscores, and removes characters other than a-z, 0-9, and _. It's useful because comparison of symbols is far faster than comparison of strings.
If you want to index a column with row['Some Header'], then just don't provide any :header_converter option.
EDIT:
In response to your comment, headers_convert won't do what you want, I'm afraid. It doesn't change the values of the header row, just how they are used as an index. Instead, you'll have to use the :return_headers option, detect the header row, and make your changes. To change the file and write it out again, you can use something like this:
require 'fastercsv'
input = File.open 'original.csv', 'r'
output = File.open 'modified.csv', 'w'
FasterCSV.filter input, output, :headers => true, :write_headers => true, :return_headers => true do |row|
change_headers(row) if row.header_row?
end
input.close
output.close
If you need to completely replace the original file, add this line after doing the above:
FileUtils.mv 'modified.csv', 'original.csv', :force => true
I've found a simple approach for solving this problem. The FasterCSV library works just fine. I'm sure that ~7 years between when the post was created to now may have something to do with it, but I thought it was worth noting here.
When reading CSV files, the FasterCSV :header_converters option isn't well documented, in my opinion. But, instead of assigning a symbol (header_converters: :symbol) one can assign a lambda (header_converters: lambda {...}). When the CSV library reads the file it transforms the headers using the lambda. Then, one can save a new CSV file that reflects the transformed headers.
For example:
options = {
headers: true,
header_converters: lambda { |h| HEADER_MAP.keys.include?(h.to_sym) ? HEADER_MAP[h.to_sym] : h }
}
table = CSV.read(FILE_TO_PROCESS, options)
File.open(PROCESSED_FILE, "w") do |file|
file.write(table.to_csv)
end
Rewriting CSV file headers is a common requirement for anyone converting exported CSV files into imports.
I found the following approach gave me what I needed:
lookup_headers = { "old": "new", "cat": "dog" } # The desired header swaps
CSV($>, headers: true, write_headers: true) do |csv_out|
CSV.foreach( ARGV[0],
headers: true,
# the following lambda replaces the header if it is found, leaving it if not...
header_converters: lambda{ |h| lookup_headers[h] || h},
return_headers: true) do |master_row|
if master_row.header_row?
# The headers are now correctly replaced by calling the updated headers
csv_out << master_row.headers
else
csv_out << master_row
end
end
end
Hope this helps!

Resources