How can I control the output formats used by Ruby CSV? - ruby

I'd like to be able to change the date and time formats used by CSV when generating csv output. For example, instead of generating '2004-1-30' for a date, I'd like it to generate '1/30/2004'.
How can I do that?

Here is a complete example :
require 'csv'
require 'date'
str = <<_
2004-1-30,foo
2004-11-20,bar
_
File.write('a',str)
CSV::Converters[:cdate] = lambda do |s|
begin
Date.strptime(s,"%Y-%m-%d").strftime("%-m/%d/%Y")
rescue ArgumentError
s
end
end
CSV.foreach('a',:converters => :cdate) do |row|
p row
end
# >> ["1/30/2004", "foo"]
# >> ["11/20/2004", "bar"]
Look at the documentation of Converters.
An Array of names from the Converters Hash and/or lambdas that handle custom conversion. A single converter doesn’t have to be in an Array. All built-in converters try to transcode fields to UTF-8 before converting. The conversion will fail if the data cannot be transcoded, leaving the field unchanged.

Related

Ruby CSV converter, remove all converters?

I have some data I was writing from one CSV to another CSV because I need to do some data manipulation.
I noticed the CSV library has some default converters that are taking my values that look like dates and parsing those into new date strings.
I was wondering if I could remove all converters? I tried using my custom converter, but no matter what I do it seems that the dates keep getting parsed.
Here is my code simplified:
require 'csv'
CSV::Converters[:my_converter] = lambda do |value|
value
end
CSV.open('new-data.csv', 'w') do |csv|
data = CSV.read('original-data.csv', :converters => [:my_converter]).each do |row|
csv << row
end
end
The value 9/30/14 0:00 is getting changed to 9/30/2014 0:00, for example.
Are you sure that your CSV file doesn't actually contain the 4-digit year? Try looking at puts File.read('original-data.csv')
When I tried this on Ruby 2.1.8, it didn't change the value
require 'csv'
my_csv_data = 'hello,"9/30/14 0:00",world'
CSV.new(my_csv_data).each do |row|
puts row.inspect # prints ["hello", "9/30/14 0:00", "world"], as expected
end
CSV files are not parsed and converted into objects, the data in the fields is returned as a string. Always. This behavior is different than YAML or JSON, which do convert back to their base types.
Consider this:
require 'csv'
CSV.parse("1,10/1/14,foo") # => [["1", "10/1/14", "foo"]]
All values are strings.
csv = ["foo", 'bar', 1, Date.new(2014, 10, 1)].to_csv # => "foo,bar,1,2014-10-01\n"
Converting an array containing native Ruby objects results in a string of comma-delimited values.
CSV.parse(csv) # => [["foo", "bar", "1", "2014-10-01"]]
Reparsing that string returns the string versions but doesn't attempt to return them to their original types as CSV doesn't have a way of knowing what those were. The developer (you) has to know and do that.
The end-result of all that is that CSV won't change a year from '14' to '2014'. It doesn't know that it's a date, and, because it's not CSV's place to convert to objects, it only splits the fields appropriately and passes the information on to be massaged by the developer.

How can I convert this CSV to JSON with Ruby?

I am trying to convert a CSV file to JSON using Ruby. I am very, very, green when it comes to working with Ruby (or any language for that matter) so the answers may need to be dumbed down for me. Putting it in JSON seems like the most reasonable solution to me because I understand how to work with JSON when assigning variables equal to the attributes that come in the response. If there is a better way to do it, feel free to teach me.
My CSV is in the following format:
Header1,Header,Header3
ValueX,ValueY,ValueZ
I would like to be able to use the data to say something along the lines of this:
For each ValueX in Row 1 after the headers, check if valueZ is > ValueY. If yes, do this, if no do that. I understand how to do the if statement, just now how to parse out my information into variables/arrays.
Any ideas here?
require 'csv'
require 'json'
rows = []
CSV.foreach('a.csv', headers: true, converters: :all) do |row|
rows << row.to_hash
end
puts rows.to_json
# => [{"Header1":"ValueX","Header":"ValueY","Header3":"ValueZ"}]
Here is a first pointer:
require 'csv'
data = CSV.read('your_file.csv', { :col_sep => ',' }
Now you should have the data in data; you can test in irb.
I don't entirely understand the question:
if z > y
# do this
else
# do that
end
For JSON, you should be able to do JSON.parse().
I am not sure what target format JSON requires, probably a Hash.
You can populate your hash with the dataset from the CVS:
hash = Hash.new
hash[key_goes_here] = value_here

CSV.generate and converters?

I'm trying to create a converter to remove newline characters from CSV output.
I've got:
nonewline=lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
I've verified that this works properly IF I load a variable and then run something like:
csv=CSV(variable,:converters=>[nonewline])
However, I'm attempting to use this code to update a bunch of preexisting code using CSV.generate, and it does not appear to work at all.
CSV.generate(:converters=>[nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
returns:
"\"hello\ngoodbye\"\n"
I've tried quite a few things as well as trying other examples I've found online, and it appears as though :converters has no effect when used with CSV.generate.
Is this correct, or is there something I'm missing?
You need to write your converter as as below :
CSV::Converters[:nonewline] = lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
Then do :
CSV.generate(:converters => [:nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
Read the documentation Converters .
Okay, above part I didn't remove, as to show you how to write the custom CSV converters. The way you wrote it is incorrect.
Read the documentation of CSV::generate
This method wraps a String you provide, or an empty default String, in a CSV object which is passed to the provided block. You can use the block to append CSV rows to the String and when the block exits, the final String will be returned.
After reading the docs, it is quite clear that this method is for writing to a csv file, not for reading. Now all the converters options ( like :converters, :header_converters) is applied, when you are reading a CSV file, but not applied when you are writing into a CSV file.
Let me show you 2 examples to illustrate this more clearly.
require 'csv'
string = <<_
foo,bar
baz,quack
_
File.write('a',string)
CSV::Converters[:upcase] = lambda do |s|
s.upcase
end
I am reading from a CSV file, so :converters option is applied to it.
CSV.open('a','r',:converters => :upcase) do |csv|
puts csv.read
end
output
# >> FOO
# >> BAR
# >> BAZ
# >> QUACK
Now I am writing into the CSV file, converters option is not applied.
CSV.open('a','w',:converters => :upcase) do |csv|
csv << ['dog','cat']
end
CSV.read('a') # => [["dog", "cat"]]
Attempting to remove newlines using :converters did not work.
I had to override the << method from csv.rb adding the following code to it:
# Change all CR/NL's into one space
row.map! { |element|
if element.is_a?(String)
element.gsub(/(\r?\n)+/,' ')
else
element
end
}
Placed right before
output = row.map(&#quote).join(#col_sep) + #row_sep # quote and separate
at line 21.
I would think this would be a good patch to CSV, as newlines will always produce bad CSV output.

How do I make an array of arrays out of a CSV?

I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.

Parse CSV file with header fields as attributes for each row

I would like to parse a CSV file so that each row is treated like an object with the header-row being the names of the attributes in the object. I could write this, but I'm sure its already out there.
Here is my CSV input:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
The code would look something like this:
CSV.open('my_file.csv','r') do |csv_obj|
puts csv_obj.foo #prints 1 the 1st time, "blah" 2nd time, etc
puts csv.bar #prints 2 the first time, 7 the 2nd time, etc
end
With Ruby's CSV module I believe I can only access the fields by index. I think the above code would be a bit more readable. Any ideas?
Using Ruby 1.9 and above, you can get a an indexable object:
CSV.foreach('my_file.csv', :headers => true) do |row|
puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end
It's not dot syntax but it is much nicer to work with than numeric indexes.
As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.
Here is an example of the symbolic syntax using Ruby 1.9. In the examples below, the code reads a CSV file named data.csv from Rails db directory.
:headers => true treats the first row as a header instead of a data row. :header_converters => :symbolize parameter then converts each cell in the header row into Ruby symbol.
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
In Ruby 1.8:
require 'fastercsv'
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
Based on the CSV provided by the Poul (the StackOverflow asker), the output from the example code above will be:
1,2,3
blah,7,blam
4,5,6
Depending on the characters used in the headers of the CSV file, it may be necessary to output the headers in order to see how CSV (FasterCSV) converted the string headers to symbols. You can output the array of headers from within the CSV.foreach.
row.headers
Easy to get a hash in Ruby 2.3:
CSV.foreach('my_file.csv', headers: true, header_converters: :symbol) do |row|
puts row.to_h[:foo]
puts row.to_h[:bar]
end
Although I am pretty late to the discussion, a few months ago I started a "CSV to object mapper" at https://github.com/vicentereig/virgola.
Given your CSV contents, mapping them to an array of FooBar objects is pretty straightforward:
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
require 'virgola'
class FooBar
include Virgola
attribute :foo
attribute :bar
attribute :baz
end
csv = <<CSV
"foo","bar","baz"
1,2,3
"blah",7,"blam"
4,5,6
CSV
foo_bars = FooBar.parse(csv).all
foo_bars.each { |foo_bar| puts foo_bar.foo, foo_bar.bar, foo_bar.baz }
Since I hit this question with some frequency:
array_of_hashmaps = CSV.read("path/to/file.csv", headers: true)
puts array_of_hashmaps.first["foo"] # 1
This is the non-block version, when you want to slurp the whole file.

Resources