Ruby - reading CSV from STDIN - ruby

I'm trying to read from .CSV file and create objects with attributes of every row.
My code works fine:
def self.load_csv
puts "Name of a file?"
filename = STDIN.gets.chomp
rows = []
text = File.read(filename).gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
row = row.to_h
row = row.each_with_object({}){|(k,v), h| h[k.to_sym] = v}
rows << row
end
rows.map do |row|
Call.new(row)
end
end
end
Now I wanted to take filename from STDIN. I simply changed:
def self.load_csv(filename)
rows = []
text = File.read(filename).gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
row = row.to_h
row = row.each_with_object({}){|(k,v), h| h[k.to_sym] = v}
rows << row
end
rows.map do |row|
Call.new(row)
end
end
end
and when I try ruby program.rb filename.csv I got error no implicit conversion of String into IO, and after removing line with File.read it does nothing - like an infinite loop maybe? Of course I invoke ceratain methods with STDIN argument in different parts of the code. I used similiar code for reading from STDIN with success in the past, what am I doing wrong this time?

This code is working:
require 'csv'
class Call
def initialize(args)
end
end
def load_csv(filename)
rows = []
text = File.read(filename).gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
row = row.to_h
row = row.each_with_object({}){ |(k,v), h| h[k.to_sym] = v }
rows << row
end
rows.map { |row| Call.new(row) }
end
filename = ARGV[0]
load_csv(filename)

Related

Writing data into a CSV file by two different CSV files

So, i'm learning ruby and i've been stuck with this for a long time and i need some help.
I need to write to a CSV file from two different CSV files and i have the code to do it but in 2 different functions and i need the two files together in one.
So thats the code:
require 'CSV'
class Plantas <
Struct.new( :code)
end
class Especies <
Struct.new(:id, :type, :code, :name_es, :name_ca, :name_en, :latin_name, :customer_id )
end
def ecode
f_inECODE = File.open("pflname.csv", "r") #get EPPOCODE
f_out=CSV.open("plantas.csv", "w+", :headers => true) #outputfile
f_inECODE.each_line do |line|
fields = line.split(',')
newPlant = Plantas.new
newPlant.code = fields[2].tr_s('"', '').strip #eppocode
plant = [newPlant.code] #linies a imprimir
f_out << plant
end
end
def data
f_dataspices=File.open("spices.csv", "r")
f_out=CSV.open("plantas.csv", "w+", :headers => true) #outputfile
f_dataspices.each_line do |line|
fields = line.split(',')
newEspecies = Especies.new
newEspecies.id = fields[0].tr_s('"', '').strip
newEspecies.type = fields[1].tr_s('"', '').strip
newEspecies.code = fields[2].tr_s('"', '').strip
newEspecies.name_es = fields[3].tr_s('"', '').strip
newEspecies.name_ca = fields[4].tr_s('"', '').strip
newEspecies.name_en = fields[5].tr_s('"', '').strip
newEspecies.latin_name = fields[6].tr_s('"', '').strip
newEspecies.customer_id = fields[7].tr_s('"', '').strip
especia = [newEspecies.id,newEspecies.type,newEspecies.code,newEspecies.name_es,newEspecies.name_ca,newEspecies.name_en,newEspecies.latin_name,newEspecies.customer_id]
f_out << especia
end
end
data
ecode
And the wished output would be like this: species.csv + ecode.csv
"id","type","code","name_es","name_ca","name_en","latin_name","customer_id","ecode"
7205,"DunSpecies",NULL,"0","0","0","",11630,LEECO
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273,LEE3O
7204,"DunSpecies",NULL,"0","0","0","",11630,L4ECO
And the actual is this:
"id","type","code","name_es","name_ca","name_en","latin_name","customer_id"
7205,"DunSpecies",NULL,"0","0","0","",11630
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273
7204,"DunSpecies",NULL,"0","0","0","",11630
(without ecode)
From one side i have the ecode and from the other the whole data i just need to put it together.
I'd like to put all together in the same file (plantas.csv)
I did in two different functions because I don't know how to put all together with one foreach I would like to put all in the same function but I don't how doing it.
If someone could help me to get this code all in one function and writing the results in the same file I would be so grateful.
An example of the input of the file ecode.csv (in which I just want the ecode field) is this:
"""identifier"",""datatype"",""code"",""lang"",""langno"",""preferred"",""status"",""creation"",""modification"",""country"",""fullname"",""authority"",""shortname"""
"""N1952"",""PFL"",""LEECO"",""la"",""1"",""0"",""N"",""06/06/2000"",""09/03/2010"","""",""Leea coccinea non"",""Planchon"",""Leea coccinea non"""
"""N2974"",""PFL"",""LEECO"",""en"",""1"",""0"",""N"",""06/06/2000"",""21/02/2011"","""",""west Indian holly"","""",""West Indian holly"""
An example of the input of the file data.csv (in which I want all the fields) is this:
"id","type","code","name_es","name_ca","name_en","latin_name","customer_id"
7205,"DunSpecies",NULL,"0","0","0","",11630
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273
And the way to link both files is by creating a third file in which i write everything in it!
At least this is my idea, i dont know if there is a simpler way to do it.
Thanks!
Cleaning up ecode.csv made it more challenging, but here is what I came up with:
In case, data.csv and ecode.csv are matched by row numbers:
require 'csv'
data = CSV.read('data.csv', headers: true).to_a
headers = data.shift << 'eppocode'
double_quoted_ecode = CSV.read('ecode.csv')
ecodeIO = StringIO.new
ecodeIO.puts double_quoted_ecode.to_a
ecodeIO.rewind
ecode = CSV.parse(ecodeIO, headers: true)
CSV.open('plantas.csv', 'w+') do |plantas|
plantas << headers
data.each.with_index do |row, idx|
planta = row + [ecode['code'][idx]]
plantas << planta
end
end
Using your example files, this gives you the following plantas.csv:
id,type,code,name_es,name_ca,name_en,latin_name,customer_id,eppocode
7205,DunSpecies,NULL,0,0,0,"",11630,LEECO
7437,DunSpecies,NULL,0,Xicoira,0,"",5273,LEECO
In case, entries are matched by data.csv's id and ecode.csv's identifier:
require 'csv'
data = CSV.read('data.csv', headers: true)
headers = data.headers << 'eppocode'
double_quoted_ecode = CSV.read('ecode.csv')
ecodeIO = StringIO.new
ecodeIO.puts double_quoted_ecode.to_a
ecodeIO.rewind
ecode = CSV.parse(ecodeIO, headers: true)
CSV.open('plantas.csv', 'w+') do |plantas|
plantas << headers
data.each do |row|
id = row['id']
ecode_row = ecode.find { |entry| entry['identifier'] == id } || {}
planta = row << ecode_row['code']
plantas << planta
end
end
I hope you find this helpful.
Data
Let's begin by creating the two CSV files. To make the results easier to follow I have arbitrarily removed some of the fields in each file, and changed one field value.
ecode.csv
ecode = '"""identifier"",""datatype"",""code"",""lang"",""langno"",""preferred"",""status"",""creation"",""modification"",""country"",""fullname"",""authority"",""shortname""" """N1952"",""PFL"",""LEECO"",""la"",""1"",""0"",""N"",""06/06/2000"",""09/03/2010"","""",""Leea coccinea non"",""Planchon"",""Leea coccinea non""" """N2974"",""PFL"",""LEEC1"",""en"",""1"",""0"",""N"",""06/06/2000"",""21/02/2011"","""",""west Indian holly"","""",""West Indian holly"""'
File.write('ecode.csv', ecode)
#=> 452
data.csv
data = '"id","type","code","customer_id"\n7205,"DunSpecies",NULL,11630\n7437,"DunSpecies",NULL,,5273'
File.write('data.csv', data)
#=> 90
Code
CSV.open('plantas.csv', 'w') do |csv_out|
converter = ->(s) { s.delete('"') }
epposcode = CSV.foreach('ecode.csv',
headers:true,
header_converters: [converter],
converters: [converter]
).map { |csv| csv["code"] }
headers = CSV.open('data.csv', &:readline) << 'epposcode'
csv_out << headers
CSV.foreach('data.csv', headers:true) do |row|
csv_out << (row << epposcode.shift)
end
end
#=> 90
Result
Let's see what was written.
puts File.read('plantas.csv')
id,type,code,customer_id,epposcode
7205,DunSpecies,NULL,11630,LEECO
7437,DunSpecies,NULL,,5273,LEEC1
Explanation
The structure we want is the following.
CSV.open('plantas.csv', 'w') do |csv_out|
epposcode = <array of 'code' field values from 'ecode.csv'>
headers = <headers from 'data.csv' to which 'epposcode' is appended>
csv_out << headers
CSV.foreach('data.csv', headers:true) do |row|
csv_out << <row of 'data.csv' to which an element of epposcode is appended>>
end
end
CSV::open is the main CSV method for writing files and CSV::foreach is generally my method-of-choice for reading CSV files. I could have instead written the following.
csv_out = CSV.open('plantas.csv', 'w')
epposcode = <array of 'code' field values from 'ecode.csv'>
headers = <headers from 'data.csv' to which 'epposcode' is appended>
csv_out << headers
CSV.foreach('data.csv', headers:true) do |row|
csv_out << <row of 'data.csv' to which an element of epposcode is appended>>
end
csv_out.close
but using a block is convenient because the file is closed before returning from the block.
It is convenient to use a converter for both the header fields and the row fields:
converter = ->(s) { s.delete('"') }
This is a proc (I've defined a lambda) that removes double quotes from strings. They are specified as two of foreach's optional arguments:
epposcode = CSV.foreach('ecode.csv',
headers:true,
header_converters: [converter],
converters: [converter]
)
Search for "Data Converters" in the CSV doc.
We invoke foreach without a block to return an enumerator, so it can be chained to map:
epposcode = CSV.foreach('ecode.csv',
headers:true,
header_converters: [converter],
converters: [converter]
).map { |csv| csv["code"] }
For the example,
epposcode
#=> ["LEECO", "LEEC1"]

How to call hash values outside class from defined hash map inside class methods?

Read a csv format file and construct a new class with the name of the file dynamically. So if the csv is persons.csv, the ruby class should be person, if it's places.csv, the ruby class should be places
Also create methods for reading and displaying each value in "csv" file and values in first row of csv file will act as name of the function.
Construct an array of objects and associate each object with the row of a csv file. For example the content of the csv file could be
name,age,city
abd,45,TUY
kjh,65,HJK
Previous code :
require 'csv'
class Feed
def initialize(source_name, column_names = [])
if column_names.empty?
column_names = CSV.open(source_name, 'r', &:first)
end
columns = column_names.reduce({}) { |columns, col_name| columns[col_name] = []; columns }
define_singleton_method(:columns) { column_names }
column_names.each do |col_name|
define_singleton_method(col_name.to_sym) { columns[col_name] }
end
CSV.foreach(source_name, headers: true) do |row|
column_names.each do |col_name|
columns[col_name] << row[col_name]
end
end
end
end
feed = Feed.new('input.csv')
puts feed.columns #["name", "age", "city"]
puts feed.name # ["abd", "kjh"]
puts feed.age # ["45", "65"]
puts feed.city # ["TUY", "HJK"]
I am trying to refine this solution using class methods and split code into smaller methods. Calling values outside the class using key names but facing errors like "undefined method `age' for Feed:Class". Is that a way I can access values outside the class ?
My solution looks like -
require 'csv'
class Feed
attr_accessor :column_names
def self.col_name(source_name, column_names = [])
if column_names.empty?
#column_names = CSV.open(source_name, :headers => true)
end
columns = #column_names.reduce({}) { |columns, col_name| columns[col_name] = []; columns }
end
def self.get_rows(source_name)
col_name(source_name, column_names = [])
define_singleton_method(:columns) { column_names }
column_names.each do |col_name|
define_singleton_method(col_name.to_sym) { columns[col_name] }
end
CSV.foreach(source_name, headers: true) do |row|
#column_names.each do |col_name|
columns[col_name] << row[col_name]
end
end
end
end
obj = Feed.new
Feed.get_rows('Input.csv')
puts obj.class.columns
puts obj.class.name
puts obj.class.age
puts obj.class.city
Expected Result -
input = Input.new
p input.name # ["abd", "kjh"]
p input.age # ["45", "65"]
input.name ='XYZ' # Value must be appended to array
input.age = 25
p input.name # ["abd", "kjh", "XYZ"]
p input.age # ["45", "65", "25"]
Let's create the CSV file.
str =<<END
name,age,city
abd,45,TUY
kjh,65,HJK
END
FName = 'temp/persons.csv'
File.write(FName, str)
#=> 36
Now let's create a class:
klass = Class.new
#=> #<Class:0x000057d0519de8a0>
and name it:
class_name = File.basename(FName, ".csv").capitalize
#=> "Persons"
Object.const_set(class_name, klass)
#=> Persons
Persons.class
#=> Class
See File::basename, String#capitalize and Module#const_set.
Next read the CSV file with headers into a CSV::Table object:
require 'csv'
csv = CSV.read(FName, headers: true)
#=> #<CSV::Table mode:col_or_row row_count:3>
csv.class
#=> CSV::Table
See CSV#read. We may now create the methods name, age and city.
csv.headers.each { |header| klass.define_method(header) { csv[header] } }
See CSV#headers, Module::define_method and CSV::Row#[].
We can now confirm they work as intended:
k = klass.new
k.name
#=> ["abd", "kjh"]
k.age
#=> ["45", "65"]
k.city
#=> ["TUY", "HJK"]
or
p = Persons.new
#=> #<Persons:0x0000598dc6b01640>
p.name
#=> ["abd", "kjh"]
and so on.

Counting every line and adding to the end of each line in a csv

What I want to do is count how many # are in each row and put this value into a total field at the end.
I mean it sort of works, but its only adding the value of the last line it counts.
My Csv
Header,Header,Header
Info#,Info,Info
Info,Info##,Info
Info,Info,Info###
My Code
require "csv"
table = CSV.read("my_test.csv", {
headers: true,
col_sep: ","
})
File.readlines('my_test.csv').each do |line|
table.each do |row|
at_count = line.count('#')
row["Total"] = at_count
end
end
CSV.open("my_test.csv", "w") do |f|
f << table.headers
table.each { | row | f << row }
end
Current Result
Header,Header,Header,Total
Info#,Info,Info,3
Info,Info##,Info,3
Info,Info,Info###,3
You don't need File.readlines; CSV already read it.
require "csv"
table = CSV.read("test.csv", { headers: true}) #just shorter
table.each do |row | #no readlines
at_count = row.to_s.count('#') # note the to_s
row["Total"] = at_count
end
CSV.open("my_test.csv", "w") do |f |
f << table.headers
table.each { | row | f << row}
end

Ruby how to merge two CSV files with slightly different headers

I have two CSV files with some common headers and others that only appear in one or in the other, for example:
# csv_1.csv
H1,H2,H3
V11,V22,V33
V14,V25,V35
# csv_2.csv
H1,H4
V1a,V4b
V1c,V4d
I would like to merge both and obtain a new CSV file that combines all the information for the previous CSV files. Injecting new columns when needed, and feeding the new cells with null values.
Result example:
H1,H2,H3,H4
V11,V22,V33,
V14,V25,V35,
V1a,,,V4b
V1c,,,V4d
Challenge accepted :)
#!/usr/bin/env ruby
require "csv"
module MergeCsv
class << self
def run(csv_paths)
csv_files = csv_paths.map { |p| CSV.read(p, headers: true) }
merge(csv_files)
end
private
def merge(csv_files)
headers = csv_files.flat_map(&:headers).uniq.sort
hash_array = csv_files.flat_map(&method(:csv_to_hash_array))
CSV.generate do |merged_csv|
merged_csv << headers
hash_array.each do |row|
merged_csv << row.values_at(*headers)
end
end
end
# Probably not the most performant way, but easy
def csv_to_hash_array(csv)
csv.to_a[1..-1].map { |row| csv.headers.zip(row).to_h }
end
end
end
if(ARGV.length == 0)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV)
I have the answer, I just wanted to help people that is looking for the same solution
require "csv"
module MergeCsv
def self.run(csv_1_path, csv_2_path)
merge(File.read(csv_1_path), File.read(csv_2_path))
end
def self.merge(csv_1, csv_2)
csv_1_table = CSV.parse(csv_1, :headers => true)
csv_2_table = CSV.parse(csv_2, :headers => true)
return csv_2_table.to_csv if csv_1_table.headers.empty?
return csv_1_table.to_csv if csv_2_table.headers.empty?
headers_in_1_not_in_2 = csv_1_table.headers - csv_2_table.headers
headers_in_1_not_in_2.each do |header_in_1_not_in_2|
csv_2_table[header_in_1_not_in_2] = nil
end
headers_in_2_not_in_1 = csv_2_table.headers - csv_1_table.headers
headers_in_2_not_in_1.each do |header_in_2_not_in_1|
csv_1_table[header_in_2_not_in_1] = nil
end
csv_2_table.each do |csv_2_row|
csv_1_table << csv_1_table.headers.map { |csv_1_header| csv_2_row[csv_1_header] }
end
csv_1_table.to_csv
end
end
if(ARGV.length != 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV[0], ARGV[1])
And execute it from the console this way:
$ ruby merge_csv.rb csv_1.csv csv_2.csv
Any other, maybe cleaner, solution is welcome.
Simplied first answer:
How to use it:
listPart_A = CSV.read(csv_path_A, headers:true)
listPart_B = CSV.read(csv_path_B, headers:true)
listPart_C = CSV.read(csv_path_C, headers:true)
list = merge(listPart_A,listPart_B,listPart_C)
Function:
def merge(*csvs)
headers = csvs.map {|csv| csv.headers }.flatten.compact.uniq.sort
csvs.flat_map(&method(:csv_to_hash_array))
end
def csv_to_hash_array(csv)
csv.to_a[1..-1].map do |row|
Hash[csv.headers.zip(row)]
end
end
I had to do something very similar
to merge n CSV files that the might share some of the columns but some may not
if you want to keep a structure and do it easily,
I think the best way is to convert to hash and then re-convert to CSV file
my solution:
#!/usr/bin/env ruby
require "csv"
def join_multiple_csv(csv_path_array)
return nil if csv_path_array.nil? or csv_path_array.empty?
f = CSV.parse(File.read(csv_path_array[0]), :headers => true)
f_h = {}
f.headers.each {|header| f_h[header] = f[header]}
n_rows = f.size
csv_path_array.shift(1)
csv_path_array.each do |csv_file|
curr_csv = CSV.parse(File.read(csv_file), :headers => true)
curr_h = {}
curr_csv.headers.each {|header| curr_h[header] = curr_csv[header]}
new_headers = curr_csv.headers - f_h.keys
exist_headers = curr_csv.headers - new_headers
new_headers.each { |new_header|
f_h[new_header] = Array.new(n_rows) + curr_csv[new_header]
}
exist_headers.each {|exist_header|
f_h[exist_header] = f_h[exist_header] + curr_csv[exist_header]
}
n_rows = n_rows + curr_csv.size
end
csv_string = CSV.generate do |csv|
csv << f_h.keys
(0..n_rows-1).each do |i|
row = []
f_h.each_key do |header|
row << f_h[header][i]
end
csv << row
end
end
return csv_string
end
if(ARGV.length < 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2> .. <file_path_csv_n>"
exit 1
end
csv_str = join_multiple_csv(ARGV)
f = File.open("results.csv", "w")
f.write(csv_str)
puts "CSV merge is done"

Why does it output one document, when it used to output multiple?

This used to output a document for each person on the list. But since I added the code to determine the most popular date & time for a list of given dates, it now only outputs one document for the first person in the list.
def save_thank_you_letters(id,form_letter)
Dir.mkdir("output") unless Dir.exists?("output")
filename = "output/thanks_#{id}.html"
File.open(filename,'w') do |file|
file.puts form_letter
end
end
puts "EventManager initialized."
contents = CSV.open 'event_attendees.csv', headers: true, header_converters: :symbol
template_letter = File.read "form_letter.erb"
erb_template = ERB.new template_letter
contents.each do |row|
id = row[0]
name = row[:first_name]
zipcode = clean_zipcode(row[:zipcode])
phone = clean_phonenumber(row[:homephone])
legislators = legislators_by_zipcode(zipcode)
form_letter = erb_template.result(binding)
save_thank_you_letters(id,form_letter)
# IT WORKS OK UNTIL I ADD THIS PART...
times = contents.map { |row| row[:regdate] }
target_times = Hash[times.group_by do |t|
DateTime.strptime(t, '%m/%d/%y %H:%M').hour
end.map do |k,v|
[k, v.count]
end.sort_by do |k,v|
v
end.reverse]
target_days = Hash[times.group_by do |t|
DateTime.strptime(t, '%m/%d/%y %H:%M').wday
end.map do |k,v|
[Date::ABBR_DAYNAMES[k], v.count]
end.sort_by do |k,v|
v
end.reverse]
puts target_times
puts target_days
end
I think it is something to do with the way that I am processing the data from the date/time data. If I remove this, I get an html document for each person on the list. But if I include it, I get the date & time info that I am looking for — but it only generates a document for the first person in the list.
Can someone please explain why what I am doing does not work? I would like it to print the times and the days of the week, but ALSO generate an html document for each person on the list.
Thanks!
When you read CSV file, you read it line by line moving internal pointer. Once you reached the end of file, this pointer stays there so every time you try to fetch new row you'll get nil unless you rewind the file. So, your code started iteration on this line:
contents.each do |row|
This fetched the first row and moved the cursor to the next line. However inside the loop you did contents.map {...} which read the whole csv file and left the curses at the end of the file.
So to fix it you need to move the statistic bits outside the loop (before or after) and rewind the file (reset the cursor) before second iteration:
contents.each do |row|
id = row[0]
name = row[:first_name]
zipcode = clean_zipcode(row[:zipcode])
phone = clean_phonenumber(row[:homephone])
legislators = legislators_by_zipcode(zipcode)
form_letter = erb_template.result(binding)
save_thank_you_letters(id,form_letter)
end
contents.rewind
times = contents.map { |row| row[:regdate] }
target_times = Hash[times.group_by do |t|
DateTime.strptime(t, '%m/%d/%y %H:%M').hour
end.map do |k,v|
[k, v.count]
end.sort_by do |k,v|
v
end.reverse]
target_days = Hash[times.group_by do |t|
DateTime.strptime(t, '%m/%d/%y %H:%M').wday
end.map do |k,v|
[Date::ABBR_DAYNAMES[k], v.count]
end.sort_by do |k,v|
v
end.reverse]
puts target_times
puts target_days

Resources