How to change headers of a CSV file using Ruby - ruby

I am trying to change the headers of a CSV file, however I am only able to change the rows of data, and not the headers.
I tried looking at other examples, however they did not work. I have been looking at some of the Ruby documentation, however it is still not working.
Here is how I am trying to do it right now:
input = File.open TestFile, 'r' #read
output = File.open TestFile, 'w' #write
CSV.filter input, output, :headers => true, :write_headers => true, :return_headers => true do |csv|
csv << ["Test"] if row.header_row?
end

I am trying to change the headers of a CSV file using a ruby script,
The general rule is: You cannot change what is already written to a file. You can erase a file and write anew to a file. You can append to a file. But you cannot make changes to what is already written.
require 'csv'
input = File.open 'input.txt', 'r' #read
output = File.open 'output.txt', 'w' #write
CSV.filter input, output, :headers => true, :write_headers => true, :return_headers => true do |row|
row << "Test" if row.header_row?
end
--output:--
$ cat input.txt
col1,col2
10,20
30,40
~/ruby_programs$ rm output.txt
remove output.txt? y
~/ruby_programs$ ruby my_prog.rb
~/ruby_programs$ cat output.txt
col1,col2,Test
10,20
30,40

A quick analyze of what you are doing:
You open a file for reading:
input = File.open TestFile, 'r' #read
You open the same file for reading - this deletes the content.
output = File.open TestFile, 'w' #write
You read the now empty file:
CSV.filter input, output, :headers => true, :write_headers => true, :return_headers => true do |csv|
puts "I'm here" ## <- My addition to show you never in the loop.
csv << ["Test"] if row.header_row?
end
A possible Solution:
You have to define different in- and output files and make a manual header definition:
#Prepare some test data
TestFile = 'test.csv'
File.open(TestFile, 'w'){|f|
f << <<csv
a;b;c
1;2;3
1;2;3
csv
}
require 'csv'
input = File.open TestFile, 'r' #read
output = File.open('x.csv', 'w') #write
output << "new a; new b;new b"
CSV.filter input, output, :headers => true, :write_headers => true do |row|
end
input.close
output.close
#You may delete the input file and rename output file
as an alternative you may define your own header:
CSV.filter( input, output,
:headers => true, #remove header from input
:out_headers => [:xa,:xb, :xc], #define output header
:write_headers => true,
) do |row|
end

input = File.open TestFile, 'r' #read
output = File.open TestFile, 'w' #write
The line that opens the file for writing is truncating it so that when you read the file as input there is nothing there.
As 7stud shows in his answer you need to have separate files for input and output.
input = File.open 'input.txt', 'r' #read
output = File.open 'output.txt', 'w' #write

rows = CSV.open("input.csv").read
rows.shift
rows = new_headers << rows
CSV.open("output.csv", "w") { |csv| csv.write(rows) }
By the way, you don't have to write them to an output file if you don't want to, at least in this example.

Related

Ruby CSV - Write on same row without overwriting?

I'm using
CSV.open(filename, "w") do |csv|
to create and write to a csv file in one ruby.rb file and now I need to open it and edit it in a second .rb file. Right now I'm using CSV.open(filename, "a") do |csv| but that creates new rows rather than adding the new content to the end of the existing rows.
If I use CSV.open(filename, "w") do |csv| the second time it overwrites the first rows.
edit:
# Create export CSV
final_export_csv = "filepath_final.csv"
# Create filename for CSV file
imported_csv_filename = "imported_file.csv"
CSV.open(final_export_csv, "w", headers: ["several", "headers"] + [:new_header], write_headers: true) do |final_csv|
# Read existing CSV file
CSV.foreach(imported_csv_filename) do |old_csv_row|
# Read a row, add the new column, write it to the new row
CSV.open(denominator_csv_filename, "r+") do |new_csv_col|
# gathering some data code
data = { passed.in }
# Write data
new_csv_col <<
[
passedin[:data]
]
old_csv_row[:new_header] = passedin[:data]
final_export_csv << old_csv_row
end
end
end
end
end
As tadman comments, you can't actually edit a file in place. Well, you can but all the lines have to remain the same length. You're not doing that.
Instead, read a row, modify it, and write it to a new CSV. Then replace the old file with the new one. Be careful to avoid slurping the entire CSV into memory, CSV files can get quite large.
require 'csv'
require 'tempfile'
require 'fileutils'
csv_file = "test.csv"
# Write the new file to a tempfile to avoid polluting the directory.
temp = Tempfile.new
# Read the header line.
old_csv = CSV.open(csv_file, "r", headers: true, return_headers: true)
old_csv.readline
# Open the new CSV with the existing headers plus a new one.
new_csv = CSV.open(
temp, "w",
headers: old_csv.headers + [:new],
write_headers: true
)
# Read a row, add the new column, write it to the new CSV.
old_csv.each do |row|
row[:new] = 42
new_csv << row
end
old_csv.close
new_csv.close
# Replace the old CSV with the new one.
FileUtils.move(temp.path, csv_file)

Generate CSV from Ruby results

I currently have this script that generates usernames from a given CSV. Rather than printing these results to the console, how can I write a new CSV with these results?
This is the script I currently have, runs with no errors. I am assuming if I write a new CSV in the do |row| block it is going to create x amount of new files which I do not want.
require 'csv'
CSV.foreach('data.csv', :headers => true) do |row|
id = row['id']
fn = row['first_name']
ln = row['last_name']
p fn[0] + ln + id[3,8]
end
Just manage the CSV file to write around the reading:
CSV.open("path/to/file.csv", "wb") do |csv|
CSV.foreach('data.csv', :headers => true) do |row|
id = row['id']
fn = row['first_name']
ln = row['last_name']
csv << [fn[0], ln, id[3,8]]
# or, to output it as a single column:
# csv << ["#{fn[0]}#{ln}#{id[3,8]}"]
end
end
Writing CSV to a file.

Working with large CSV files in Ruby

I want to parse two CSV files of the MaxMind GeoIP2 database, do some joining based on a column and merge the result into one output file.
I used standard CSV ruby library, it is very slow. I think it tries to load all the file in memory.
block_file = File.read(block_path)
block_csv = CSV.parse(block_file, :headers => true)
location_file = File.read(location_path)
location_csv = CSV.parse(location_file, :headers => true)
CSV.open(output_path, "wb",
:write_headers=> true,
:headers => ["geoname_id","Y","Z"] ) do |csv|
block_csv.each do |block_row|
puts "#{block_row['geoname_id']}"
location_csv.each do |location_row|
if (block_row['geoname_id'] === location_row['geoname_id'])
puts " match :"
csv << [block_row['geoname_id'],block_row['Y'],block_row['Z']]
break location_row
end
end
end
Is there another ruby library that support processing in chuncks ?
block_csv is 800MB and location_csv is 100MB.
Just use CSV.open(block_path, 'r', :headers => true).each do |line| instead of File.read and CSV.parse. It will parse the file line by line.
In your current version, you explicitly tell it to read all the file with File.read and then to parse the whole file as a string with CSV.parse. So it does exactly what you have told.

Compressing using Bzip2 on-the-fly to a file?

There is a program that generates huge CSV files. For example:
arr = (0..10).to_a
CSV.open("foo.csv", "wb") do |csv|
(2**16).times { csv << arr }
end
It will generate a big file, so I want to be compressed on-the-fly, and, instead of output a non-compressed CSV file (foo.csv), output a bzip-compressed CSV file (foo.csv.bzip).
I have an example from the "ruby-bzip2" gem:
writer = Bzip2::Writer.new File.open('file')
writer << 'data1'
writer.close
I am not sure how to compose Bzip2 write from the CSV one.
You can also construct a CSV object with an IO or something sufficiently like an IO, such as a Bzip2::Writer.
For example
File.open('file.bz2', 'wb') do |f|
writer = Bzip2::Writer.new f
CSV(writer) do |csv|
(2**16).times { csv << arr }
end
writer.close
end
Maybe it would be more flexible to write the CSV data to stdout:
# csv.rb
require 'csv'
$stdout.sync = true
arr = (0..10).to_a
(2**16).times do
puts arr.to_csv
end
... and pipe the output to bzip2:
$ ruby csv.rb | bzip2 > foo.csv.bz2

Removing whitespaces in a CSV file

I have a string with extra whitespace:
First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
I want to parse this line and remove the whitespaces.
My code looks like:
namespace :db do
task :populate_contacts_csv => :environment do
require 'csv'
csv_text = File.read('file_upload_example.csv')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
end
end
#prices = CSV.parse(IO.read('prices.csv'), :headers=>true,
:header_converters=> lambda {|f| f.strip},
:converters=> lambda {|f| f ? f.strip : nil})
The nil test is added to the row but not header converters assuming that the headers are never nil, while the data might be, and nil doesn't have a strip method. I'm really surprised that, AFAIK, :strip is not a pre-defined converter!
You can strip your hash first:
csv.each do |unstriped_row|
row = {}
unstriped_row.each{|k, v| row[k.strip] = v.strip}
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Edited to strip hash keys too
CSV supports "converters" for the headers and fields, which let you get inside the data before it's passed to your each loop.
Writing a sample CSV file:
csv = "First,Last,Email ,Mobile Phone ,Company,Title ,Street,City,State,Zip,Country, Birthday,Gender ,Contact Type
first,last,email ,mobile phone ,company,title ,street,city,state,zip,country, birthday,gender ,contact type
"
File.write('file_upload_example.csv', csv)
Here's how I'd do it:
require 'csv'
csv = CSV.open('file_upload_example.csv', :headers => true)
[:convert, :header_convert].each { |c| csv.send(c) { |f| f.strip } }
csv.each do |row|
puts "First Name: #{row['First']} \nLast Name: #{row['Last']} \nEmail: #{row['Email']}"
end
Which outputs:
First Name: 'first'
Last Name: 'last'
Email: 'email'
The converters simply strip leading and trailing whitespace from each header and each field as they're read from the file.
Also, as a programming design choice, don't read your file into memory using:
csv_text = File.read('file_upload_example.csv')
Then parse it:
csv = CSV.parse(csv_text, :headers => true)
Then loop over it:
csv.each do |row|
Ruby's IO system supports "enumerating" over a file, line by line. Once my code does CSV.open the file is readable and the each reads each line. The entire file doesn't need to be in memory at once, which isn't scalable (though on new machines it's becoming a lot more reasonable), and, if you test, you'll find that reading a file using each is extremely fast, probably equally fast as reading it, parsing it then iterating over the parsed file.

Resources