Ruby CSV headers not in the first line - ruby

I would like to read CSV file with (headers: true option), but the first 5 lines of my file contain unwanted data. So I want line 6 to be a header and start reading file with line 6.
But when I read a file CSV.readlines("my_file.csv", headers: true).drop(5),
it still uses line 1 as a header. How can I set line 6 as a header?

Pre-read the garbage lines before you start CSV.
require 'csv'
File.open("my_file.csv") do |f|
5.times { f.gets }
csv = CSV.new(f, headers: true)
puts csv.shift.inspect
end

Here is my solution
require 'csv'
my_header = CSV.readlines("my_file.csv").drop(5).first
CSV.readlines("my_file.csv", headers: my_header).drop(6) do |row|
do something .....
end

Related

Ruby/Rake: Why isn't the CSV file open for reading?

I want to drop the top two rows from a CSV file and add my own header. I have wrapped this in a rake task.
task :fix_csv do
# copy to temp file
cp ENV['source'], TMP_FILE
# drop header rows
table = CSV.table(TMP_FILE)
File.open(TMP_FILE, 'w') do |f|
f.write(table.drop(2).to_csv)
end
# add new header
CSV.open(TMP_FILE, 'w', force_quotes: true) do |csv|
csv << HEADERS if csv.count.eql? 0
end
puts 'Done!'
end
However, this fails with an error:
rake aborted!
IOError: not opened for reading
../rakefile.rb:54:in `count'
Line 54 is:
csv << HEADERS if csv.count.eql? 0
Why can't it read the file? Do I need to explicitly close the file after I've removed the first two rows?
The second time you open the file for writing only, but then you are trying to iterate getting an access to content (namely by querying the row count):
# ⇓⇓⇓
CSV.open(TMP_FILE, 'w', force_quotes: true) do |csv|
# ⇓⇓⇓⇓⇓
csv << HEADERS if csv.count.eql? 0
end
while it’s easy to fix, may I ask what would be wrong with forgetting about CSV in total, in favor of somewhat like:
old = File.readlines(FILE_NAME).drop(2)
old[0...0] = HEADERS.join(',')
File.write(FILE_NAME, old)
?

Adding Headers to a created CSV file in Ruby - keep getting errors

I've been trying to use Ruby to create a CSV file from json data. I was able to create the file, but I need to add a few headers. I tried following suggestions and answers from similar questions posted here on Stack Overflow, but I keep getting errors. Can anyone give me some pointers?
Here's my code.
require 'csv'
require 'json'
CSV.open("your_csv.csv", "w") do |csv|
JSON.parse(File.open("tojson.txt").read).each do |hash|
csv << hash.values
#csv.each { |line| line['New_header'] = line[0].to_i + line[1].to_i }
end
end
And here is the error I'm getting:
Anyone have any suggestions?
This is not how you add headers to a csv file. When you generate csv content, a header row is just a regular row. And should be generated as such. Example:
CSV.open("your_csv.csv", "w") do |csv|
csv << ['new_header', 'value1', 'value2'] # the headers
JSON.parse(File.open("tojson.txt").read).each do |hash|
row = [generate, values, for, headers, above]
csv << row
end
end
You don't have a #csv variable. You have a csv one.

How to skip the first line of a CSV file and make the second line the header

Is there a way to skip the first line of a CSV file and make the second line act as the header?
I have a CSV file that has the date on the first row and the headers on the second row, so I need to be able to skip the first row when iterating over it. I tried using slice but that converts the CSV to an array and I really want to read it as CSV so I can take advantage of headers.
Depending on your data you may use another approach with theskip_lines-option
This examples skip all lines with a leading #
require 'csv'
CSV.parse(DATA.read,:col_sep=>';',:headers=>true,
:skip_lines=> /^#/ #Mark comments!
) do |row|
p row
end
#~
__END__
#~ Comment
#~ More comment
a;b;c;d
1;2;3;4
#~ More comment
1;2;3;4
#~ More comment
1;2;3;4
The result is
#<CSV::Row "a":"1" "b":"2" "c":"3" "d":"4">
#<CSV::Row "a":"1" "b":"2" "c":"3" "d":"4">
#<CSV::Row "a":"1" "b":"2" "c":"3" "d":"4">
In your case the csv contains a date, so you may use:
require 'csv'
CSV.parse(DATA.read,:col_sep=>';',:headers=>true,
:skip_lines=> /^\d\d\d\d-\d\d-\d\d$/ #Skip line with date only
) do |row|
p row
end
#~
__END__
2016-03-19
a;b;c;d
1;2;3;4
1;2;3;4
1;2;3;4
or you could use more extend starting lines:
require 'csv'
CSV.parse(DATA.read,:col_sep=>';',:headers=>true,
:skip_lines=> /^Created by/ #Skip line with date only
) do |row|
p row
end
__END__
Created by test.rb on 2016-03-19
a;b;c;d
1;2;3;4
1;2;3;4
1;2;3;4
I don't think there's an elegant way of doing it, but it can be done:
require "csv"
# Create a stream using the original file.
# Don't use `textmode` since it generates a problem when using this approach.
file = File.open "file.csv"
# Consume the first CSV row.
# `\r` is my row separator character. Verify your file to see if it's the same one.
loop { break if file.readchar == "\r" }
# Create your CSV object using the remainder of the stream.
csv = CSV.new file, headers: true
You can do this
text = File.readlines("file.csv")[1..-1].join()
csv = CSV.parse(text, headers: true)
I had the same problem (except I wanted to skip more than 1 line at the beginning) and came across this question while looking for a nice solution. For my case, I went with the code described in this answer to a similar question, except that I am also utilizing the headers option as you mentioned you wanted to do.
CSV.parse(File.readlines(path).drop(1).join, headers: true) do |row|
# ... now I can use: row['column_name']
end
For the posterity: sometimes the first lines are present but with empty values (a row with ,,,,,,,,,, before the headers), so the solution is to remove those by doing this:
require 'csv'
CSV.parse(content, headers: true, skip_lines: /^(\s*,\s*)*$/)
This solution will work, no matter how many rows with empty values before the headers are present. Also it will remove any empty rows after the headers, so be careful and double check if that works for you.
P.S.: Change the comma (,) if you're separating with other char
This simple code worked for me. You can read a CSV file and ignore its first line which is the header or field names:
CSV.foreach(File.join(File.dirname(__FILE__), filepath), headers: true) do |row|
puts row.inspect
end
You can do what ever you want with row. Don't forget to use headers: true

Ruby CSV parsing string with escaped quotes

I have a line in my CSV file that has some escaped quotes:
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
When I try to parse it the the Ruby CSV parser:
require 'csv'
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
puts row
end
I get this error:
.../1.9.3-p327/lib/ruby/1.9.1/csv.rb:1914:in `block (2 levels) in shift': Missing or stray quote in line 122 (CSV::MalformedCSVError)
How can I get around this error?
The \" is typical Unix whereas Ruby CSV expects ""
To parse it:
require 'csv'
text = File.read('test.csv').gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
puts row
end
Note: if your CSV file is very large, it uses a lot of RAM to read the entire file. Consider reading the file one line at a time.
Note: if your CSV file may have slashes in front of slashes, use Andrew Grimm's suggestion below to help:
gsub(/(?<!\\)\\"/,'""')
CSV supports "converters", which we can normally use to massage the content of a field before it's passed back to our code. For instance, that can be used to strip extra spaces on all fields in a row.
Unfortunately, the converters fire off after the line is split into fields, and it's during that step that CSV is getting mad about the embedded quotes, so we have to get between the "line read" step, and the "parse the line into fields" step.
This is my sample CSV file:
ID,Name,Country
173,"Yukihiro \"The Ruby Guy\" Matsumoto","Japan"
Preserving your CSV.foreach method, this is my example code for parsing it without CSV getting mad:
require 'csv'
require 'pp'
header = []
File.foreach('test.csv') do |csv_line|
row = CSV.parse(csv_line.gsub('\"', '""')).first
if header.empty?
header = row.map(&:to_sym)
next
end
row = Hash[header.zip(row)]
pp row
puts row[:Name]
end
And the resulting hash and name value:
{:ID=>"173", :Name=>"Yukihiro \"The Ruby Guy\" Matsumoto", :Country=>"Japan"}
Yukihiro "The Ruby Guy" Matsumoto
I assumed you were wanting a hash back because you specified the :headers flag:
CSV.foreach('my.csv', headers: true, header_converters: :symbol) do |row|
Open the file in MSExcel and save as MS-DOS Comma Separated(.csv)

Ruby - remove columns from csv file and convert to pipe delimited txt file

I'm trying to take a CSV file, strip a few columns, and then output a pipe delimited text file.
Here's my code, which almost works. The only problem is the CSV.generate block is adding double quotes around the whole thing, as well as a random comma with double quotes around it where the line break is.
require 'csv'
original = CSV.read('original.csv', { headers: true, return_headers: true })
original.delete('Column header 1')
original.delete('Column header 2')
original.delete('Column header 3')
csv_string = CSV.generate do |csv|
csv << original
end
pipe_string = csv_string.tr(",","|")
File.open('output.txt', 'w+') do |f|
f.write(pipe_string)
end
Is there a better way to do this? Any help is appreciated.
Try this:
require 'csv'
original = CSV.read('original.csv', { headers: true, return_headers: true })
original.delete('Column header 1')
original.delete('Column header 2')
original.delete('Column header 3')
CSV.open('output.txt', 'w', col_sep: '|') do |csv|
original.each do |row|
csv << row
end
end

Resources