Open CSV without reading header rows in Ruby - ruby

I'm opening CSV using Ruby:
CSV.foreach(file_name, "r+") do |row|
next if row[0] == 'id'
update_row! row
end
and I don't really care about headers row.
I don't like next if row[1] == 'id' inside loop. Is there anyway to tell CSV to skip headers row and just iterate through rows with data ?
I assume provided CSVs always have a header row.

There are a few ways you could handle this. The simplest method would be to pass the {headers: true} option to your loop:
CSV.foreach(file_name, headers: true) do |row|
update_row! row
end
Notice how there is no mode specified - this is because according to the documentation, CSV::foreach takes only the file and options hash as its arguments (as opposed to, say, CSV::open, which does allow one to specify mode.
Alternatively, you could read the data into an array (rather than using foreach), and shift the array before iterating over it:
my_csv= CSV.read(filename)
my_csv.shift
my_csv.each do |row|
update_row! row
end

According to Ruby doc:
options = {:headers=>true}
CSV.foreach(file_name, options) ...
should suffice.

A simple thing to do that works when reading files line-by-line is:
CSV.foreach(file_name, "r+") do |row|
next if $. == 1
update_row! row
end
$. is a global variable in Ruby that contains the line-number of the file being read.

Related

Ruby: How to iterate through a hash created from a csv file

I am trying to take an existing CSV file, add a fourth row to it, and then iterate through the second and third row to create the fourth rows values. Using Ruby I've created hashes where the headers are the keys and the column values are the hash values (ex: "id"=>"1", "new_fruit" => "apple")
My practice CSV file looks like this:practice csv file image
My goal is to create a fourth column: "brand_new" (which I was able to do) and then add values to it by concatenating the values from the second and third row (which I am stuck on). At the moment I just have a placement value (x) for the fourth columns values so I could see if adding the fourth column to the hash actually worked: Results with x = 1
Here is my code:
require 'csv'
def self.import
table = []
CSV.foreach(File.path("practice.csv"), headers: true) do |row|
table.each do |row|
row["brand_new"] = full_name
end
table << row.to_h
end
table
end
def full_name
x = 1
return x
end
# Add another col, row by row:
import.each do |row|
row["brand_new"] = full_name
end
puts import
Any suggestions or guidance would be much appreciated. Thank you.
Simplified your code a bit. I read the file first, then iterate about the read content.
Note: Change col_sep to comma or delete it to use the default if needed.
require "csv"
def self.import
table = CSV.read("practice.csv", headers: true , col_sep: ";")
table.each do |row|
row["brand_new"] = "#{row["old_fruit"]} #{row["new_fruit"]}"
end
puts table
end
I use the read method to read the CSV file content. It allows you to directly access the column/cell values.
Line 7 shows how to concatenate the column values as string:
"#{row["old_fruit"]} #{row["new_fruit"]}"
Refer to this old SO post and the CSV Ruby docs to learn more about working with CSV files.

Best way to parse large CSV files in ruby

What is the best way to parse a large CSV file in ruby. My CSV file is almost 1 GB. I want to filter the data in CSV according to some conditions.
You don't specifically say, but I think most people commenting feel this is likely to be a homework question. If so you should read "How do I ask and answer homework questions?". If not read "How do I ask a good question?".
As G4143 stated in the comment Ruby has an excellent CSV class which should fit your needs.
Here are a couple of quick examples using foreach which the documentation describes as being intended as the primary method for reading CSV files. The method reads one line at a time from the file so it should work well with large files. Here is a basic example of how you might filter out a subset of Csv records using it, but I would encourage you to read the CSV class documentation and follow-up with more specific questions, showing what you have tried so far if you have trouble.
The basic idea is to start with an empty array, use foreach to get each row and if that row meets your filtering criteria, added to the initially empty filtered results array.
test.csv:
a, b, c
1,2,3
4,5,6
require 'csv'
filtered = []
CSV.foreach("test.csv") do |row|
filtered << row if row[0] == "1"
end
filtered
=> [["1", "2", "3"]]
In the case where the first line of the file is a "header" you can pass in an option to treat it as such:
require 'csv'
filtered = []
CSV.foreach("test.csv", :headers => true) do |row|
filtered << row if row["a"] == "1"
end
filtered
=> [#<CSV::Row "a":"1" " b":"2" " c":"3">]

Fetching second row from csv file in Ruby [duplicate]

This question already has answers here:
Ignore header line when parsing CSV file
(6 answers)
Closed 4 years ago.
actual_row = File.open(file_name[0], 'r')
first_row_data = []
CSV.foreach(actual_row) do |row|
first_row_data << row[1]
end
puts first_row_data
With this I am trying to fetch the second row of CSV but it is printing the second column instead.
The foreach method returns an enumerator if no block is given, which allows you to use methods such as drop from Enumerable:
# outputs all rows after the first
CSV.foreach('test.csv').drop(1).each { |row| puts row.inspect }
To limit to just one row, we can then take:
# outputs only the second row
CSV.foreach('test.csv').drop(1).take(1).each { |row| puts row.inspect }
But, we're still parsing the entire file and just discarding most of it. Luckily, we can add lazy into the mix:
# outputs only the second row, parsing only the first 2 rows of the file
CSV.foreach('test.csv').lazy.drop(1).take(1).each { |row| puts row.inspect }
But, if the first row is a header row, don't forgot you can tell CSV about it:
# outputs only the second row, as a CSV::Row, only parses 2 rows
CSV.foreach('test.csv', headers: true).take(1).each { |row| puts row.inspect }
As an aside (in case I did this wrong), it looks like the shift method is what CSV is using for parsing the rows, so I just added:
class CSV
alias :orig_shift :shift
def shift
$stdout.puts "shifting row"
orig_shift
end
end
and ran with a sample csv to see how many times "shifting row" was output for each of the examples.
If you'd like the entire row, you should change
row[1]
to just
row
row[1] is grabbing the second column's value of the entire row. Each column value is stored sequentially in the row variable. You can see this directly in your console if you print
puts row.inspect
If you want just the second row, you can try something like this:
actual_row = File.open(file_name[0], 'r')
first_row_data = []
CSV.foreach(actual_row) do |row|
if $. == 1
first_row_data << row
end
end
puts first_row_data
You can learn more about $. and similar variables here: https://docs.ruby-lang.org/en/2.4.0/globals_rdoc.html

Best way of Parsing 2 CSV files and printing the common values in a third file

I am new to Ruby, and I have been struggling with a problem that I suspect has a simple answer. I have two CSV files, one with two columns, and one with a single column. The single column is a subset of values that exist in one column of my first file. Example:
file1.csv:
abc,123
def,456
ghi,789
jkl,012
file2.csv:
def
jkl
All I need to do is look up the column 2 value in file1 for each value in file2 and output the results to a separate file. So in this case, my output file should consist of:
456
012
I’ve got it working this way:
pairs=IO.readlines("file1.csv").map { |columns| columns.split(',') }
f1 =[]
pairs.each do |x| f1.push(x[0]) end
f2 = IO.readlines("file2.csv").map(&:chomp)
collection={}
pairs.each do |x| collection[x[0]]=x[1] end
f=File.open("outputfile.txt","w")
f2.each do |col1,col2| f.puts collection[col1] end
f.close
...but there has to be a better way. If anyone has a more elegant solution, I'd be very appreciative! (I should also note that I will eventually need to run this on files with millions of lines, so speed will be an issue.)
To be as memory efficient as possible, I'd suggest only reading the full file2 (which I gather would be the smaller of the two input files) into memory. I'm using a hash for fast lookups and to store the resulting values, so as you read through file1 you only store the values for those keys you need. You could go one step further and write the outputfile while reading file2.
require 'CSV'
# Read file 2, the smaller file, and store keys in result Hash
result = {}
CSV.foreach("file2.csv") do |row|
result[row[0]] = false
end
# Read file 1, the larger file, and look for keys in result Hash to set values
CSV.foreach("file1.csv") do |row|
result[row[0]] = row[1] if result.key? row[0]
end
# Write the results
File.open("outputfile.txt", "w") do |f|
result.each do |key, value|
f.puts value if value
end
end
Tested with Ruby 1.9.3
Parsing For File 1
data_csv_file1 = File.read("file1.csv")
data_csv1 = CSV.parse(data_csv_file1, :headers => true)
Parsing For File 2
data_csv_file2 = File.read("file2.csv")
data_csv2 = CSV.parse(data_csv_file1, :headers => true)
Collection of names
names_from_sheet1 = data_csv1.collect {|data| data[0]} #returns an array of names
names_from_sheet2 = data_csv2.collect {|data| data[0]} #returns an array of names
common_names = names_from_sheet1 & names_from_sheet2 #array with common names
Collecting results to be printed
results = [] #this will store the values to be printed
data_csv1.each {|data| results << data[1] if common_names.include?(data[0]) }
Final output
f = File.open("outputfile.txt","w")
results.each {|result| f.puts result }
f.close

Opening/using a table up in Ruby

I have a simple tab-separated text file that I want Ruby to read every value in the second column and write out a text file with each table value and another number. I was wondering how might I go about doing this (probably using some kind of loop).
Thanks
File.open("output.txt", "w") do |output_file|
File.open("input.txt") do |input_file|
input_file.each_line do |line|
values = line.split("\t")
output_file.puts "#{values[1]} anothervalue"
end
end
end

Resources