count number of elements in a CSV in a row in Ruby - ruby

I have the following piece of code to process a CSV:
CSV.foreach("./matrix1.csv") do |row|
puts "Row is: "
print row
line_count += 1
end
And it successfully found out the number of line in the CSV. However, how can I find the number of CSV elements in one line(row).
For example, I have the following CSV
1,2,3,4,5
How can I see that the number of elements is 5?

If each line contains same number of elements then:
CSV.open('test.csv', 'r') { |csv| puts csv.first.length }
If not then count for each line:
CSV.foreach('test.csv', 'r') { |row| puts r.length }

row.size
worked as suggested by #Stefan
But a more error prone approach is:
CSV.foreach('test.csv','r').max_by(&:length).length
Another way is to get the size of the current line:
CSV.foreach('test.csv', 'r') { |row| puts r.length }

Related

Fetching second row from csv file in Ruby [duplicate]

This question already has answers here:
Ignore header line when parsing CSV file
(6 answers)
Closed 4 years ago.
actual_row = File.open(file_name[0], 'r')
first_row_data = []
CSV.foreach(actual_row) do |row|
first_row_data << row[1]
end
puts first_row_data
With this I am trying to fetch the second row of CSV but it is printing the second column instead.
The foreach method returns an enumerator if no block is given, which allows you to use methods such as drop from Enumerable:
# outputs all rows after the first
CSV.foreach('test.csv').drop(1).each { |row| puts row.inspect }
To limit to just one row, we can then take:
# outputs only the second row
CSV.foreach('test.csv').drop(1).take(1).each { |row| puts row.inspect }
But, we're still parsing the entire file and just discarding most of it. Luckily, we can add lazy into the mix:
# outputs only the second row, parsing only the first 2 rows of the file
CSV.foreach('test.csv').lazy.drop(1).take(1).each { |row| puts row.inspect }
But, if the first row is a header row, don't forgot you can tell CSV about it:
# outputs only the second row, as a CSV::Row, only parses 2 rows
CSV.foreach('test.csv', headers: true).take(1).each { |row| puts row.inspect }
As an aside (in case I did this wrong), it looks like the shift method is what CSV is using for parsing the rows, so I just added:
class CSV
alias :orig_shift :shift
def shift
$stdout.puts "shifting row"
orig_shift
end
end
and ran with a sample csv to see how many times "shifting row" was output for each of the examples.
If you'd like the entire row, you should change
row[1]
to just
row
row[1] is grabbing the second column's value of the entire row. Each column value is stored sequentially in the row variable. You can see this directly in your console if you print
puts row.inspect
If you want just the second row, you can try something like this:
actual_row = File.open(file_name[0], 'r')
first_row_data = []
CSV.foreach(actual_row) do |row|
if $. == 1
first_row_data << row
end
end
puts first_row_data
You can learn more about $. and similar variables here: https://docs.ruby-lang.org/en/2.4.0/globals_rdoc.html

How to process CSV data from two files using a matching column

I have two CSV files, one is 3.5GB and the other one is 100MB. The larger file contains two columns that I need to add to two other columns from the other file to make a third CSV file.
Both files contain a postcode column which is how I'm trying to match the rows from them. However, as the files are quite large, the operation is slow. I tried looking for matches in two ways but they were both slow:
CSV.foreach('ukpostcodes.csv') do |row|
CSV.foreach('pricepaid.csv') do |item|
if row[1] == item[3]
puts "match"
end
end
end
and:
firstFile = CSV.read('pricepaid.csv')
secondFile = CSV.read('ukpostcodes.csv')
post_codes = Array.new
lat_longs = Array.new
firstFile.each do |row|
post_codes << row[3]
end
secondFile.each do |row|
lat_longs << row[1]
end
post_codes.each do |row|
lat_longs.each do |item|
if row == item
puts "Match"
end
end
end
Is there a more efficient way of handling this task as the CSV files are large in size?

How do I two merge several CSV files horizontally?

I've got about 50 CSV files that need to be merged together horizontally into one CSV.
The headers can be ignored. A little bit simplified the files look like this:
File 1:
1,2,4,5,6
4,5,68,7,4,2
1,2
1,2,3
File 2:
1,2,4
4,5,6,4
3,4,5
3,4,5
The output should look like this:
1,2,4,5,6,1,2,4
4,5,68,7,4,2,4,5,6,4
1,2,3,4,5
3,4,5
1,2,3
The order of mergeing the files is also not important. I know how to merge them vertically, but I have no clue how to merge horizontally.
I thought about something like this with a nested array, but it does not work, but I don't know why. It seems like the data array does not accept the line array.
#!/usr/bin/env ruby
require 'csv'
data = Array.new
filecount=1
linecount=1
CSV.open("output.csv", "wb") do |output|
Dir.glob('*.csv').each do |each|
next if each == 'output.csv'
file = CSV.read(each)
file.each do |line|
data[filecount][linecount] = line
linecount=linecount+1
end
filecount=filecount+1
end
end
puts data
I prepared a small script that solves your problem, and added some comments for better explanation.
The main idea is to catch the input line by line so you do not have to use much memory.
#!/usr/bin/env ruby
require 'csv'
# map "treats" each element of the array with the block
files = Dir.glob('csv/*.csv').map { |file| CSV.open file, 'r' }
CSV.open("output.csv", "wb") do |out|
loop do
# shift returns the next line
# compact remove nil entries
line = files.map { |file| file.shift }.compact
# remove entry if file has no row
line.reject! { |e| e.empty? }
# break the endless loop if no input to handle
break if line.empty?
out << line.flatten
end
end

Best way of Parsing 2 CSV files and printing the common values in a third file

I am new to Ruby, and I have been struggling with a problem that I suspect has a simple answer. I have two CSV files, one with two columns, and one with a single column. The single column is a subset of values that exist in one column of my first file. Example:
file1.csv:
abc,123
def,456
ghi,789
jkl,012
file2.csv:
def
jkl
All I need to do is look up the column 2 value in file1 for each value in file2 and output the results to a separate file. So in this case, my output file should consist of:
456
012
I’ve got it working this way:
pairs=IO.readlines("file1.csv").map { |columns| columns.split(',') }
f1 =[]
pairs.each do |x| f1.push(x[0]) end
f2 = IO.readlines("file2.csv").map(&:chomp)
collection={}
pairs.each do |x| collection[x[0]]=x[1] end
f=File.open("outputfile.txt","w")
f2.each do |col1,col2| f.puts collection[col1] end
f.close
...but there has to be a better way. If anyone has a more elegant solution, I'd be very appreciative! (I should also note that I will eventually need to run this on files with millions of lines, so speed will be an issue.)
To be as memory efficient as possible, I'd suggest only reading the full file2 (which I gather would be the smaller of the two input files) into memory. I'm using a hash for fast lookups and to store the resulting values, so as you read through file1 you only store the values for those keys you need. You could go one step further and write the outputfile while reading file2.
require 'CSV'
# Read file 2, the smaller file, and store keys in result Hash
result = {}
CSV.foreach("file2.csv") do |row|
result[row[0]] = false
end
# Read file 1, the larger file, and look for keys in result Hash to set values
CSV.foreach("file1.csv") do |row|
result[row[0]] = row[1] if result.key? row[0]
end
# Write the results
File.open("outputfile.txt", "w") do |f|
result.each do |key, value|
f.puts value if value
end
end
Tested with Ruby 1.9.3
Parsing For File 1
data_csv_file1 = File.read("file1.csv")
data_csv1 = CSV.parse(data_csv_file1, :headers => true)
Parsing For File 2
data_csv_file2 = File.read("file2.csv")
data_csv2 = CSV.parse(data_csv_file1, :headers => true)
Collection of names
names_from_sheet1 = data_csv1.collect {|data| data[0]} #returns an array of names
names_from_sheet2 = data_csv2.collect {|data| data[0]} #returns an array of names
common_names = names_from_sheet1 & names_from_sheet2 #array with common names
Collecting results to be printed
results = [] #this will store the values to be printed
data_csv1.each {|data| results << data[1] if common_names.include?(data[0]) }
Final output
f = File.open("outputfile.txt","w")
results.each {|result| f.puts result }
f.close

How to dump a 2D array directly into a CSV file?

I have this 2D array:
arr = [[1,2],[3,4]]
I usually do:
CSV.open(file) do |csv|
arr.each do |row|
csv << row
end
end
Is there any easier or direct way of doing it other than adding row by row?
Assuming that your array is just numbers (no strings that potentially have commas in them) then:
File.open(file,'w'){ |f| f << arr.map{ |row| row.join(',') }.join('\n') }
One enormous string blatted to disk, with no involving the CSV library.
Alternatively, using the CSV library to correctly escape each row:
require 'csv'
# #to_csv automatically appends '\n', so we don't need it in #join
File.open(file,'w'){ |f| f << arr.map(&:to_csv).join }
If you have to do this often and the code bothers you, you could monkeypatch it in:
class CSV
def self.dump_array(array,path,mode="rb",opts={})
open(path,mode,opts){ |csv| array.each{ |row| csv << row } }
end
end
CSV.dump_array(arr,file)
Extending the answer above by #Phrogz, while using the csv library and requiring to change the default delimiter:
File.open(file,'w'){ |f| f << arr.map{|x| x.to_csv(col_sep: '|')}.join }

Resources