Ruby: Write CSV line by line instead of whole file - ruby

I am using the follow code to write to a CSV file. It writes the whole file at once. I would like to write the CSV file line by line by amending the file. How can I adjust my code?
CSV.open("#{#app_path}/Data_#{#filename}", "w") do |csv|
data_array.each do |r|
csv << r
end
end

As I understand, the problem is not the csv file, but the size of the array (and that after each fail you have to rebuild the array).
My attempt at solving that would be to process the array in chunks like below:
def process_array_by_chunks(array, starting_index = 0, chunk_size)
return if array.empty?
current_index = starting_index
size = array.size
stop = false
while !stop do
puts "doing index: #{current_index}"
yield(array[current_index, chunk_size])
stop = true if current_index >= size
current_index = current_index + chunk_size
end
rescue StandardError => e
puts "failed at index: #{current_index}"
puts "data left to process: "
return array[current_index, size]
end
# call function with a block in which we write csv file
process_array_by_chunks(array, start, chunk_size) do | array|
CSV.open(path, "w") do |csv|
array.each do |r|
csv << r
end
end
end
if that blows up for some reason the function will return an array with all the items that were not yet processed.

Related

Stream-like processing of files in Ruby

I would like to understand how I can add filters/transformations in a stream starting from a source, going to a sink.
As an example, consider reading a CSV file. That would be
CSV.foreach(file) { |row| ... }
If I want to read from a zipped file, that would become
stream = Zlib::GzipReader.open('/tmp/foo.csv.gz')
csv = CSV.new(stream)
csv.each { |row| ... }
Now, possibly I would like to add transformations on between gunzip and csv. What is the best way to achieve this goal in Ruby?
Currently I'm doing this:
gzip = Zlib::GzipReader.open('/tmp/foo.psv.gz') # Pipe Separated Value (fake example).
trans = IOTransform.new(gzip) { |line| line&.tr('|', ',') } # A simple example.
csv = CSV.new(trans)
csv.each do |row|
puts row.inspect
end
using the following class
class IOTransform
def initialize(io, &block)
#io = io
#block = block
#inbuf = +''
end
def gets(...)
if !#io.eof?
#inbuf << #io.gets(...)
end
res = nil
remainder = nil
#inbuf&.each_line { |line|
if line[-1] == "\n"
res ||= +''
res << #block.call(line)
else
remainder = line
end
}
#inbuf = remainder
res
end
end
but it feels like I really missed something: I very much doubt that there's no easy way to process streams in Ruby.
Thanks in advance!

Ruby - Delete rows in csv file using enumerator CSV.open

I know how to do it with CSV.read, but CSV.open and enumerator I'm not sure how. Or how do I omit those specific row before loading them in the new_csv[] ?
Thanks!
new_csv = []
CSV.open(file, headers:true) do |unit|
units = unit.each
units.select do |row|
#delete row [0][1][2][3]
new_csv << row
end
Code Example
If you want to skip the first four rows plus the header, this are some options.
Get pure array:
new_csv = CSV.read(filename)[5..]
or keep the csv object
new_csv = []
CSV.open(filename, headers:true) do |csv|
csv.each_with_index do |row, i|
new_csv << row if i > 3
end
end
or using Enumerable#each_with_object:
csv = CSV.open(filename, headers:true)
new_csv = csv.each_with_index.with_object([]) do |(row, i), ary|
ary << row if i > 3
end
Let's begin by creating a CSV file:
contents =<<~END
name,nickname,age
Robert,Bobbie,23
Wilma,Stretch,45
William,Billy-Bob,72
Henrietta,Mama,53
END
FName = 'x.csv'
File.write(FName, contents)
#=> 91
We can use CSV::foreach without a block to return an enumerator.
csv = CSV.foreach(FName, headers:true)
#=> #<Enumerator: CSV:foreach("x.csv", "r", headers: true)>
The enumerator csv generates CSV::ROW objects:
obj = csv.next
#=> #<CSV::Row "name":"Robert" "nickname":"Bobbie" "age":"23">
obj.class
#=> CSV::Row
Before continuing let me Enumerator#rewind csv so that csv.next will once again generate its first element.
csv.rewind
Suppose we wish to skip the first two records. We can do that using Enumerator#next:
2.times { csv.next }
Now continue generating elements with the enumerator, mapping them to an array of hashes:
loop.map { csv.next.to_h }
#=> [{"name"=>"William", "nickname"=>"Billy-Bob", "age"=>"72"},
# {"name"=>"Henrietta", "nickname"=>"Mama", "age"=>"53"}]
See Kernel#loop and CSV::Row#to_h. The enumerator csv raises a StopInteration exception when next invoked after the enumerator has generated its last element. As you see from its doc, loop handles that exception by breaking out of the loop.
loop is a very versatile method. I generally use it in place of while and until, as well as when I need it to handle a StopIteration exception.
If you just want the values, then:
csv.rewind
2.times { csv.next }
loop.with_object([]) { |_,arr| arr << csv.next.map(&:last) }
#=> [["William", "Billy-Bob", "72"],
# ["Henrietta", "Mama", "53"]]

Ruby how to merge two CSV files with slightly different headers

I have two CSV files with some common headers and others that only appear in one or in the other, for example:
# csv_1.csv
H1,H2,H3
V11,V22,V33
V14,V25,V35
# csv_2.csv
H1,H4
V1a,V4b
V1c,V4d
I would like to merge both and obtain a new CSV file that combines all the information for the previous CSV files. Injecting new columns when needed, and feeding the new cells with null values.
Result example:
H1,H2,H3,H4
V11,V22,V33,
V14,V25,V35,
V1a,,,V4b
V1c,,,V4d
Challenge accepted :)
#!/usr/bin/env ruby
require "csv"
module MergeCsv
class << self
def run(csv_paths)
csv_files = csv_paths.map { |p| CSV.read(p, headers: true) }
merge(csv_files)
end
private
def merge(csv_files)
headers = csv_files.flat_map(&:headers).uniq.sort
hash_array = csv_files.flat_map(&method(:csv_to_hash_array))
CSV.generate do |merged_csv|
merged_csv << headers
hash_array.each do |row|
merged_csv << row.values_at(*headers)
end
end
end
# Probably not the most performant way, but easy
def csv_to_hash_array(csv)
csv.to_a[1..-1].map { |row| csv.headers.zip(row).to_h }
end
end
end
if(ARGV.length == 0)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV)
I have the answer, I just wanted to help people that is looking for the same solution
require "csv"
module MergeCsv
def self.run(csv_1_path, csv_2_path)
merge(File.read(csv_1_path), File.read(csv_2_path))
end
def self.merge(csv_1, csv_2)
csv_1_table = CSV.parse(csv_1, :headers => true)
csv_2_table = CSV.parse(csv_2, :headers => true)
return csv_2_table.to_csv if csv_1_table.headers.empty?
return csv_1_table.to_csv if csv_2_table.headers.empty?
headers_in_1_not_in_2 = csv_1_table.headers - csv_2_table.headers
headers_in_1_not_in_2.each do |header_in_1_not_in_2|
csv_2_table[header_in_1_not_in_2] = nil
end
headers_in_2_not_in_1 = csv_2_table.headers - csv_1_table.headers
headers_in_2_not_in_1.each do |header_in_2_not_in_1|
csv_1_table[header_in_2_not_in_1] = nil
end
csv_2_table.each do |csv_2_row|
csv_1_table << csv_1_table.headers.map { |csv_1_header| csv_2_row[csv_1_header] }
end
csv_1_table.to_csv
end
end
if(ARGV.length != 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV[0], ARGV[1])
And execute it from the console this way:
$ ruby merge_csv.rb csv_1.csv csv_2.csv
Any other, maybe cleaner, solution is welcome.
Simplied first answer:
How to use it:
listPart_A = CSV.read(csv_path_A, headers:true)
listPart_B = CSV.read(csv_path_B, headers:true)
listPart_C = CSV.read(csv_path_C, headers:true)
list = merge(listPart_A,listPart_B,listPart_C)
Function:
def merge(*csvs)
headers = csvs.map {|csv| csv.headers }.flatten.compact.uniq.sort
csvs.flat_map(&method(:csv_to_hash_array))
end
def csv_to_hash_array(csv)
csv.to_a[1..-1].map do |row|
Hash[csv.headers.zip(row)]
end
end
I had to do something very similar
to merge n CSV files that the might share some of the columns but some may not
if you want to keep a structure and do it easily,
I think the best way is to convert to hash and then re-convert to CSV file
my solution:
#!/usr/bin/env ruby
require "csv"
def join_multiple_csv(csv_path_array)
return nil if csv_path_array.nil? or csv_path_array.empty?
f = CSV.parse(File.read(csv_path_array[0]), :headers => true)
f_h = {}
f.headers.each {|header| f_h[header] = f[header]}
n_rows = f.size
csv_path_array.shift(1)
csv_path_array.each do |csv_file|
curr_csv = CSV.parse(File.read(csv_file), :headers => true)
curr_h = {}
curr_csv.headers.each {|header| curr_h[header] = curr_csv[header]}
new_headers = curr_csv.headers - f_h.keys
exist_headers = curr_csv.headers - new_headers
new_headers.each { |new_header|
f_h[new_header] = Array.new(n_rows) + curr_csv[new_header]
}
exist_headers.each {|exist_header|
f_h[exist_header] = f_h[exist_header] + curr_csv[exist_header]
}
n_rows = n_rows + curr_csv.size
end
csv_string = CSV.generate do |csv|
csv << f_h.keys
(0..n_rows-1).each do |i|
row = []
f_h.each_key do |header|
row << f_h[header][i]
end
csv << row
end
end
return csv_string
end
if(ARGV.length < 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2> .. <file_path_csv_n>"
exit 1
end
csv_str = join_multiple_csv(ARGV)
f = File.open("results.csv", "w")
f.write(csv_str)
puts "CSV merge is done"

selective replacing of printf statements

I am trying to search for a bunch of print statements that I want to filter as follows:
I want to select all dbg_printfs.
Out of all of those I want to select those that have value.stringValue().
Out of those I only want those that do not have value.stringValue().value().
Finally, I want to replace those lines with value.stringValue() to value.stringValue().value().
I don't know why my current code isn't working?
fileObj = File.new(filepath, "r")
while (line = fileObj.gets)
line.scan(/dbg_printf/) do
line.scan(/value.stringValue()/) do
if !line.scan(/\.value\(\)/)
line.gsub!(/value.stringValue()/, 'value.stringValue().value()')
end
end
end
fileObj.close
Primarily, your problem seems to be that you expect altering the string returned from gets to alter the contents of the file. There isn't actually that kind of relationship between strings and files. You need to explicitly write the modifications to the file. Personally, I would probably write that code like this:
modified_contents = IO.readlines(filepath).map do |line|
if line =~ /dbg_printf/
# This regex just checks for value.stringValue() when not followed by .value()
line.gsub /value\.stringValue\(\)(?!\.value\(\))/, 'value.stringValue().value()'
else
line
end
end
File.open(filepath, 'w') {|file| file.puts modified_contents }
The problem is that you are not writing the changed lines back to the same file or a new file. To write them to the same file, read the file into an array, change the array and then write it back to the same or a different file (the later being the more prudent). Here's one way to do that with few lines of code.
Code
fin_name and fout_name are the names (with paths) of the input and output files, respectively.
def filter_array(fin_name, fout_name)
arr_in = File.readlines(fin_name)
arr_out = arr_in.map { |l| (l.include?('dbg_printfs') &&
l.include?('value.stringValue()') &&
!l.include?('value.stringValue().value()')) ?
'value.stringValue() to value.stringValue().value()' : l }
File.open(fout_name, 'w') { |f| f.puts arr_out }
end
Because you are reading code files, they will not be so large that reading them all at once into memory will be a problem.
Example
First, we'll construct an input file:
array = ["My dbg_printfs was a value.stringValue() as well.",
"Her dbg_printfs was a value.stringValue() but not " +
"a value.stringValue().value()",
"value.stringValue() is one of my favorites"]
fin_name = 'fin'
fout_name = 'fout'
File.open(fin_name, 'w') { |f| f.puts array }
We can confirm its contents with:
File.readlines(fin_name).map { |l| puts l }
Now try it:
filter_array(fin_name, fout_name)
Read the output file to see if it worked:
File.readlines(fout_name).map { |l| puts l }
#=> value.stringValue() to value.stringValue().value()
# Her dbg_printfs was a value.stringValue() but not a value.stringValue().value()
# value.stringValue() is one of my favorites
It looks OK.
Explanation
def filter_array(fin_name, fout_name)
arr_in = File.readlines(fin_name)
arr_out = arr_in.map { |l| (l.include?('dbg_printfs') &&
l.include?('value.stringValue()') &&
!l.include?('value.stringValue().value()')) ?
'value.stringValue() to value.stringValue().value()' : l }
File.open(fout_name, 'w') { |f| f.puts arr_out }
end
For the above example,
arr_in = File.readlines('fin')
#=> ["My dbg_printfs was a value.stringValue() as well.\n",
# "Her dbg_printfs was a value.stringValue() but not a value.stringValue().value()\n",
# "value.stringValue() is one of my favorites\n"]
The first element of arr_in passed to map is:
l = "My dbg_printfs] was a value.stringValue() as well."
We have
l.include?('dbg_printfs') #=> true
l.include?('value.stringValue()') #=> true
!l.include?('value.stringValue().value()') #=> true
so that element is mapped to:
"value.stringValue() to value.stringValue().value()"
Neither of the other two elements are replaced by this string, because
!l.include?('value.stringValue().value()') #=> false
and
l.include?('dbg_printfs') #=> false
respectively. Hence,
arr_out = arr_in.map { |l| (l.include?('dbg_printfs') &&
l.include?('value.stringValue()') &&
!l.include?('value.stringValue().value()')) ?
'value.stringValue() to value.stringValue().value()' : l }
#=> ["value.stringValue() to value.stringValue().value()",
# "Her dbg_printfs was a value.stringValue() but not a value.stringValue().value()\n",
# "value.stringValue() is one of my favorites\n"]
The final step is writing arr_out to the output file.

Converting Array of Strings to Array of Floats

I'm writing an app that revolves around getting sets of numerical data from a file. However, since the data is acquired in string form, I have to convert it to floats, which is where the fun starts. The relevant section of my code is as shown (lines 65-73):
ft = []
puts "File Name: #{ARGV[0]}"
File.open(ARGV[0], "r") do |file|
file.each_line do |line|
ft << line.scan(/\d+/)
end
end
ft.collect! {|i| i.to_f}
This works just fine in irb, that is, the last line changes the array to floats.
irb(main):001:0> ft = ["10", "23", "45"]
=> ["10", "23", "45"]
irb(main):002:0> ft.collect! {|i| i.to_f}
=> [10.0, 23.0, 45.0]
However when I run my application I get this error:
ruby-statistics.rb:73:in `block in <main>': undefined method `to_f' for #<Array:
0x50832c> (NoMethodError)
from ruby-statistics.rb:73:in `collect!'
from ruby-statistics.rb:73:in `<main>'
Any help with this would be appreciated.
line.scan returns an array, so you are inserting an array into an array. The easiest thing to do would be to call flatten on the array before you convert the strings to floats.
ft = []
puts "File Name: #{ARGV[0]}"
File.open(ARGV[0], "r") do |file|
file.each_line do |line|
ft << line.scan(/\d+/)
end
end
ft = ft.flatten.collect { |i| i.to_f }
You should have a look at the format of "ft" after reading the file.
Each line gets stored in another array so in fact "ft" looks something like this:
[["1","2"],["3","4"]]
So you have to do something like this:
ft = []
puts "File Name: #{ARGV[0]}"
File.open(ARGV[0], "r") do |file|
file.each_line do |line|
ft << line.scan(/\d+/)
end
end
tmp = []
ft.each do |line|
line.each do |number|
tmp << number.to_f
end
end
puts tmp
This is just a guess since I don't know what your file format looks like.
Edit:
Here as a one-liner:
ft.flatten!.collect! { |i| i.to_f }

Resources