Output many arrays to CSV-files in Ruby - ruby

I have a question about Ruby. What I want to do is first to sort my items ascending and then write them out to a CSV-file. Now, the problem is further complicated by the fact that I want to iterate over a lot of CSV-files. I found this thread and the answer looks fine, but I am not able to get more than the last line written to my output file.
How can I get the whole data sorted and written to different CSV-files?
My code:
require 'date'
require 'csv'
class Daily <
# Daily has a open
Struct.new(:open)
# a method to print out a csv record for the current Daily.
def print_csv_record
printf("%s,", open)
printf("\n")
end
end
#------#
# MAIN #
#------#
# This is where I iterate over my csv-files:
foobar = ['foo', 'bar']
foobar.each do |foobar|
# get the input filename from the command line
input_file = "#{foobar}.csv"
# define an array to hold the Daily records
arr = Array.new
# loop through each record in the csv file, adding
# each record to my array while overlooking the header.
f = File.open(input_file, "r")
f.each_with_index { |row, i|
next if i == 0
words = row.split(',')
p = Daily.new
# do a little work here to convert my numbers
p.open = words[1].to_f
arr.push(p)
}
# sort the data by ascending opens
arr.sort! { |a,b| a.open <=> b.open }
# print out all the sorted records (just print to stdout)
arr.each { |p|
CSV.open("#{foobar}_new.csv", "w") do |csv|
csv << p.print_csv_record
end
}
end
My input CSV-file:
Open
52.23
52.45
52.36
52.07
52.69
52.38
51.2
50.99
51.41
51.89
51.38
50.94
49.55
50.21
50.13
50.14
49.49
48.5
47.92
My output CSV-file:
47.92

You need to put the iteration inside the open CSV file:
CSV.open("#{foobar}_new.csv", "w") do |csv|
arr.each { |p|
csv << p.print_csv_record
}
end

Related

How to remove headers and second column in CSV in ruby?

I have a CSV that looks like this:
user_id,is_user_unsubscribed
131072,1
7077888,1
11010048,1
12386304,1
327936,1
2228480,1
6553856,1
9830656,1
10158336,1
10486016,1
10617088,1
11010304,1
11272448,1
393728,1
7012864,1
8782336,1
11338240,1
11928064,1
4326144,1
8127232,1
11862784,1
but I want the data to look like this:
131072
7077888
11010048
12386304
327936
...
any ideas on what to do? I have 330,000 rows...
You can read your file as an array and ignore the first row like this:
data = CSV.read("dataset.csv")[1 .. -1]
This way you can remove the header.
Regarding the column, you can delete a column like this:
data = CSV.read("dataset.csv")[1 .. -1]
data.delete("is_user_unsubscribed")
data.to_csv # => The new CSV in string format
Check this for more info: http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Table.html
http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html
My recommendation would be to read in a line from your file as a string, then split the String that you get by commas (there is a comma separating your columns).
Splitting a Ruby String:
https://code-maven.com/ruby-split
require 'pp'
line_num=0
text=File.open('myfile.csv').read
text.each_line do |line|
textArray = line.split
textIWant = textArray[0]
line_num = line_num + 1
print "#{textIWant}"
end
In this code we open a text file, and read line by line. Each line we split into the text we want by choosing the text from the first column (zeroth item in the array), then print it.
If you do not want the headers, when line_num = 0, add an if statement to not pick up the data. Even better use unless.
Just rewrite a new file with your new data.
I wound up doing this. Is this kosher?
user_ids = []
[]
CSV.foreach("eds_users_sept15.csv", headers:true) do |row|
user_ids << row['user_id']
end
nil
user_ids.count
322101
CSV.open('some_new_file.csv', 'w') do |c|
user_ids.each do |id|
c << [id]
end
end
I have 330,000 rows...
So I guess speed matters, right?
I took your method and the other 2 that was proposed, tested them on a 330,000 rows csv file and made a benchmark to show you something interesting.
require 'csv'
require 'benchmark'
Benchmark.bm(10) do |bm|
bm.report("Method 1:") {
data = Array.new
CSV.foreach("input.csv", headers:true) do |row|
data << row['user_id']
end
}
bm.report("Method 2:") {
data = CSV.read("input.csv")[1 .. -1]
data.delete("is_user_unsubscribed")
}
bm.report("Method 3:") {
data = Array.new
File.open('input.csv').read.each_line do |line|
data << line.split(',')[0]
end
data.shift # => remove headers
}
end
The output:
user system total real
Method 1: 3.110000 0.010000 3.120000 ( 3.129409)
Method 2: 1.990000 0.010000 2.000000 ( 2.004016)
Method 3: 0.380000 0.010000 0.390000 ( 0.383700)
As you can see handling the CSV file as a simple text file, splitting the lines and pushing them into the array is ~5 times faster than using CSV Module. Of course it has some disadvantages too; i.e., if you'll ever add columns in the input file you'll have to review the code.
It's up to you if you prefer lightspeed code or easier scalability.
I'm guessing that you plan to convert each string that precedes a comma to an integer. If so,
CSV.read("dataset.csv").drop(1).map(:to_i)
is all you need. (For example, "131072,1".to_i #=> 131072.)
If you want strings, you could write
CSV.read("dataset.csv").drop(1).map { |s| s[/d+/] }

Write an array to multi column CSV format using Ruby

I have an array of arrays in Ruby that i'm trying to output to a CSV file (or text). That I can then easily transfer over to another XML file for graphing.
I can't seem to get the output (in text format) like so. Instead I get one line of data which is just a large array.
0,2
0,3
0,4
0,5
I originally tried something along the lines of this
File.open('02.3.gyro_trends.text' , 'w') { |file| trend_array.each { |x,y| file.puts(x,y)}}
And it outputs
0.2
46558
0
46560
0
....etc etc.
Can anyone point me in the "write" direction for getting either:
(i) .text file that can put my data like so.
trend_array[0][0], trend_array[0][1]
trend_array[1][0], trend_array[1][1]
trend_array[2][0], trend_array[2][1]
trend_array[3][0], trend_array[3][1]
(ii) .csv file that would put this data in separate columns.
edit I recently added more than two values into my array, check out my answer combining Cameck's solution.
This is currently what I have at the moment.
trend_array=[]
j=1
# cycle through array and find change in gyro data.
while j < gyro_array.length-2
if gyro_array[j+1][1] < 0.025 && gyro_array[j+1][1] > -0.025
trend_array << [0, gyro_array[j][0]]
j+=1
elsif gyro_array[j+1][1] > -0.025 # if the next value is increasing by x1.2 the value of the previous amount. Log it as +1
trend_array << [0.2, gyro_array[j][0]]
j+=1
elsif gyro_array[j+1][1] < 0.025 # if the next value is decreasing by x1.2 the value of the previous amount. Log it as -1
trend_array << [-0.2, gyro_array[j][0]]
j+=1
end
end
#for graphing and analysis purposes (wanted to print it all as a csv in two columns)
File.open('02.3test.gyro_trends.text' , 'w') { |file| trend_array.each { |x,y| file.puts(x,y)}}
File.open('02.3test.gyro_trends_count.text' , 'w') { |file| trend_array.each {|x,y| file.puts(y)}}
I know it's something really easy, but for some reason I'm missing it. Something with concatenation, but I found that if I try and concatenate a \\n in my last line of code, it doesn't output it to the file. It outputs it in my console the way I want it, but not when I write it to a file.
Thanks for taking the time to read this all.
File.open('02.3test.gyro_trends.text' , 'w') { |file| trend_array.each { |a| file.puts(a.join(","))}}
Alternately using the CSV Class:
def write_to_csv(row)
if csv_exists?
CSV.open(#csv_name, 'a+') { |csv| csv << row }
else
# create and add headers if doesn't exist already
CSV.open(#csv_name, 'wb') do |csv|
csv << CSV_HEADER
csv << row
end
end
end
def csv_exists?
#exists ||= File.file?(#csv_name)
end
Call write_to_csv with an array [col_1, col_2, col_3]
Thank you both #cameck & #tkupari, both answers were what I was looking for. Went with Cameck's answer in the end, because it "cut out" cutting and pasting text => xml. Here's what I did to get an array of arrays into their proper places.
require 'csv'
CSV_HEADER = [
"Apples",
"Oranges",
"Pears"
]
#csv_name = "Test_file.csv"
def write_to_csv(row)
if csv_exists?
CSV.open(#csv_name, 'a+') { |csv| csv << row }
else
# create and add headers if doesn't exist already
CSV.open(#csv_name, 'wb') do |csv|
csv << CSV_HEADER
csv << row
end
end
end
def csv_exists?
#exists ||= File.file?(#csv_name)
end
array = [ [1,2,3] , ['a','b','c'] , ['dog', 'cat' , 'poop'] ]
array.each { |row| write_to_csv(row) }

How to open a CSV file and increment a specific value by one

I'm trying to open a simple CSV file in Ruby, find a particular key, increment its value by one, then re-save it.
Example CSV file:
store1,0
store2,0
store3,0
...etc.
Ruby code:
require 'csv'
currentStore = store # store is passed as a parameter
if currentStore.nil? && currentStore.empty?
currentStore = "nil"
store_data = {}
File.open('store_count.csv').each_line {|line|
line_data = line.split(",")
if !line_data[1].nil? && !line_data[1].empty?
store[line_data[0]] = line_data[1].strip.to_i
else
next
end
}
if store_data.key?(currentStore)
store_data[currentStore] += 1
CSV.open("store_count.csv", "wb") {
|csv| store_data.to_a.each {
|elem| csv << elem
}
}
end
So for example, if I increment 'store3', I need my file to look like:
store1,0
store2,0
store3,1
etc...
After I increment the value, I need to re-save to CSV.
You cannot read and write at the same time.
What are the Ruby File.open modes and options?
If you are dealing with a small file, where you can fit everything into the memory, this should be enough.
temp_file = File.open('temp.csv', 'w')
CSV.readlines('temp.csv').each do |line|
temp_file << "#{line[0]},#{line[1].to_i + 1}\n"
end
temp_file.flush
temp_file.close
If you want to rename the file, you can do this
`rm store.csv`
`mv temp.csv store.csv`

How to map and edit a CSV file with Ruby

Is there a way to edit a CSV file using the map method in Ruby? I know I can open a file using:
CSV.open("file.csv", "a+")
and add content to it, but I have to edit some specific lines.
The foreach method is only useful to read a file (correct me if I'm wrong).
I checked the Ruby CSV documentation but I can't find any useful info.
My CSV file has less than 1500 lines so I don't mind reading all the lines.
Another answer using each.with_index():
rows_array = CSV.read('sample.csv')
desired_indices = [3, 4, 5].sort # these are rows you would like to modify
rows_array.each.with_index(desired_indices[0]) do |row, index|
if desired_indices.include?(index)
# modify over here
rows_array[index][target_column] = 'modification'
end
end
# now update the file
CSV.open('sample3.csv', 'wb') { |csv| rows_array.each{|row| csv << row}}
You can also use each_with_index {} insead of each.with_index {}
Is there a way to edit a CSV file using the map method in Ruby?
Yes:
rows = CSV.open('sample.csv')
rows_array = rows.to_a
or
rows_array = CSV.read('sample.csv')
desired_indices = [3, 4, 5] # these are rows you would like to modify
edited_rows = rows_array.each_with_index.map do |row, index|
if desired_indices.include?(index)
# simply return the row
# or modify over here
row[3] = 'shiva'
# store index in each edited rows to keep track of the rows
[index, row]
end
end.compact
# update the main row_array with updated data
edited_rows.each{|row| rows_array[row[0]] = row[1]}
# now update the file
CSV.open('sample2.csv', 'wb') { |csv| rows_array.each{|row| csv << row}}
This is little messier. Is not it? I suggest you to use each_with_index with out map to do this. See my another answer
Here is a little script I wrote as an example on how read CSV data, do something to data, and then write out the edited text to a new file:
read_write_csv.rb:
#!/usr/bin/env ruby
require 'csv'
src_dir = "/home/user/Desktop/csvfile/FL_insurance_sample.csv"
dst_dir = "/home/user/Desktop/csvfile/FL_insurance_sample_out.csv"
puts " Reading data from : #{src_dir}"
puts " Writing data to : #{dst_dir}"
#create a new file
csv_out = File.open(dst_dir, 'wb')
#read from existing file
CSV.foreach(src_dir , :headers => false) do |row|
#then you can do this
# newrow = row.each_with_index { |rowcontent , row_num| puts "# {rowcontent} #{row_num}" }
# OR array to hash .. just saying .. maybe hash of arrays..
#h = Hash[*row]
#csv_out << h
# OR use map
#newrow = row.map(&:capitalize)
#csv_out << h
#OR use each ... Add and end
#newrow.each do |k,v| puts "#{k} is #{v}"
#Lastly, write back the edited , regexed data ..etc to an out file.
#csv_out << newrow
end
# close the file
csv_out.close
The output file has the desired data:
USER#USER-SVE1411EGXB:~/Desktop/csvfile$ ls
FL_insurance_sample.csv FL_insurance_sample_out.csv read_write_csv.rb
The input file data looked like this:
policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1

Iterate over an Excel workbook and index everything?

This would be done in Ruby..I have provided what I have attempted thus far.
I am curious as to if it is possible to iterate over an excel workbook (so it would be multiple sheets) and basically index/record where everything is located. Lets say I have a workbook of 10 sheets. I want it to grab the first sheet, record that sheets name, then move to the first cell and begin indexing(not sure if correct word) the data on that sheet. It would record the cell location so for the first (1,A) and the data thats in it. I am trying to output the data into a format as such like a CSV file or something:
Some code I have written that basically just iterates over every sheet and every cell in a workbook (removes whitespaces) and grabs its data and puts into a CSV...no sheet names or cell numbers present. I am using the roo and csv gems:
require 'rubygems'
require 'roo'
#Classes Used
class ArrayIterator
def initialize(array)
#array = array
#index = 0
end
def has_next?
#index < #array.length
end
def item
#array[#index]
end
def next_item
value = #array[#index]
#index += 1
value
end
end
#Open up files to compare
w1 = Excelx.new ( "C:/Ruby/myworkbook.xlsx" )
$values = Array.new
i = 0.to_i
# Continue until no worksheets left
num_sheets = w1.sheets().size
while (i < num_sheets)
puts "i is currently : #{i}"
puts "length of sheet array is : #{num_sheets}"
#Grab first sheet of each workbook
w1.default_sheet = w1.sheets[i]
1.upto(w1.last_row) do | row |
1.upto(w1.last_column) do | column |
string = w1.cell(row, column).to_s
if (string.strip.empty?)
puts "Whitespace!"
else
$values << string
end
end
end
i = i + 1.to_i
end
count = 0.to_i
CSV.open('C:/Ruby/results.csv', "w") do |csv|
csv << ['String']
i = ArrayIterator.new($values)
while i.has_next?
csv << [i.next_item]
count += 1
end
end
I took the liberty to shorten your script while adding check on empty sheets which produced errors.
require 'roo'
w1 = Excelx.new ( "C:/Ruby193/test/roo/book1.xlsx" )
CSV.open("book1.csv", "w") do |csv|
w1.sheets.each do |sheet|
w1.default_sheet = sheet
if w1.first_row && w1.first_column
eval(w1.to_s).each do |index, value|
csv << [sheet, index, value]
end
end
end
end
which gives in book1.csv
Sheet1,"[1, 1]",a1
Sheet1,"[1, 2]",b1
Sheet1,"[2, 1]",a2
Sheet1,"[2, 2]",b2
Sheet2,"[1, 1]",aa1
Sheet2,"[1, 2]",bb1
Sheet2,"[2, 1]",aa2
Sheet2,"[2, 2]",bb2

Resources