Lines skipped when writing to csv in ruby - ruby

I'm writing a small application that takes data from an uploaded csv file, parses each line and then rearranges the data in a new csv file, which is then downloaded.
Here is the code that creates the new csv file and downloads it:
CSV.open("template.csv", "w") do |csv|
#formatted_lines.each do |line|
csv << line
end
# Download CSV
send_file("template.csv", :disposition => 'attachment', :filename => File.basename("template.csv"))
end
I end up with a csv file with 270 lines, even though the #formatted_lines array has 280 lines/arrays in it. There is nothing wrong with the data in the original csv file that would cause an error when it is parsed. Why would it cut off the last 10 lines?

You're not actually closing the file before trying to send it, it's possible that the last ten lines are just buffered and haven't been written to disk. Try doing the send_file after the CSV.open block.

Related

Zlib::GzipReader doesnt read whole file

I have this block of ruby code. I need to read big json.gz file, that cannot be loaded into RAM at once (so no GzipReader.new method). To achieve this, I use GzipReader and then lazy read with batch loading. Everything works perfectly, but for some reasons, not all data from json get to block. Only 5500125 rows are being processed in this code, but file has cca 6600000 rows. If i use File.open('authors.jsonl.gz') instead of Zlib, then all rows are processed, but are not unzipped.
I looked almost all day to documenattion and haven't found anything :( I also try to unzip each row, that is processed, but all my attempts failed also. Is there way how to unzip file and then read it in chunks (all of its content not just part), or at least read line by line and unzip each line on its own?
Thank you guys :)
Zlib::GzipReader(File.open('authors.jsonl.gz')) do |file|
file.lazy.each_slice(batch_size) do |lines|
lines.each do |line|
parsed_line = JSON.parse(line.gsub('\u0000', ''))
array_of_authors << {id: parsed_line['id'],
name: parsed_line['name'],
username: parsed_line['username'],
description: parsed_line['description'],
followers_count: parsed_line.dig('public_metrics', 'followers_count'),
following_count: parsed_line.dig('public_metrics', 'following_count'),
tweet_count: parsed_line.dig('public_metrics', 'tweet_count'),
listed_count: parsed_line.dig('public_metrics', 'listed_count')}
end
end
end

adding an array to a csv file in second column using ruby

I am trying to automate a search on Google because I have more than 1 thousand lines. I can read and automate my search from the CSV, but I cannot add the array to the file. Maybe I'm missing something?
For the test, the CSV file is made up of 1 column with no header and 3 rows.
Here is my code:
require 'watir'
require 'nokogiri'
require 'csv'
browser = Watir::Browser.new(:chrome)
browser.goto("http://www.google.com")
CSV.open('C:\Users\Market\Documents\Emailhunter_scraper\test-email.csv').map do |terms|
browser.text_field(title: "Rechercher").set terms
browser.send_keys :return
sleep(rand(10))
doc = Nokogiri::HTML.parse(browser.html)
doc.css("div.f kv _SWb").each do |item|
name = item.css('a').text
link = item.css('a')[:href]
csv << [name, link]
end
sleep(rand(10))
end
sleep(rand(10))
As shown in the documentation for CSV.open, the file mode defaults to "rb".
This means the file is being opened as read-only. Instead, you need to use:
CSV.open('path/to/file/csv', 'wb')
The full documentation for different modes can be seen here. They are:
"r" Read-only, starts at beginning of file (default mode).
"r+" Read-write, starts at beginning of file.
"w" Write-only, truncates existing file
to zero length or creates a new file for writing.
"w+" Read-write, truncates existing file to zero length
or creates a new file for reading and writing.
"a" Write-only, each write call appends data at end of file.
Creates a new file for writing if file does not exist.
"a+" Read-write, each write call appends data at end of file.
Creates a new file for reading and writing if file does
not exist.
"b" Binary file mode
Suppresses EOL <-> CRLF conversion on Windows. And
sets external encoding to ASCII-8BIT unless explicitly
specified.
"t" Text file mode

Changing information in a CSV file

I'm trying to write a ruby script that will read through a CSV file and prepend information to certain cells (for instance adding a path to a file). I am able to open and mutate the text just fine, but am having issues writing back to the CSV without overriding everything. This is a sample of what I have so far:
CSV.foreach(path) { |row|
text = row[0].to_s
new_text = "test:#{text}"
}
I would like to add something within that block that would then write new_textback to the same reference cell(row) in the file. The only way I have to found to write to a file is
CSV.open(path, "wb") { |row|
row << new_text
}
But I think that is bad practice since you are reopening the file within the file block already. Is there a better way I could do this?
EX: I have a CSV file that looks something like:
file,destination
test.txt,A101
and need it to be:
file,destination
path/test.txt,id:A101
Hope that makes sense. Thanks in advance!
Depending on the size if the file, you might consider loading the contents of the file into a local variable and then manipulating that, overwriting the original file.
lines = CSV.read(path)
File.open(path, "wb") do |file|
lines.each do |line|
text = line[0].to_s
line[0] = "test:#{text}" # Replace this with your editing logic
file.write CSV.generate_line(line)
end
end
Alternately, if the file is big, you could write each modified line to a new file along the way and then replace the old file with the new one at the end.
Given that you don't appear to be doing anything that draws on CSV capabilities, I'd recommend using Ruby's "in-place" option variable $-i.
Some of the stats software I use wants just the data, and can't deal with a header line. Here's a script I wrote a while back to (appear to) strip the first line out of one or more data files specified on the command-line.
#! /usr/bin/env ruby -w
#
# User supplies the name of one or more files to be "stripped"
# on the command-line.
#
# This script ignores the first line of each file.
# Subsequent lines of the file are copied to the new version.
#
# The operation saves each original input file with a suffix of
# ".orig" and then operates in-place on the specified files.
$-i = ".orig" # specify backup suffix
oldfilename = ""
ARGF.each do |line|
if ARGF.filename == oldfilename # If it's an old file
puts line # copy lines through.
else # If it's a new file remember it
oldfilename = ARGF.filename # but don't copy the first line.
end
end
Obviously you'd want to change the puts line pass-through to whatever edit operations you want to perform.
I like this solution because even if you screw it up, you've preserved your original file as its original name with .orig (or whatever suffix you choose) appended.

Append new lines to a csv from json.parse

more sysadmin (chef) than ruby guy, so this may be a five minute fix.
I am working on a task where i write a ruby script that pulls json data from multiple files, parses it, and writes the desired fields to a single .csv file. Basically pulling metadata about aws accounts and putting it in an accountant friendly format.
Got a lot of help from another stackoverflow on how to solve the problem for a single file, json.parse help.
My issue is that I am trying to pull the same data from multiple JSON files in an array. I can get it to loop through each file with the code below.
require 'csv'
require "json"
delim_file = CSV.open("delimited_test.csv", "w")
aws_account_list = %w(example example2)
aws_account_list.each do |account|
json_file = File.read(account.to_s + "_aws.json")
parsed_json = JSON.parse(json_file)
delim_file = CSV.open("delimited_test.csv", "w")
# This next line could be a problem if you ran this code multiple times
delim_file << ["EbsOptimized", "PrivateDnsName", "KeyName", "AvailabilityZone", "OwnerId"]
parsed_json['Reservations'].each do |inner_json|
inner_json['Instances'].each do |instance_json|
delim_file << [[instance_json['EbsOptimized'].to_s, instance_json['PrivateDnsName'], instance_json['KeyName'], instance_json['Placement']['AvailabilityZone'], inner_json['OwnerId']],[]]
end
delim_file.close
end
end
However, whenever I do it, it overwrites every time to the same single row in the .csv file. I have tried adding a \n string to the end of the array, converting the array to a string with hashes and doing a \n, but all that does is add a line to the same row that it overwrites.
How would I go about writing that it reads each json file, then appending each files metadata to a new row? This looks like a simple case of writing the right loop, but I can't figure it out.
You declared your file like this:
delim_file = CSV.open("delimited_test.csv", "w")
To fix your issue, all you have to do is change "w" to "a":
delim_file = CSV.open("delimited_test.csv", "a")
See the docs for IO#new for a description of the available file modes. In short, w creates an empty file at the filename, overwriting anyothers, and writes to that. a only creates the file if it doesn't exist, and appends otherwise. Because you have it currently at w, it'll overwrite it each time you run the script. With a, it'll append to what's already there.
You need to open file in append mode, use
delim_file = CSV.open("delimited_test.csv", "a")
'a' Write-only, starts at end of file if file exists, otherwise creates a new file for writing.
'a+' Read-write, starts at end of file if file exists, otherwise creates a new file for reading and writing'

How to assert a CSV file in Ruby

Is there a nice way to assert the contents of a CSV file in Ruby?
I understand how to use the CSV libraries and how to read in the CSV file, but that results in a long list of assertions such as:
`assert_equal("0", #csv_array[0].field('impressions'))
assert_equal("7", #csv_array[0].field('clicks'))
assert_equal("330", #csv_array[0].field('currency.GBP.commissions'))
assert_equal("6", #csv_array[0].field('currency.GBP.conversions'))
assert_equal("3300", #csv_array[0].field('currency.GBP.ordervalue'))`
Is there some sort of file comparator so I could write:
assert_equal(expected.csv ,actual.csv )
or something along those lines?
How about this:
expected_csv = "impressions,clicks,currency.GBP.comiisions,currency.GBP.conversions,currency.GBP.ordervalue
0,7,330,6,3300"
actual_csv = File.open('actual.csv').read
assert_equal(expected_csv, actual_csv)
That should work if the entire contents of the CSV file is only 2 lines. Otherwise you will have to manipulate actual_csv to get the parts you want to test. You could do that like so:
IO.readlines('actual.csv')[3]
That will get you the third line. You can then concatenate with a header line or compare to a string without the header.
If you have to test very output, you might find approval testing an interesting approach. Basically, the output is saved the first time your test runs. You can then check the output manually and approve it if correct. On subsequent runs, there will be an error when the output differs.
I created a quick and dirty method for doing this which I may clean up and turn into a gem at some point. https://gist.github.com/bpardee/513b4a15e5ebdc596e0b
For instance, the following code:
file = 'test.csv'
File.open(file, 'w') do |fout|
fout.puts "foo,bar,zulu\n1,2,3\n4,5,6"
end
assert_csv(file) do |csv|
csv << %w(foo bar warrior)
csv << [1,3,5]
csv << [4,5,6]
end
Would result in:
Missing columns: ["zulu"]
Unexpected columns: ["warrior"]
The following mismatches were found in line 2:
bar actual=3 expected=2
I don't recommend this for big csv files since everything is loaded into memory.

Resources