adding an array to a csv file in second column using ruby - ruby

I am trying to automate a search on Google because I have more than 1 thousand lines. I can read and automate my search from the CSV, but I cannot add the array to the file. Maybe I'm missing something?
For the test, the CSV file is made up of 1 column with no header and 3 rows.
Here is my code:
require 'watir'
require 'nokogiri'
require 'csv'
browser = Watir::Browser.new(:chrome)
browser.goto("http://www.google.com")
CSV.open('C:\Users\Market\Documents\Emailhunter_scraper\test-email.csv').map do |terms|
browser.text_field(title: "Rechercher").set terms
browser.send_keys :return
sleep(rand(10))
doc = Nokogiri::HTML.parse(browser.html)
doc.css("div.f kv _SWb").each do |item|
name = item.css('a').text
link = item.css('a')[:href]
csv << [name, link]
end
sleep(rand(10))
end
sleep(rand(10))

As shown in the documentation for CSV.open, the file mode defaults to "rb".
This means the file is being opened as read-only. Instead, you need to use:
CSV.open('path/to/file/csv', 'wb')
The full documentation for different modes can be seen here. They are:
"r" Read-only, starts at beginning of file (default mode).
"r+" Read-write, starts at beginning of file.
"w" Write-only, truncates existing file
to zero length or creates a new file for writing.
"w+" Read-write, truncates existing file to zero length
or creates a new file for reading and writing.
"a" Write-only, each write call appends data at end of file.
Creates a new file for writing if file does not exist.
"a+" Read-write, each write call appends data at end of file.
Creates a new file for reading and writing if file does
not exist.
"b" Binary file mode
Suppresses EOL <-> CRLF conversion on Windows. And
sets external encoding to ASCII-8BIT unless explicitly
specified.
"t" Text file mode

Related

How to replace the first few bytes of a file in Ruby without opening the whole file?

I have a 30MB XML file that contains some gibberish in the beginning, and so typically I have to remove that in order for Nokogiri to be able to parse the XML document properly.
Here's what I currently have:
contents = File.open(file_path).read
if contents[0..123].include? 'authenticate_response'
fixed_contents = File.open(file_path).read[123..-1]
File.open(file_path, 'w') { |f| f.write(fixed_contents) }
end
However, this actually causes the ruby script to open up the large XML file twice. Once to read the first 123 characters, and another time to read everything but the first 123 characters.
To solve the first issue, I was able to accomplish this:
contents = File.open(file_path).read(123)
However, now I need to remove these characters from the file without reading the entire file. How can I "trim" the beginning of this file without having to open the entire thing in memory?
You can open the file once, then read and check the "garbage" and finally pass the opened file directly to nokogiri for parsing. That way, you only need read the file once and don't need to write it at all.
File.open(file_path) do |xml_file|
if xml_file.read(123).include? 'authenticate_response'
# header found, nothing to do
else
# no header found. We rewind and let nokogiri parse the whole file
xml_file.rewind
end
xml = Nokogiri::XML.parse(xml_file)
# Now to whatever you want with the parsed XML document
end
Please refer to the documentation of IO#read, IO#rewind and Nokigiri::XML::Document.parse for details about those methods.

Ruby CSV.open will create file if one does not exist?

I'm generating a CSV file in Ruby with data from my database. I was using CSV.open(filename, "w") do |csv|
Will that create a file with that filename if one does not exist?
That depends on the options you specify when you call CSV.open().
If you call it with a "r" (read only) mode argument, then you will get an error:
No such file or directory # rb_sysopen - your_file_name.ext
If, however, you use the "w" (write) or "wb" (write binary) option, Ruby will create the file for you.
CSV.open("my_new_file.csv", "r") --> will fail if file does not exist
CSV.open("my_new_file.csv", "w") --> will create a new file

Ruby: Why can't I create a new file?

I'm trying to create a json file and write to it.
My code looks like this:
def save_as_json(object)
f = File.new('file.json')
f.puts(object.to_json, 'w')
f.close
end
save_as_json({'name'=>'fred'})
The problem is, I get the following error when I run it:
:15:in `initialize': No such file or directory # rb_sysopen - file.json (Errno::ENOENT)
I'm asking Ruby to create the file but it's complaining that it doesn't exist! What is the correct way to create and write to a file?
You just need to open the file using the 'w' mode like this:
f = File.new('file.json', 'w')
You want to determine the mode based on what you plan to do with the file, but here are your options:
"r" Read-only, starts at beginning of file (default mode).
"r+" Read-write, starts at beginning of file.
"w" Write-only, truncates existing file
to zero length or creates a new file for writing.
"w+" Read-write, truncates existing file to zero length
or creates a new file for reading and writing.
"a" Write-only, each write call appends data at end of file.
Creates a new file for writing if file does not exist.
"a+" Read-write, each write call appends data at end of file.
Creates a new file for reading and writing if file does
not exist.
IO Docs
File creation defaults to read mode, so trying to use a filespec that does not exist will result in an error:
2.3.0 :001 > f = File.new 'foo'
Errno::ENOENT: No such file or directory # rb_sysopen - foo
You need to specify 'w':
2.3.0 :002 > f = File.new 'foo', 'w'
=> #<File:foo>
That said, there are easier ways to write to files than to get a file handle using File.new or File.open. The simplest way in Ruby is to call File.write:
File.write('file.json', object.to_json)
You can use the longer File.open approach if you want; if you do, the simplest approach is to pass a block to File.open:
File.open('file.json', 'w') { |f| f << object.to_json }
This eliminates the need for you to explicitly close the file; File.open, when passed a block, closes the file for you after the block has finished executing.

Changing information in a CSV file

I'm trying to write a ruby script that will read through a CSV file and prepend information to certain cells (for instance adding a path to a file). I am able to open and mutate the text just fine, but am having issues writing back to the CSV without overriding everything. This is a sample of what I have so far:
CSV.foreach(path) { |row|
text = row[0].to_s
new_text = "test:#{text}"
}
I would like to add something within that block that would then write new_textback to the same reference cell(row) in the file. The only way I have to found to write to a file is
CSV.open(path, "wb") { |row|
row << new_text
}
But I think that is bad practice since you are reopening the file within the file block already. Is there a better way I could do this?
EX: I have a CSV file that looks something like:
file,destination
test.txt,A101
and need it to be:
file,destination
path/test.txt,id:A101
Hope that makes sense. Thanks in advance!
Depending on the size if the file, you might consider loading the contents of the file into a local variable and then manipulating that, overwriting the original file.
lines = CSV.read(path)
File.open(path, "wb") do |file|
lines.each do |line|
text = line[0].to_s
line[0] = "test:#{text}" # Replace this with your editing logic
file.write CSV.generate_line(line)
end
end
Alternately, if the file is big, you could write each modified line to a new file along the way and then replace the old file with the new one at the end.
Given that you don't appear to be doing anything that draws on CSV capabilities, I'd recommend using Ruby's "in-place" option variable $-i.
Some of the stats software I use wants just the data, and can't deal with a header line. Here's a script I wrote a while back to (appear to) strip the first line out of one or more data files specified on the command-line.
#! /usr/bin/env ruby -w
#
# User supplies the name of one or more files to be "stripped"
# on the command-line.
#
# This script ignores the first line of each file.
# Subsequent lines of the file are copied to the new version.
#
# The operation saves each original input file with a suffix of
# ".orig" and then operates in-place on the specified files.
$-i = ".orig" # specify backup suffix
oldfilename = ""
ARGF.each do |line|
if ARGF.filename == oldfilename # If it's an old file
puts line # copy lines through.
else # If it's a new file remember it
oldfilename = ARGF.filename # but don't copy the first line.
end
end
Obviously you'd want to change the puts line pass-through to whatever edit operations you want to perform.
I like this solution because even if you screw it up, you've preserved your original file as its original name with .orig (or whatever suffix you choose) appended.

Append new lines to a csv from json.parse

more sysadmin (chef) than ruby guy, so this may be a five minute fix.
I am working on a task where i write a ruby script that pulls json data from multiple files, parses it, and writes the desired fields to a single .csv file. Basically pulling metadata about aws accounts and putting it in an accountant friendly format.
Got a lot of help from another stackoverflow on how to solve the problem for a single file, json.parse help.
My issue is that I am trying to pull the same data from multiple JSON files in an array. I can get it to loop through each file with the code below.
require 'csv'
require "json"
delim_file = CSV.open("delimited_test.csv", "w")
aws_account_list = %w(example example2)
aws_account_list.each do |account|
json_file = File.read(account.to_s + "_aws.json")
parsed_json = JSON.parse(json_file)
delim_file = CSV.open("delimited_test.csv", "w")
# This next line could be a problem if you ran this code multiple times
delim_file << ["EbsOptimized", "PrivateDnsName", "KeyName", "AvailabilityZone", "OwnerId"]
parsed_json['Reservations'].each do |inner_json|
inner_json['Instances'].each do |instance_json|
delim_file << [[instance_json['EbsOptimized'].to_s, instance_json['PrivateDnsName'], instance_json['KeyName'], instance_json['Placement']['AvailabilityZone'], inner_json['OwnerId']],[]]
end
delim_file.close
end
end
However, whenever I do it, it overwrites every time to the same single row in the .csv file. I have tried adding a \n string to the end of the array, converting the array to a string with hashes and doing a \n, but all that does is add a line to the same row that it overwrites.
How would I go about writing that it reads each json file, then appending each files metadata to a new row? This looks like a simple case of writing the right loop, but I can't figure it out.
You declared your file like this:
delim_file = CSV.open("delimited_test.csv", "w")
To fix your issue, all you have to do is change "w" to "a":
delim_file = CSV.open("delimited_test.csv", "a")
See the docs for IO#new for a description of the available file modes. In short, w creates an empty file at the filename, overwriting anyothers, and writes to that. a only creates the file if it doesn't exist, and appends otherwise. Because you have it currently at w, it'll overwrite it each time you run the script. With a, it'll append to what's already there.
You need to open file in append mode, use
delim_file = CSV.open("delimited_test.csv", "a")
'a' Write-only, starts at end of file if file exists, otherwise creates a new file for writing.
'a+' Read-write, starts at end of file if file exists, otherwise creates a new file for reading and writing'

Resources