How not to save to csv when array is empty - ruby

I'm parsing through a website and i'm looking for potentially many million rows of content. However, csv/excel/ods doesn't allow for more than a million rows.
That is why I'm trying to use a provisionary to exclude saving empty content. However, it's not working: My code keeps creating empty rows in csv.
This is the code I have:
# create csv
CSV.open("neverending.csv", "w") do |csv|
csv << ["kuk","date","name"]
# loop through all urls
File.foreach("neverendingurls.txt") do |line|
begin
doorzoekbarefile = Nokogiri::HTML(open(line))
for k in 1..999 do
# PROVISIONARY / CONDITIONAL
unless doorzoekbarefile.at_xpath("//td[contains(style, '60px')])[#{k}]").nil?
# xpaths
kuk = doorzoekbarefile.at_xpath("(//td[contains(#style,'60px')])[#{k}]")
date = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[1]")
name = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[2]")
# save to csv
csv << [kuk,date,name]
end
end
end
rescue
puts "error bij url #{line}"
end
end
end
Anybody have a clue what's going wrong or how to solve the problem? Basically I simply need to change the code so that it doesn't create a new row of csv data when the xpaths are empty.

This really doesn't have to do with xpath. It's simple Array#empty?
row = [kuk,date,name]
csv << row if row.compact.empty?
BTW, your code is a mess. Learn how to indent at least beore posting again.

Related

I have a conundrum involving blocks and passing them around, need help solving it

Ok, so I've build a DSL and part of it requires the user of the DSL to define what I called a 'writer block'
writer do |data_block|
CSV.open("data.csv", "wb") do |csv|
headers_written = false
data_block do |hash|
(csv << headers_written && headers_written = true) unless headers_written
csv << hash.values
end
end
end
The writer block gets called like this:
def pull_and_store
raise "No writer detected" unless #writer
#writer.call( -> (&block) {
pull(pull_initial,&block)
})
end
The problem is two fold, first, is this the best way to handle this kind of thing and second I'm getting a strange error:
undefined method data_block' for Servo_City:Class (NoMethodError)
It's strange becuase I can see data_block right there, or at least it exists before the CSV block at any rate.
What I'm trying to create is a way for the user to write a wrapper block that both wraps around a block and yields a block to the block that is being wrapped, wow that's a mouthful.
Inner me does not want to write an answer before the question is clarified.
Other me wagers that code examples will help to clarify the problem.
I assume that the writer block has the task of persisting some data. Could you pass the data into the block in an enumerable form? That would allow the DSL user to write something like this:
writer do |data|
CSV.open("data.csv", "wb") do |csv|
csv << header_row
data.each do |hash|
data_row = hash.values
csv << data_row
end
end
end
No block passing required.
Note that you can pass in a lazy collection if dealing with hugely huge data sets.
Does this solve your problem?
Trying to open the CSV file every time you want to write a record seems overly complex and likely to cause bad performance (unless writing is intermittent). It will also overwrite the CSV file each time unless you change the file mode from wb to ab.
I think something simple like:
csv = CSV.open('data.csv', 'wb')
csv << headers
writer do |hash|
csv << hash.values
end
would be something more understandable.

Ruby equivalent to Python's DictWriter?

I have a Ruby script that goes through a CSV, determines some information, and then puts out a resulting CSV file. In Python, I'm able to open both my source file and my results file with DictReader and DictWriter respectively and write rows as dictionaries, where keys are the file header values. It doesn't appear that there is a manageable way to do this in Ruby, but I'm hoping somebody can point me to a better solution than storing all of my result hashes in an array and writing them after the fact.
The standard library "CSV" gives rows hash-like behavior when headers are enabled.
require 'csv'
CSV.open("file.csv", "wb") do |csv_out|
CSV.foreach("test.csv", headers: true) do |row|
row["header2"].upcase! # hashlike behaviour
row["new_header"] = 12 # add a new column
csv_out << row
end
end
(test.csv has a header1, a header2 and some random comma separated string lines.)

Ruby: Write a value to a specific location in CSV file

I'm still fairly new to coding and I'm trying to learn about manipulating CSV files.
The code below opens a specified CSV file, goes to each url in the CSV file in column B (header = url), and finds the price on the webpage.
Example data from CSV file:
Store,URL,Price
Walmart,http://www.walmart.com/ip/HP-11.6-Stream-Laptop-PC-with-Intel-Celeron-Processor-2GB-Memory-32GB-Hard-Drive-Windows-8.1-and-Microsoft-Office-365-Personal-1-yr-subscription/39073484
Walmart,http://www.walmart.com/ip/Nextbook-10.1-Intel-Quad-Core-2-In-1-Detachable-Windows-8.1-Tablet/39092206
Walmart,http://www.walmart.com/ip/Nextbook-10.1-Intel-Quad-Core-2-In-1-Detachable-Windows-8.1-Tablet/39092206
I'm having trouble writing that price to the adjacent column C (header = price) in the same CSV.
require 'nokogiri'
require 'open-uri'
require 'csv'
contents = CSV.open "mp_lookup.csv", headers: true, header_converters: :symbol
contents.each do |row|
row_url = row[:url]
goto_url = Nokogiri::HTML(open(row_url))
new_price = goto_url.css('meta[itemprop="price"]')[0]['content']
#----
#In this section, I'm looking to write the value of new_price to the 3rd column in the same CSV file
#----
end
In the past, I've been able to use:
in_file = open("mp_lookup.csv", 'w')
in_file.write(new_price)
But this doesn't seem to work in this situation.
Any help is appreciated!
The simple answer is that you can refer to the :price column in the CSV file, just like you refer to the :url column. Try this code to set the price in the CSV object in memory:
row[:price] = new_price
After you've read through all of the records, you'll want to save the CSV file again. You can save it to any filename, but we'll simply overwrite the previous file in this example:
CSV.open("mp_lookup.csv", "wb") do |csv|
contents.each do |row|
csv << row
end
end
In a real production environment, you'd want to be more fault tolerant than this, and preserve the original file until the end of the process. However, this shows how to update the values in the price column for each row, and then save the changes to a file.

Reading every line in a CSV and using it to query an API

I have the following Ruby code:
require 'octokit.rb'
require 'csv.rb'
CSV.foreach("actors.csv") do |row|
CSV.open("node_attributes.csv", "wb") do |csv|
csv << [Octokit.user "userid"]
end
end
I have a csv called actors.csv where every row has one entry - a string with a userid.
I want to go through all the rows, and for each row do Octokit.user "userid", and then store the output from each query on a separate row in a CSV - node_attributes.csv.
My code does not seem to do this? How can I modify it to make this work?
require 'csv'
DOC = 'actors.csv'
DOD = 'new_output.csv'
holder = CSV.read(DOC)
You can navigate it by calling
holder[0][0]
=> data in the array
holder[1][0]
=> moar data in array
make sense?
#make this a loop
profile = []
profile[0] = holder[0][0]
profile[1] = holder[1][0]
profile[2] = 'whatever it is you want to store in the new cell'
CSV.open(DOD, "a") do |data|
data << profile.map
end
#end the loop here
That last bit of code will print whatever you want into a new csv file

Ruby - Reading csv file and executing value in loop is skipping over lines in the csv file

I'm sure this is a completely ignorant question but here it goes. The following code's objective is to read a list of id's from a standard csv file, use the value to append to a URL, call the URL and extract a specific attribute via xpath. The problem I'm having is that the loop seems to be skipping some lines.
In example, here is a sample of 10 values:
777961
777972
781033
781044
781055
847066
744187
893908
369009
369010
The code is only reading every other line. The actual file has around 6000 lines, not huge but I'm only getting about 2500 values returned in the second file.
f = File.open('test.csv', 'r+')
url_f = File.open("url.csv", "w")
for line in f
f.each_line do |item|
item = f.gets
url = "http://test.com/testid=" + item
client = HTTPClient.new
resp = client.get_content(url)
doc = Nokogiri::HTML(resp)
doc.xpath("//link[#rel='canonical']/#href").each do |attr|
url_f.puts attr.value
puts attr.value
end
puts item
end
end
Nevermind, I figured it out.
I had the line item = f.gets which would call the next line every time the loop ran thus skipping every other line. I knew it was a noob question. :P

Resources