I'm parsing through a website and i'm looking for potentially many million rows of content. However, csv/excel/ods doesn't allow for more than a million rows.
That is why I'm trying to use a provisionary to exclude saving empty content. However, it's not working: My code keeps creating empty rows in csv.
This is the code I have:
# create csv
CSV.open("neverending.csv", "w") do |csv|
csv << ["kuk","date","name"]
# loop through all urls
File.foreach("neverendingurls.txt") do |line|
doorzoekbarefile = Nokogiri::HTML(open(line))
for k in 1..999 do
unless doorzoekbarefile.at_xpath("//td[contains(style, '60px')])[#{k}]").nil?
# xpaths
kuk = doorzoekbarefile.at_xpath("(//td[contains(#style,'60px')])[#{k}]")
date = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[1]")
name = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[2]")
# save to csv
csv << [kuk,date,name]
puts "error bij url #{line}"
Anybody have a clue what's going wrong or how to solve the problem? Basically I simply need to change the code so that it doesn't create a new row of csv data when the xpaths are empty.
This really doesn't have to do with xpath. It's simple Array#empty?
row = [kuk,date,name]
csv << row if row.compact.empty?
BTW, your code is a mess. Learn how to indent at least beore posting again.
Ok, so I've build a DSL and part of it requires the user of the DSL to define what I called a 'writer block'
writer do |data_block|
CSV.open("data.csv", "wb") do |csv|
headers_written = false
data_block do |hash|
(csv << headers_written && headers_written = true) unless headers_written
csv << hash.values
The writer block gets called like this:
def pull_and_store
raise "No writer detected" unless #writer
#writer.call( -> (&block) {
The problem is two fold, first, is this the best way to handle this kind of thing and second I'm getting a strange error:
undefined method data_block' for Servo_City:Class (NoMethodError)
It's strange becuase I can see data_block right there, or at least it exists before the CSV block at any rate.
What I'm trying to create is a way for the user to write a wrapper block that both wraps around a block and yields a block to the block that is being wrapped, wow that's a mouthful.
Inner me does not want to write an answer before the question is clarified.
Other me wagers that code examples will help to clarify the problem.
I assume that the writer block has the task of persisting some data. Could you pass the data into the block in an enumerable form? That would allow the DSL user to write something like this:
writer do |data|
CSV.open("data.csv", "wb") do |csv|
csv << header_row
data.each do |hash|
data_row = hash.values
csv << data_row
No block passing required.
Note that you can pass in a lazy collection if dealing with hugely huge data sets.
Does this solve your problem?
Trying to open the CSV file every time you want to write a record seems overly complex and likely to cause bad performance (unless writing is intermittent). It will also overwrite the CSV file each time unless you change the file mode from wb to ab.
I think something simple like:
csv = CSV.open('data.csv', 'wb')
csv << headers
writer do |hash|
csv << hash.values
would be something more understandable.
I have a Ruby script that goes through a CSV, determines some information, and then puts out a resulting CSV file. In Python, I'm able to open both my source file and my results file with DictReader and DictWriter respectively and write rows as dictionaries, where keys are the file header values. It doesn't appear that there is a manageable way to do this in Ruby, but I'm hoping somebody can point me to a better solution than storing all of my result hashes in an array and writing them after the fact.
The standard library "CSV" gives rows hash-like behavior when headers are enabled.
require 'csv'
CSV.open("file.csv", "wb") do |csv_out|
CSV.foreach("test.csv", headers: true) do |row|
row["header2"].upcase! # hashlike behaviour
row["new_header"] = 12 # add a new column
csv_out << row
(test.csv has a header1, a header2 and some random comma separated string lines.)
I'm still fairly new to coding and I'm trying to learn about manipulating CSV files.
The code below opens a specified CSV file, goes to each url in the CSV file in column B (header = url), and finds the price on the webpage.
Example data from CSV file:
I'm having trouble writing that price to the adjacent column C (header = price) in the same CSV.
require 'nokogiri'
require 'open-uri'
require 'csv'
contents = CSV.open "mp_lookup.csv", headers: true, header_converters: :symbol
contents.each do |row|
row_url = row[:url]
goto_url = Nokogiri::HTML(open(row_url))
new_price = goto_url.css('meta[itemprop="price"]')[0]['content']
#In this section, I'm looking to write the value of new_price to the 3rd column in the same CSV file
In the past, I've been able to use:
in_file = open("mp_lookup.csv", 'w')
But this doesn't seem to work in this situation.
Any help is appreciated!
The simple answer is that you can refer to the :price column in the CSV file, just like you refer to the :url column. Try this code to set the price in the CSV object in memory:
row[:price] = new_price
After you've read through all of the records, you'll want to save the CSV file again. You can save it to any filename, but we'll simply overwrite the previous file in this example:
CSV.open("mp_lookup.csv", "wb") do |csv|
contents.each do |row|
csv << row
In a real production environment, you'd want to be more fault tolerant than this, and preserve the original file until the end of the process. However, this shows how to update the values in the price column for each row, and then save the changes to a file.
I have the following Ruby code:
require 'octokit.rb'
require 'csv.rb'
CSV.foreach("actors.csv") do |row|
CSV.open("node_attributes.csv", "wb") do |csv|
csv << [Octokit.user "userid"]
I have a csv called actors.csv where every row has one entry - a string with a userid.
I want to go through all the rows, and for each row do Octokit.user "userid", and then store the output from each query on a separate row in a CSV - node_attributes.csv.
My code does not seem to do this? How can I modify it to make this work?
require 'csv'
DOC = 'actors.csv'
DOD = 'new_output.csv'
holder = CSV.read(DOC)
You can navigate it by calling
=> data in the array
=> moar data in array
make sense?
#make this a loop
profile = []
profile[0] = holder[0][0]
profile[1] = holder[1][0]
profile[2] = 'whatever it is you want to store in the new cell'
CSV.open(DOD, "a") do |data|
data << profile.map
#end the loop here
That last bit of code will print whatever you want into a new csv file
I'm sure this is a completely ignorant question but here it goes. The following code's objective is to read a list of id's from a standard csv file, use the value to append to a URL, call the URL and extract a specific attribute via xpath. The problem I'm having is that the loop seems to be skipping some lines.
In example, here is a sample of 10 values:
The code is only reading every other line. The actual file has around 6000 lines, not huge but I'm only getting about 2500 values returned in the second file.
f = File.open('test.csv', 'r+')
url_f = File.open("url.csv", "w")
for line in f
f.each_line do |item|
item = f.gets
url = "http://test.com/testid=" + item
client = HTTPClient.new
resp = client.get_content(url)
doc = Nokogiri::HTML(resp)
doc.xpath("//link[#rel='canonical']/#href").each do |attr|
url_f.puts attr.value
puts attr.value
puts item
Nevermind, I figured it out.
I had the line item = f.gets which would call the next line every time the loop ran thus skipping every other line. I knew it was a noob question. :P