How to save the result of Nokogiri - ruby

This is part of a ruby script. I want to save the results to a text file. I only want the results specified in these two DIVS.
url = browser.html
doc = Nokogiri::HTML(open(url))
price = doc.css("#sectionPrice").text
ship = doc.css("#shippingCharges td").text
How do I save the scraped results? Mind you that the script loading the page is working correclty. In SHELL I can see the values of my scrape using XPATH as follows.
page_html = Nokogiri::HTML.parse(browser.html)
shipping = puts page_html.xpath(".//*[#id='shippingCharges']").inner_text
price = puts page_html.xpath(".//*[#id='sectionPrice']").inner_text
How do I save this data to a CSV or XML?
//Side Question: Is this data returned in SHELL saved anywhere? How do I access it outside of SHELL
url = browser.html
doc = Nokogiri::HTML(open(url))
price = doc.css("#sectionPrice").text
ship = doc.css("#shippingCharges td").text
CSV.open("/users/fabio/desktop/ruby/gp.csv", "wb") do |csv|
csv << [price, ship]
end
Not creating the CSVfile. Nothing appearing in the DIR What gives?

It is pretty simple to write this to a csv file.
Just add the following in:
require 'csv'
CSV.open("file.csv", "wb") do |csv|
csv << [price, ship]
end
If shipping and price are arrays then you will want to iterate through them but this is how you create a csv.
Hope this gets you on your way.
Cheers!

Related

Ruby: Write a value to a specific location in CSV file

I'm still fairly new to coding and I'm trying to learn about manipulating CSV files.
The code below opens a specified CSV file, goes to each url in the CSV file in column B (header = url), and finds the price on the webpage.
Example data from CSV file:
Store,URL,Price
Walmart,http://www.walmart.com/ip/HP-11.6-Stream-Laptop-PC-with-Intel-Celeron-Processor-2GB-Memory-32GB-Hard-Drive-Windows-8.1-and-Microsoft-Office-365-Personal-1-yr-subscription/39073484
Walmart,http://www.walmart.com/ip/Nextbook-10.1-Intel-Quad-Core-2-In-1-Detachable-Windows-8.1-Tablet/39092206
Walmart,http://www.walmart.com/ip/Nextbook-10.1-Intel-Quad-Core-2-In-1-Detachable-Windows-8.1-Tablet/39092206
I'm having trouble writing that price to the adjacent column C (header = price) in the same CSV.
require 'nokogiri'
require 'open-uri'
require 'csv'
contents = CSV.open "mp_lookup.csv", headers: true, header_converters: :symbol
contents.each do |row|
row_url = row[:url]
goto_url = Nokogiri::HTML(open(row_url))
new_price = goto_url.css('meta[itemprop="price"]')[0]['content']
#----
#In this section, I'm looking to write the value of new_price to the 3rd column in the same CSV file
#----
end
In the past, I've been able to use:
in_file = open("mp_lookup.csv", 'w')
in_file.write(new_price)
But this doesn't seem to work in this situation.
Any help is appreciated!
The simple answer is that you can refer to the :price column in the CSV file, just like you refer to the :url column. Try this code to set the price in the CSV object in memory:
row[:price] = new_price
After you've read through all of the records, you'll want to save the CSV file again. You can save it to any filename, but we'll simply overwrite the previous file in this example:
CSV.open("mp_lookup.csv", "wb") do |csv|
contents.each do |row|
csv << row
end
end
In a real production environment, you'd want to be more fault tolerant than this, and preserve the original file until the end of the process. However, this shows how to update the values in the price column for each row, and then save the changes to a file.

Reading every line in a CSV and using it to query an API

I have the following Ruby code:
require 'octokit.rb'
require 'csv.rb'
CSV.foreach("actors.csv") do |row|
CSV.open("node_attributes.csv", "wb") do |csv|
csv << [Octokit.user "userid"]
end
end
I have a csv called actors.csv where every row has one entry - a string with a userid.
I want to go through all the rows, and for each row do Octokit.user "userid", and then store the output from each query on a separate row in a CSV - node_attributes.csv.
My code does not seem to do this? How can I modify it to make this work?
require 'csv'
DOC = 'actors.csv'
DOD = 'new_output.csv'
holder = CSV.read(DOC)
You can navigate it by calling
holder[0][0]
=> data in the array
holder[1][0]
=> moar data in array
make sense?
#make this a loop
profile = []
profile[0] = holder[0][0]
profile[1] = holder[1][0]
profile[2] = 'whatever it is you want to store in the new cell'
CSV.open(DOD, "a") do |data|
data << profile.map
end
#end the loop here
That last bit of code will print whatever you want into a new csv file

Need help exporting parsed results, via Nokogiri, and exporting to CSV,. Only last parsed result is shown, why?

This is killing me and searching here and the big G is confusing me even more.
I followed the tutorial at Railscasts #190 on Nokogiri and was able to write myself a nice little parser:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.target.com/c/movies-entertainment/-/N-5xsx0/Ntk-All/Ntt-wwe/Ntx-matchallpartial+rel+E#navigation=true&facetedValue=/-/N-5xsx0&viewType=medium&sortBy=PriceLow&minPrice=0&maxPrice=10&isleaf=false&navigationPath=5xsx0&parentCategoryId=9975218&RatingFacet=0&customPrice=true"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".standard").each do |item|
title = item.at_css("span.productTitle a")[:title]
format = item.at_css("span.description").text
price = item.at_css(".price-label").text[/\$[0-9\.]+/]
link = item.at_css("span.productTitle a")[:href]
puts "#{title}, #{format}, #{price}, #{link}"
end
I'm happy with the results and able to see it in the Windows console. However, I want to export the results to a CSV file and have tried numerous ways (with no luck) and I know I'm missing something. My latest updated code (after downloading the html files) is below:
require 'rubygems'
require 'nokogiri'
require 'csv'
#title = Array.new
#format = Array.new
#price = Array.new
#link = Array.new
doc = Nokogiri::HTML(open("index1.html"))
doc.css(".standard").each do |item|
#title << item.at_css("span.productTitle a")[:title]
#format << item.at_css("span.description").text
#price << item.at_css(".price-label").text[/\$[0-9\.]+/]
#link << item.at_css("span.productTitle a")[:href]
end
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
csv << [#title, #format, #price, #link]
end
It works and spits a file out for me, but just the last result. I followed the tutorial at Andrew!: WEb Scraping... and trying to mix what I'm trying to achieve with someone else's process is confusing.
I assume it's looping through all of the results and only printing the last. Can someone give me pointers on how I should loop this (if that's the problem) so that all the results are in their respective columns?
Thanks in advance.
You're storing values in four arrays, but you're not enumerating the arrays when you generate your output.
Here is a possible fix:
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
until #title.empty?
csv << [#title.shift, #format.shift, #price.shift, #link.shift]
end
end
Note that this is a destructive operation that shifts the values off of the arrays one at a time, so in the end they will all be empty.
There are more efficient ways to read and convert the data, but this will hopefully do what you want for now.
There are several things you could do to write this more in the "Ruby way":
require 'rubygems'
require 'nokogiri'
require 'csv'
doc = Nokogiri::HTML(open("index1.html"))
CSV.open('file.csv', 'wb') do |csv|
csv << %w[title format price link]
doc.css('.standard').each do |item|
csv << [
item.at_css('span.productTitle a')[:title]
item.at_css('span.description').text
item.at_css('.price-label').text[/\$[0-9\.]+/]
item.at_css('span.productTitle a')[:href]
]
end
end
Without sample HTML it's not possible to test this, but, based on your code, it looks like it'd work.
Notice that in your code you're using instance variables. They're not necessary because you aren't defining a class to have an instance of. You can use local values instead.

Xpath content not saved

It might just be an idiotic bug in the code that I haven't yet discovered, but it's been taking me quite some time: When parsing websites using nokogiri and xpath, and trying to save the content of the xpaths to a .csv file, the csv file has empty cells.
Basically, the content of the xpath returns empty OR my code doesn't properly read the websites.
This is what I'm doing:
require 'open-uri'
require 'nokogiri'
require 'csv'
CSV.open("neverend.csv", "w") do |csv|
csv << ["kuk","date","name"]
#first, open the urls from a document. The urls are correct.
File.foreach("neverendurls.txt") do |line|
#second, the loop for each url
searchablefile = Nokogiri::HTML(open(line))
#third, the xpaths. These work when I try them on the website.
kuk = searchablefile.at_xpath("(//tbody/tr/td[contains(#style,'60px')])[1]")
date = searchablefile.at_xpath("(//tbody/tr/td[contains(#style,'60px')])[1]/following-sibling::*[1]")
name = searchablefile.at_xpath("(//tbody/tr/td[contains(#style, '60px')])[1]/following-sibling::*[2]")
#fourth, saving the xpaths
csv << [kuk,date,name]
end
end
what am I missing here?
It's impossible to tell from what you posted, but let's clean that hot mess up with css:
kuk = searchablefile.at 'td[style*=60px]'
date = searchablefile.at 'td[style*=60px] + *'
name = searchablefile.at 'td[style*=60px] + * + *'

How not to save to csv when array is empty

I'm parsing through a website and i'm looking for potentially many million rows of content. However, csv/excel/ods doesn't allow for more than a million rows.
That is why I'm trying to use a provisionary to exclude saving empty content. However, it's not working: My code keeps creating empty rows in csv.
This is the code I have:
# create csv
CSV.open("neverending.csv", "w") do |csv|
csv << ["kuk","date","name"]
# loop through all urls
File.foreach("neverendingurls.txt") do |line|
begin
doorzoekbarefile = Nokogiri::HTML(open(line))
for k in 1..999 do
# PROVISIONARY / CONDITIONAL
unless doorzoekbarefile.at_xpath("//td[contains(style, '60px')])[#{k}]").nil?
# xpaths
kuk = doorzoekbarefile.at_xpath("(//td[contains(#style,'60px')])[#{k}]")
date = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[1]")
name = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[2]")
# save to csv
csv << [kuk,date,name]
end
end
end
rescue
puts "error bij url #{line}"
end
end
end
Anybody have a clue what's going wrong or how to solve the problem? Basically I simply need to change the code so that it doesn't create a new row of csv data when the xpaths are empty.
This really doesn't have to do with xpath. It's simple Array#empty?
row = [kuk,date,name]
csv << row if row.compact.empty?
BTW, your code is a mess. Learn how to indent at least beore posting again.

Resources