I'm learning Ruby and i need help with CSVs files.
I'm not finding the correct way to print output to a CSV file but i get recursivity and i don't want that.
Thats the code:
require "httparty"
require "csv"
class ConnexioEPPO
include HTTParty
base_uri 'https://data.eppo.int/api/rest/1.0'
##authtoken = "a9505d2ab257987580641d1a56de1f6c"
def pests(eppocode)
request = self.class.get("/taxon/#{eppocode}/pests?authtoken=#{##authtoken}")
result = request.parsed_response
CSV.open("data.csv", "w", headers: result["Host"].first.keys) do |csv|
result["Host"].each do |h|
requestTax = self.class.get("/taxon/#{h["eppocode"]}/taxonomy?authtoken=#{##authtoken}")
resultTax = requestTax.parsed_response
puts h["eppocode"]
resultTax.each do |tax|
puts "#{tax["eppocode"]} #{tax["prefname"]}"
planta = h.values << tax["eppocode"] +", "+ tax["prefname"]
csv << planta
end
end
end
end
end
connexioEPPO = ConnexioEPPO.new
puts connexioEPPO.pests('1ULMG')
As you can see in the first part of the code im requesting (pests) a information as a Hash what contains(this is just few lines):
{"eppocode"=>"ANIDMA", "idclass"=>9, "labelclass"=>"Host", "fullname"=>"Anisandrus maiche"}
{"eppocode"=>"ANOLGL", "idclass"=>9, "labelclass"=>"Host", "fullname"=>"Anoplophora glabripennis"}
{"eppocode"=>"APRIGE", "idclass"=>9, "labelclass"=>"Host", "fullname"=>"Apriona germari"}
{"eppocode"=>"PHYPUL", "idclass"=>9, "labelclass"=>"Host", "fullname"=>"'Candidatus Phytoplasma ulmi'"}
And then I'm requesting (taxonomy) information again and what I'm doing is getting information where the field eppocode is equal to eppocode in the new requests (the idea is just get the taxonomy of the previously requested fields by eppocode)
And the idea is to get printed out in a CSV the first headers
{"eppocode"=>"ANIDMA", "idclass"=>9, "labelclass"=>"Host", "fullname"=>"Anisandrus maiche"}
and adding the taxonomy of the corresponding field
Wished output(example with one eppocode):
ANIDMA,9,Host,Anisandrus maiche,"1ANIMK, Animalia","1ARTHP, Arthropoda","1INSEC, Insecta","1COLEO, Coleoptera","1CURCF, Curculionidae","1SCOLS, Scolytinae","1ANIDG, Anisandrus","ANIDMA, Anisandrus maiche"
Actual output(example with one eppocode):
ANIDMA,9,Host,Anisandrus maiche,"1ANIMK, Animalia"
ANIDMA,9,Host,Anisandrus maiche,"1ARTHP, Arthropoda"
ANIDMA,9,Host,Anisandrus maiche,"1HEXAQ, Hexapoda"
ANIDMA,9,Host,Anisandrus maiche,"1INSEC, Insecta"
ANIDMA,9,Host,Anisandrus maiche,"1COLEO, Coleoptera"
ANIDMA,9,Host,Anisandrus maiche,"1CURCF, Curculionidae"
ANIDMA,9,Host,Anisandrus maiche,"1SCOLS, Scolytinae"
ANIDMA,9,Host,Anisandrus maiche,"1ANIDG, Anisandrus"
ANIDMA,9,Host,Anisandrus maiche,"ANIDMA, Anisandrus maiche"
And what I want is avoid that bunch of fields with the same information.
I hope you understand my questions
Thanks!
I believe that in order to fix your code with the minimal changes, you need to change the posting to the CSV to be outside of your resultTax loop, so something like the following might work?:
planta = h.values
resultTax.each do |tax|
puts "#{tax["eppocode"]} #{tax["prefname"]}"
planta << tax["eppocode"] +", "+ tax["prefname"]
end
csv << planta
Related
Ok, so I've build a DSL and part of it requires the user of the DSL to define what I called a 'writer block'
writer do |data_block|
CSV.open("data.csv", "wb") do |csv|
headers_written = false
data_block do |hash|
(csv << headers_written && headers_written = true) unless headers_written
csv << hash.values
end
end
end
The writer block gets called like this:
def pull_and_store
raise "No writer detected" unless #writer
#writer.call( -> (&block) {
pull(pull_initial,&block)
})
end
The problem is two fold, first, is this the best way to handle this kind of thing and second I'm getting a strange error:
undefined method data_block' for Servo_City:Class (NoMethodError)
It's strange becuase I can see data_block right there, or at least it exists before the CSV block at any rate.
What I'm trying to create is a way for the user to write a wrapper block that both wraps around a block and yields a block to the block that is being wrapped, wow that's a mouthful.
Inner me does not want to write an answer before the question is clarified.
Other me wagers that code examples will help to clarify the problem.
I assume that the writer block has the task of persisting some data. Could you pass the data into the block in an enumerable form? That would allow the DSL user to write something like this:
writer do |data|
CSV.open("data.csv", "wb") do |csv|
csv << header_row
data.each do |hash|
data_row = hash.values
csv << data_row
end
end
end
No block passing required.
Note that you can pass in a lazy collection if dealing with hugely huge data sets.
Does this solve your problem?
Trying to open the CSV file every time you want to write a record seems overly complex and likely to cause bad performance (unless writing is intermittent). It will also overwrite the CSV file each time unless you change the file mode from wb to ab.
I think something simple like:
csv = CSV.open('data.csv', 'wb')
csv << headers
writer do |hash|
csv << hash.values
end
would be something more understandable.
I've been trying to use Ruby to create a CSV file from json data. I was able to create the file, but I need to add a few headers. I tried following suggestions and answers from similar questions posted here on Stack Overflow, but I keep getting errors. Can anyone give me some pointers?
Here's my code.
require 'csv'
require 'json'
CSV.open("your_csv.csv", "w") do |csv|
JSON.parse(File.open("tojson.txt").read).each do |hash|
csv << hash.values
#csv.each { |line| line['New_header'] = line[0].to_i + line[1].to_i }
end
end
And here is the error I'm getting:
Anyone have any suggestions?
This is not how you add headers to a csv file. When you generate csv content, a header row is just a regular row. And should be generated as such. Example:
CSV.open("your_csv.csv", "w") do |csv|
csv << ['new_header', 'value1', 'value2'] # the headers
JSON.parse(File.open("tojson.txt").read).each do |hash|
row = [generate, values, for, headers, above]
csv << row
end
end
You don't have a #csv variable. You have a csv one.
This is killing me and searching here and the big G is confusing me even more.
I followed the tutorial at Railscasts #190 on Nokogiri and was able to write myself a nice little parser:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.target.com/c/movies-entertainment/-/N-5xsx0/Ntk-All/Ntt-wwe/Ntx-matchallpartial+rel+E#navigation=true&facetedValue=/-/N-5xsx0&viewType=medium&sortBy=PriceLow&minPrice=0&maxPrice=10&isleaf=false&navigationPath=5xsx0&parentCategoryId=9975218&RatingFacet=0&customPrice=true"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".standard").each do |item|
title = item.at_css("span.productTitle a")[:title]
format = item.at_css("span.description").text
price = item.at_css(".price-label").text[/\$[0-9\.]+/]
link = item.at_css("span.productTitle a")[:href]
puts "#{title}, #{format}, #{price}, #{link}"
end
I'm happy with the results and able to see it in the Windows console. However, I want to export the results to a CSV file and have tried numerous ways (with no luck) and I know I'm missing something. My latest updated code (after downloading the html files) is below:
require 'rubygems'
require 'nokogiri'
require 'csv'
#title = Array.new
#format = Array.new
#price = Array.new
#link = Array.new
doc = Nokogiri::HTML(open("index1.html"))
doc.css(".standard").each do |item|
#title << item.at_css("span.productTitle a")[:title]
#format << item.at_css("span.description").text
#price << item.at_css(".price-label").text[/\$[0-9\.]+/]
#link << item.at_css("span.productTitle a")[:href]
end
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
csv << [#title, #format, #price, #link]
end
It works and spits a file out for me, but just the last result. I followed the tutorial at Andrew!: WEb Scraping... and trying to mix what I'm trying to achieve with someone else's process is confusing.
I assume it's looping through all of the results and only printing the last. Can someone give me pointers on how I should loop this (if that's the problem) so that all the results are in their respective columns?
Thanks in advance.
You're storing values in four arrays, but you're not enumerating the arrays when you generate your output.
Here is a possible fix:
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
until #title.empty?
csv << [#title.shift, #format.shift, #price.shift, #link.shift]
end
end
Note that this is a destructive operation that shifts the values off of the arrays one at a time, so in the end they will all be empty.
There are more efficient ways to read and convert the data, but this will hopefully do what you want for now.
There are several things you could do to write this more in the "Ruby way":
require 'rubygems'
require 'nokogiri'
require 'csv'
doc = Nokogiri::HTML(open("index1.html"))
CSV.open('file.csv', 'wb') do |csv|
csv << %w[title format price link]
doc.css('.standard').each do |item|
csv << [
item.at_css('span.productTitle a')[:title]
item.at_css('span.description').text
item.at_css('.price-label').text[/\$[0-9\.]+/]
item.at_css('span.productTitle a')[:href]
]
end
end
Without sample HTML it's not possible to test this, but, based on your code, it looks like it'd work.
Notice that in your code you're using instance variables. They're not necessary because you aren't defining a class to have an instance of. You can use local values instead.
I'm parsing through a website and i'm looking for potentially many million rows of content. However, csv/excel/ods doesn't allow for more than a million rows.
That is why I'm trying to use a provisionary to exclude saving empty content. However, it's not working: My code keeps creating empty rows in csv.
This is the code I have:
# create csv
CSV.open("neverending.csv", "w") do |csv|
csv << ["kuk","date","name"]
# loop through all urls
File.foreach("neverendingurls.txt") do |line|
begin
doorzoekbarefile = Nokogiri::HTML(open(line))
for k in 1..999 do
# PROVISIONARY / CONDITIONAL
unless doorzoekbarefile.at_xpath("//td[contains(style, '60px')])[#{k}]").nil?
# xpaths
kuk = doorzoekbarefile.at_xpath("(//td[contains(#style,'60px')])[#{k}]")
date = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[1]")
name = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[2]")
# save to csv
csv << [kuk,date,name]
end
end
end
rescue
puts "error bij url #{line}"
end
end
end
Anybody have a clue what's going wrong or how to solve the problem? Basically I simply need to change the code so that it doesn't create a new row of csv data when the xpaths are empty.
This really doesn't have to do with xpath. It's simple Array#empty?
row = [kuk,date,name]
csv << row if row.compact.empty?
BTW, your code is a mess. Learn how to indent at least beore posting again.
This is part of a ruby script. I want to save the results to a text file. I only want the results specified in these two DIVS.
url = browser.html
doc = Nokogiri::HTML(open(url))
price = doc.css("#sectionPrice").text
ship = doc.css("#shippingCharges td").text
How do I save the scraped results? Mind you that the script loading the page is working correclty. In SHELL I can see the values of my scrape using XPATH as follows.
page_html = Nokogiri::HTML.parse(browser.html)
shipping = puts page_html.xpath(".//*[#id='shippingCharges']").inner_text
price = puts page_html.xpath(".//*[#id='sectionPrice']").inner_text
How do I save this data to a CSV or XML?
//Side Question: Is this data returned in SHELL saved anywhere? How do I access it outside of SHELL
url = browser.html
doc = Nokogiri::HTML(open(url))
price = doc.css("#sectionPrice").text
ship = doc.css("#shippingCharges td").text
CSV.open("/users/fabio/desktop/ruby/gp.csv", "wb") do |csv|
csv << [price, ship]
end
Not creating the CSVfile. Nothing appearing in the DIR What gives?
It is pretty simple to write this to a csv file.
Just add the following in:
require 'csv'
CSV.open("file.csv", "wb") do |csv|
csv << [price, ship]
end
If shipping and price are arrays then you will want to iterate through them but this is how you create a csv.
Hope this gets you on your way.
Cheers!