String cant write to CSV file Ruby - ruby

I don't necessarily understand why this bit of code is incorrect. I understand the error in that string class doesn't have a method map. But I'm having a hard time wrapping my head around this error.
The error
`<<': undefined method `map' for #<String:0x000001020b8940> (NoMethodError)
The but of code
require 'nokogiri'
require 'open-uri'
require 'csv'
doc = Nokogiri::HTML(open("dent-file.html"))
new_array = doc.search("p").map do |para|
para.text
end
CSV.open("dent.csv", "w") do |csv|
new_array.each do |string|
csv << string
end
end
I want to write each element of the newdoc array to each line of the csv file dent.csv.

CSV#<< accepts an array or a CSV::Row. Convert the string to an array.
csv << [string]

Related

How do I get this Nokogiri output to write each object to a column in a csv?

I have this code here which outputs a CSV, but when I open the CSV file its just has a 0 in the first two columns.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
page = Nokogiri::HTML(open("https://www.drugs.com/pharmaceutical-
companies.html"))
puts page.class #=> Nokogiri::HTML::Document
pharma_links = page.css("div.col-list-az a")
link= pharma_links.each{|link| puts link['href'] }
company = pharma_links.each{|link| puts link.text}
CSV.open("/Users/file.csv", "wb") do |csv|
csv << [company, link]
end
The problem is that pharma_links.each{|link| ...} returns the ENTIRE enumerator, so if you do this once for company and once for link you now have two new arrays. You then have to re-map each company & link in a new array / hash (or by index if you are lazy AND you know for certain nothing went wrong in the either .each call)
To avoid this, simply construct the CSV while you are looping through the data. For each line of the CSV you expect one pharma_links 'line', so iterate through each at the same time:
require 'nokogiri'
require 'open-uri'
require 'csv'
page = Nokogiri::HTML(open("https://www.drugs.com/pharmaceutical-companies.html"))
# puts page.class #=> Nokogiri::HTML::Document
pharma_links = page.css("div.col-list-az a")
# Create the CSV and iterate through the links while creating it
# You can also add headers to the CSV on instantiation
CSV.open("file.csv", "wb", write_headers: true, headers: ['url','description']) do |csv|
pharma_links.each do |link|
puts "Adding #{link.text}" # prove that it works :)
csv << [link['href'], link.text]
end
end

JSON to CSV File Ruby

I am trying to convert the following JSON to CSV via Ruby, but am having trouble with my code. I am learning as I go, so any help is appreciated.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
JSON.parse(struct).read.each do |hash|
csv << hash.values
end
end
The error I receive is:
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `new'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `parse'
from test.rb:14:in `block in <main>'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/2.2.0/csv.rb:1273:in `open'
from test.rb:13:in `<main>'
I am trying to get all the data off of the following link and put it into a CSV file that I can analyse later. https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C
You have several problems here, the most significant of which is that you're calling JSON.parse twice. The second time you call it on struct, which was the result of calling JSON.parse the first time. You're basically doing JSON.parse(JSON.parse(string)). Oops.
There's another problem on the line where you call JSON.parse a second time: You call read on the value it returns. As far as I know JSON.parse does not ordinarily return anything that responds to read.
Fixing those two errors, your code looks something like this:
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
struct.each do |hash|
csv << hash.values
end
end
This ought to work iif struct is an object that responds to each (like an array) and the values yielded by each all respond to values (like a hash). In other words, this code assumes that JSON.parse will return an array of hashes, or something similar. If it doesn't—well, that's beyond the scope of this question.
As an aside, this is not great:
response.body.scan(/processPOIs\((.*)\);/).first.first
The purpose of String#scan is to find every substring in a string that matches a regular expression. But you're only concerned with the first match, so scan is the wrong choice.
An alternative is to use String#match:
matches = response.body.match(/processPOIs\((.*)\)/)
json = matches[1]
struct = JSON.parse(json)
However, that's overkill. Since this is a JSONP response, we know that it will look like this:
processPOIs(...);
...give or take a trailing semicolon or newline. We don't need a regular expression to find the parts inside the parentheses, because we already know where it is: It starts 13 characters from the start (i.e. index 12) and ends two characters before the end ("index" -3). That makes it easy work with String#slice, a.k.a. String#[]:
json = response.body[12..-3]
struct = JSON.parse(json)
Like I said, "give or take a trailing semicolon or newline," so you might need to tweak that ending index depending on what the API returns. And with that, no more ugly .first.first, and it's faster, too.
Thank you everybody for the help. I was able to get everything into a CSV and then just used some VBA to organize it the way I wanted.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
matches = response.body.match(/processPOIs\((.*)\)/)
json = response.body[12..-3]
struct = JSON.parse(json)
CSV.open("output.csv", "w") do |csv|
csv << struct['searchResults'].map { |result| result['fields']}
end

Need help exporting parsed results, via Nokogiri, and exporting to CSV,. Only last parsed result is shown, why?

This is killing me and searching here and the big G is confusing me even more.
I followed the tutorial at Railscasts #190 on Nokogiri and was able to write myself a nice little parser:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.target.com/c/movies-entertainment/-/N-5xsx0/Ntk-All/Ntt-wwe/Ntx-matchallpartial+rel+E#navigation=true&facetedValue=/-/N-5xsx0&viewType=medium&sortBy=PriceLow&minPrice=0&maxPrice=10&isleaf=false&navigationPath=5xsx0&parentCategoryId=9975218&RatingFacet=0&customPrice=true"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".standard").each do |item|
title = item.at_css("span.productTitle a")[:title]
format = item.at_css("span.description").text
price = item.at_css(".price-label").text[/\$[0-9\.]+/]
link = item.at_css("span.productTitle a")[:href]
puts "#{title}, #{format}, #{price}, #{link}"
end
I'm happy with the results and able to see it in the Windows console. However, I want to export the results to a CSV file and have tried numerous ways (with no luck) and I know I'm missing something. My latest updated code (after downloading the html files) is below:
require 'rubygems'
require 'nokogiri'
require 'csv'
#title = Array.new
#format = Array.new
#price = Array.new
#link = Array.new
doc = Nokogiri::HTML(open("index1.html"))
doc.css(".standard").each do |item|
#title << item.at_css("span.productTitle a")[:title]
#format << item.at_css("span.description").text
#price << item.at_css(".price-label").text[/\$[0-9\.]+/]
#link << item.at_css("span.productTitle a")[:href]
end
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
csv << [#title, #format, #price, #link]
end
It works and spits a file out for me, but just the last result. I followed the tutorial at Andrew!: WEb Scraping... and trying to mix what I'm trying to achieve with someone else's process is confusing.
I assume it's looping through all of the results and only printing the last. Can someone give me pointers on how I should loop this (if that's the problem) so that all the results are in their respective columns?
Thanks in advance.
You're storing values in four arrays, but you're not enumerating the arrays when you generate your output.
Here is a possible fix:
CSV.open("file.csv", "wb") do |csv|
csv << ["title", "format", "price", "link"]
until #title.empty?
csv << [#title.shift, #format.shift, #price.shift, #link.shift]
end
end
Note that this is a destructive operation that shifts the values off of the arrays one at a time, so in the end they will all be empty.
There are more efficient ways to read and convert the data, but this will hopefully do what you want for now.
There are several things you could do to write this more in the "Ruby way":
require 'rubygems'
require 'nokogiri'
require 'csv'
doc = Nokogiri::HTML(open("index1.html"))
CSV.open('file.csv', 'wb') do |csv|
csv << %w[title format price link]
doc.css('.standard').each do |item|
csv << [
item.at_css('span.productTitle a')[:title]
item.at_css('span.description').text
item.at_css('.price-label').text[/\$[0-9\.]+/]
item.at_css('span.productTitle a')[:href]
]
end
end
Without sample HTML it's not possible to test this, but, based on your code, it looks like it'd work.
Notice that in your code you're using instance variables. They're not necessary because you aren't defining a class to have an instance of. You can use local values instead.

Parse remote file with FasterCSV

I'm trying to parse the first 5 lines of a remote CSV file. However, when I do, it raises Errno::ENOENT exception, and says:
No such file or directory - [file contents] (with [file contents] being a dump of the CSV contents
Here's my code:
def preview
#csv = []
open('http://example.com/spreadsheet.csv') do |file|
CSV.foreach(file.read, :headers => true) do |row|
n += 1
#csv << row
if n == 5
return #csv
end
end
end
end
The above code is built from what I've seen others use on Stack Overflow, but I can't get it to work.
If I remove the read method from the file, it raises a TypeError exception, saying:
can't convert StringIO into String
Is there something I'm missing?
Foreach expects a filename. Try parse.each
You could manually pass each line to CSV for parsing:
require 'open-uri'
require 'csv'
def preview(file_url)
#csv = []
open(file_url).each_with_index do |line, i|
next if i == 0 #Ignore headers
#csv << CSV.parse(line)
if i == 5
return #csv
end
end
end
puts preview('http://www.ferc.gov/docs-filing/eqr/soft-tools/sample-csv/contract.txt')

Manipulating XML files in ruby with XmlSimple

I've got a complex XML file, and I want to extract a content of a specific tag from it.
I use a ruby script with XmlSimple gem. I retrieve an XML file with HTTP request, then strip all the unnecessary tags and pull out necessary info. That's the script itself:
data = XmlSimple.xml_in(response.body)
hash_1 = Hash[*data['results']]
def find_value(hash, value)
hash.each do |key, val|
if val[0].kind_of? Hash then
find_value(val[0], value)
else
if key.to_s.eql? value
puts val
end
end
end
end
hash_1['book'].each do |arg|
find_value(arg, "title")
puts("\n")
end
The problem is, that when I change replace puts val with return val, and then call find_value method with puts find_value (arg, "title"), i get the whole contents of hash_1[book] on the screen.
How to correct the find_value method?
A "complex XML file" and XmlSimple don't mix. Your task would be solved a lot easier with Nokogiri, and be faster as well:
require 'nokogiri'
doc = Nokogiri::XML(response.body)
puts doc.xpath('//book/title/text()')

Resources