JSON Parsing Google API Custom Search Error - ruby

I have the following code:
require 'rubygems'
require 'httparty'
require 'pp'
require 'json'
class Search
include HTTParty
format :json
end
x = Search.get('https://www.googleapis.com/customsearch/v1key=AI...&cx=013...=flowers&alt=json')
x = x.to_s
result = JSON.parse(x)
And every time I run it on the google search results that come back I get the following:
FlowerPlaces.com Delivers Fresh <b>Flowers</b> To Your Place! <br>
Order <b>Flowers</b> Online or Call 800-411-9049 for Same Day <b>Flower</b>
Delivery.", "cacheId"=>"v94CIDza4gQJ"}]}' (JSON::ParserError)
from /usr/local/lib/ruby/1.9.1/json/common.rb:148:in `parse'
from gg.rb:15:in `<main>'
Here is that same line in the string version which JSON is trying to parse:
Order <b>Flowers</b> Online or Call 800-411-9049 for Same Day <b>Flower</b>
Delivery.\", \"cacheId\"=>\"v94CIDza4gQJ\"}]}"
Now, I've tried it with multiple search quires and I'm wondering am I doing something wrong? I added to the .to_s because the json parser tries to convert the HTTParty get statement to a string and can't and then throws an error. The JSON parser appears to get all the way to the end of the string (which is everything google has returned to me) before it throws the error.
What am I missing?

You need to add .body onto the end of the x = Search.get('http://....') like so:
x = Search.get('http://....').body

Related

Using parsed response in separate GET call

I'm new to Ruby and API, so my apologies if this is super simple...
I need to have script that will first POST to initiate the creation of an export file, and then have a GET call to retrieve the file. The GET call needs to use part of the POST json response.
I'm using the httparty gem.
I think I need to create a variable that equals the parsed json, and then make that variable part of the GET call, but I'm not clear on how to do that.
Help is appreciated.
require 'httparty'
url = 'https://api.somewhere.org'
response = HTTParty.post(url)
puts response.parse_response
json response:
export_files"=>
{"id"=> #####,
"export_id"=> #####,
"status"=>"Queued"}}
In my GET call I need to use the export_id number in the url.
HTTParty.get('https://api.somewhere.org/export_id/####')
As described in the comments but a bit more verbose and skeleton for error:
require 'httparty'
require 'json'
url = 'https://api.somewhere.org'
response = HTTParty.post(url)
if hash = JSON.parse(response.body)
if export_id = hash[:export_files][:export_id]
post = HTTParty.post("https://api.somewhere.org/export_id/#{export_id}")
end
else
# handle error
end

I am getting (eval):1: invalid Unicode codepoint error while trying to scrape instagram

I am trying to scrape data from instagram. Here is my code
require 'open-uri'
require 'nokogiri'
require 'json'
require "unicode/emoji"
def get_html
url = 'https://www.instagram.com/muriithi_kabogo/'
html = open(url)
end
def pass_data
html = get_html
doc = Nokogiri::HTML(html)
end
def get_data
profiles = []
body = pass_data.at('body')
script = body.at('script').text
myText = script
json_object_data = eval(myText)
end
get_data()
When I try to change the text into json format, I get an error:
(eval):1: invalid Unicode codepoint (SyntaxError)
usinessmen #beautiful #smile\ud83d\ude0a #teambringit #shebr
How do I move past this error?
JSON, like JavaScript, uses UCS2 encoding, which Ruby chokes on.
Do not use evil. For one thing, Ruby will detect \ud83d\ude0a as invalid codepoints, as it should; for another, it is a security hole; and lastly, it slows down your code.
Use JSON.parse, which is safer, faster, and knows how to deal with UCS2:
require 'json'
json_str = '"usinessmen #beautiful #smile\ud83d\ude0a #teambringit #shebr"'
JSON.parse(json_str)
# => "usinessmen #beautiful #smile😊 #teambringit #shebr"

Nokogiri - Checking if the value of an xpath exists and is blank or not in Ruby

I have an XML file, and before I process it I need to make sure that a certain element exists and is not blank.
Here is the code I have:
CSV.open("#{csv_dir}/products.csv","w",{:force_quotes => true}) do |out|
out << headers
Dir.glob("#{xml_dir}/*.xml").each do |xml_file|
gdsn_doc = GDSNDoc.new(xml_file)
logger.info("Processing xml file #{xml_file}")
:x
#desc_exists = #gdsn_doc.xpath("//productData/description")
if !#desc_exists.empty?
row = []
headers.each do |col|
row << product[col]
end
out << row
end
end
end
The following code is not working to find the "description" element and to check whether it is blank or not:
#desc_exists = #gdsn_doc.xpath("//productData/description")
if !#desc_exists.empty?
Here is a sample of the XML file:
<productData>
<description>Chocolate biscuits </description>
<productData>
This is how I have defined the class and Nokogiri:
class GDSNDoc
def initialize(xml_file)
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
#doc.remove_namespaces!
The code had to be moved up to an earlier stage, where Nokogiri was initialised. It doesn't get runtime errors, but it does let XML files with blank descriptions get through and it shouldn't.
class GDSNDoc
def initialize(xml_file)
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
#doc.remove_namespaces!
desc_exists = #doc.xpath("//productData/descriptions")
if !desc_exists.empty?
You are creating your instance like this:
gdsn_doc = GDSNDoc.new(xml_file)
then use it like this:
#desc_exists = #gdsn_doc.xpath("//productData/description")
#gdsn_doc and gdsn_doc are two different things in Ruby - try just using the version without the #:
#desc_exists = gdsn_doc.xpath("//productData/description")
The basic test is to use:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<productData>
<description>Chocolate biscuits </description>
<productData>
EOT
# using XPath selectors...
doc.xpath('//productData/description').to_html # => "<description>Chocolate biscuits </description>"
doc.xpath('//description').to_html # => "<description>Chocolate biscuits </description>"
xpath works fine when the document is parsed correctly.
I get an error "undefined method 'xpath' for nil:NilClass (NoMethodError)
Usually this means you didn't parse the document correctly. In your case it's because you're not using the right variable:
gdsn_doc = GDSNDoc.new(xml_file)
...
#desc_exists = #gdsn_doc.xpath("//productData/description")
Note that gdsn_doc is not the same as #gdsn_doc. The later doesn't appear to have been initialized.
#doc = File.open(xml_file) {|f| Nokogiri::XML(f)}
While that should work, it's idiomatic to write it as:
#doc = Nokogiri::XML(File.read(xml_file))
File.open(...) do ... end is preferred if you're processing inside the block and want Ruby to automatically close the file. That isn't necessary when you're simply reading then passing the content to something else for processing, hence the use of File.read(...) which slurps the file. (Slurping isn't necessary a good practice because it can have scalability problems, but for reasonable sized XML/HTML it's OK because it's easier to use DOM-based parsing than SAX.)
If Nokogiri doesn't raise an exception it was able to parse the content, however that still doesn't mean the content was valid. It's a good idea to check
#doc.errors
to see whether Nokogiri/libXML had to do some fix-ups on the content just to be able to parse it. Fixing the markup can change the DOM from what you expect, making it impossible to find a tag based on your assumptions for the selector. You could use xmllint or one of the XML validators to check, but Nokogiri will still have to be happy.
Nokogiri includes a command-line version nokogiri that accepts a URL to the document you want to parse:
nokogiri http://example.com
It'll open IRB with the content loaded and ready for you to poke at it. It's very convenient when debugging and testing. It's also a decent way to make sure the content actually exists if you're dealing with HTML containing DHTML that loads parts of the page dynamically.

JSON to CSV File Ruby

I am trying to convert the following JSON to CSV via Ruby, but am having trouble with my code. I am learning as I go, so any help is appreciated.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
JSON.parse(struct).read.each do |hash|
csv << hash.values
end
end
The error I receive is:
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `new'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/gems/2.2.0/gems/json-1.8.3/lib/json/common.rb:155:in `parse'
from test.rb:14:in `block in <main>'
from c:/RailsInstaller/Ruby2.2.0/lib/ruby/2.2.0/csv.rb:1273:in `open'
from test.rb:13:in `<main>'
I am trying to get all the data off of the following link and put it into a CSV file that I can analyse later. https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C
You have several problems here, the most significant of which is that you're calling JSON.parse twice. The second time you call it on struct, which was the result of calling JSON.parse the first time. You're basically doing JSON.parse(JSON.parse(string)). Oops.
There's another problem on the line where you call JSON.parse a second time: You call read on the value it returns. As far as I know JSON.parse does not ordinarily return anything that responds to read.
Fixing those two errors, your code looks something like this:
struct = JSON.parse(response.body.scan(/processPOIs\((.*)\);/).first.first)
CSV.open("output.csv", "w") do |csv|
struct.each do |hash|
csv << hash.values
end
end
This ought to work iif struct is an object that responds to each (like an array) and the values yielded by each all respond to values (like a hash). In other words, this code assumes that JSON.parse will return an array of hashes, or something similar. If it doesn't—well, that's beyond the scope of this question.
As an aside, this is not great:
response.body.scan(/processPOIs\((.*)\);/).first.first
The purpose of String#scan is to find every substring in a string that matches a regular expression. But you're only concerned with the first match, so scan is the wrong choice.
An alternative is to use String#match:
matches = response.body.match(/processPOIs\((.*)\)/)
json = matches[1]
struct = JSON.parse(json)
However, that's overkill. Since this is a JSONP response, we know that it will look like this:
processPOIs(...);
...give or take a trailing semicolon or newline. We don't need a regular expression to find the parts inside the parentheses, because we already know where it is: It starts 13 characters from the start (i.e. index 12) and ends two characters before the end ("index" -3). That makes it easy work with String#slice, a.k.a. String#[]:
json = response.body[12..-3]
struct = JSON.parse(json)
Like I said, "give or take a trailing semicolon or newline," so you might need to tweak that ending index depending on what the API returns. And with that, no more ugly .first.first, and it's faster, too.
Thank you everybody for the help. I was able to get everything into a CSV and then just used some VBA to organize it the way I wanted.
require 'json'
require 'net/http'
require 'uri'
require 'csv'
uri = 'https://www.mapquestapi.com/search/v2/radius?key=Imjtd%7Clu6t200zn0,bw=o5-layg1&radius=3000&callback=processPOIs&maxMatches=4000&origin=40.7686973%2C-73.9918181&hostedData=mqap.33882_stores_prod%7Copen_status%20=%20?%20OR%20open_status%20=%20?%20OR%20open_status%20=%20?%7CExisting,Coming%20Soon,New%7C'
response = Net::HTTP.get_response(URI.parse(uri))
matches = response.body.match(/processPOIs\((.*)\)/)
json = response.body[12..-3]
struct = JSON.parse(json)
CSV.open("output.csv", "w") do |csv|
csv << struct['searchResults'].map { |result| result['fields']}
end

Working with Ruby and APIs

I am pretty new to working with Ruby, especially with APIs but I've been trying to get the Darksky API to work, but I'm afraid I'm missing something obvious with how I'm using it.
Here is what I have
require 'darksky'
darksky = Darksky::API.new('my api key')
forecast = darksky.forecast('34.0500', '118.2500')
forecast
When I run this from the command line nothing happens. What am I doing wrong here?
Simply using forecast isn't going to do anything. You need to use puts at a minimum:
puts forecast
Or, see if Ruby's object pretty-printer can return something more interesting:
require 'pp'
pp forecast
Digging in further, I think their API doesn't work. Based on their examples, using a valid key and their location samples, plus the locations from their source site Forecast.io, also returns nil.
Using the REST interface directly from Forecast.io's site does return JSON. JSON is very easy to work with in Ruby, so it's a good way to go.
Here's some code to test the API, and Forecast.io's REST interface:
API_KEY = 'xxxxxxxxxxxxxxxxxxx'
LOCATION = %w[37.8267 -122.423]
require 'darksky'
darksky = Darksky::API.new(API_KEY)
forecast = darksky.forecast(*LOCATION)
forecast # => nil
brief_forecast = darksky.brief_forecast(*LOCATION)
brief_forecast # => nil
require 'json'
require 'httparty'
URL = "https://api.forecast.io/forecast/#{ API_KEY }/37.8267,-122.423"
puts URL
# >> https://api.forecast.io/forecast/xxxxxxxxxxxxxxxxxxx/37.8267,-122.423
puts HTTParty.get(URL).body[0, 80]
# >> {"latitude":37.8267,"longitude":-122.423,"timezone":"America/Los_Angeles","offse
Notice that LOCATION is 37.8267,-122.423 in both cases, which is Alcatraz according to the Forecast.io site. Also notice that the body output displayed is a JSON string.
Pass the returned JSON to the Ruby's JSON class like:
JSON[returned_json]
to get it parsed back into a Ruby Hash. Using OpenURI (because it comes with Ruby) instead of HTTParty, and passing it to JSON for parsing looks like:
body = open(URL).read
puts JSON[body]

Resources