How can I use string interpolation to pass a url into Nokogiri? - ruby

So I have a method and when I pass in a straight URL (see code below) the method returns just fine. When I pass in the URL using #{} for the possibility of using different location in Craigslist, it throws the error shown at the bottom. I suppose my question is twofold:
Why doesn't Nokogiri allow me to open this?
Can I change this to accept the URL?
Code:
def get_post_date(listing_url)
# This method takes in a page and returns a date hopefully in a date format
# but right now text
listing = Nokogiri::HTML(open(listing_url)).css("p")
setter = ""
for element in listing
if element.css('time').text!=""&&setter==""
post_time = "poop" # Time.parse(element.css('time').text)
return "poop"
end
end
end
location = "sfbay"
# THIS throws an error
p get_post_date("#{location}.craigslist.org/sfc/vac/4248712420.html")
# THIS works
p get_post_date("sfbay.craigslist.org/sfc/vac/4248712420.html")
Error:
c:\>ruby cljobs.rb C:/Ruby193/lib/ruby/1.9.1/open-uri.rb:35:in
`initialize': No such file or direct ory -
sfbay.craigslist.org/sfc/vac/4248712420.html (Errno::ENOENT)
from C:/Ruby193/lib/ruby/1.9.1/open-uri.rb:35:in `open'
from C:/Ruby193/lib/ruby/1.9.1/open-uri.rb:35:in `open'
from cljobs.rb:7:in `get_post_date'
from cljobs.rb:40:in `'

In order to open a URL you need to require OpenURI. Otherwise nokogiri will try to open a file.
require 'open-uri'
listing = Nokogiri::HTML(open(listing_url))

Related

Sinatra: params hash cannot be merged

I want to merge a hash with default parameters and the actual parameters given in a request. When I call this seemingly innocent script:
#!/usr/bin/env ruby
require 'sinatra'
get '/' do
defaults = { 'p1' => 'default1', 'p2' => 'default2' }
# params = request.params
params = defaults.merge(params)
params
end
with curl http://localhost:4567?p0=request then it crashes with
Listening on localhost:4567, CTRL+C to stop
2016-06-17 11:10:34 - TypeError - no implicit conversion of nil into Hash:
sinatrabug:8:in `merge'
sinatrabug:8:in `block in <main>'
When I access the Rack request.params directly it works. I looked into the Sinatra sources but I couldn't figure it out.
So I have a solution for my actual problem. But I don't know why it works.
My question is: Why can I assign param to a parameter, why is the class Hash but in defaults.merge params it throws an exception?
Any idea?
This is caused by the way Ruby handles local variables and setter methods (i.e. methods that end in =) with the same name. When Ruby reaches the line
params = defaults.merge(params)
it assumes you want to create a new local variable named params, rather than use the method. The initial value of this variable will be nil, and this is the value that the merge method sees.
If you want to refer to the method, you need to refer to it as self.params=. This is for any object that has such a method, not just Sinatra.
A better solution, to avoid this confusion altogether, might be to use a different name. Something like:
get '/' do
defaults = { 'p1' => 'default1', 'p2' => 'default2' }
normalized_params = defaults.merge(params)
normalized_params.inspect
end
Your code is throwing an error because params is nil when you make this call defaults.merge(params). I assume you are trying to merge defaults with request.params, which should contain the parameters from your GET.
Change this line
params = defaults.merge(params)
to this
params = defaults.merge(request.params)
I found this in rack gem
http://www.rubydoc.info/gems/rack/Rack/Request#params-instance_method
It seems you can retrieve GET and POST data by params method but you can't write in it. You have to use update_param and delete_param instead.

How do I call a function in Ruby?

I'm trying to call but I keep getting an error. This is my code:
require 'rubygems'
require 'net/http'
require 'uri'
require 'json'
class AlchemyAPI
#Setup the endpoints
##ENDPOINTS = {}
##ENDPOINTS['taxonomy'] = {}
##ENDPOINTS['taxonomy']['url'] = '/url/URLGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['text'] = '/text/TextGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['html'] = '/html/HTMLGetRankedTaxonomy'
##BASE_URL = 'http://access.alchemyapi.com/calls'
def initialize()
begin
key = File.read('C:\Users\KVadher\Desktop\api_key.txt')
key.strip!
if key.empty?
#The key file should't be blank
puts 'The api_key.txt file appears to be blank, please copy/paste your API key in the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
Process.exit(1)
end
if key.length != 40
#Keys should be exactly 40 characters long
puts 'It appears that the key in api_key.txt is invalid. Please make sure the file only includes the API key, and it is the correct one.'
Process.exit(1)
end
#apiKey = key
rescue => err
#The file doesn't exist, so show the message and create the file.
puts 'API Key not found! Please copy/paste your API key into the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
#create a blank file to hold the key
File.open("api_key.txt", "w") {}
Process.exit(1)
end
end
# Categorizes the text for a URL, text or HTML.
# For an overview, please refer to: http://www.alchemyapi.com/products/features/text-categorization/
# For the docs, please refer to: http://www.alchemyapi.com/api/taxonomy/
#
# INPUT:
# flavor -> which version of the call, i.e. url, text or html.
# data -> the data to analyze, either the the url, text or html code.
# options -> various parameters that can be used to adjust how the API works, see below for more info on the available options.
#
# Available Options:
# showSourceText -> 0: disabled (default), 1: enabled.
#
# OUTPUT:
# The response, already converted from JSON to a Ruby object.
#
def taxonomy(flavor, data, options = {})
unless ##ENDPOINTS['taxonomy'].key?(flavor)
return { 'status'=>'ERROR', 'statusInfo'=>'Taxonomy info for ' + flavor + ' not available' }
end
#Add the URL encoded data to the options and analyze
options[flavor] = data
return analyze(##ENDPOINTS['taxonomy'][flavor], options)
print
end
**taxonomy(text,"trees",1)**
end
In ** ** I have entered my call. Am I doing something incorrect. The error I receive is:
C:/Users/KVadher/Desktop/testrub:139:in `<class:AlchemyAPI>': undefined local variable or method `text' for AlchemyAPI:Class (NameError)
from C:/Users/KVadher/Desktop/testrub:6:in `<main>'
I feel as though I'm calling as normal and that there is something wrong with the api code itself? Although I may be wrong.
Yes, as jon snow says, the function (method) call must be outside of the class. The methods are defined along with the class.
Also, Options should be a Hash, not a number, as you call options[flavor] = data, which is going to cause you another problem.
I believe maybe you meant to put text in quotes, as that is one of your flavors.
Furthermore, because you declared a class, this is called an instance method, and you must make an instance of the class to use this:
my_instance = AlchemyAPI.new
my_taxonomy = my_instance.taxonomy("text", "trees")
That's enough to get it to work, it seems like you have a ways to go to get this all working though. Good luck!

How to parse XML with Mechanize and XMLSimple in ruby?

I'm trying to fetch a remote XML file with Mechanize to get icecast status information. But I'm having problems to pass the XML file from Mechanize::File format to string or some XML format which XMLSimple can work with.
The XML document looks like that:
<icestats>
<admin>donschoe#stackoverflow.com</admin>
<!-- ... -->
</icestats>
My code looks like that right now:
require 'mechanize'
require 'xmlsimple'
server = 'example.net'
port = 8000
user = 'stackoverflow'
password = 'hackme'
agent = Mechanize.new
agent.user_agent_alias = 'Linux Firefox'
agent.add_auth("http://#{server}:#{port}/admin/status.xml", user, password)
agent.get("http://#{server}:#{port}/admin/status.xml")
xml = agent.current_page
status = XmlSimple.xml_in(xml)
puts status['admin']
This should output: donschoe#stackoverflow.com
But it throws:
/home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:191:in 'xml_in': Could not parse object of type: <Mechanize::File>. (ArgumentError)
Now, I understand the XMLSimple needs a string and therefore I tried to convert the Mechanize::File format to string, replacing the second last line with:
status = XmlSimple.xml_in(xml.to_s)
But this throws an even more weird exception:
/usr/lib64/ruby/1.9.1/rexml/parsers/baseparser.rb:406:in `block in pull_event': Undefined prefix Mechanize: found (REXML::UndefinedNamespaceException)
from /usr/lib64/ruby/1.9.1/set.rb:222:in `block in each'
from /usr/lib64/ruby/1.9.1/set.rb:222:in `each_key'
from /usr/lib64/ruby/1.9.1/set.rb:222:in `each'
from /usr/lib64/ruby/1.9.1/rexml/parsers/baseparser.rb:404:in `pull_event'
from /usr/lib64/ruby/1.9.1/rexml/parsers/baseparser.rb:183:in `pull'
from /usr/lib64/ruby/1.9.1/rexml/parsers/treeparser.rb:22:in `parse'
from /usr/lib64/ruby/1.9.1/rexml/document.rb:231:in `build'
from /usr/lib64/ruby/1.9.1/rexml/document.rb:43:in `initialize'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:965:in `new'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:965:in `parse'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:164:in `xml_in'
from /home/user/.gem/ruby/1.9.1/gems/xml-simple-1.1.2/lib/xmlsimple.rb:203:in `xml_in'
from debugging.rb:16:in `<main>'
What's wrong with my approach? When I download the XML file and use the local XML file the code above works as desired.
I'm especially looking for solutions with Mechanize rather than Nokogiri.
Try changing:
xml = agent.current_page
to:
xml = agent.current_page.body

Ruby json parse error: unexpected token

I have a working method that opens and parses a json file. Now I'm trying to iterate through a directory of json files and display their contents.
Working method for a single file:
def aperson
File.open("people/Elvis Presley.json") do |f|
parse = JSON.parse(f.read)
end
end
Non-working method to iterate through a directory:
16. def list
17. Dir.glob('people/*').each do |f|
18. parse = JSON.parse(f)
19 end
20. end
My error is:
/Users/ad/.rbenv/versions/1.9.3-p194/lib/ruby/1.9.1/json/common.rb:148:in `parse': 743: unexpected token at 'people/Elvis Presley.json' (JSON::ParserError)
from /Users/ad/.rbenv/versions/1.9.3-p194/lib/ruby/1.9.1/json/common.rb:148:in `parse'
from app.rb:18:in `block in list'
from app.rb:17:in `each'
from app.rb:17:in `list'
from app.rb:24:in `<main>'
All of the files in the directory have the same content and are valis as per JSONlint.
Any help would be greatly appreciated.
You tried to parse the filename as JSON, which won't work.
Instead, you need to read the file first:
parse = JSON.parse(File.read(f))
not sure, but can you try to parse the content of file instead of file name:
parse = JSON.parse( File.read f )
In your non-working code, f is just string of the expanded file name. So you need to read the file after you've received the filename in the block.
While writing it, #nneonneo already gave you solution. So I'm not giving again.

JSON Parsing Google API Custom Search Error

I have the following code:
require 'rubygems'
require 'httparty'
require 'pp'
require 'json'
class Search
include HTTParty
format :json
end
x = Search.get('https://www.googleapis.com/customsearch/v1key=AI...&cx=013...=flowers&alt=json')
x = x.to_s
result = JSON.parse(x)
And every time I run it on the google search results that come back I get the following:
FlowerPlaces.com Delivers Fresh <b>Flowers</b> To Your Place! <br>
Order <b>Flowers</b> Online or Call 800-411-9049 for Same Day <b>Flower</b>
Delivery.", "cacheId"=>"v94CIDza4gQJ"}]}' (JSON::ParserError)
from /usr/local/lib/ruby/1.9.1/json/common.rb:148:in `parse'
from gg.rb:15:in `<main>'
Here is that same line in the string version which JSON is trying to parse:
Order <b>Flowers</b> Online or Call 800-411-9049 for Same Day <b>Flower</b>
Delivery.\", \"cacheId\"=>\"v94CIDza4gQJ\"}]}"
Now, I've tried it with multiple search quires and I'm wondering am I doing something wrong? I added to the .to_s because the json parser tries to convert the HTTParty get statement to a string and can't and then throws an error. The JSON parser appears to get all the way to the end of the string (which is everything google has returned to me) before it throws the error.
What am I missing?
You need to add .body onto the end of the x = Search.get('http://....') like so:
x = Search.get('http://....').body

Resources