Open-Uri Alternative - Getting a response from a website - ruby

The following code works in Ruby 1.9.3p-551
require "open-uri"
res = open("http://example.com/version").read
p res => {"buildNumber": 2496, "buildDate": "2015-09-29 11:18:02 +0200", "timestamp": 1443639212 }
In any Ruby version higher than 1.9.3 I get the following error;
from /Users/imac/.rbenv/versions/2.1.0/lib/ruby/2.1.0/net/http/response.rb:357:in `finish': incorrect header check (Zlib::DataError)
I need to use a higher version as this will be used in a Rails 4 app.
Any ideas for alternatives?

Turns out the gzip encoding is not accepted by default. Or at least that's what I'm guessing.
The following works.
res = open("http://someurl.com/version", "Accept-Encoding" => "plain").read
Interesting how this changed from Ruby 2.0.0+

Another neat solution for this.
require 'rest-client'
url = "http://example.com/version"
def get_response(url)
begin
return RestClient.get(url, {:accept => :json})
rescue RestClient::GatewayTimeout
"GatewayTimeout"
rescue RestClient::RequestTimeout
"RequestTimeout"
rescue SocketError
"SocketError"
end
end
p get_response(url)
# => "{\"buildNumber\": 2535, \"buildDate\": \"2015-09-30 17:41:42 +0200\", \"timestamp\": 1444085042 }"

Related

In RoR, how do I catch an exception if I get no response from a server?

I’m using Rails 4.2.3 and Nokogiri to get data from a web site. I want to perform an action when I don’t get any response from the server, so I have:
begin
content = open(url).read
if content.lstrip[0] == '<'
doc = Nokogiri::HTML(content)
else
begin
json = JSON.parse(content)
rescue JSON::ParserError => e
content
end
end
rescue Net::OpenTimeout => e
attempts = attempts + 1
if attempts <= max_attempts
sleep(3)
retry
end
end
Note that this is different than getting a 500 from the server. I only want to retry when I get no response at all, either because I get no TCP connection or because the server fails to respond (or some other reason that causes me not to get any response). Is there a more generic way to take account of this situation other than how I have it? I feel like there are a lot of other exception types I’m not thinking of.
This is generic sample how you can define timeout durations for HTTP connection, and perform several retries in case of any error while fetching content (edited)
require 'open-uri'
require 'nokogiri'
url = "http://localhost:3000/r503"
openuri_params = {
# set timeout durations for HTTP connection
# default values for open_timeout and read_timeout is 60 seconds
:open_timeout => 1,
:read_timeout => 1,
}
attempt_count = 0
max_attempts = 3
begin
attempt_count += 1
puts "attempt ##{attempt_count}"
content = open(url, openuri_params).read
rescue OpenURI::HTTPError => e
# it's 404, etc. (do nothing)
rescue SocketError, Net::ReadTimeout => e
# server can't be reached or doesn't send any respones
puts "error: #{e}"
sleep 3
retry if attempt_count < max_attempts
else
# connection was successful,
# content is fetched,
# so here we can parse content with Nokogiri,
# or call a helper method, etc.
doc = Nokogiri::HTML(content)
p doc
end
When it comes to rescuing exceptions, you should aim to have a clear understanding of:
Which lines in your system can raise exceptions
What is going on under the hood when those lines of code run
What specific exceptions could be raised by the underlying code
In your code, the line that's fetching the content is also the one that could see network errors:
content = open(url).read
If you go to the documentation for the OpenURI module you'll see that it uses Net::HTTP & friends to get the content of arbitrary URIs.
Figuring out what Net::HTTP can raise is actually very complicated but, thankfully, others have already done this work for you. Thoughtbot's suspenders project has lists of common network errors that you can use. Notice that some of those errors have to do with different network conditions than what you had in mind, like the connection being reset. I think it's worth rescuing those as well, but feel free to trim the list down to your specific needs.
So here's what your code should look like (skipping the Nokogiri and JSON parts to simplify things a bit):
require 'net/http'
require 'open-uri'
HTTP_ERRORS = [
EOFError,
Errno::ECONNRESET,
Errno::EINVAL,
Net::HTTPBadResponse,
Net::HTTPHeaderSyntaxError,
Net::ProtocolError,
Timeout::Error,
]
MAX_RETRIES = 3
attempts = 0
begin
content = open(url).read
rescue *HTTP_ERRORS => e
if attempts < MAX_RETRIES
attempts += 1
sleep(2)
retry
else
raise e
end
end
I would think about using a Timeout that raises an exception after a short period:
MAX_RESPONSE_TIME = 2 # seconds
begin
content = nil # needs to be defined before the following block
Timeout.timeout(MAX_RESPONSE_TIME) do
content = open(url).read
end
# parsing `content`
rescue Timeout::Error => e
attempts += 1
if attempts <= max_attempts
sleep(3)
retry
end
end

Ruby HTTP POST - Errors

Can someone explain to me why I am getting this error when doing this POST? I pulled the snippet from the Ruby-docs page.
undefined method `hostname' for #URI::HTTP:0x10bd441d8 URL:http://ws.mittthetwitapp.com/ws.phpmywebservice (NoMethodError)
Perhaps I am missing a require or something?
require 'net/http'
uri= URI('http://ws.mywebservice.com/ws.php')
req = Net::HTTP::Post.new(uri.path)
req.set_form_data('xmlPayload' => '<TestRequest><Message>Hi Test</Message></TestRequest>')
res = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(req)
end
case res
when Net::HTTPSuccess, Net::HTTPRedirection
# OK
else
res.value
end
If you're using a version of Ruby prior to 1.9.3, you should use uri.host.
URI#hostname was added in Ruby 1.9.3. It is different than URI#host in that it removes brackets from IPv6 hostnames. For non-IPv6 hostnames it should behave identically.
The implementation (from APIdock):
def hostname
v = self.host
/\A\[(.*)\]\z/ =~ v ? $1 : v
end

EventMachine and Twitter streaming API

I am running an EventMachine process using the Twitter streaming API. I always have an issue if the content of the stream is not frequently.
Here is the minimal version of the script:
require 'rubygems'
require 'eventmachine'
require 'em-http'
require 'json'
usage = "#{$0} <user> <password> <track>"
abort usage unless user = ARGV.shift
abort usage unless password = ARGV.shift
abort usage unless keywords= ARGV.shift
def startIt(user,password,keywords)
EventMachine.run do
http = EventMachine::HttpRequest.new("https://stream.twitter.com/1/statuses/filter.json",{:port=>443}).post(
:head =>{ 'Authorization' => [ user, password ] } ,
:body =>{"track"=>keywords},
:keepalive=>true,
:timeout=>-1)
buffer = ""
http.stream do |chunk|
buffer += chunk
while line = buffer.slice!(/.+\r?\n/)
if line.length>5
tweet=JSON.parse(line)
puts Time.new.to_s+"#{tweet['user']['screen_name']}: #{tweet['text']}"
end
end
end
http.errback {
puts Time.new.to_s+"Error: "
puts http.error
}
end
rescue => error
puts "error rescue "+error.to_s
end
while true
startIt user,password,keywords
end
If I search for a keyword like "iphone", everything works well
If I search for a less frequently used keyword, my stream keeps to be closed very rapidely , around 20 sec after the last message.
Note: that http.error is always empty, so it's very hard to understand while the stream is closed...
On the other end, the nerly similar php version is not closed, so seems probably in issue with eventmachine/http-em but I dont' understand which one...
You should add settings to prevent your connection to timeout.
Try this :
http = EventMachine::HttpRequest.new(
"https://stream.twitter.com/1/statuses/filter.json",
:connection_timeout => 0,
:inactivity_timeout => 0
).post(
:head => {'Authorization' => [ user, password ] } ,
:body => {'track' => keywords}
)
Good luck,
Christian

Using Open-URI to fetch XML and the best practice in case of problems with a remote url not returning/timing out?

Current code works as long as there is no remote error:
def get_name_from_remote_url
cstr = "http://someurl.com"
getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
doc = Nokogiri::XML(getresult)
my_data = doc.xpath("/session/name").text
# => 'Fred' or 'Sam' etc
return my_data
end
But, what if the remote URL times out or returns nothing? How I detect that and return nil, for example?
And, does Open-URI give a way to define how long to wait before giving up? This method is called while a user is waiting for a response, so how do we set a max timeoput time before we give up and tell the user "sorry the remote server we tried to access is not available right now"?
Open-URI is convenient, but that ease of use means they're removing the access to a lot of the configuration details the other HTTP clients like Net::HTTP allow.
It depends on what version of Ruby you're using. For 1.8.7 you can use the Timeout module. From the docs:
require 'timeout'
begin
status = Timeout::timeout(5) {
getresult = open(cstr, "UserAgent" => "Ruby-OpenURI").read
}
rescue Timeout::Error => e
puts e.to_s
end
Then check the length of getresult to see if you got any content:
if (getresult.empty?)
puts "got nothing from url"
end
If you are using Ruby 1.9.2 you can add a :read_timeout => 10 option to the open() method.
Also, your code could be tightened up and made a bit more flexible. This will let you pass in a URL or default to the currently used URL. Also read Nokogiri's NodeSet docs to understand the difference between xpath, /, css and at, %, at_css, at_xpath:
def get_name_from_remote_url(cstr = 'http://someurl.com')
doc = Nokogiri::XML(open(cstr, 'UserAgent' => 'Ruby-OpenURI'))
# xpath returns a nodeset which has to be iterated over
# my_data = doc.xpath('/session/name').text # => 'Fred' or 'Sam' etc
# at returns a single node
doc.at('/session/name').text
end

Best practices for handling binary data in Ruby?

What are the best practices for reading and writing binary data in Ruby?
In the code sample below I needed to send a binary file using over HTTP (as POST data):
class SimpleHandler < Mongrel::HttpHandler
def process(request, response)
response.start(200) do |head,out|
head["Content-Type"] = "application/ocsp-responder"
f = File.new("resp.der", "r")
begin
while true
out.syswrite(f.sysread(1))
end
rescue EOFError => err
puts "Sent response."
end
end
end
end
While this code seems to do a good job, it probably isn't very idiomatic. How can I improve it?
Then FileUtils copy_stream might be of use.
require 'fileutils'
fin = File.new('svarttag.jpg')
fout = File.new('blacktrain.jpg','w')
FileUtils.copy_stream(fin,fout)
fin.close
fout.close
Maybe not exactly what you asked for but if it's the whole HTTP POST files issue you want to solve then HTTPClient can do it for you:
require 'httpclient'
HTTPClient.post 'http://nl.netlog.com/test', { :file => File.new('resp.der') }
Also I've heard that Nick Siegers multipart-post is good but I haven't used it.

Resources