ssl`sysread_nonblock': end of file reached (EOFError) - ruby

I have written a code which uses ruby threads.
require 'rubygems'
require 'net/http'
require 'uri'
def get_response()
uri = URI.parse('https://..........')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
-----
-----
end
t1 = []
15.times do |i|
t1[i] = Thread.new{
hit_mdm(i)
sleep(rand(0)/10.0)
}
end
t1.each {|t| t.join}
The code works fine, but when the programs reaches its end it throws following error:
ruby/2.0.0/openssl/buffering.rb:174:in `sysread_nonblock': end of file reached (EOFError)
How to overcome this problem.

def getHttp(uri)
begin
http = Net::HTTP.new(uri.host, uri.port)
rescue
p 'failed Net::HTTP.new', uri
retry
end
http
end
based on the downvoted answer, I attached some code to show an catch exception example

You haven't specified what hit_mdm() is, but presumably it's something that calls get_response considering your Net::HTTP setup prior.
There's many places on the web where you can find evidence that Net::HTTP is probably thread safe, though nothing conclusive.
I've done lots of stress testing with Net::HTTP and threads and my experience is that the EOFErrors are common problems with multiple HTTP connections. Whether it's happening because of the server or the client or the connection or the Net::HTTP library is going to be very difficult to debug, especially using threaded code which does TCP communication, which is also threaded, in a sense.
You could use wireshark to figure out where the EOFError is coming from, or, you could save yourself a lot of headache and just rescue the EOFError on the sysread (your backtrace can tell you where to put the rescue so it's only effecting the Net::HTTP call, if that's where the EOFError is generated).
But without more info, we can't really tell you why the EOFError is happening for sure.

Related

Setting an HTTP Timeout in Ruby 1.9.3

I'm using Ruby 1.9.3 and need to GET a URL. I have this working with Net::HTTP, however, if the site is down, Net::HTTP ends up hanging.
While searching the internet, I've seen many people faced similar problems, all with hacky solutions. However, many of those posts are quite old.
Requirements:
I'd prefer using Net::HTTP to installing a new gem.
I need both the Body and the Response Code. (e.g. 200)
I do not want to require open-uri, since that makes global changes and raises some security issues.
I need to GET a URL within X seconds, or return error.
Using Ruby 1.9.3, how can I GET a URL while setting a timeout?
To clarify, my existing code looks like:
Net::HTTP.get_response(URI.parse(url))
Trying to add:
Net::HTTP.open_timeout(1000)
Results in:
NoMethodError: undefined method `open_timeout' for Net::HTTP:Class
You can set the open_timeout attribute of the Net::HTTP object before making the connection.
uri = URI.parse(url)
Net::HTTP.new(uri.hostname, uri.port) do |http|
http.open_timeout = 1000
response = http.request_get(uri.request_uri)
end
I tried all the solutions here and on the other questions about this problem but I only got everything right with the following code, The open-uri gem is a wrapper for net::http.
I needed a get that had to wait longer than the default timeout and read the response. The code is also simpler.
require 'open-uri'
open(url, :read_timeout => 5 * 60) do |response|
if response.read[/Return: Ok/i]
log "sending ok"
else
raise "error sending, no confirmation received"
end
end

Ruby - catch all network exceptions [duplicate]

What’s the best way to rescue exceptions from Net::HTTP?
Exceptions thrown are described in Ruby’s socket.c, like Errno::ETIMEDOUT, Errno::ECONNRESET, and Errno::ECONNREFUSED. The base class to all of these is SystemCallError, but it feels weird to write code like the following because SystemCallError seems so far removed from making an HTTP call:
begin
response = Net::HTTP.get_response(uri)
response.code == "200"
rescue SystemCallError
false
end
Is it just me? Is there a better way to handle this beyond fixing Net::HTTP to handle the Errno exceptions that would likely pop up and encapsulate them in a parent HttpRequestException?
I agree it is an absolute pain to handle all the potential exceptions. Look at this to see an example:
Working with Net::HTTP can be a pain. It's got about 40 different ways
to do any one task, and about 50 exceptions it can throw.
Just for the love of google, here's what I've got for the "right way"
of catching any exception that Net::HTTP can throw at you:
begin
response = Net::HTTP.post_form(...) # or any Net::HTTP call
rescue Timeout::Error, Errno::EINVAL, Errno::ECONNRESET, EOFError,
Net::HTTPBadResponse, Net::HTTPHeaderSyntaxError, Net::ProtocolError => e
...
end
Why not just rescue Exception => e? That's a bad habit to get into, as
it hides any problems in your actual code (like SyntaxErrors, whiny
nils, etc). Of course, this would all be much easier if the possible
errors had a common ancestor.
The issues I've been seeing in dealing with Net::HTTP have made me
wonder if it wouldn't be worth it to write a new HTTP client library.
One that was easier to mock out in tests, and didn't have all these
ugly little facets.
What I've done, and seen most people do, is move away from Net::HTTP and move to 3rd party HTTP libraries such as:
httparty and faraday
I experienced the same problem, and after a lot of research, I realized the best way to to handle all exceptions Net::HTTP methods would throw is to rescue from StandardError.
As pointed by Mike Lewis's answer, Tammer Saleh blog post proposes rescuing from a lot exceptions, but it is still flaw. There are some exceptions he does not rescue from, like Errno::EHOSTUNREACH, Errno::ECONNREFUSED, and possible some socket exceptions.
So, as I found out in tenderlove's translation of an old ruby-dev thread, the best solution is rescuing from StandardError, unfortunately:
begin
response = Net::HTTP.get_response(uri)
rescue StandardError
false
end
It is awful, but if you want your system does not break because of these other exceptions, use this approach.
Another approach is to aggregate all these exceptions in a constant, and then re-use this constant, e.g.:
ALL_NET_HTTP_ERRORS = [
Timeout::Error, Errno::EINVAL, Errno::ECONNRESET, EOFError,
Net::HTTPBadResponse, Net::HTTPHeaderSyntaxError, Net::ProtocolError
]
begin
your_http_logic()
rescue *ALL_NET_HTTP_ERRORS
…
end
It is far more maintainable and cleaner.
However, word of warning. I've copied the possible exceptions list from the aforementioned Tammer Saleh's blog post, and I know that his list is incomplete. For instance, Net::HTTP.get(URI("wow")) raises Errno::ECONNREFUSED which is not listed. Also, I wouldn't be surprised if the list should be modified for different Ruby versions.
For this reason, I recommend sticking to rescue StandardError in most cases. In order to avoid catching too much, move as much as possible outside the begin-rescue-end block, preferably leave only a call to one of the Net::HTTP methods.
Your intuition on this is right, for the most robust solution, I'd probably rescue each one individually (or in small groups) and take the appropriate action, like trying the connection again, or abandoning the request all together. I like to avoid using a very high-level/generic rescue because it might catch exceptions that I'm not prepared for or didn't expect.

Ruby: expand shorten urls the hard way

Is there a way to open URLS in ruby and output the re-directed url:
ie convert http://bit.ly/l223ue to http://paper.li/CoyDavidsonCRE/1309121465
I find that there are more url shortener services than gems can keep up with, so I'm asking for the hard -but robust- way, instead of using a gem that connects to some API.
Here is a lengthen method
This has very little error handling but it might help you get started.
You could wrap lengthen with a begin rescue block that returns nil or attempt to retry it later. Not sure what you are trying to build but hope it helps.
require 'uri'
require 'net/http'
def lengthen(url)
uri = URI(url)
Net::HTTP.new(uri.host, uri.port).get(uri.path).header['location']
end
irb(main):008:0> lengthen('http://bit.ly/l223ue')
=> "http://paper.li/CoyDavidsonCRE/1309121465"

EventMachine Proxy -- HTTP Proxy mixing up request/response pairs

I have the following code (just as a test) and I want to create an HTTP proxy using EventMachine. The code below is an example on the es-proxy GitHub page. However, when I run this and open up a website that has a moderate amount of images, the images start loading incorrectly. What I mean by this is that some images are loaded twice or if I request my icon for the navigation bar, I instead get the profile picture. This is especially evident if I refresh the page a few times.
It seems that the responses do not correspond to the matching request; causing everything to be jumbled. However, I'm not sure why this is. The code below seems simple enough for this to not be a problem.
require 'rubygems'
require 'em-proxy'
require 'http/parser' # gem install http_parser.rb
require 'uuid' # gem install uuid
# > ruby em-proxy-http.rb
# > curl --proxy localhost:9889 www.google.com
host = "0.0.0.0"
port = 9889
puts "listening on #{host}:#{port}..."
Proxy.start(:host => host, :port => port) do |conn|
#p = Http::Parser.new
#p.on_headers_complete = proc do |h|
session = UUID.generate
puts "New session: #{session} (#{h.inspect})"
host, port = h['Host'].split(':')
conn.server session, :host => host, :port => (port || 80)
conn.relay_to_servers #buffer
#buffer = ''
end
#buffer = ''
conn.on_connect do |data,b|
puts [:on_connect, data, b].inspect
end
conn.on_data do |data|
#buffer << data
#p << data
data
end
conn.on_response do |backend, resp|
#puts [:on_response, backend, resp].inspect
resp
end
conn.on_finish do |backend, name|
puts [:on_finish, name].inspect
end
end
Update
I believe I have insight as to what is happening but, still no way of solving my problem. I am creating a server for each request and when I relay my requests I have multiple servers. Then in the on response I should only be returning the response if it is from the correct server. However, I don't have a way to correlate this as of yet.
Here a proper response:
Try removing every puts in the example so the main loop can concentrate on doing the actual network I/O, it works for me like that.
I think there may be some kind of timeout playing behind this, maybe the client does not wait long enough for the full answer to come back while the server is stuck outputing text to the console.
That's the downside of using an event reactor, you have to make sure nothing blocks it.
The code doesn't seem to account for persistent http connections. Maybe you could try a HTTP 1.0 browser.

Using Watir to check for bad links

I have an unordered list of links that I save off to the side, and I want to click each link and make sure it goes to a real page and doesnt 404, 500, etc.
The issue is that I do not know how to do it. Is there some object I can inspect which will give me the http status code or anything?
mylinks = Browser.ul(:id, 'my_ul_id').links
mylinks.each do |link|
link.click
# need to check for a 200 status or something here! how?
Browser.back
end
My answer is similar idea with the Tin Man's.
require 'net/http'
require 'uri'
mylinks = Browser.ul(:id, 'my_ul_id').links
mylinks.each do |link|
u = URI.parse link.href
status_code = Net::HTTP.start(u.host,u.port){|http| http.head(u.request_uri).code }
# testing with rspec
status_code.should == '200'
end
if you use Test::Unit for testing framework, you can test like the following, i think
assert_equal '200',status_code
another sample (including Chuck van der Linden's idea): check status code and log out URLs if the status is not good.
require 'net/http'
require 'uri'
mylinks = Browser.ul(:id, 'my_ul_id').links
mylinks.each do |link|
u = URI.parse link.href
status_code = Net::HTTP.start(u.host,u.port){|http| http.head(u.request_uri).code }
unless status_code == '200'
File.open('error_log.txt','a+'){|file| file.puts "#{link.href} is #{status_code}" }
end
end
There's no need to use Watir for this. A HTTP HEAD request will give you an idea whether the URL resolves and will be faster.
Ruby's Net::HTTP can do it, or you can use Open::URI.
Using Open::URI you can request a URI, and get a page back. Because you don't really care what the page contains, you can throw away that part and only return whether you got something:
require 'open-uri'
if (open('http://www.example.com').read.any?)
puts "is"
else
puts "isn't"
end
The upside is the Open::URI resolves HTTP redirects. The downside is it returns full pages so it can be slow.
Ruby's Net::HTTP can help somewhat, because it can use HTTP HEAD requests, which don't return the entire page, only a header. That by itself isn't enough to know whether the actual page is reachable because the HEAD response could redirect to a page that doesn't resolve, so you have to loop through the redirects until you either don't get a redirect, or you get an error. The Net::HTTP docs have an example to get you started:
require 'net/http'
require 'uri'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
response = Net::HTTP.get_response(URI.parse(uri_str))
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end
print fetch('http://www.ruby-lang.org')
Again, that example is returning pages, which might slow you down. You can replace get_response with request_head, which returns a response like get_response does, which should help.
In either case, there's another thing you have to consider. A lot of sites use "meta refreshes", which cause the browser to refresh the page, using an alternate URL, after parsing the page. Handling these requires requesting the page and parsing it, looking for the <meta http-equiv="refresh" content="5" /> tags.
Other HTTP gems like Typhoeus and Patron also can do HEAD requests easily, so take a look at them too. In particular, Typhoeus can handle some heavy loads via its companion Hydra, allowing you to easily use parallel requests.
EDIT:
require 'typhoeus'
response = Typhoeus::Request.head("http://www.example.com")
response.code # => 302
case response.code
when (200 .. 299)
#
when (300 .. 399)
headers = Hash[*response.headers.split(/[\r\n]+/).map{ |h| h.split(' ', 2) }.flatten]
puts "Redirected to: #{ headers['Location:'] }"
when (400 .. 499)
#
when (500 .. 599)
#
end
# >> Redirected to: http://www.iana.org/domains/example/
Just in case you haven't played with one, here's what the response looks like. It's useful for exactly the sort of situation you're look at:
(rdb:1) pp response
#<Typhoeus::Response:0x00000100ac3f68
#app_connect_time=0.0,
#body="",
#code=302,
#connect_time=0.055054,
#curl_error_message="No error",
#curl_return_code=0,
#effective_url="http://www.example.com",
#headers=
"HTTP/1.0 302 Found\r\nLocation: http://www.iana.org/domains/example/\r\nServer: BigIP\r\nConnection: Keep-Alive\r\nContent-Length: 0\r\n\r\n",
#http_version=nil,
#mock=false,
#name_lookup_time=0.001436,
#pretransfer_time=0.055058,
#request=
:method => :head,
:url => http://www.example.com,
:headers => {"User-Agent"=>"Typhoeus - http://github.com/dbalatero/typhoeus/tree/master"},
#requested_http_method=nil,
#requested_url=nil,
#start_time=nil,
#start_transfer_time=0.109741,
#status_message=nil,
#time=0.109822>
If you have a lot of URLs to check, see the Hydra example that is part of Typhoeus.
There's a bit of a philosophical debate on whether watir or watir-webdriver should provide HTTP return code information. The premise being that an ordinary 'user' which is what Watir is simulating on the DOM is ignorant of HTTP return codes. I don't necessarily agree with this, as I have a slightly different use case perhaps to the main (performance testing etc)... but it is what it is. This thread expresses some opinions about the distinction => http://groups.google.com/group/watir-general/browse_thread/thread/26486904e89340b7
At present there's no easy way to determine HTTP response codes from Watir without using supplementary tools like proxies/Fiddler/HTTPWatch/TCPdump, or downgrading to a net/http level of scripting mid test... I personally like using firebug with the netexport plugin to keep a retrospective look at tests.
All previous solutions are inefficient if you have a very huge number of links because for each one, it will establish a new HTTP connection with the server hosting the link.
I have written a one-liner bash command that will use the curl command to fetch a list of links supplied from stdin and returns a list of status codes corresponding to each link. The key point here is that curl takes all bunch of links in the same invocation and it will reuse HTTP connections which will dramatically improve speed.
However, curl will divide the list into chunks of 256, which is still by far more than 1! To make sure connections are reused, sort the links first (simply using the sort command).
cat <YOUR_LINKS_FILE_ONE_PER_LINE> | xargs curl --head --location -w '---HTTP_STATUS_CODE:%{http_code}\n\n' -s --retry 10 --globoff | grep HTTP_STATUS_CODE | cut -d: -f2 > <RESULTS_FILE>
It is worth noting that the above command will follow HTTP redirects, retry 10 times for temporary errors (timeouts or 5xx) and of course will only fetch headers.
Update: added --globoff so that curl won't expand any url if it contains {} or []

Resources