Net::HTTP::Server fails when reading a file - ruby

I'm trying to write a very simple HTTP server that returns the next 500 lines of a file on each request. If I try to read a file, the server fails.
This is my program:
#!/usr/bin/env ruby
require 'rubygems'
require 'net/http/server'
require 'pp'
file = File.foreach("data/all.txt").each_slice(500)
headers = {'Content-Type' => 'text/plain'}
Net::HTTP::Server.run(:port => 2000) do |request, stream|
[200, headers, file.next]
end
If I make a request, I get the first 500 lines from the file, but I get this on the console:
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/generator.rb:132:in`call'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/generator.rb:132:in`next'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/generator.rb:190:in`next' ./urlServer.rb:12
/Library/Ruby/Gems/1.8/gems/net-http-server-0.2.2/lib/net/http/server/daemon.rb:122:in`call'
/Library/Ruby/Gems/1.8/gems/net-http-server-0.2.2/lib/net/http/server/daemon.rb:122:in`serve'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/gserver.rb:211:in`start'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/gserver.rb:208:in`initialize'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/gserver.rb:208:in`new'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/gserver.rb:208:in`start'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/gserver.rb:198:in`initialize'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/gserver.rb:198:in`new'
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/gserver.rb:198:in`start'
/Library/Ruby/Gems/1.8/gems/net-http-server-0.2.2/lib/net/http/server/server.rb:42:in`run' ./urlServer.rb:11
All further requests fail.
How do I fix this program?

When the Enumerator in the file variable reaches then end, the next method will raise a StopIteration exception, but you are not rescuing it so that is likely the issue here.
You probably want to wrap it in a begin-rescue block, e.g.:
Net::HTTP::Server.run(:port => 2000) do |request, stream|
begin
body = file.next
rescue StopIteration
body = []
end
[200, headers, body]
end

Related

In RoR, how do I catch an exception if I get no response from a server?

I’m using Rails 4.2.3 and Nokogiri to get data from a web site. I want to perform an action when I don’t get any response from the server, so I have:
begin
content = open(url).read
if content.lstrip[0] == '<'
doc = Nokogiri::HTML(content)
else
begin
json = JSON.parse(content)
rescue JSON::ParserError => e
content
end
end
rescue Net::OpenTimeout => e
attempts = attempts + 1
if attempts <= max_attempts
sleep(3)
retry
end
end
Note that this is different than getting a 500 from the server. I only want to retry when I get no response at all, either because I get no TCP connection or because the server fails to respond (or some other reason that causes me not to get any response). Is there a more generic way to take account of this situation other than how I have it? I feel like there are a lot of other exception types I’m not thinking of.
This is generic sample how you can define timeout durations for HTTP connection, and perform several retries in case of any error while fetching content (edited)
require 'open-uri'
require 'nokogiri'
url = "http://localhost:3000/r503"
openuri_params = {
# set timeout durations for HTTP connection
# default values for open_timeout and read_timeout is 60 seconds
:open_timeout => 1,
:read_timeout => 1,
}
attempt_count = 0
max_attempts = 3
begin
attempt_count += 1
puts "attempt ##{attempt_count}"
content = open(url, openuri_params).read
rescue OpenURI::HTTPError => e
# it's 404, etc. (do nothing)
rescue SocketError, Net::ReadTimeout => e
# server can't be reached or doesn't send any respones
puts "error: #{e}"
sleep 3
retry if attempt_count < max_attempts
else
# connection was successful,
# content is fetched,
# so here we can parse content with Nokogiri,
# or call a helper method, etc.
doc = Nokogiri::HTML(content)
p doc
end
When it comes to rescuing exceptions, you should aim to have a clear understanding of:
Which lines in your system can raise exceptions
What is going on under the hood when those lines of code run
What specific exceptions could be raised by the underlying code
In your code, the line that's fetching the content is also the one that could see network errors:
content = open(url).read
If you go to the documentation for the OpenURI module you'll see that it uses Net::HTTP & friends to get the content of arbitrary URIs.
Figuring out what Net::HTTP can raise is actually very complicated but, thankfully, others have already done this work for you. Thoughtbot's suspenders project has lists of common network errors that you can use. Notice that some of those errors have to do with different network conditions than what you had in mind, like the connection being reset. I think it's worth rescuing those as well, but feel free to trim the list down to your specific needs.
So here's what your code should look like (skipping the Nokogiri and JSON parts to simplify things a bit):
require 'net/http'
require 'open-uri'
HTTP_ERRORS = [
EOFError,
Errno::ECONNRESET,
Errno::EINVAL,
Net::HTTPBadResponse,
Net::HTTPHeaderSyntaxError,
Net::ProtocolError,
Timeout::Error,
]
MAX_RETRIES = 3
attempts = 0
begin
content = open(url).read
rescue *HTTP_ERRORS => e
if attempts < MAX_RETRIES
attempts += 1
sleep(2)
retry
else
raise e
end
end
I would think about using a Timeout that raises an exception after a short period:
MAX_RESPONSE_TIME = 2 # seconds
begin
content = nil # needs to be defined before the following block
Timeout.timeout(MAX_RESPONSE_TIME) do
content = open(url).read
end
# parsing `content`
rescue Timeout::Error => e
attempts += 1
if attempts <= max_attempts
sleep(3)
retry
end
end

In Ruby/Sinatra, how to halt with an ERB template and error message

In my Sinatra project, I'd like to be able to halt with both an error code and an error message:
halt 403, "Message!"
I want this, in turn, to be rendered in an error page template (using ERB). For example:
error 403 do
erb :"errors/error", :locals => {:message => env['sinatra.error'].message}
end
However, apparently env['sinatra.error'].message (aka the readme and every single website says I should do it) does not expose the message I've provided. (This code, when run, returns the undefined method `message' for nil:NilClass error.)
I've searched for 4-5 hours and experimented with everything and I can't figure out where the message is exposed for me to render via ERB! Does anyone know where it is?
(It seems like the only alternative I can think of is writing this instead of the halt code above, every time I would like to halt:
halt 403, erb(:"errors/error", :locals => {m: "Message!"})
This code works. But this is a messy solution since it involves hardcoding the location of the error ERB file.)
(If you were wondering, this problem is not related to the show_exceptions configuration flag because both set :show_exceptions, false and set :show_exceptions, :after_handler make no difference.)
Why doesn't it work − use the source!
Lets look at the Sinatra source code to see why this problem doesn't work. The main Sinatra file (lib/sinatra/base.rb) is just 2043 lines long, and pretty readable code!
All halt does is:
def halt(*response)
response = response.first if response.length == 1
throw :halt, response
end
And exceptions are caught with:
# Dispatch a request with error handling.
def dispatch!
invoke do
static! if settings.static? && (request.get? || request.head?)
filter! :before
route!
end
rescue ::Exception => boom
invoke { handle_exception!(boom) }
[..]
end
def handle_exception!(boom)
#env['sinatra.error'] = boom
[..]
end
But for some reason this code is never run (as tested with basic "printf-debugging"). This is because in invoke the block is run like:
# Run the block with 'throw :halt' support and apply result to the response.
def invoke
res = catch(:halt) { yield }
res = [res] if Fixnum === res or String === res
if Array === res and Fixnum === res.first
res = res.dup
status(res.shift)
body(res.pop)
headers(*res)
elsif res.respond_to? :each
body res
end
nil # avoid double setting the same response tuple twice
end
Notice the catch(:halt) here. The if Array === res and Fixnum === res.first part is what halt sets and how the response body and status code are set.
The error 403 { .. } block is run in call!:
invoke { error_block!(response.status) } unless #env['sinatra.error']
So now we understand why this doesn't work, we can look for solutions ;-)
So can I use halt some way?
Not as far as I can see. If you look at the body of the invoke method, you'll see that the body is always set when using halt. You don't want this, since you want to override the response body.
Solution
Use a "real" exception and not the halt "pseudo-exception". Sinatra doesn't seem to come with pre-defined exceptions, but the handle_exception! does look at http_status to set the correct HTTP status:
if boom.respond_to? :http_status
status(boom.http_status)
elsif settings.use_code? and boom.respond_to? :code and boom.code.between? 400, 599
status(boom.code)
else
status(500)
end
So you could use something like this:
require 'sinatra'
class PermissionDenied < StandardError
def http_status; 403 end
end
get '/error' do
#halt 403, 'My special message to you!'
raise PermissionDenied, 'My special message to you!'
end
error 403 do
'Error message -> ' + #env['sinatra.error'].message
end
Which works as expected (the output is Error message -> My special message to you!). You can return an ERB template here.
In Sinatra v2.0.7+, messages passed to halt are stored in the body of the response. So a halt with an error code and an error message (eg: halt 403, "Message!") can be caught and rendered in an error page template with:
error 403 do
erb :"errors/error", locals: { message: body[0] }
end

Skip a http request if response if taking too long with ruby

I have an array of urls. I'm going through each one, sending a get request and printing the response code. Here is part of the code:
arr.each do |url|
res = Faraday.get(link.href)
p res.status
end
However sometimes I get to url, it times out and crashes. Is there a way to tell ruby "if I don't get a response in a certain amount of time then skip to the next url?"
You could add a timeout like this:
require 'timeout'
arr.each do |url|
begin
Timeout.timeout(5) do # a timeout of five seconds
res = Faraday.get(link.href)
p res.status
end
rescue Timeout::Error
# handle error: show user a message?
end
end

ruby net/http `read_body': Net::HTTPOK#read_body called twice (IOError)

I'm getting read_body called twice (IOError) using the net/http library. I'm trying to download files and use http sessions efficiently. Looking for some help or advice to fix my issues. From my debug message it appears when I log the response code, readbody=true. Is that why read_body is read twice when I try to write the large file in chunks?
D, [2015-04-12T21:17:46.954928 #24741] DEBUG -- : #<Net::HTTPOK 200 OK readbody=true>
I, [2015-04-12T21:17:46.955060 #24741] INFO -- : file found at http://hidden:8080/job/project/1/maven-repository/repository/org/project/service/1/service-1.zip.md5
/usr/lib/ruby/2.2.0/net/http/response.rb:195:in `read_body': Net::HTTPOK#read_body called twice (IOError)
from ./deploy_application.rb:36:in `block in get_file'
from ./deploy_application.rb:35:in `open'
from ./deploy_application.rb:35:in `get_file'
from ./deploy_application.rb:59:in `block in <main>'
from ./deploy_application.rb:58:in `each'
from ./deploy_application.rb:58:in `<main>'
require 'net/http'
require 'logger'
STAMP = Time.now.utc.to_i
#log = Logger.new(STDOUT)
# project , build, service remove variables above
project = "project"
build = "1"
service = "service"
version = "1"
BASE_URI = URI("http://hidden:8080/job/#{project}/#{build}/maven-repository/repository/org/#{service}/#{version}/")
# file pattern for application is zip / jar. Hopefully the lib in the zipfile is acceptable.
# example for module download /#{service}/#{version}.zip /#{service}/#{version}.zip.md5 /#{service}/#{version}.jar /#{service}/#{version}.jar.md5
def clean_exit(code)
# remove temp files on exit
end
def get_file(file)
puts BASE_URI
uri = URI.join(BASE_URI,file)
#log.debug(uri)
request = Net::HTTP::Get.new uri #.request_uri
#log.debug(request)
response = #http.request request
#log.debug(response)
case response
when Net::HTTPOK
size = 0
progress = 0
total = response.header["Content-Length"].to_i
#log.info("file found at #{uri}")
# need to handle file open error
Dir.mkdir "/tmp/#{STAMP}"
File.open "/tmp/#{STAMP}/#{file}", 'wb' do |io|
response.read_body do |chunk|
size += chunk.size
new_progress = (size * 100) / total
unless new_progress == progress
#log.info("\rDownloading %s (%3d%%) " % [file, new_progress])
end
progress = new_progress
io.write chunk
end
end
when 404
#log.error("maven repository file #{uri} not found")
exit 4
when 500...600
#log.error("error getting #{uri}, server returned #{response.code}")
exit 5
else
#log.error("unknown http response code #{response.code}")
end
end
#http = Net::HTTP.new(BASE_URI.host, BASE_URI.port)
files = [ "#{service}-#{version}.zip.md5", "#{service}-#{version}.jar", "#{service}-#{version}.jar.md5" ].each do |file| #"#{service}-#{version}.zip",
get_file(file)
end
Edit: Revised answer!
Net::HTTP#request, when called without a block, will pre-emptively read the body. The documentation isn't clear about this, but it hints at it by suggesting that the body is not read if a block is passed.
If you want to make the request without reading the body, you'll need to pass a block to the request call, and then read the body from within that. That is, you want something like this:
#http.request request do |response|
# ...
response.read_body do |chunk|
# ...
end
end
This is made clear in the implementation; Response#reading_body will first yield the unread response to a block if given (from #transport_request, which is called from #request), then read the body unconditionally. The block parameter to #request gives you that chance to intercept the response before the body is read.

Typhoeus Hydra run out of memory

I wrote a script that checks urls from file (using ruby gem Typhoeus). I don't know why when I run my code the memory usage grow. Usually after 10000 urls script crashes.
Is there any solution for it ? Thanks in advance for your help.
My code:
require 'rubygems'
require 'typhoeus'
def run file
log = Logger.new('log')
hydra = Typhoeus::Hydra.new(:max_concurrency => 30)
hydra.disable_memoization
File.open(file).each do |url|
begin
request = Typhoeus::Request.new(url.strip, :method => :get, :follow_location => true)
request.on_complete do |resp|
check_website(url, resp.body)
end
puts "queuing #{ url }"
hydra.queue(request)
request.destroy
rescue Exception => e
log.error e
end
end
hydra.run
end
One approach might be to adapt your file processing - instead of reading a line from the file and immediately creating the request object, try processing them in batches (say 5000 at a time) and throttle your request rate / memory consumption.
I've made improvement to my code, as you suggest I'm processing urls to hydra in batches.
It works with normal memory usage but I don't know why after about 1000 urls it just stop getting new ones. This is very strange, no errors, script is still running but it doesn't send/get new requests. My code:
def run file, concurrency
log = Logger.new('log')
log.info '*** Hydra started ***'
queue = []
File.open(file).each do |uri|
queue << uri
if queue.size == concurrency * 5
hydra = Typhoeus::Hydra.new(:max_concurrency => concurrency)
hydra.disable_memoization
queue.each do |url|
request = Typhoeus::Request.new(url.strip, :method => :get, :follow_location => true, :max_redirections => 2, :timeout => 5000)
request.on_complete do |resp|
check_website(url, resp.body)
puts "#{url} code: #{resp.code} curl_msg #{resp.curl_error_message}"
end
puts "queuing #{url}"
hydra.queue(request)
end
puts 'hydra run'
hydra.run
queue = []
end
end
log.info '*** Hydra finished work ***'
end

Resources