Set timeout to Resolv to get ip address from a domain - ruby

BTW: I already take a look at others answers on SO, but none of them works as expected.
Take a look at the following code:
require 'resolv'
t = Time.now
Resolv::DNS.open do |dns|
dns.getaddress('thisisaninvaliddomain.com')
end
p Time.now - t
This piece of code takes something between 1.5 ~ 4.5 seconds to run.
If I add a timeout as other SO responses like (Set a timeout for Ruby Resolv.getaddress(ip)) suggest
require 'resolv'
Resolv::DNS.open do |dns|
dns.timeouts = 1
dns.getaddress('thisisaninvaliddomain.com')
end
it just finishes without issues, but, if I put a lower timeout like 0.0001 it fails with the following message:
resolv.rb:379:in 'getaddress': DNS result has no information for thisdomaindoesnotexists.com (Resolv::ResolvError)
If I try with a valid domain (like google.com) it will return the same error instead of a ResolvTimeout, so, I have no way to know if the domain doesn't exist, or it throws a timeout.
Any ideas what's going on here?

Related

Unable to read cookie in Rspec 3 via last_response

I am trying to read in Rspec 3.1 a cookie received after get call.
I see it is returned but the last_response.cookies doesn't exist.
How can I read response's cookie?
it "doesn't signs in" do
get '/ui/pages/Home'
puts last_response.cookies
end
I know it has been a while, but facing exactly this same issue now, after some struggle, I've found an article here that shares an interesting approach. As I also couldn't find any native parsed method for this, that has worked fine for me.
Basically, place this piece of code below on your spec/spec_helper.rb:
def cookies_from_response(response=last_response)
Hash[response["Set-Cookie"].lines.map { |line|
cookie = Rack::Test::Cookie.new(line.chomp)
[cookie.name, cookie]
}]
end
and you could use this to see the parsed hash:
puts cookies_from_response
For a cookie's value check, you could then use something like:
# Given your cookie name is 'foo' and the content is 'bar'
expect(cookies['foo'].value).to eq 'bar'
Hopefully this becomes helpful to others facing similar issues.

Setting an HTTP Timeout in Ruby 1.9.3

I'm using Ruby 1.9.3 and need to GET a URL. I have this working with Net::HTTP, however, if the site is down, Net::HTTP ends up hanging.
While searching the internet, I've seen many people faced similar problems, all with hacky solutions. However, many of those posts are quite old.
Requirements:
I'd prefer using Net::HTTP to installing a new gem.
I need both the Body and the Response Code. (e.g. 200)
I do not want to require open-uri, since that makes global changes and raises some security issues.
I need to GET a URL within X seconds, or return error.
Using Ruby 1.9.3, how can I GET a URL while setting a timeout?
To clarify, my existing code looks like:
Net::HTTP.get_response(URI.parse(url))
Trying to add:
Net::HTTP.open_timeout(1000)
Results in:
NoMethodError: undefined method `open_timeout' for Net::HTTP:Class
You can set the open_timeout attribute of the Net::HTTP object before making the connection.
uri = URI.parse(url)
Net::HTTP.new(uri.hostname, uri.port) do |http|
http.open_timeout = 1000
response = http.request_get(uri.request_uri)
end
I tried all the solutions here and on the other questions about this problem but I only got everything right with the following code, The open-uri gem is a wrapper for net::http.
I needed a get that had to wait longer than the default timeout and read the response. The code is also simpler.
require 'open-uri'
open(url, :read_timeout => 5 * 60) do |response|
if response.read[/Return: Ok/i]
log "sending ok"
else
raise "error sending, no confirmation received"
end
end

Ruby and Timeout.timeout performance issue

I'm not sure how to solve this big performance issue of my application. I'm using open-uri to request the most popular videos from youtube and when I ran perftools https://github.com/tmm1/perftools.rb
It shows that the biggest performance issue is Timeout.timeout. Can anyone suggest me how to solve the problem?
I'm using ruby 1.8.7.
Edit:
This is the output from my profiler
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B4bANr--YcONZDRlMmFhZjQtYzIyOS00YjZjLWFlMGUtMTQyNzU5ZmYzZTU4&hl=en_US
Timeout is wrapping the function that is actually doing the work to ensure that if the server fails to respond within a certain time, the code will raise an error and stop execution.
I suspect that what you are seeing is that the server is taking some time to respond. You should look at caching the response in some way.
For instance, using memcached (pseudocode)
require 'dalli'
require 'open-uri'
DALLI = Dalli.client.new
class PopularVideos
def self.get
result = []
unless result = DALLI.get("videos_#{Date.today.to_s}")
doc = open("http://youtube/url")
result = parse_videos(doc) # parse the doc somehow
DALLI.set("videos_#{Date.today.to_s}", result)
end
result
end
end
PopularVideos.get # calls your expensive parsing script once
PopularVideos.get # gets the result from memcached for the rest of the day

EventMachine Proxy -- HTTP Proxy mixing up request/response pairs

I have the following code (just as a test) and I want to create an HTTP proxy using EventMachine. The code below is an example on the es-proxy GitHub page. However, when I run this and open up a website that has a moderate amount of images, the images start loading incorrectly. What I mean by this is that some images are loaded twice or if I request my icon for the navigation bar, I instead get the profile picture. This is especially evident if I refresh the page a few times.
It seems that the responses do not correspond to the matching request; causing everything to be jumbled. However, I'm not sure why this is. The code below seems simple enough for this to not be a problem.
require 'rubygems'
require 'em-proxy'
require 'http/parser' # gem install http_parser.rb
require 'uuid' # gem install uuid
# > ruby em-proxy-http.rb
# > curl --proxy localhost:9889 www.google.com
host = "0.0.0.0"
port = 9889
puts "listening on #{host}:#{port}..."
Proxy.start(:host => host, :port => port) do |conn|
#p = Http::Parser.new
#p.on_headers_complete = proc do |h|
session = UUID.generate
puts "New session: #{session} (#{h.inspect})"
host, port = h['Host'].split(':')
conn.server session, :host => host, :port => (port || 80)
conn.relay_to_servers #buffer
#buffer = ''
end
#buffer = ''
conn.on_connect do |data,b|
puts [:on_connect, data, b].inspect
end
conn.on_data do |data|
#buffer << data
#p << data
data
end
conn.on_response do |backend, resp|
#puts [:on_response, backend, resp].inspect
resp
end
conn.on_finish do |backend, name|
puts [:on_finish, name].inspect
end
end
Update
I believe I have insight as to what is happening but, still no way of solving my problem. I am creating a server for each request and when I relay my requests I have multiple servers. Then in the on response I should only be returning the response if it is from the correct server. However, I don't have a way to correlate this as of yet.
Here a proper response:
Try removing every puts in the example so the main loop can concentrate on doing the actual network I/O, it works for me like that.
I think there may be some kind of timeout playing behind this, maybe the client does not wait long enough for the full answer to come back while the server is stuck outputing text to the console.
That's the downside of using an event reactor, you have to make sure nothing blocks it.
The code doesn't seem to account for persistent http connections. Maybe you could try a HTTP 1.0 browser.

Scrubyt gives 404 Error when clicking link using _details method

This might be a similar problem to my earlier two questions - see here and here but I'm trying to use the _detail command to automatically click the link so I can scrape the details page for each individual event.
The code I'm using is:
require 'rubygems'
require 'scrubyt'
nuffield_data = Scrubyt::Extractor.define do
fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'
event do
title 'The Coast of Mayo'
link_url
event_detail do
dates "1-4 October"
times "7:30pm"
end
end
next_page "Next Page", :limit => 20
end
nuffield_data.to_xml.write($stdout,1)
Is there any way to print out the URL that using the event_detail is trying to access? The error doesn't seem to give me the URL that gave the 404.
Update: I think the link may be a relative link - could this be causing problems? Any ideas how to deal with that?
I had the same issue with relative links and fixed it like this... you have to set the :resolve param to the correct base url
event do
title 'The Coast of Mayo'
link_url
event_detail :resolve => 'http://www.nuffieldtheatre.co.uk/cn/events' do
dates "1-4 October"
times "7:30pm"
end
end
sudo gem install ruby-debug
This will give you access to a nice ruby debugger, start the debugger by altering your script:
require 'rubygems'
require 'ruby-debug'
Debugger.start
Debugger.settings[:autoeval] = true if Debugger.respond_to?(:settings)
require 'scrubyt'
nuffield_data = Scrubyt::Extractor.define do
fetch 'http://www.nuffieldtheatre.co.uk/cn/events/event_listings.php'
event do
title 'The Coast of Mayo'
link_url
event_detail do
dates "1-4 October"
times "7:30pm"
end
end
next_page "Next Page", :limit => 2
end
nuffield_data.to_xml.write($stdout,1)
Then find out where scrubyt is throwing an exception - in this case:
/Library/Ruby/Gems/1.8/gems/scrubyt-0.3.4/lib/scrubyt/core/navigation/fetch_action.rb:52:in `fetch'
Find the scrubyt gem on your system, and add a rescue clause to the method in question so that the end of the method looks like this:
if ##current_doc_protocol == 'file'
##hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(open(##current_doc_url).read))
else
##hpricot_doc = Hpricot(PreFilterDocument.br_to_newline(##mechanize_doc.body))
store_host_name(self.get_current_doc_url) # in case we're on a new host
end
rescue
debugger
self # the self is here because debugger doesn't like being at the end of a method
end
Now run the script again and you should be dropped into a debugger when the exception is raised. Just try typing this a the debug prompt to see what the offending URL is:
##current_doc_url
You can also add a debugger statement anywhere in that method if you want to check what is going on - for example you may want to add one between line 51 and 52 of this method to check how the url that is being called changes and why.
This is basically how I figured out the answer to your previous questions.
Good luck.
Sorry I have no idea why this would be nil - every time I have run this it returns a url - the method self.fetch requires a URL which you should be able to access as the local variable doc_url. If this returns nil also may you should post the code where you have included the debugger call.
I've tried to access doc_url but that seems to also return nil. When I have access to my server (later in the day) I'll post the code with the debugging bit in it.

Resources