How to properly close a HTTP connection in Ruby - ruby

I've been looking for a proper way to close a HTTP connection and have found nothing yet.
require 'net/http'
require 'uri'
uri = URI.parse("http://www.google.com")
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new("/")
http.start
resp = http.request request
p resp.body
So far so good, but now when I try to close the connection guessing which method to use:
http.shutdown # error
http.close # error
when I check ri Net::HTTP.start, the documentation explicitly says
the caller is responsible for closing it (the connection) upon completion
I don't want to use the block form.

The method to close the connection is http.finish.
The net/http API is particularly confusing to use. It is generally easier to use a higher-level library instead, such as HTTParty or REST Client, which will provide a more intuitive API and take care of the lower level details for you.

If the optional block is given, the newly created Net::HTTP object is passed to it and closed when the block finishes.
from ruby-doc.org

Related

Wait for selector to present

When doing web scraping with Nokogiri I occasionally get the following error message
undefined method `at_css' for nil:NilClass (NoMethodError)
I know that the selected element is present at some time, but the site is sometimes a bit slow to respond, and I guess this is the reason why I'm getting the error.
Is there some way to wait until a certain selector is present before proceeding with the script?
My current http request block looks like this
url = URL
body = BODY
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.read_timeout = 200 # default 60 seconds
http.open_timeout = 200 # default nil
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request.body = body
request["Content-Type"] = "application/x-www-form-urlencoded"
begin
response = http.request(request)
doc = Nokogiri::HTML(response.body)
rescue
sleep 100
retry
end
While you can use a streaming Net::HTTP like #Stefan says in his comment, and an associated handler that includes Nokogiri, you can't parse a partial HTTP document using a DOM model, which is Nokogiri's default, because it expects the full document also.
You could use Nokogiri's SAX parser, but that's an entirely different programming style.
If you're retrieving an entire page, then use OpenURI instead of the lower-level Net::HTTP. It automatically handles a number of things that Net::HTTP will not do by default, such as redirection, which makes it a lot easier to retrieve pages and will greatly simplify your code.
I suspect the problem is either that the site is timing out, or the tag you're trying to find is dynamically loaded after the real page loads.
If it's timing out you'll need to increase your wait time.
If it's dynamically loading that markup, you can request the main page, locate the appropriate URL for the dynamic content and load it separately. Once you have it, you can either insert it into the first page if you need everything, or just parse it separately.

Can I reference a complete Ruby Net::HTTP request as a string before sending?

I'm using Net::HTTP in Ruby 1.9.2p290 to handle some, obviously, networking calls.
I now have a need to see the complete request that is sent to the server (as one long big String conforming to HTTP 1.0/1.1.
In other words, I want Net::HTTP to handle the heavy lifting of generating the HTTP standard-compliant request+body, but I want to send the string with a custom delivery mechanism.
Net::HTTPRequest doesn't seem to have any helpful methods here -- do I need to go lower down the stack and hijack something?
Does anyone know of a good library, maybe other than Net::HTTP, that could help?
EDIT: I'd also like to do the same going the other way (turning a string response into Net::HTTP::* -- although it seems I may be able to instantiate Net::HTTPResponse by myself?
Request:
post = Net::HTTP::Post.new('http://google.com')
post.set_form_data :query => 'ruby http'
sio = StringIO.new
post.exec si, Net::HTTP::HTTPVersion, post.path
puts sio.string
Response:
si = StringIO.new("HTTP/1.1 200 OK\n")
bio = Net::BufferedIO.new(si)
Net::HTTPResponse.read_new(bio)

How do I make a uber-simple API wrapper in Ruby?

I'm trying to use a super simple API from is.gd:
http://is.gd/api.php?longurl=http://www.example.com
Which returns a response header "HTTP/1.1 200 OK" if the URL was shortened as expected, or "HTTP/1.1 500 Internal Server Error" if there was any problem that prevented this. Assuming the request was successful, the body of the response will contain only the new shortened URL
I don't even know where to begin or if there are any available ruby methods to make sending and receiving of these API requests frictionless. I basically want to assign the response (the shortened url) to a ruby object.
How would you do this? Thanks in advance.
Super simple:
require 'open-uri'
def shorten(url)
open("http://is.gd/api.php?longurl=#{url}").read
rescue
nil
end
open-uri is part of the Ruby standard library and (among other things) makes it possible to do HTTP requests using the open method (which usually opens files). open returns an IO, and calling read on the IO returns the body. open-uri will throw an exception if the server returns a 500 error, and in this case I'm catching the exception and return nil, but if you want you can let the exception bubble up to the caller, or raise another exception.
Oh, and you would use it like this:
url = "http://www.example.com"
puts "The short version of #{url} is #{shorten(url)}"
I know you already got an answer you accepted, but I still want to mention httparty because I've made very good experiences wrapping APIs (Delicious and Github) with it.

Ruby's open-uri and cookies

I would like to store the cookies from one open-uri call and pass them to the next one. I can't seem to find the right docs for doing this. I'd appreciate it if you could tell me the right way to do this.
NOTES: w3.org is not the actual url, but it's shorter; pretend cookies matter here.
h1 = open("http://www.w3.org/")
h2 = open("http://www.w3.org/People/Berners-Lee/", "Cookie" => h1.FixThisSpot)
Update after 2 nays: While this wasn't intended as rhetorical question I guarantee that it's possible.
Update after tumbleweeds: See (the answer), it's possible. Took me a good while, but it works.
I thought someone would just know, but I guess it's not commonly done with open-uri.
Here's the ugly version that neither checks for privacy, expiration, the correct domain, nor the correct path:
h1 = open("http://www.w3.org/")
h2 = open("http://www.w3.org/People/Berners-Lee/",
"Cookie" => h1.meta['set-cookie'].split('; ',2)[0])
Yes, it works. No it's not pretty, nor fully compliant with recommendations, nor does it handle multiple cookies (as is).
Clearly, HTTP is a very straight-forward protocol, and open-uri lets you at most of it. I guess what I really needed to know was how to get the cookie from the h1 request so that it could be passed to the h2 request (that part I already knew and showed). The surprising thing here is how many people basically felt like answering by telling me not to use open-uri, and only one of those showed how to get a cookie set in one request passed to the next request.
You need to add a "Cookie" header.
I'm not sure if open-uri can do this or not, but it can be done using Net::HTTP.
# Create a new connection object.
conn = Net::HTTP.new(site, port)
# Get the response when we login, to set the cookie.
# body is the encoded arguments to log in.
resp, data = conn.post(login_path, body, {})
cookie = resp.response['set-cookie']
# Headers need to be in a hash.
headers = { "Cookie" => cookie }
# On a get, we don't need a body.
resp, data = conn.get(path, headers)
Thanks Matthew Schinckel your answer was really useful. Using Net::HTTP I was successful
# Create a new connection object.
site = "google.com"
port = 80
conn = Net::HTTP.new(site, port)
# Get the response when we login, to set the cookie.
# body is the encoded arguments to log in.
resp, data = conn.post(login_path, body, {})
cookie = resp.response['set-cookie']
# Headers need to be in a hash.
headers = { "Cookie" => cookie }
# On a get, we don't need a body.
resp, data = conn.get(path, headers)
puts resp.body
Depending on what you are trying to accomplish, check out webrat. I know it is usually used for testing, but it can also hit live sites, and it does a lot of the stuff that your web browser would do for you, like store cookies between requests and follow redirects.
you would have to roll your own cookie support by parsing the meta headers when reading and adding a cookie header when submitting a request if you are using open-uri. Consider using httpclient http://raa.ruby-lang.org/project/httpclient/ or something like mechanize instead http://mechanize.rubyforge.org/ as they have cookie support built in.
There is a RFC 2109 and RFC 2965 cookie jar implementation to be found here for does that want standard compliant cookie handling.
https://github.com/dwaite/cookiejar

How do I read only x number of bytes of the body using Net::HTTP?

It seems like the methods of Ruby's Net::HTTP are all or nothing when it comes to reading the body of a web page. How can I read, say, the just the first 100 bytes of the body?
I am trying to read from a content server that returns a short error message in the body of the response if the file requested isn't available. I need to read enough of the body to determine whether the file is there. The files are huge, so I don't want to get the whole body just to check if the file is available.
This is an old thread, but the question of how to read only a portion of a file via HTTP in Ruby is still a mostly unanswered one according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:
require 'net/http'
# provide access to the actual socket
class Net::HTTPResponse
attr_reader :socket
end
uri = URI("http://www.example.com/path/to/file")
begin
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new(uri.request_uri)
# calling request with a block prevents body from being read
http.request(request) do |response|
# do whatever limited reading you want to do with the socket
x = response.socket.read(100);
# be sure to call finish before exiting the block
http.finish
end
end
rescue IOError
# ignore
end
The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.
FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:
class Net::BufferedIO
def readchar
read(1)[0].ord
end
end
Shouldn't you just use an HTTP HEAD request (Ruby Net::HTTP::Head method) to see if the resource is there, and only proceed if you get a 2xx or 3xx response? This presumes your server is configured to return a 4xx error code if the document is not available. I would argue this was the correct solution.
An alternative is to request the HTTP head and look at the content-length header value in the result: if your server is correctly configured, you should easily be able to tell the difference in length between a short message and a long document. Another alternative: set the content-range header field in the request (which again assumes that the server is behaving correctly WRT the HTTP spec).
I don't think that solving the problem in the client after you've sent the GET request is the way to go: by that time, the network has done the heavy lifting, and you won't really save any wasted resources.
Reference: http header definitions
I wanted to do this once, and the only thing that I could think of is monkey patching the Net::HTTP#read_body and Net::HTTP#read_body_0 methods to accept a length parameter, and then in the former just pass the length parameter to the read_body_0 method, where you can read only as much as length bytes.
To read the body of an HTTP request in chunks, you'll need to use Net::HTTPResponse#read_body like this:
http.request_get('/large_resource') do |response|
response.read_body do |segment|
print segment
end
end
Are you sure the content server only returns a short error page?
Doesn't it also set the HTTPResponse to something appropriate like 404. In which case you can trap the HTTPClientError derived exception (most likely HTTPNotFound) which is raised when accessing Net::HTTP.value().
If you get an error then your file wasn't there if you get 200 the file is starting to download and you can close the connection.
You can't. But why do you need to? Surely if the page just says that the file isn't available then it won't be a huge page (i.e. by definition, the file won't be there)?

Resources