use ruby to get content length of URLs - ruby

I am trying to write a ruby script that gets some details about files on a website using net/http. My code looks like this:
require 'open-uri'
require 'net/http'
url = URI.parse asset
res = Net::HTTP.start(url.host, url.port) {|http|
http.get(asset)
}
headers = res.to_hash
p headers
I would like to get two pieces of information from this request: the total length of the content inflated, and (as appropriate) the length of the content deflated.
Sometimes, the headers will include a content-length parameter, which appears to be the gzipped length of the content. I can also approximate the inflated size of the content using res.body.length, but this has not been foolproof by any stretch of the imagination. The documentation on net/http says that gzip headers are removed from the list automatically (to help me, gee thanks) so I cannot seem to get a reliable handle on this information.
Any help is appreciated (including other gems if they will do this more easily).

Got it! The "magic" behavior here only occurs if you don't specify your own accept-encoding header. Amended code as follows:
require 'open-uri'
require 'net/http'
require 'date'
require 'zlib'
headers = { "accept-encoding" => "gzip;q=1.0,deflate;q=0.6,identity;q=0.3" }
url = URI.parse asset
res = Net::HTTP.start(url.host, url.port) {|http|
http.get(asset, headers)
}
headers = res.to_hash
gzipped = headers['content-encoding'] && headers['content-encoding'][0] == "gzip"
content = gzipped ? Zlib::GzipReader.new(StringIO.new(res.body)).read : res.body
full_length = content.length,
compressed_length = (headers["content-length"] && headers["content-length"][0] || res.body.length),

You can try use sockets to send HEAD request to the server with is faster (no content) and don't send "Accept-Encoding: gzip", so your response will not be gzip.

Related

Using Ruby's Net/HTTP module, can I ever send raw JSON data?

I've been doing a lot of research on the topic of sending JSON data through Ruby HTTP requests, compared to sending data and requests through Fiddler. My primary goal is to find a way to send a nested hash of data in an HTTP request using Ruby.
In Fiddler, you can specify a JSON in the request body and add the header "Content-Type: application/json".
In Ruby, using Net/HTTP, I'd like to do the same thing if it's possible. I have a hunch that it isn't possible, because the only way to add JSON data to an http request in Ruby is by using set_form_data, which expects data in a hash. This is fine in most cases, but this function does not properly handle nested hashes (see the comments in this article).
Any suggestions?
Although using something like Faraday is often a lot more pleasant, it's still doable with the Net::HTTP library:
require 'uri'
require 'json'
require 'net/http'
url = URI.parse("http://example.com/endpoint")
http = Net::HTTP.new(url.host, url.port)
content = { test: 'content' }
http.post(
url.path,
JSON.dump(content),
'Content-type' => 'application/json',
'Accept' => 'text/json, application/json'
)
After reading tadman's answer above, I looked more closely at adding data directly to the body of the HTTP request. In the end, I did exactly that:
require 'uri'
require 'json'
require 'net/http'
jsonbody = '{
"id":50071,"name":"qatest123456","pricings":[
{"id":"dsb","name":"DSB","entity_type":"Other","price":6},
{"id":"tokens","name":"Tokens","entity_type":"All","price":500}
]
}'
# Prepare request
url = server + "/v1/entities"
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.set_debug_output( $stdout )
request = Net::HTTP::Put.new(uri )
request.body = jsonbody
request.set_content_type("application/json")
# Send request
response = http.request(request)
If you ever want to debug the HTTP request being sent out, use this code, verbatim: http.set_debug_output( $stdout ). This is probably the easiest way to debug HTTP requests being sent through Ruby and it's very clear what is going on :)

Ruby equivalent for setting HTTP GET headers

In C# it was fairly simple and didn't take more than a couple minutes to google:
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(#"http://www.example.com?q=someValue");
request.Headers.Add("Authorization: OAuth realm=\"example.com\" oauth_consumer_key=\"BCqrstoO\" ... so on and so forth");
string resultString = "";
using (StreamReader read = new StreamReader(request.GetResponse().GetResponseStream(), true))
{
resultString = read.ReadToEnd();
}
Trying to do it in Ruby hasn't quite been as straight forward (or is just something stupid that I'm missing).
I have been looking and the closest things I've come to finding my answer are How to make an HTTP GET with modified headers? and Send Custom Headers in Ruby.
So my problem, I suppose, boils down to
How do I set the headers as just a just a straight forward string?
Why do these two examples show headers formatted the way they are?
Is what I'm asking for even good convention and if not, how do I format what I'm trying to do in the convention these Ruby methods are asking for?
So far I tried the two examples and here's my most recent non-working attempt:
headers = "Authorization: OAuth realm=\"example.com\" oauth_consumer_key=\"BCqrstoO\" ... so on and so forth"
uri = URI("www.example.com")
http = Net::HTTP.new(uri.host, uri.port)
http.get(uri.path, headers) do |chunk|
puts chunk
end
Use open-uri. Example:
require 'open-uri'
open("http://www.ruby-lang.org/en/",
"User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "foo#bar.invalid",
"Referer" => "http://www.ruby-lang.org/") {|f|
# ...
}
Just in case you check this at this point on time, the Net:HTTPRequest object allows you to add headers easily.
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri
request['my-header'] = '1'
http.request request do |response|
puts response
end
end

Verifying a remote image is actually an image file in ruby?

I'm trying to figure out how I can verify what I'm feeding into carrierwave is actually an image. The source I'm getting my image urls from isn't giving me back all live urls. Some of the images no longer exist. Unfortunately it doesn't really return the right status codes or anything because I was using some code to check if the remote file exists and it was passing that check. So now just to be on the safe side I'd like a way to verify i'm getting back a valid image file before I go ahead and download it.
Here is the remote file checking code I was using just for reference but I'd prefer something that actually can identify that the files are images.
require 'open-uri'
require 'net/http'
def remote_file_exists?(url)
url = URI.parse(url)
Net::HTTP.start(url.host, url.port) do |http|
return http.head(url.request_uri).code == "200"
end
end
I would check to see if the service returns the proper mime types in the Content-Type HTTP header. (here's a list of mime types)
For example, the Content-Type of the StackOverflow homepage is text/html; charset=utf-8, and the Content-Type of your gravatar image is image/png
To check the Content-Type header for image in ruby using Net::HTTP, you would use the following:
def remote_file_exists?(url)
url = URI.parse(url)
Net::HTTP.start(url.host, url.port) do |http|
return http.head(url.request_uri)['Content-Type'].start_with? 'image'
end
end
Rick Button's answer worked for me but I needed to add SSl support:
def self.remote_image_exists?(url)
url = URI.parse(url)
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = (url.scheme == "https")
http.start do |http|
return http.head(url.request_uri)['Content-Type'].start_with? 'image'
end
end
I ended up using HTTParty for this. The .net request answer from Rick Button kept timing out.
def remote_file_exists?(url)
response = HTTParty.get(url)
response.code == 200 && response.headers['Content-Type'].start_with? 'image'
end
https://github.com/jnunemaker/httparty

Using Ruby, what is the most efficient way to get the content type of a given URL?

What is the most efficient way to get the content-type of a given URL using Ruby?
This is what I'd do if I want simple code:
require 'open-uri'
str = open('http://example.com')
str.content_type #=> "text/html"
The big advantage is it follows redirects.
If you're checking a bunch of URLs you might want to call close on the handles after you've found what you want.
Take a look at the Net::HTTP library.
require 'net/http'
response = nil
uri, path = 'google.com', '/'
Net::HTTP.start(uri, 80) { |http| response = http.head(path) }
p response['content-type']

How to make an HTTP GET with modified headers?

What is the best way to make an HTTP GET request in Ruby with modified headers?
I want to get a range of bytes from the end of a log file and have been toying with the following code, but the server is throwing back a response saying that "it is a request that the server could not understand" (the server is Apache).
require 'net/http'
require 'uri'
#with #address, #port, #path all defined elsewhere
httpcall = Net::HTTP.new(#address, #port)
headers = {
'Range' => 'bytes=1000-'
}
resp, data = httpcall.get2(#path, headers)
Is there a better way to define headers in Ruby?
Does anyone know why this would be failing against Apache? If I do a get in a browser to http://[address]:[port]/[path] I get the data I am seeking without issue.
Created a solution that worked for me (worked very well) - this example getting a range offset:
require 'uri'
require 'net/http'
size = 1000 #the last offset (for the range header)
uri = URI("http://localhost:80/index.html")
http = Net::HTTP.new(uri.host, uri.port)
headers = {
'Range' => "bytes=#{size}-"
}
path = uri.path.empty? ? "/" : uri.path
#test to ensure that the request will be valid - first get the head
code = http.head(path, headers).code.to_i
if (code >= 200 && code < 300) then
#the data is available...
http.get(uri.path, headers) do |chunk|
#provided the data is good, print it...
print chunk unless chunk =~ />416.+Range/
end
end
If you have access to the server logs, try comparing the request from the browser with the one from Ruby and see if that tells you anything. If this isn't practical, fire up Webrick as a mock of the file server. Don't worry about the results, just compare the requests to see what they are doing differently.
As for Ruby style, you could move the headers inline, like so:
httpcall = Net::HTTP.new(#address, #port)
resp, data = httpcall.get2(#path, 'Range' => 'bytes=1000-')
Also, note that in Ruby 1.8+, what you are almost certainly running, Net::HTTP#get2 returns a single HTTPResponse object, not a resp, data pair.

Resources