Specify Character Encoding for Net::HTTP - ruby

When I make this HTTP request:
Net::HTTP.get_response('www.telize.com',"/geoip/190.88.39.27").body
=> "{\"timezone\":\"America\\/Curacao\",\"isp\":\"United Telecommunication Services (UTS)\",\"country\":\"Cura\xE7ao\",\"dma_code\":\"0\",\"region_code\":\"00\",\"area_code\":\"0\",\"ip\":\"190.88.39.27\",\"asn\":\"AS11081\",\"continent_code\":\"NA\",\"city\":\"Willemstad\",\"longitude\":-68.9167,\"latitude\":12.1,\"country_code\":\"CW\",\"country_code3\":\"CUW\"}\n"
It returns a JSON body, but notice the country: \"country\":\"Cura\xE7ao\". The response body should actually looks like this: "country":"Curaçao". It looks like Net::HTTP is assuming this is ASCII-8BIT:
Net::HTTP.get_response('www.telize.com',"/geoip/190.88.39.27").body.encoding
=> Encoding:ASCII-8BIT
but this can't be the case. How can I tell Net::HTTP which character encoding to use when making the request?

As the Tin Man determined, "\xE7" is the latin-1 encoding for LATIN SMALL LETTER C WITH CEDILLA, which as far as I can determine isn't a valid json encoding.
But...once you know the encoding, you can change it from ruby's ASCII-8BIT(which just means ruby considers the data to be binary, i.e. unencoded) to UTF-8, like this:
require 'net/http'
server_encoding = "ISO-8859-1"
resp = Net::HTTP.get_response('www.telize.com',"/geoip/190.88.39.27")
json = resp.body.force_encoding(server_encoding).encode("UTF-8")
puts json
--output:--
{"timezone":"America\/Curacao","isp":"United Telecommunication Services
UTS)","country":"Curaçao","dma_code":"0","region_code":"00","area_code":"0",
"ip":"190.88.39.27","asn":"AS11081","continent_code":"NA","city":"Willemstad",
"longitude":-68.9167,"latitude":12.1,"country_code":"CW","country_code3":"CUW"}
It looks like Net::HTTP is assuming this is ASCII-8BIT
Net::HTTP tags the data as binary/ASCII-8BIT, i.e. the data has no encoding, and leaves it to you to figure out how to interpret the data.

You can't tell a server what encoding to use, but you can ask it what it thinks the file's encoding is and then pass that Net::HTTP.
Look at the head method:
response = nil
Net::HTTP.start('www.telize.com',80) { |http|
response = http.head('/geoip/190.88.39.27')
}
response.each_header { |h| p "#{ h } => #{ response[h] }" }
Running that tells you the contents of the various headers:
"server => nginx"
"date => Thu, 12 Jun 2014 23:42:16 GMT"
"content-type => application/json; charset=iso-8859-1"
"connection => close"
The content-type value is what you want:
response['content-type'].split('=').last
# => "iso-8859-1"
Note that the server rarely does a consistency check to see whether the encoding it's told to use actually matches the file it's serving. This means the content you receive could vary wildly from what the server said it is, and, at that point, you're totally on your own to figure out what it really is, especially when the file has mixed encodings. Welcome to the wild and wooly internet.

Related

Get answer from Google Dictionary API using ruby win Portuguese and accented characters

I'm trying to get results from the Google Dictionary API with ruby. It works well with non accented characters but does not work with accented characters (i.e. if you type directly the URL into the address bar of the browser).
If you use the chrome browser you get good answers either with accents or no accents.
I already jumped over the problem of the URI parser not linking URLs with accents using the following code
require "addressable"
require "net/http"
begin
uri = Addressable::URI.convert_path('https://api.dictionaryapi.dev/api/v2/entries/pt-BR/há')
p uri
rescue => error
p error
end
response = Net::HTTP.get(uri)
p response
I get an empty response, while using the browser I get the correct response.
Can somebody suggest some workaround? What am I doing wrong?
I didn't dig deep inside addressable gem.
But here is a working example with URI and JSON:
require "net/http"
require "json"
begin
uri = URI.parse(URI.encode('https://api.dictionaryapi.dev/api/v2/entries/pt-BR/há'))
p uri
rescue => error
p error
end
response = Net::HTTP.get(uri)
p JSON.parse(response)
=>
[
{
"word"=>"ha",
"phonetics"=>[{}],
"meanings"=>[
{
"partOfSpeech"=>"undefined",
"definitions"=>[{"definition"=>"símb. de HECTARE.", "synonyms"=>[], "antonyms"=>[]}]}
]
},
{
"word"=>"hã",
"phonetics"=>[{}],
"origin"=>"⊙ ETIM voc.onom.",
"meanings"=>[
{
"partOfSpeech"=>"interjeição",
"definitions"=>[
{
"definition"=>"expressa reflexão, esclarecimento, admiração.", "synonyms"=>[], "antonyms"=>[]
}
]
}
]
}
]
Thx for all your answers so far but it seems that the problem is on the API (that is no longer maintained by Google).
What is happening in your last example with that the word 'hà' is transformed in 'ha' i.e the accent is removed and the semantics are lost.
I will try another way.

Proper way to upload a doc to FSCrawler for indexing in Elasticsearch

I'm prototyping a Rails application to upload documents to FSCrawler (running the REST interface), to incorporate into an Elasticsearch index. Using their example, this works:
response = `curl -F "file=##{params[:document][:upload].tempfile.path}" "http://127.0.0.1:8080/fscrawler/_upload?debug=true"`
The file gets uploaded, and the content gets indexed. This is an example of what I get:
"{\n \"ok\" : true,\n \"filename\" : \"RackMultipart20200130-91061-16swulg.pdf\",\n \"url\" : \"http://127.0.0.1:9200/local/_doc/d661edecf3e28572676e97a6f0d1d\",\n \"doc\" : {\n \"content\" : \"\\n \\n \\n\\nBasically, what you need to know is that Dante is all IP-based, and makes use of common IT standards. Each Dante device behaves \\n\\nmuch like any other network device you would already find on your network. \\n\\nIn order to make integration into an existing network easy, here are some of the things that Dante does: \\n\\n▪ Dante...
When I run curl at the command line, I get EVERYTHING, like the "filename" being properly set. If I use it as above, in the Rails controller, as you can see, the filename is set to the Tempfile's filename. That's not a workable solution. Trying to use params[:document][:upload].tempfile (without .path) or just params[:document][:upload] both fail entirely.
I'm trying to do this "the right way," but every incarnation of using a proper HTTP client to do this fails. I can't figure out how to invoke an HTTP POST that will submit a file to FSCrawler the way curl (on the command line) does it.
In this example, I'm just trying to send the file by using the Tempfile file object. For some reason, FSCrawler gives me the error in the comment, and get a little metadata, but no content is indexed:
## Failed to extract [100000] characters of text for ...
## org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
uri = URI("http://127.0.0.1:8080/fscrawler/_upload?debug=true")
request = Net::HTTP::Post.new(uri)
form_data = [['file', params[:document][:upload].tempfile,
{ filename: params[:document][:upload].original_filename,
content_type: params[:document][:upload].content_type }]]
request.set_form form_data, 'multipart/form-data'
response = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(request)
end
If I change the above to use params[:document][:upload].tempfile.path, then I don't get the error about the InputStream, but I also (still) do not get any content indexed. This is an example of what I get:
{"_index":"local","_type":"_doc","_id":"72c9ecf2a83440994eb87d28786e6","_version":3,"_seq_no":26,"_primary_term":1,"found":true,"_source":{"content":"/var/folders/bn/pcc1h8p16tl534pw__fdz2sw0000gn/T/RackMultipart20200130-91061-134tcxn.pdf\n","meta":{},"file":{"extension":"pdf","content_type":"text/plain; charset=ISO-8859-1","indexing_date":"2020-01-30T15:33:45.481+0000","filename":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf"},"path":{"virtual":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf","real":"Similarity in Postgres and Rails using Trigrams · pganalyze.pdf"}}}
If I try to use RestClient, and I try send the file by referencing the actual path to the Tempfile, then I get this error message, and I get nothing:
## Unsupported media type
response = RestClient.post 'http://127.0.0.1:8080/fscrawler/_upload?debug=true',
file: params[:document][:upload].tempfile.path,
content_type: params[:document][:upload].content_type
If I try to .read() the file, and submit that, then I break the FSCrawler form:
## Internal server error
request = RestClient::Request.new(
:method => :post,
:url => 'http://127.0.0.1:8080/fscrawler/_upload?debug=true',
:payload => {
:multipart => true,
:file => File.read(params[:document][:upload].tempfile),
:content_type => params[:document][:upload].content_type
})
response = request.execute
Obviously, I've been trying this every way I can, but I can't replicate whatever curl is doing with any known Ruby-based HTTP clients. I'm utterly lost as to how to get Ruby to submit data to FSCrawler in a way that will get the document contents indexed properly. I've been at this far longer than I care to admit. What am I missing here?
I finally tried Faraday, and, based on this answer, came up with the following:
connection = Faraday.new('http://127.0.0.1:8080') do |f|
f.request :multipart
f.request :url_encoded
f.adapter :net_http
end
file = Faraday::UploadIO.new(
params[:document][:upload].tempfile.path,
params[:document][:upload].content_type,
params[:document][:upload].original_filename
)
payload = { :file => file }
response = connection.post('/fscrawler/_upload', payload)
Using Fiddler helped me to see the results of my attempts, as I got closer and closer to the curl request. This snippet posts the request almost exactly as curl does. To route this call through the proxy, I just needed to add , proxy: 'http://localhost:8866' to the end of the connection setup.

troubles generating signature for alibaba cloud

Reading the HTTP API docs. My requests fail though for bad signature. From error message I can see that my string to sign is correct but looks like I can't generate the correct HMAC-SHA1 (seriously why use SHA1 still??).
So I decided to try replicate the signature of the sample inside same document.
[47] pry(main)> to_sign = "GET&%2F&AccessKeyId%3Dtestid&Action%3DDescribeRegions&Format%3DXML&SignatureMethod%3DHMAC-SHA1&SignatureNonce%3D3ee8c1b8-83d3-44af-a94f-4e0ad82fd6cf&SignatureVersion%3D1.0&Timestamp%3D2016-02-23T12%253A46%253A24Z&Version%3D2014-05-26"
[48] pry(main)> Base64.encode64 OpenSSL::HMAC.digest("sha1", "testsecret", to_sign)
=> "MLAxpXej4jJ7TL0smgWpOgynR7s=\n"
[49] pry(main)> Base64.encode64 OpenSSL::HMAC.digest("sha1", "testsecret&", to_sign)
=> "VyBL52idtt+oImX0NZC+2ngk15Q=\n"
[50] pry(main)> Base64.encode64 OpenSSL::HMAC.hexdigest("sha1", "testsecret&", to_sign)
=> "NTcyMDRiZTc2ODlkYjZkZmE4MjI2NWY0MzU5MGJlZGE3ODI0ZDc5NA==\n"
[51] pry(main)> Base64.encode64 OpenSSL::HMAC.hexdigest("sha1", "testsecret", to_sign)
=> "MzBiMDMxYTU3N2EzZTIzMjdiNGNiZDJjOWEwNWE5M2EwY2E3NDdiYg==\n"
[52] pry(main)> OpenSSL::HMAC.hexdigest("sha1", "testsecret&", to_sign)
=> "57204be7689db6dfa82265f43590beda7824d794"
[53] pry(main)> OpenSSL::HMAC.hexdigest("sha1", "testsecret", to_sign)
=> "30b031a577a3e2327b4cbd2c9a05a93a0ca747bb"
As evident none of these matches the example signature of CT9X0VtwR86fNWSnsc6v8YGOjuE=. Any idea what is missing here?
Update: taking tcpdump from the Golang client tool I see that it does a POST request like:
POST /?AccessKeyId=**********&Action=DescribeRegions&Format=JSON&RegionId=cn-qingdao&Signature=aHZVpIMb0%2BFKdoWSIVaFJ7bd2LA%3D&SignatureMethod=HMAC-SHA1&SignatureNonce=c29a0e28964c470a8997aebca4848b57&SignatureType=&SignatureVersion=1.0&Timestamp=2018-07-16T19%3A46%3A33Z&Version=2014-05-26 HTTP/1.1
Host: ecs.aliyuncs.com
User-Agent: Aliyun-CLI-V3.0.3
Content-Length: 0
Content-Type: application/x-www-form-urlencoded
x-sdk-client: golang/1.0.0
x-sdk-core-version: 0.0.1
x-sdk-invoke-type: common
Accept-Encoding: gzip
When I take parameters from the above request and generate signature it does match. So I tried all tree: GET, POST with URL params and POST with params in body. Every time I am getting a signature error. If I redo the request with exact same params as the golang tool, I'm getting nonce already used error (as expected).
Finally got this working. The main issue in my case was that I have been double-percent-encoding the signature parameter thus it turned out invalid. What helped me most was running the aliyun cli utility and capturing traffic, then running a query with exactly the same parameters to compare the exact query string.
But let me list some key points for me:
once hmac-sha1 sig is generated, do not percent-encode it, just add it to the query with normal form www encoding
order of parameters in the HTTP query is not significant; order of parameters in the signing string is significant though
I find all the following types of requests to work: GET, POST with parameters in URL query, POST with parameters in request body form www encoded; I'm using GET per documentation but I see aliyun using POST vs query params and ordered params in the query
you must add & character to the end of the secret key when generating HMAC-SHA1
generate HMAC-SHA1 in binary form, then encode as Base64 (no hex values)
some parameters might be case insensitive, e.g. Format works both as json and JSON
I see aliyun, #wanghq and John using UUID 4 for SignatureNonce but I deferred to plain random (according to docs) because it seems to be only a replay attack protection. So cryptographically secure random number must unnecessary.
The special encoding rules for +, * and ~ seem to only apply to string for signing, not actually to encode data in such a way in the HTTP query.
I decided to not use #wanghq's wrapper as it didn't work for me as well disables certificate validation but maybe it's going to be fixed. Just I thought that queries are simple enough once signature is figured out and an additional layer of indirection is not worth it. +1 to his answer though as it was helpful to get my signature right.
Here's example ruby code to make a simple request:
require 'base64'
require 'cgi'
require 'openssl'
require 'time'
require 'rest-client'
# perform a request against Alibaba Cloud API
# #see https://www.alibabacloud.com/help/doc-detail/25489.htm
def request(action:, params: {})
api_url = "https://ecs.aliyuncs.com/"
# method = "POST"
method = "GET"
process_params!(http: method, action: action, params: params)
RestClient::Request.new(method: method, url: api_url, headers: {params: params})
# RestClient::Request.new(method: method, url: api_url, payload: params)
# RestClient::Request.new(method: method, url: api_url, payload: params.map{|k,v| "#{k}=#{CGI.escape(v)}"}.join("&"))
end
# generates the required common params for a request and adds them to params
# #return undefined
# #see https://www.alibabacloud.com/help/doc-detail/25490.htm
def process_params!(http:, action:, params:)
params.merge!({
"Action" => action,
"AccessKeyId" => config[:auth][:key_id],
"Format" => "JSON",
"Version" => "2014-05-26",
"Timestamp" => Time.now.utc.iso8601
})
sign!(http: http, action: action, params: params)
end
# generate request signature and adds to params
# #return undefined
# #see https://www.alibabacloud.com/help/doc-detail/25492.htm
def sign!(http:, action:, params:)
params.delete "Signature"
params["SignatureMethod"] = "HMAC-SHA1"
params["SignatureVersion"] = "1.0"
params["SignatureNonce"] = "#{rand(1_000_000_000_000)}"
# params["SignatureNonce"] = SecureRandom.uuid.gsub("-", "")
canonicalized_query_string = params.sort.map { |key, value|
"#{key}=#{percent_encode value}"
}.join("&")
string_to_sign = %{#{http}&#{percent_encode("/")}&#{percent_encode(canonicalized_query_string)}}
params["Signature"] = hmac_sha1(string_to_sign)
end
# #param data [String]
# #return [String]
def hmac_sha1(data, secret: config[:auth][:key_secret])
Base64.encode64(OpenSSL::HMAC.digest('sha1', "#{secret}&", data)).strip
end
# encode strings per Alibaba cloud rules for signing
# #return [String] encoded string
# #see https://www.alibabacloud.com/help/doc-detail/25492.htm
def percent_encode(str)
CGI.escape(str).gsub(?+, "%20").gsub(?*, "%2A").gsub("%7E", ?~)
end
## example call
request(action: "DescribeRegions")
Code can be simplified a little but decided to keep it very close to documentation instructions.
P.S. not sure why John deleted his answer but leaving a link above to his web page for any python guys looking for example code
Seems this aliyun ruby sdk (non official, just for reference) works. You may want to check how it's implemented.
Check how its string_to_sign looks like. I did a run and seems it's slightly different than what you provided. The params are concatenated with & instead of %26.
GET&%2F&AccessKeyId%3Dtestid&Action%3DDescribeRegions&Format%3DXML&SignatureMethod%3DHMAC-SHA1&SignatureNonce%3D3ee8c1b8-83d3-44af-a94f-4e0ad82fd6cf&SignatureVersion%3D1.0&Timestamp%3D2016-02-23T12%253A46%253A24Z&Version%3D2014-05-26
require 'rubygems'
require 'aliyun'
$DEBUG = true
options = {
:access_key_id => "k",
:access_key_secret => "s",
:service => :ecs
}
service = Aliyun::Service.new options
puts service.DescribeRegions({})
wanted to share a library I found (Python) that does everything for me w/o the need to sign the request myself.
It can also help those who wants to just copy their functions and still construct the signature on their own
I'm using this:
from aliyunsdkcore.client import AcsClient
from aliyunsdkvpc.request.v20160428.DescribeEipAddressesRequest import DescribeEipAddressesRequest
client = AcsClient(access_key, secret_key, region)
request = DescribeEipAddressesRequest()
request.set_accept_format('json')
response = client.do_action_with_exception(request) # FYI returned as Bytes
print(response)
Each section in Alibaba Cloud has its own library (just like I used: aliyunsdkvpc for EIP addresses)
And they are all listed here:
https://develop.aliyun.com/tools/sdk?#/python

Trying to use open-uri in ruby, some HTML contents are coming in as "Loading..."

I am trying to create a program to compare a specific thing on a webpage, and then compare it another time, I'm currently working on getting the piece of information that will change. But, the text that would change appears if I inspect element in the page, but not if I use open-uri, it comes in as "Loading..." (see picture), is there a way to get all the HTML text?
Picture here.
This is the current code I have
contents = open('https://www.cargurus.com/Cars/l-Used-Mazda-MAZDASPEED6-d841', &:read)
File.open("testing.txt", "w") do |line|
line.puts "\r" + "#{contents}"
end
Any help to get the Loading... to change to the actual HTML code would be amazing.
Thanks
The problem
So, open uri just makes HTTP requests and gives you access to the body. In this case, the body is html. That html has a placeholder for this data, which is what you're seeing. Then that html says to load up some javascript that will make another request to the server to get the data, and when the data comes in, it will replace the placeholder with the real data. So, to handle this, you ultimately need whatever is coming back from that request the javascript is making.
Three solutions
Ordered from my least favourite to my most favourite.
You can try to evaluate the JavaScript to have it operate on the html. This is going to be painful, so I don't recommend it, but if you wanted to go down that path, I think there's a gem called "the ruby racer" or something (IIRC, it wraps v8).
You can launch a web browser, let the browser handle all the cray cray, and then ask the browser for the html after it's been updated. This is what Rahul's solution does, and it's a really nice solution. It's not my favourite because it's pretty heavy and you're relegated to information displayed in the html. This is called "scraping", and it's pretty fragile (some designer moves something around the page and your script breaks), and the information is in human presentation format, which means you usually have to do a lot of little parsing things.
You can open your browser's devtools, go to the network tab, filter to the XHR requests, and reload the page. One of these made the request to get the data that was used to fill in the place holder. Figure out which one it is and then you can make that request yourself. There's ways this can be fragile, too, eg sometimes you have to have the right cookies, and you often have to experiment with what the browser sent to figure out how much of it you need (usually it's way less than was sent, which is true for your case). Protip: When you do this, separate requesting the data from parsing and exploring it (ie save it to a file and then, while looking through the data, get it from the file rather than making a new request every time... this way it won't change on you and you won't get rate limited)
Solution #3
So, I was curious and went ahead and tried solution number 3 myself, and it worked pretty admirably, check it out:
require 'uri'
require 'net/http'
# build a post request to the URL that the page got the data from
uri = URI 'https://www.cargurus.com/Cars/inventorylisting/ajaxFetchSubsetInventoryListing.action?sourceContext=untrackedExternal_true_0'
req = Net::HTTP::Post.new(uri)
# set some headers
req['origin'] = 'https://www.cargurus.com' # for cross origin requests
req['cache-control'] = 'no-cache' # no caching, just in case,
req['pragma'] = 'no-cache' # we prob don't want stale data
# looks like you can pass it an awful lot of filters to use
req.set_form_data(
"page"=>"1", "zip"=>"", "address"=>"", "latitude"=>"", "longitude"=>"",
"distance"=>"100", "selectedEntity"=>"d841", "transmission"=>"ANY",
"entitySelectingHelper.selectedEntity2"=>"", "minPrice"=>"", "maxPrice"=>"",
"minMileage"=>"", "maxMileage"=>"", "bodyTypeGroup"=>"", "serviceProvider"=>"",
"filterBySourcesString"=>"", "filterFeaturedBySourcesString"=>"",
"displayFeaturedListings"=>"true", "searchSeoPageType"=>"",
"inventorySearchWidgetType"=>"AUTO", "allYearsForTrimName"=>"false",
"daysOnMarketMin"=>"", "daysOnMarketMax"=>"", "vehicleDamageCategoriesRaw"=>"",
"minCo2Emission"=>"", "maxCo2Emission"=>"", "vatOnly"=>"false",
"minEngineDisplacement"=>"", "maxEngineDisplacement"=>"", "minMpg"=>"",
"maxMpg"=>"", "minEnginePower"=>"", "maxEnginePower"=>"", "isRecentSearchView"=>"false"
)
# make the request (200 means it worked)
res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request req }
res.code # => "200"
# parse the response
require 'json'
json = JSON.parse res.body
# we're on page 1 of 1, and there are 48 results on this page
json['page'] # => 1
json['listings'].size # => 48
json['remainingResults'] # => false
# apparently we're looking at some sort of car or smth
json['modelId'] # => "d841"
json['modelName'] # => "Mazda MAZDASPEED6"
# a bunch of places sell this car
json['sellers'].size # => 47
json['sellers'][0]['location'] # => "Portland OR, 97217"
# the first of our 48 cars seems to be a deal
listing = json['listings'][0]
listing['mainPictureUrl'] # => "https://static.cargurus.com/images/forsale/2018/05/24/02/58/2006_mazda_mazdaspeed6-pic-61663369386257285-152x114.jpeg"
listing['expectedPriceString'] # => "$8,972"
listing['priceString'] # => "$6,890"
listing['daysOnMarket'] # => 61
listing['savingsRecommendation'] # => "Good Deal"
listing['carYear'] # => 2006
listing['mileageString'] # => "81,803"
# none of the 48 are salvaged or lemons
json['listings'].count { |l| l['lemon'] } # => 0
json['listings'].count { |l| l['salvage'] } # => 0
# the savings recommendations seem reasonably distributed
json['listings'].group_by { |l| l["savingsRecommendation"] }.map { |rec, ls| [rec, ls.size] }
# => [["Good Deal", 4],
# ["Fair Deal", 11],
# ["No Price Analysis", 23],
# ["High Price", 8],
# ["Overpriced", 2]]
Your web page contains ajax request and open-uri only returns server-side page, it not wait for ajax request
You can use the below code which waits for page loading
#load the libraries
require 'watir'
browser = Watir::Browser.new
browser.goto "https://www.cargurus.com/Cars/l-Used-Mazda-MAZDASPEED6-d841"
# giving some time for website to load
sleep 2
puts browser.html
NOTE: you need chromedriver to use the script http://chromedriver.chromium.org/downloads
if you don't want to open url in browser then you can use headless-WebKit

why is content_length in Net::HTTP.get_response sometimes nil even on good results?

I have the following ruby code (was trying to write a simple http-ping)
require 'net/http'
res1 = Net::HTTP.get_response 'www.google.com' , '/'
res2 = Net::HTTP.get_response 'www.google.com' , '/search?q=abc'
res1.code #200
res2.code #200
res1.content_length #5213
res2.content_length #nil **<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< WHY**
res2.body[0..60]
=> "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org"
Why does res2 content_length does not show through? Is it in some other attribute of res2 (how does one see those?)
I am a newcomer at ruby. Using irb 0.9.6 on AWS Linux
Thanks a lot.
It appears that the value returned is not necessarily the length of the body, but the fixed length of the content, when that fixed length is known in advance and stored in the content-length header.
See the source for the implementation of HTTPHeader#content_length (taken from http://ruby-doc.org/stdlib-2.3.1/libdoc/net/http/rdoc/Net/HTTPHeader.html):
# File net/http/header.rb, line 262
def content_length
return nil unless key?('Content-Length')
len = self['Content-Length'].slice(/\d+/) or
raise Net::HTTPHeaderSyntaxError, 'wrong Content-Length format'
len.to_i
end
What this probably means in this case is that the response was a multi-part MIME response, and the content-length header is not used in this case.
What you most likely want in this case is body.length, since that's the only real way to tell the actual length of the response body for a multi-part response.
Note that may be performance implications by always using content.body to find the content length; you may choose to try the content_length approach first and if it's nil, fall back to body.length.
Here's an example modification to your code:
require 'net/http'
res1 = Net::HTTP.get_response 'www.google.com' , '/'
res2 = Net::HTTP.get_response 'www.google.com' , '/search?q=abc'
res1.code #200
res2.code #200
res1.content_length #5213
res2.content_length.nil? ? res2.body.length : res2.content_length #57315 **<<<<<<<<<<<<<<< Works now **
res2.body[0..60]
=> "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org"
or, better yet, capture the content_length and use the captured value for comparison:
res2_content_length = res2.content_length
if res2_content_length.nil?
res2_content_length = res2.body.length
end
Personally, I'd just stick with always checking body.length and deal with any potential performance issue if and when it arises.
This should reliably retrieve the actual length of the content for you, regardless of whether you received a simple response of a multi-part response.

Resources