How could I get two urls by the same request in Ruby? - ruby

I want to get the contents of 'a.com/a.html' and 'a.com/b.html' with the same request
And my code is
uri = URI.parse("http://www.sample.com/sample1.html")
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
# request.initialize_http_header({"User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36"})
result = http.request(request).body
should i change the path of the request?or any other idea?

You can't fetch multiple resources at once, but you can reuse a HTTP connection to fetch multiple resources from the same server (one after another):
require 'net/http'
Net::HTTP.start('a.com') do |http|
result_a = http.get('/a.html').body
result_b = http.get('/b.html').body
end
From the docs:
::start immediately creates a connection to an HTTP server which is kept open for the duration of the block. The connection will remain open for multiple requests in the block if the server indicates it supports persistent connections.

Related

Downloading image from url using python-requests recieving error 403:Forbidden

So Im trying to download image but server response is "403". I have tried using other user-agents if it has any sense but it doesn't work. With other links code works well. I don't know how i can circumvent server or smth.
import requests
import shutil
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
r = requests.get('https://s8.mkklcdnv6temp.com/mangakakalot/b1/bd926355/chapter_3/1.jpg',
headers=headers,
stream=True)
if r.status_code == 200:
with open("img.png", 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
else:
print('failure')

requests-html HTTPSConnectionPoolRead timed out

Trying to send a request to here
using requests-html.
Here is my code:
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
session = HTMLSession()
while True:
try:
r = session.get("https://www.size.co.uk/product/white-fila-v94m-low/119095/",headers=headers,timeout=40)
r.html.render()
print(r.html.text)
except Exception as e:
print(e)
Here is the error I am receiving:
HTTPSConnectionPool(host='www.size.co.uk', port=443): Read timed out. (read timeout=40)
I thought setting the user agent would fix the problem, but I am still receiving the error? Increasing the timeout hasn't done the trick either
You can do this with Async
from requests_html import AsyncHTMLSession
s = AsyncHTMLSession()
async def main():
r = await s.get('https://www.size.co.uk/product/white-fila-v94m-low/119095/')
await r.html.arender()
print(r.content)
s.run(main)

Trying to http get "www.shopyourway.com" Fail

I tried to do a http get for the website http://www.shopyourway.com using ruby Net::HTTP.get, but I got an error with code 512. And I tried to do a get with ssl for url "https://www.shopyourway.com". It just followed a redirection to the url without ssl.
code is as below:
uri = URI('https ://www.shopyourway.com') #space between https and : does not exist
body = Net::HTTP.get(uri)
I can browse the url using browser. But why I can't do a http get for that url?
Thanks
finally get this to works, need to add couple headers to the Get request.
uri = URI('http://www.shopyourway.com/today')
req = Net::HTTP::Get.new(uri)
req['Upgrade-Insecure-Requests'] = '1'
req['Connection'] = 'keep-alive'
req['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'

ruby and net/http request without content-type

I'm trying to make a call to a Tika server using Net::HTTP::Put. The issue is that the call always passes the Content-Type, which keeps Tika from running the detectors (which I want) and then chokes due to the default Content-Type of application/x-www-form-urlencoded. Tika docs suggest to not use that.
So, I have the following:
require 'net/http'
port = 9998
host = "localhost"
path = "/meta"
req = Net::HTTP::Put.new(path)
req.body_stream = File.open(file_name)
req['Transfer-Encoding'] = 'chunked'
req['Accept'] = 'application/json'
response = Net::HTTP.new(host, port).start { |http|
http.request(req)
}
I tried adding req.delete('content-type') and setting initheaders = {} in various ways, but the default content-type keeps getting sent.
Any insights would be greatly appreciated, since I would rather avoid having to make multiple curl calls ... is there any way to suppress the sending of that default header?
If you set req['Content-Type'] = nil then Net::HTTP will set it to the default of 'application/x-www-form-urlencoded', but if you set it to a blank string Net::HTTP leaves it alone:
req['Content-Type'] = ''
Tika should see that as an invalid type and enable the detectors.
It seems that Tika will run the detectors if the Content-Type is application/octet-stream. Adding
req.content_type = "application/octet-stream"
is now allowing me to get results.

Unknown algorithm MD5 using net-http-digest_auth

I'm trying to do some digest authorization to a server, then parse the resulting HTML with nokogiri. I'm using the net-http-digest_auth gem (https://github.com/drbrain/net-http-digest_auth) to do the url connection. All is fine up until I start the digest_auth code (line 20); it throws an 'unknown algorithm ""MD5"" error'..
The full error message from the console:
~/.rvm/gems/ruby-1.9.3-p194#rails32/gems/net-http-digest_auth-1.2.1/lib/net/http/digest_auth.rb:105:in 'auth_header': unknown algorithm ""MD5"" (Net::HTTP::DigestAuth::Error)
from ./server_connection.rb:20:in '<main>'
Line 20 is the auth line:
auth = digest_auth.auth_header uri, res['www-authenticate'], 'GET'
Here's my complete code (almost completely verbatim from the sample code used at the github link):
#!/usr/bin/env ruby
require 'uri'
require 'net/http'
require 'net/http/digest_auth'
digest_auth = Net::HTTP::DigestAuth.new
uri = URI.parse 'http://url/controlpage?name=_internal_variables_&asList=1&useJS=True'
uri.user = 'username'
uri.password = 'password'
h = Net::HTTP.new uri.host, uri.port
req = Net::HTTP::Get.new uri.request_uri
res = h.request req
# res is a 401 response with a WWW-Authenticate header
auth = digest_auth.auth_header uri, res['www-authenticate'], 'GET'
# create a new request with the Authorization header
req = Net::HTTP::Get.new uri.request_uri
req.add_field 'Authorization', auth
# re-issue request with Authorization
res = h.request req
if res.code == "200"
page = Nokogiri::HTML(res)
isDaylight = page.css('.controlTitle:contains("isDaylight") ~ .controlValue');
puts isDaylight.content
end
Updated this question to include the request headers via Chrome's dev tools:
GET /_getupdatedcontrols?name=_internal_variables_&asList=True&folderFilter=0&changeCount=479&serverState=idle HTTP/1.1
Host: url
Connection: keep-alive
Cache-Control: no-cache
Authorization: Digest username="username", realm="Indigo Control Server", nonce="71079e9f29f7210325ae451d0f423f07", uri="/_getupdatedcontrols?name=_internal_variables_&asList=True&folderFilter=0&changeCount=479&serverState=idle", algorithm=MD5, response="bc056cc472d35f7967973cb51c5b1a65", qop=auth, nc=00005649, cnonce="18dfcf3e4a7b809d"
X-Indigo-Web-Server-Version: 1
X-Prototype-Version: 1.6.0.3
X-Requested-With: XMLHttpRequest
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.17 Safari/536.11
Accept: text/javascript, text/html, application/xml, text/xml, */*
Referer: http://url/controlpage?name=_internal_variables_&asList=1&useJS=True
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
I ended up using the httpclient gem to accomplish the same thing.
The final code to do exactly what i was after:
#!/usr/bin/env ruby
require 'httpclient'
require 'nokogiri'
c = HTTPClient.new
c.debug_dev = STDOUT
c.set_auth("http://domain.com", "username", "password")
doc = Nokogiri::HTML(c.get_content("http://domain.com"))
isDaylight = "";
doc.css('.controlTitle:contains("isDaylight") ~ .controlValue').each do |var|
isDaylight = var.content
end
if (!isDaylight)
system("curl -X PUT --digest -u username:password -d isOn=1 http://domain.com")
else
system("curl -X PUT --digest -u username:password -d isOn=0 http://domain.com")
end
I hope this helps others that may be working with a home automation server and needing to easily do digest-based authentication.
Seth,
I ran into this same issue while working on a script in ruby. I am new to ruby but after a few google searches and some Charles Proxy showing me what was going on, I see that it is common for HTTP implementations to include quotes in the algorithm="MD5" portion of the Auth header, which is incorrect according to spec (it should be algorithm=MD5, with out quotes). Your updated header logs from Chrome devtools shows that your server response is honoring the spec, but the ruby library is NOT when it interprets that response string. This can be seen by
You server's 401 response included:
note the algorithm=MD5
Authorization: Digest username="username", realm="Indigo Control Server", nonce="71079e9f29f7210325ae451d0f423f07", uri="/_getupdatedcontrols?name=_internal_variables_&asList=True&folderFilter=0&changeCount=479&serverState=idle", algorithm=MD5, response="bc056cc472d35f7967973cb51c5b1a65", qop=auth, nc=00005649, cnonce="18dfcf3e4a7b809d"
But the console output of the initial request using this Ruby library shows:
note the algorithm=\"MD5\"
<- "GET /some/request HTTP/1.1\r\nAccept: */*\r\nUser-Agent: Ruby\r\nConnection: close\r\nHost: 10.1.0.15\r\n\r\n"
-> "HTTP/1.1 401 Unauthorized\r\n"
-> "Content-Length: 530\r\n"
-> "Server: SomeServer/5.0\r\n"
-> "Allow: GET, HEAD, POST, PUT\r\n"
-> "Date: Sun, 27 Jan 2013 00:29:23 GMT\r\n"
-> "Content-Type: text/html;charset=utf-8\r\n"
-> "Www-Authenticate: Digest realm=\"Some Realm\", nonce=\"5a8b8b46cfb84466431baf454eb9ddb9\", algorithm=\"MD5\", qop=\"auth\"\r\n"
For the script example in the original post, I would insert the following two lines:
www_auth_response = res['www-authenticate']
www_auth_response["algorithm=\"MD5\""] = "algorithm=MD5"
And Modify the third line:
auth = digest_auth.auth_header uri, www_auth_response, 'GET'
As follows:
...
res = h.request req
# res is a 401 response with a WWW-Authenticate header
www_auth_response = res['www-authenticate']
www_auth_response["algorithm=\"MD5\""] = "algorithm=MD5"
auth = digest_auth.auth_header uri, www_auth_response, 'GET'
# create a new request with the Authorization header
req = Net::HTTP::Get.new uri.request_uri
req.add_field 'Authorization', auth
...
The important thing that is going on here is that we are modifying the www-authenticate string that is coming back from your initial unauthorized 401 request (as interpreted by this ruby library). Sending the modified header string (www_auth_response) to the digest_auth.auth_header method produces no errors. At least that worked for me in my script!
I hope that helps!
Matt

Resources