So Im trying to download image but server response is "403". I have tried using other user-agents if it has any sense but it doesn't work. With other links code works well. I don't know how i can circumvent server or smth.
import requests
import shutil
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}
r = requests.get('https://s8.mkklcdnv6temp.com/mangakakalot/b1/bd926355/chapter_3/1.jpg',
headers=headers,
stream=True)
if r.status_code == 200:
with open("img.png", 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
else:
print('failure')
Trying to send a request to here
using requests-html.
Here is my code:
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
session = HTMLSession()
while True:
try:
r = session.get("https://www.size.co.uk/product/white-fila-v94m-low/119095/",headers=headers,timeout=40)
r.html.render()
print(r.html.text)
except Exception as e:
print(e)
Here is the error I am receiving:
HTTPSConnectionPool(host='www.size.co.uk', port=443): Read timed out. (read timeout=40)
I thought setting the user agent would fix the problem, but I am still receiving the error? Increasing the timeout hasn't done the trick either
You can do this with Async
from requests_html import AsyncHTMLSession
s = AsyncHTMLSession()
async def main():
r = await s.get('https://www.size.co.uk/product/white-fila-v94m-low/119095/')
await r.html.arender()
print(r.content)
s.run(main)
I tried to do a http get for the website http://www.shopyourway.com using ruby Net::HTTP.get, but I got an error with code 512. And I tried to do a get with ssl for url "https://www.shopyourway.com". It just followed a redirection to the url without ssl.
code is as below:
uri = URI('https ://www.shopyourway.com') #space between https and : does not exist
body = Net::HTTP.get(uri)
I can browse the url using browser. But why I can't do a http get for that url?
Thanks
finally get this to works, need to add couple headers to the Get request.
uri = URI('http://www.shopyourway.com/today')
req = Net::HTTP::Get.new(uri)
req['Upgrade-Insecure-Requests'] = '1'
req['Connection'] = 'keep-alive'
req['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
I'm trying to make a call to a Tika server using Net::HTTP::Put. The issue is that the call always passes the Content-Type, which keeps Tika from running the detectors (which I want) and then chokes due to the default Content-Type of application/x-www-form-urlencoded. Tika docs suggest to not use that.
So, I have the following:
require 'net/http'
port = 9998
host = "localhost"
path = "/meta"
req = Net::HTTP::Put.new(path)
req.body_stream = File.open(file_name)
req['Transfer-Encoding'] = 'chunked'
req['Accept'] = 'application/json'
response = Net::HTTP.new(host, port).start { |http|
http.request(req)
}
I tried adding req.delete('content-type') and setting initheaders = {} in various ways, but the default content-type keeps getting sent.
Any insights would be greatly appreciated, since I would rather avoid having to make multiple curl calls ... is there any way to suppress the sending of that default header?
If you set req['Content-Type'] = nil then Net::HTTP will set it to the default of 'application/x-www-form-urlencoded', but if you set it to a blank string Net::HTTP leaves it alone:
req['Content-Type'] = ''
Tika should see that as an invalid type and enable the detectors.
It seems that Tika will run the detectors if the Content-Type is application/octet-stream. Adding
req.content_type = "application/octet-stream"
is now allowing me to get results.
I'm trying to do some digest authorization to a server, then parse the resulting HTML with nokogiri. I'm using the net-http-digest_auth gem (https://github.com/drbrain/net-http-digest_auth) to do the url connection. All is fine up until I start the digest_auth code (line 20); it throws an 'unknown algorithm ""MD5"" error'..
The full error message from the console:
~/.rvm/gems/ruby-1.9.3-p194#rails32/gems/net-http-digest_auth-1.2.1/lib/net/http/digest_auth.rb:105:in 'auth_header': unknown algorithm ""MD5"" (Net::HTTP::DigestAuth::Error)
from ./server_connection.rb:20:in '<main>'
Line 20 is the auth line:
auth = digest_auth.auth_header uri, res['www-authenticate'], 'GET'
Here's my complete code (almost completely verbatim from the sample code used at the github link):
#!/usr/bin/env ruby
require 'uri'
require 'net/http'
require 'net/http/digest_auth'
digest_auth = Net::HTTP::DigestAuth.new
uri = URI.parse 'http://url/controlpage?name=_internal_variables_&asList=1&useJS=True'
uri.user = 'username'
uri.password = 'password'
h = Net::HTTP.new uri.host, uri.port
req = Net::HTTP::Get.new uri.request_uri
res = h.request req
# res is a 401 response with a WWW-Authenticate header
auth = digest_auth.auth_header uri, res['www-authenticate'], 'GET'
# create a new request with the Authorization header
req = Net::HTTP::Get.new uri.request_uri
req.add_field 'Authorization', auth
# re-issue request with Authorization
res = h.request req
if res.code == "200"
page = Nokogiri::HTML(res)
isDaylight = page.css('.controlTitle:contains("isDaylight") ~ .controlValue');
puts isDaylight.content
end
Updated this question to include the request headers via Chrome's dev tools:
GET /_getupdatedcontrols?name=_internal_variables_&asList=True&folderFilter=0&changeCount=479&serverState=idle HTTP/1.1
Host: url
Connection: keep-alive
Cache-Control: no-cache
Authorization: Digest username="username", realm="Indigo Control Server", nonce="71079e9f29f7210325ae451d0f423f07", uri="/_getupdatedcontrols?name=_internal_variables_&asList=True&folderFilter=0&changeCount=479&serverState=idle", algorithm=MD5, response="bc056cc472d35f7967973cb51c5b1a65", qop=auth, nc=00005649, cnonce="18dfcf3e4a7b809d"
X-Indigo-Web-Server-Version: 1
X-Prototype-Version: 1.6.0.3
X-Requested-With: XMLHttpRequest
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.17 Safari/536.11
Accept: text/javascript, text/html, application/xml, text/xml, */*
Referer: http://url/controlpage?name=_internal_variables_&asList=1&useJS=True
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
I ended up using the httpclient gem to accomplish the same thing.
The final code to do exactly what i was after:
#!/usr/bin/env ruby
require 'httpclient'
require 'nokogiri'
c = HTTPClient.new
c.debug_dev = STDOUT
c.set_auth("http://domain.com", "username", "password")
doc = Nokogiri::HTML(c.get_content("http://domain.com"))
isDaylight = "";
doc.css('.controlTitle:contains("isDaylight") ~ .controlValue').each do |var|
isDaylight = var.content
end
if (!isDaylight)
system("curl -X PUT --digest -u username:password -d isOn=1 http://domain.com")
else
system("curl -X PUT --digest -u username:password -d isOn=0 http://domain.com")
end
I hope this helps others that may be working with a home automation server and needing to easily do digest-based authentication.
Seth,
I ran into this same issue while working on a script in ruby. I am new to ruby but after a few google searches and some Charles Proxy showing me what was going on, I see that it is common for HTTP implementations to include quotes in the algorithm="MD5" portion of the Auth header, which is incorrect according to spec (it should be algorithm=MD5, with out quotes). Your updated header logs from Chrome devtools shows that your server response is honoring the spec, but the ruby library is NOT when it interprets that response string. This can be seen by
You server's 401 response included:
note the algorithm=MD5
Authorization: Digest username="username", realm="Indigo Control Server", nonce="71079e9f29f7210325ae451d0f423f07", uri="/_getupdatedcontrols?name=_internal_variables_&asList=True&folderFilter=0&changeCount=479&serverState=idle", algorithm=MD5, response="bc056cc472d35f7967973cb51c5b1a65", qop=auth, nc=00005649, cnonce="18dfcf3e4a7b809d"
But the console output of the initial request using this Ruby library shows:
note the algorithm=\"MD5\"
<- "GET /some/request HTTP/1.1\r\nAccept: */*\r\nUser-Agent: Ruby\r\nConnection: close\r\nHost: 10.1.0.15\r\n\r\n"
-> "HTTP/1.1 401 Unauthorized\r\n"
-> "Content-Length: 530\r\n"
-> "Server: SomeServer/5.0\r\n"
-> "Allow: GET, HEAD, POST, PUT\r\n"
-> "Date: Sun, 27 Jan 2013 00:29:23 GMT\r\n"
-> "Content-Type: text/html;charset=utf-8\r\n"
-> "Www-Authenticate: Digest realm=\"Some Realm\", nonce=\"5a8b8b46cfb84466431baf454eb9ddb9\", algorithm=\"MD5\", qop=\"auth\"\r\n"
For the script example in the original post, I would insert the following two lines:
www_auth_response = res['www-authenticate']
www_auth_response["algorithm=\"MD5\""] = "algorithm=MD5"
And Modify the third line:
auth = digest_auth.auth_header uri, www_auth_response, 'GET'
As follows:
...
res = h.request req
# res is a 401 response with a WWW-Authenticate header
www_auth_response = res['www-authenticate']
www_auth_response["algorithm=\"MD5\""] = "algorithm=MD5"
auth = digest_auth.auth_header uri, www_auth_response, 'GET'
# create a new request with the Authorization header
req = Net::HTTP::Get.new uri.request_uri
req.add_field 'Authorization', auth
...
The important thing that is going on here is that we are modifying the www-authenticate string that is coming back from your initial unauthorized 401 request (as interpreted by this ruby library). Sending the modified header string (www_auth_response) to the digest_auth.auth_header method produces no errors. At least that worked for me in my script!
I hope that helps!
Matt