Mechanize and NTLM Authentication - ruby

The following code generates a 401 => Net::HTTPUnauthorized error.
From the log:
response-header: x-powered-by => ASP.NET
response-header: content-type => text/html
response-header: www-authenticate => Negotiate, NTLM
response-header: date => Mon, 02 Aug 2010 19:48:17 GMT
response-header: server => Microsoft-IIS/6.0
response-header: content-length => 1539
status: 401
The Script is as follows:
require 'rubygems'
require 'mechanize'
require 'logger'
agent = WWW::Mechanize.new { |a| a.log = Logger.new("mech.log") }
agent.user_agent_alias = 'Windows IE 7'
agent.basic_auth("username","password")
page = agent.get("http://server/loginPage.asp")
I believe the reason for the 401 is that I need to be authenticating using NTLM, but I have been unable to find a good example of how to do this.

agent.add_auth('http://server', 'username', 'password', nil, 'domain.name')
http://mechanize.rubyforge.org/Mechanize.html
tested:
Windows Server 2012 R2 + IIS 8.5
Ruby 1.9.3

Mechanize 2 supports NTLM auth:
m = Mechanize.new
m.agent.username = 'user'
m.agent.password = 'password'
m.agent.domain = 'addomain'

Related

How to use Typhoeus::Request object using https

I'm trying to make an https request using the Typhoeus::Request object and i don't get it working.
The code i'm running is something like this:
url = "https://some.server.com/"
req_opts = {
:method => :get,
:headers => {
"Content-Type"=>"application/json",
"Accept"=>"application/json"
},
:params=>{},
:params_encoding=>nil,
:timeout=>0,
:ssl_verifypeer=>true,
:ssl_verifyhost=>2,
:sslcert=>nil,
:sslkey=>nil,
:verbose=>true
}
request = Typhoeus::Request.new(url, req_opts)
response = request.run
The response i'm getting is this:
HTTP/1.1 302 Found
Location: https://some.server.com:443/
Date: Sat, 27 Apr 2019 02:25:05 GMT
Content-Length: 5
Content-Type: text/plain; charset=utf-8
Why is this happening?
Well it's hard to know because your example is not a reachable url. But 2 things I see is that you are not passing an ssl cert or key. But also 302 indicates a redirect. You can try to follow redirection but your first problem is probably you don't need to set SSL options, why are you?
See if you try the following options:
req_opts = {
:method => :get,
:headers => {
"Content-Type"=>"application/json",
"Accept"=>"application/json"
},
:params=>{},
:params_encoding=>nil,
:timeout=>0,
:followlocation => true,
:ssl_verifypeer=>false,
:ssl_verifyhost=>0,
:verbose=>true
}
See the following sections for more info
https://github.com/typhoeus/typhoeus#following-redirections
https://github.com/typhoeus/typhoeus#ssl

Set proper header for crawler to prevent cached html

Hello everyone i am building a small web crawler that fetch news from some websites.
I am using Typhoeus.
My code is like this:
request = Typhoeus::Request.new(url, timeout: 60)
request.on_complete do |response|
doc = Nokogiri::HTML(response.body)
root_url = source.website.url
links = doc.css(css_selectors).take(20)
end
hydra.queue(request)
hydra.run
The problem is some websites requests return a chached old versions of the page. i tried setting the headers and included "Cache-Control" => 'no-cache' but that didn't help!
Any help will be appreciated.
The same things happens when using open-uri.
one of the website's reponse header:
{"Server"=>"nginx/1.10.2", "Date"=>"Sat, 07 Jan 2017 12:43:54 GMT", "Content-Type"=>"text/html; charset=utf-8", "Transfer-Encoding"=>"chunked", "Connection"=>"keep-alive", "X-Drupal-Cache"=>"MISS", "X-Content-Type-Options"=>"nosniff", "Etag"=>"\"1483786108-1\"", "Content-Language"=>"ar", "Link"=>"</taxonomy/term/1>; rel=\"shortlink\",</Actualit%C3%A9s>; rel=\"canonical\"", "X-Generator"=>"Drupal 7 (http://drupal.org)", "Cache-Control"=>"public, max-age=0", "Expires"=>"Sun, 19 Nov 1978 05:00:00 GMT", "Vary"=>"Cookie,Accept-Encoding", "Last-Modified"=>"Sat, 07 Jan 2017 10:48:28 GMT", "X-Cacheable"=>"YES", "X-Served-From-Cache"=>"Yes"}
This should work:
"Cache-Control" => 'no-cache, no-store, must-revalidate'
"Pragma" => 'no-cache'
"Expires" => '0'

Booking.com login with Mechanize

I try to log into Booking.com using Mechanize at this URL : https://admin.booking.com/hotel/hoteladmin/
So far, it's impossible for me to pass the login process. I am afraid that they use a javascript function during the send of the form to set the csrf_token. Here is the code I use :
login_url = "https://admin.booking.com/hotel/hoteladmin"
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.verify_mode= OpenSSL::SSL::VERIFY_NONE
# Get the login page
page = agent.get(login_url)
form = page.form_with(:name => 'myform')
form.loginname = my_username
form.password = my_password
form.add_field!("csrf_token", "empty-token")
# Submit the form
page = form.submit( form.button_with(:name => "Login") )
When I load the page on my browser I get :
var token = '..................EXTRA-LONG-TOKEN..................' || 'empty-token',
But when I check it using Mechanize I get :
var token = '' || 'empty-token',
Please find the complete page body using Mechanize here.
So they use the javascript to set this variable in a new field created when we submit the form ?
if (
form &&
form.method &&
form.method.toLowerCase() === 'post' &&
typeof form.elements.csrf_token === 'undefined'
) {
input = doc.createElement( 'input' );
input.name = 'csrf_token';
input.type = 'hidden';
input.value = token;
form.appendChild( input );
}
I also try to take a look at Firebug in the Network tab without success. When we submit the form there is this sequence :
302 - POST - login.html
302 - GET - https://admin.booking.com/hotel/hoteladmin/index-hotel.html?page=&lang=xu&ses=89abb0da735818bc6252d69ece255276&t=1429195712.93074
302 - GET - https://admin.booking.com/hotel/hoteladmin/extranet_ng/manage/index.html?lang=xu&ses=89abb0da735818bc6252d69ece255276&hotel_id=XXXXXX&t=1429195713.11779
200 - GET - /home.html
When I check on the POST request I can see in "Data of the request" :
Content-Type: application/x-www-form-urlencoded
Content-Length: 95
ses=e7541870781128880d7c61aa1e4cc357&loginname=my_login&password=my_password&lang=xu&login=Login+
So, I don't know if the csrf_token from above is used or not and if it is I don't know where. And I don't know if it is the csrf_token that blocks me from logging in.
Here is the request/response headers from my browser for a success login :
---------- Request ----------
Host: admin.booking.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: https://admin.booking.com/hotel/hoteladmin/login.html
Cookie: cwd-extranet=1; ecid=RtSy3w%2Fk5BG5Z67OY8E2rQZz; slan=xu; auth_token=569054884; ut=e; _ga=GA1.2.357900853.1429171802
Connection: keep-alive
---------- Response ----------
Connection: keep-alive
Content-Type: text/html; charset=UTF-8
Date: Thu, 16 Apr 2015 14:57:24 GMT
Location: /hotel/hoteladmin/index-hotel.html?page=&lang=xu&ses=8df70f6f7699cf5c5d63271fbbb47bb1&t=1429196244.67621
Server: nginx
Set-Cookie: cwd-extranet=1; path=/; expires=Tue, 14-Apr-2020 14:57:24 GMT
slan=xu; path=/; expires=Wed, 18-May-2033 03:33:20 GMT; HttpOnly
Strict-Transport-Security: max-age=2592000
Transfer-Encoding: chunked
And here is headers from Mechanize, login failed (no location on response header ?) :
form encoding: utf-8
query: "ses=e1520f97a6e9056940b4cf4e90684836&loginname=my_login&password=my_password&lang=xu&csrf_token=empty-token"
Net::HTTP::Post: /hotel/hoteladmin/login.html
request-header: accept-encoding => gzip,deflate,identity
request-header: accept => */*
request-header: user-agent => Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/5.1.1 Safari/534.51.22
request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7
request-header: accept-language => en-us,en;q=0.5
request-header: host => admin.booking.com
request-header: referer => https://admin.booking.com/hotel/hoteladmin/
request-header: content-type => application/x-www-form-urlencoded
request-header: content-length => 105
status: Net::HTTPOK 1.1 200 OK
response-header: server => nginx
response-header: date => Thu, 16 Apr 2015 14:39:22 GMT
response-header: content-type => text/html; charset=UTF-8
response-header: transfer-encoding => chunked
response-header: connection => keep-alive
response-header: vary => Accept-Encoding
response-header: x-powered-by => en105admapp-04
response-header: strict-transport-security => max-age=2592000
response-header: content-encoding => gzip
Thanks for your help
I managed to resolve the issue without taking care of the CSRF Token.
What I did is to follow the POST/GET sequence found with Firebug, only the SES token which can be found on the login form (hidden) is important.
So for the login POST we have :
uri = URI.parse("https://admin.booking.com/hotel/hoteladmin/login.html")
data = URI.encode("lang=en&login=Login&ses=#{token}&loginname=#{username}&password=#{password}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request.body = data
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']
And then we follow the redirection with the previous cookie & location until we get a 200 response code with something like :
uri = URI.parse("https://admin.booking.com#{location}")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Get.new(uri.request_uri)
request['Cookie'] = cookie
response = http.request(request)
cookie = response.response['set-cookie']
location = response.response['location']

Get image file using ruby & capybara

It is image tag on the page accessed by capybara via HTTPS protocol:
<img src="path">
Is it any way to get image file from the page using capybara with any kind of driver?
I can not use something like File.read('path') because image is also accessible via HTTPS only. My latest researches brought me to such kind of solution:
Visit page
Save page to png (webkit driver has such useful ability)
Crop image
But I do believe that pretty solution exists.
Edited 1:
I've tried out padde's solution, but here is response body:
<html><head><title>Object moved</title></head>
<body>
<h2>Object moved to here.</h2>
</body>
</html>
Edited 2:
> curl -I image_path
5860cf30abf5d5480
HTTP/1.1 302 Found
Cache-Control: private
Content-Length: 168
Content-Type: text/html; charset=utf-8
Location: /Bledy/Blad404.aspx?aspxerrorpath=/CaptchaType.ashx
Server: Microsoft-IIS/7.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Sat, 03 Nov 2012 17:18:55 GMT
What you probably want is a HTTPS request from Ruby if i get this right. Try:
require 'net/https'
url = URI.parse('path')
Net::HTTP.start(url.host, url.port, :use_ssl => true, :verify_mode => OpenSSL::SSL::VERIFY_NONE) do |http|
res = http.get(url.request_uri)
open("image.png", "wb") do |f|
f.write(res.body)
end
end
For cropping, you can either use chunky_png (pure Ruby) or rmagick (requires ImageMagick)
Edit: If you want to follow redirects you can do
require 'net/https'
def process_image( content )
# do your cropping here
open("image.png", "wb") do |f|
f.write(content)
end
end
def fetch( url )
Net::HTTP.start(url.host, url.port, :use_ssl => true, :verify_mode => OpenSSL::SSL::VERIFY_NONE) do |http|
response = http.get(url.request_uri)
case response.code
when Net::HTTPRedirection
fetch response['location']
else
process_image response.body
end
end
end
fetch URI.parse('path')

OAuth gem not signing requests

I'm using the ruby gem for OAuth (http://oauth.rubyforge.org/) and I can't get it to create the authorization header for the provider I'm attempting to hit.
Here is my code:
consumer = OAuth::Consumer.new(auth[:consumer_key], auth[:consumer_secret], {
:site => 'http://api.rdio.com',
:scheme => :header
})
access_token = OAuth::AccessToken.new(consumer)
ap access_token.post('/1', :method => 'search', :query => 'Robert', :types => 'User')
When the requests happens, The header is not present in the call.
#<Net::HTTP::Post:0x7fbf149e91e0
#body_data = nil,
#header = {
"accept" => [
[0] "*/*"
],
"user-agent" => [
[0] "Ruby"
],
"content-length" => [
[0] "0"
],
"content-type" => [
[0] "application/x-www-form-urlencoded"
]
},
The header I'm referring to is the one that looks like this:
OAuth oauth_nonce=\"225579211881198842005988698334675835446\", oauth_signature_method=\"HMAC-SHA1\", oauth_token=\"token_411a7f\", oauth_timestamp=\"1199645624\", oauth_consumer_key=\"consumer_key_86cad9\", oauth_signature=\"1oO2izFav1GP4kEH2EskwXkCRFg%3D\", oauth_version=\"1.0\"
Looks like you are trying to do 2-legged oauth. See if this code works for you.
Edit: Updated Code Sample
gem 'oauth'
require 'oauth'
require 'net/http'
consumer = OAuth::Consumer.new('ENTER_KEY', 'ENTER_SECRET', {
:site => 'http://api.rdio.com',
:scheme => :header
})
resp = consumer.request(:post, '/1/search', nil, {}, 'method=search&query=Robert&types=User', { 'Content-Type' => 'application/x-www-form-urlencoded' })
puts resp.code + "\r\n"
puts resp.body
Edit: Added captured http stream
POST /1/search HTTP/1.1
Content-Type: application/x-www-form-urlencoded
Accept: */*
User-Agent: OAuth gem v0.4.5
Content-Length: 37
Authorization: OAuth oauth_consumer_key="REDACTED_KEY", oauth_nonce="dwp8m2TGPHQNx3A7imLi7OkAULL7c0IWbTKefPXCsAY", oauth_signature="LxDZn6UNFLY%2FaXItu6MPK5a11js%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="1330193449", oauth_version="1.0"
Connection: close
Host: api.rdio.com
method=search&query=Robert&types=UserHTTP/1.1 200 OK
X-Mashery-Responder: mashery-web1.LAX
Content-Type: application/json
Vary: Accept-Encoding
Vary: Accept-Language, Cookie
Content-Language: en
Cache-Control: no-cache
X-Version: 11.1
Accept-Ranges: bytes
Date: Sat, 25 Feb 2012 18:10:50 GMT
Server: Mashery Proxy
Content-Length: 2763
Connection: close
{"status": "ok", "result": {"person_count": 9603, "track_count": 93409, "number_results": 200, "playlist_count": 205, "results": ***TRUNCATED RESULTS FOR BREVITY***

Resources