I have a script to scrape data with Mechanize, but I can't authenticate properly on some intranet sites because of NTLM authentication.
This is the code:
require 'mechanize'
url = 'http://intranet/somesite.asp'
agent = Mechanize.new
agent.auth(url, 'my_login', 'my_password')
agent.get(url) do |page|
puts page.title
puts page.body
end
This is the error returned:
/home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:753:in `response_authenticate': 401 => Net::HTTPUnauthorized for http://sistemasnet/srd/Consultas/ConsultaGeral/TelaListagem.asp -- NTLM authentication failed -- available realms: (Mechanize::UnauthorizedError)
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:302:in `fetch'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:788:in `response_authenticate'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:302:in `fetch'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:788:in `response_authenticate'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:302:in `fetch'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize.rb:440:in `get'
from mechanize_scrape.rb:6:in `<main>'
I already tried all three methods with no success:
add_auth
auth
basic_auth
and also tried to give more parameters like realm and domain, although I don't really get what realm is.
Just went through mechanize issues, and realized they dropped NTLM support.
Related
For a while I've been unable to run webhooks from my GitLab instance. At first I thought it was something related to GitLab upgrade ~10.0 release, or some iptables, but now I think it might be more Ruby thing together with how Shippable endpoints are called (in Ruby?).
On the failed request site I can see following information:
reason for failure is execution expired
URL is https://[username:password]#api.shippable.com/projects/[project id]/newBuild - it's generated by Shippable automatically on enabling project
X-Gitlab-Event type is Push Hook
there is also JSON with request body
First, I tested wither I can actually connect with Shippable from server
curl --verbose -X POST -H "Content-Type: application/json" -H "X-Gitlab-Event: $event" --data "$json" $url
Request succeeded, which made me think that it is not a matter of iptables (however I checked and no, no iptables rules were set).
Then I attempted to recreate that request inside /opt/gitlab/embedded/bin/irb:
require 'net/http'
require 'net/https'
uri = URI.parse(url)
username = uri.userinfo.split(':')[0]
password = uri.userinfo.split(':')[1]
req = Net::HTTP::Post.new(uri.path, {'Content-Type' =>'application/json', 'X-Gitlab-Event' => event})
req.basic_auth username, password
req.body = json
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
response = http.start { |http| http.request(req) }
Then it failed just like in GitLab with:
Net::OpenTimeout: execution expired
from /opt/gitlab/embedded/lib/ruby/2.3.0/net/http.rb:880:in `initialize'
from /opt/gitlab/embedded/lib/ruby/2.3.0/net/http.rb:880:in `open'
from /opt/gitlab/embedded/lib/ruby/2.3.0/net/http.rb:880:in `block in connect'
from /opt/gitlab/embedded/lib/ruby/2.3.0/timeout.rb:101:in `timeout'
from /opt/gitlab/embedded/lib/ruby/2.3.0/net/http.rb:878:in `connect'
from /opt/gitlab/embedded/lib/ruby/2.3.0/net/http.rb:863:in `do_start'
from /opt/gitlab/embedded/lib/ruby/2.3.0/net/http.rb:852:in `start'
from (irb):142
from embedded/bin/irb:11:in `<main>'
Interestingly similar thing happens on my local machine: curl succeeds while Ruby throws.
Additionally I checked out that it shouldn't be a matter of basic auth, SSL nor POST - I successfully POSTed snippet on Bitbucket from my server's irb the same way I tested Shippable webhook endpoint. I even posted the Shippable request on my mock server with virtually the same request format.
At this point I am curious what might be the cause of such behavior and how to debug it further. The only 2 factors I found that were constant in all failing cases is the target (Shippable URI) and the client (Ruby's Net::HTTP). What else do you suggest me to check?
I cannot answer when exactly but the issue disappeared - I assume either GitLab or Shippable update changed something as hooks started working again without me doing any action.
I'm trying to use RestClient to retrieve a page that's secured using an SSL client certificate. My code is as follows:
require 'restclient'
p12 = OpenSSL::PKCS12.new(File.read('client.p12'), 'password')
client = RestClient::Resource.new('https://example.com/',
:ssl_client_key => p12.key,
:verify_ssl => OpenSSL::SSL::VERIFY_NONE)
client.get
When I run it, I see the following failure:
1.9.3-p374 :007 > client.get
RestClient::BadRequest: 400 Bad Request
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in `return!'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in `process_result'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:178:in `block in transmit'
from /home/duncan/.rvm/rubies/ruby-1.9.3-p374/lib/ruby/1.9.1/net/http.rb:745:in `start'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in `transmit'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in `execute'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in `execute'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/resource.rb:51:in `get'
from (irb):7
from /home/duncan/.rvm/rubies/ruby-1.9.3-p374/bin/irb:13:in `<main>'
I'm fairly sure this is a failure to authenticate, as I get the same error in a browser if I don't install the client certificate.
I'm using OpenSSL::SSL::VERIFY_NONE because the server has a self-signed certificate, and I believe this is the correct value to pass to ignore that.
Any suggestions on how to get this working would be greatly appreciated - even a pointer to some detailed documentation, or a suggestion of a different Gem could work. I've not had much luck with either the Gem docs or Google :(
Your HTTPS request is going to need the client certificate as well as the key. Try:
client = RestClient::Resource.new('https://example.com/',
:ssl_client_cert => p12.certificate,
:ssl_client_key => p12.key,
:verify_ssl => OpenSSL::SSL::VERIFY_NONE)
If that doesn't work you can try capturing the handshake packets (e.g. with WireShark) to verify that the API is offering the certificate.
I am adding functionality that scrapes an XML page from a source that requires the use of an HTTPS connection with authentication. I am trying to use Ryan Bates' Railscast #190 solution but I'm running into a 401 Authentication error.
Here is my test Ruby script:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "https://biblesearch.americanbible.org/passages.xml?q[]=john+3:1-5&version=KJV"
doc = Nokogiri::XML(open(url, :http_basic_authentication => ['username' ,'password']))
puts doc.xpath("//text_preview")
Here is the output of the console after I run my script:
/usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `block in connect'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/timeout.rb:54:in `timeout'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/timeout.rb:99:in `timeout'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `connect'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:755:in `do_start'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:744:in `start'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:306:in `open_http'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:775:in `buffer_open'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:203:in `block in open_loop'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:201:in `catch'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:201:in `open_loop'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:146:in `open_uri'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:677:in `open'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:33:in `open'
from scrape.rb:6:in `<main>'
In my research, I saw one post in which it was suggested that in 1.9.3 the following option could be used:
doc = Nokogiri::XML(open(url, :http_basic_authentication => ['username' ,'password'], :ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE))
However, this did not work either. I would appreciate some insight into addressing this challenge.
The given URL will be redirected to /v1/KJV/passages.xml?q[]=john+3%3A1-5 with HTTP status code 302 Found. OpenURI understands the redirection, but automatically deletes authentication header (maybe) for security reason. (*)
If you access "http://biblesearch.americanbible.org/v1/KJV/passages.xml?q[]=john+3%3A1-5" directly, you will get the expected result. :-)
(*) You can find in open-uri.rb:
if redirect
### snip ###
if options.include? :http_basic_authentication
# send authentication only for the URI directly specified.
options = options.dup
options.delete :http_basic_authentication
end
You can do this and it should work too:
open(url, :http_basic_authentication => [user, pass] )
doc = Nokogiri::HTML(open(url, :http_basic_authentication => [user, pass] ))
You can then parse the doc anyway you want.
By passing the http_basic_authentication in the header again in the second request, you will make up for the deleted header in the first request.
hope this works for you.
http://http-basic-authentication-nokogiri.blogspot.com/2014/08/http-basic-authentication-using-nokogiri.html
You say you need to use HTTPS, but you're using the HTTP protocol:
url = "http://biblesearch...."
OpenURI understands both HTTP and HTTPS. If you want to connect using HTTPS, change the protocol in the URL to HTTPS, then make the connection:
url = "https://biblesearch...."
I'm trying to read Stanford ecorner XML:
open("http://ecorner.stanford.edu/RecentlyAdded.xml")
but am running into the following error message:
OpenURI::HTTPError: 500 Internal Server Error
from /usr/local/lib/ruby/1.8/open-uri.rb:277:in `open_http'
from /usr/local/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
from /usr/local/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:162:in `catch'
from /usr/local/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
from /usr/local/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
from /usr/local/lib/ruby/1.8/open-uri.rb:518:in `open'
from /usr/local/lib/ruby/1.8/open-uri.rb:30:in `open'
from (irb):65
from :0
I believe, but I could be wrong, it's because I would need to be logged in to use the feed.
Any workaround I could use?
In case of not being logged in you should get an HTTP response code of 401 Unauthorized and not 500. I tried to open the site in the browser, which works. Turns out their web server doesn't like missing user agents, so if you add that open-uri works:
>> require 'open-uri'
#=> true
>> open("http://ecorner.stanford.edu/RecentlyAdded.xml", 'User-Agent' => 'ruby')
#=> #<File:/var/folders/H9/H9qnar1yGZqBrWFGuTE0RU+++TI/-Tmp-/open-uri20110505-25566-zsc3pd-0>
This is working for me:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::XML(open('http://ecorner.stanford.edu/RecentlyAdded.xml'))
puts doc.search('title').map{ |n| n.text }
>> Recently Added STVP Entrepreneurship Corner Materials
>> STVP Entrepreneurship Corner
>> Podcast: Developing Products that Save Lives - Richard Scheller (Genentech)
>> Podcast: How to Build Instant Connections - Ori Brafman (Author)
>> Podcast: A New Vision for Capital Markets - Barry Silbert (SecondMarket)
>> Podcast: Effective Models for Sustainable Growth - Jennifer Morris (Conservation International)
Note that you got a 500-range error. That means their server is acting up, but is functional enough to admit the problem. If you got a 400-range error they'd be refusing you access to the content for some reason, so I doubt the problem is authentication or anything on your side.
Setup with cucumber, capybara and selenium but some scenarios works only randomly.
Running
ruby 1.8.6 on rvm
rails 2.3.8
selenium pops open firefox 3.6
I have tried to add this with no luck:
with_scope(selector) do
click_button(button)
selenium.wait_for_page_to_load
end
The error output is sometimes:
> Given I am logged in and have created newsletter and subscribers # features/step_definitions/newsletter_send_steps.rb:108
end of file reached (EOFError)
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/protocol.rb:133:in `sysread'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/timeout.rb:62:in `timeout'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/timeout.rb:93:in `timeout'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/protocol.rb:126:in `readline'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/http.rb:2009:in `read_new'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/http.rb:1050:in `request_without_fakeweb'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/http.rb:1037:in `request_without_fakeweb'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/http.rb:543:in `start'
/Users/christianhager/.rvm/rubies/ruby-1.8.6-p399/lib/ruby/1.8/net/http.rb:1035:in `request_without_fakeweb'
./features/step_definitions/web_steps.rb:24:in `__instance_exec2'
./features/step_definitions/web_steps.rb:9:in `with_scope'
./features/step_definitions/web_steps.rb:9:in `with_scope'
./features/step_definitions/web_steps.rb:23:in `/^(?:|I )press "([^\"]*)"(?: within "([^\"]*)")?$/'
features/enhanced/newsletter_send1.feature:7:in `Given I am logged in and have created newsletter and subscribers'
And othertimes:
> no button with value or id or text 'create_user_button' found (Capybara::ElementNotFound)
./features/step_definitions/web_steps.rb:24:in `__instance_exec2'
./features/step_definitions/web_steps.rb:9:in `with_scope'
./features/step_definitions/web_steps.rb:9:in `with_scope'
./features/step_definitions/web_steps.rb:23:in `/^(?:|I )press "([^\"]*)"(?: within "([^\"]*)")?$/'
features/enhanced/newsletter_send1.feature:7:in `Given I am logged in and have created newsletter and subscribers'
And sometimes it just works....
This is how my env.rb looks like
ENV["RAILS_ENV"] ||= "cucumber"
require File.expand_path(File.dirname(__FILE__) + '/../../config/environment')
require 'cucumber/formatter/unicode' # Remove this line if you don't want Cucumber Unicode support
require 'cucumber/rails/world'
require 'cucumber/rails/active_record'
require 'cucumber/web/tableish'
require 'capybara/rails'
require 'capybara/cucumber'
require 'capybara/session'
require 'cucumber/rails/capybara_javascript_emulation'
require "selenium-webdriver"
Capybara.default_driver = :selenium
Capybara.default_wait_time = 5
Capybara.ignore_hidden_elements = false
Capybara.default_selector = :css
ActionController::Base.allow_rescue = false
require 'database_cleaner'
DatabaseCleaner.strategy = :truncation
Before do
Capybara.reset_sessions!
DatabaseCleaner.clean
end
Cucumber::Rails::World.use_transactional_fixtures = false
Cucumber-steps:
Given I am on the signup page
And I fill in "user_login" with "jeppsipeppsi#arcticelvis.com" within "body"
And I fill in "user_password" with "secret" within "body"
And I fill in "user_password_confirmation" with "secret" within "body"
And I check "terms_of_use" within "body"
And I press "create_user_button" within "body"
Any insight would be great :)
It's HTTP mocking, if you remove fakeweb or webmock from your environment (entirely), it should all work again.
The last comment by Adam Greene DOES WORK regarding setting up Curb with:
Selenium::WebDriver.for :firefox, :http_client => Selenium::WebDriver::Remote::Http::Curb
Read the thread on the Capybara group.
The problem we're having is playing back recorded http traffic using fakeweb or webmock since web driver is now Curb. So if you're goal was to fake out traffic over Capybara, you'll get browser testing to work again but you won't be able to play the traffic back over the same browser. (We're using VCR to record.)
Adding Curb support is listed as a "ticket" on the Fakeweb's Github Issues site.
I bumped into this in a Rails 2.3 app with cucumber/capybara/akephalos/fakeweb recently, but ultimately got to resolve this by completely killing all gems in my bundle (they where kept in .bundle/ and reinstalling.