OpenUri causing 401 Unauthorized error with HTTPS URL - ruby

I am adding functionality that scrapes an XML page from a source that requires the use of an HTTPS connection with authentication. I am trying to use Ryan Bates' Railscast #190 solution but I'm running into a 401 Authentication error.
Here is my test Ruby script:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "https://biblesearch.americanbible.org/passages.xml?q[]=john+3:1-5&version=KJV"
doc = Nokogiri::XML(open(url, :http_basic_authentication => ['username' ,'password']))
puts doc.xpath("//text_preview")
Here is the output of the console after I run my script:
/usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `block in connect'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/timeout.rb:54:in `timeout'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/timeout.rb:99:in `timeout'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `connect'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:755:in `do_start'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:744:in `start'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:306:in `open_http'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:775:in `buffer_open'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:203:in `block in open_loop'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:201:in `catch'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:201:in `open_loop'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:146:in `open_uri'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:677:in `open'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:33:in `open'
from scrape.rb:6:in `<main>'
In my research, I saw one post in which it was suggested that in 1.9.3 the following option could be used:
doc = Nokogiri::XML(open(url, :http_basic_authentication => ['username' ,'password'], :ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE))
However, this did not work either. I would appreciate some insight into addressing this challenge.

The given URL will be redirected to /v1/KJV/passages.xml?q[]=john+3%3A1-5 with HTTP status code 302 Found. OpenURI understands the redirection, but automatically deletes authentication header (maybe) for security reason. (*)
If you access "http://biblesearch.americanbible.org/v1/KJV/passages.xml?q[]=john+3%3A1-5" directly, you will get the expected result. :-)
(*) You can find in open-uri.rb:
if redirect
### snip ###
if options.include? :http_basic_authentication
# send authentication only for the URI directly specified.
options = options.dup
options.delete :http_basic_authentication
end

You can do this and it should work too:
open(url, :http_basic_authentication => [user, pass] )
doc = Nokogiri::HTML(open(url, :http_basic_authentication => [user, pass] ))
You can then parse the doc anyway you want.
By passing the http_basic_authentication in the header again in the second request, you will make up for the deleted header in the first request.
hope this works for you.
http://http-basic-authentication-nokogiri.blogspot.com/2014/08/http-basic-authentication-using-nokogiri.html

You say you need to use HTTPS, but you're using the HTTP protocol:
url = "http://biblesearch...."
OpenURI understands both HTTP and HTTPS. If you want to connect using HTTPS, change the protocol in the URL to HTTPS, then make the connection:
url = "https://biblesearch...."

Related

Mechanize with NTLM giving "401 Unauthorized" in Ruby

I have a script to scrape data with Mechanize, but I can't authenticate properly on some intranet sites because of NTLM authentication.
This is the code:
require 'mechanize'
url = 'http://intranet/somesite.asp'
agent = Mechanize.new
agent.auth(url, 'my_login', 'my_password')
agent.get(url) do |page|
puts page.title
puts page.body
end
This is the error returned:
/home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:753:in `response_authenticate': 401 => Net::HTTPUnauthorized for http://sistemasnet/srd/Consultas/ConsultaGeral/TelaListagem.asp -- NTLM authentication failed -- available realms: (Mechanize::UnauthorizedError)
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:302:in `fetch'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:788:in `response_authenticate'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:302:in `fetch'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:788:in `response_authenticate'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:302:in `fetch'
from /home/igallina/.rvm/gems/ruby-2.2.2/gems/mechanize-2.7.3/lib/mechanize.rb:440:in `get'
from mechanize_scrape.rb:6:in `<main>'
I already tried all three methods with no success:
add_auth
auth
basic_auth
and also tried to give more parameters like realm and domain, although I don't really get what realm is.
Just went through mechanize issues, and realized they dropped NTLM support.

SSL Verify Error Ruby

So I have this script:
#!/usr/bin/ruby
require 'net/https'
require 'open-uri'
puts "HTTPS Client for Ruby!"
puts "Enter the URL"
site = gets.chomp
url = URI.parse(site)
http = Net::HTTP.new(url.host,url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.cert_store = OpenSSL::X509::Store.new
http.cert_store.set_default_paths
http.cert_store.add_file('/home/user/sec/certs/cacert.pem')
page = Net::HTTP.get(url)
puts page
It works fine. It's able to grab the html of the homepage of pretty much any http or https website. However, I have an HTTPS enabled webserver set up in a virtual machine which it doesn't work with. Before I enabled SSL on the webserver this script grabbed the html just fine. So my question is, why do I receive this error:
/usr/lib/ruby/2.1.0/net/http.rb:920:in `connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
from /usr/lib/ruby/2.1.0/net/http.rb:920:in `block in connect'
from /usr/lib/ruby/2.1.0/timeout.rb:67:in `timeout'
from /usr/lib/ruby/2.1.0/net/http.rb:920:in `connect'
from /usr/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
from /usr/lib/ruby/2.1.0/net/http.rb:852:in `start'
from /usr/lib/ruby/2.1.0/net/http.rb:583:in `start'
from /usr/lib/ruby/2.1.0/net/http.rb:478:in `get_response'
from /usr/lib/ruby/2.1.0/net/http.rb:455:in `get'
from https_client.rb:20:in `<main>'
When running the script trying to grab the html of my web server? The path that I've specified has an actual certificate there.
You get that error when the SSL certificate is self signed/not from a verified ssl provider.So assuming the website you are pulling from, is your own/you trust it and hasn't a verified certificate, change
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
to
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
same situation

Ruby / Sinatra - Route any HTTP request to HTTPS (using Rack::SSL) - !! Invalid request

I'm playing with Ruby / Sinatra at present and attempting to get HTTPS working.
I've taken a look at the rack:ssl gem here: https://github.com/josh/rack-ssl
It seems to be working when I run the application (as in redirecting to HTTPS), but nothing is displayed in the browser and the log comes up with the following error:
!! Invalid request
#!/usr/bin/env ruby
require 'rack/ssl'
require 'sinatra'
use Rack::SSL
get '/' do
'Hello World'
end
I'm not sure what to do from here.
Update:
Turned Thin Logging on in the sinatra app and got the following in the log:
!! Invalid request
Invalid HTTP format, parsing fails.
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/thin-1.5.1/lib/thin/request.rb:82:in `execute'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/thin-1.5.1/lib/thin/request.rb:82:in `parse'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/thin-1.5.1/lib/thin/connection.rb:39:in `receive_data'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run_machine'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/eventmachine-1.0.3/lib/eventmachine.rb:187:in `run'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/thin-1.5.1/lib/thin/backends/base.rb:63:in `start'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/thin-1.5.1/lib/thin/server.rb:159:in `start'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/rack-1.5.2/lib/rack/handler/thin.rb:16:in `run'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/sinatra-1.4.3/lib/sinatra/base.rb:1408:in `run!'
/Users/ashleycox/.rvm/gems/ruby-2.0.0-p247/gems/sinatra-1.4.3/lib/sinatra/main.rb:25:in `block in <module:Sinatra>'
Any help would be greatly appreciated, thank you.

open_http: 403 Forbidden (OpenURI::HTTPError)

I am trying to pull data from my Google+ API, using this script:
require 'open-uri'
require 'json'
google_api_key = 'put your google api key here'
page_id = '105672627985088123672'
data = open("https://www.googleapis.com/plus/v1/people/#{page_id}?key=#{google_api_key}").read
obj = JSON.parse(data)
puts obj['plusOneCount'].to_i
However, I keep getting this error:
/Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:346:in `open_http': 403 Forbidden (OpenURI::HTTPError)
from /Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:769:in `buffer_open'
from /Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:203:in `block in open_loop'
from /Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:201:in `catch'
from /Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:201:in `open_loop'
from /Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:146:in `open_uri'
from /Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:671:in `open'
from /Users/xng/.rvm/rubies/ruby-1.9.2-p320/lib/ruby/1.9.1/open-uri.rb:33:in `open'
from gplus.rb:8:in `<main>'
I am not sure what is wrong here, any help would be great.
The problem looks like your google API key doesn't match the one that google have in their servers. So you need to make sure that you are using the right key. is it a private or free service ?
Have to regenerate the API key.

RestClient failing to GET resource using SSL client certificate

I'm trying to use RestClient to retrieve a page that's secured using an SSL client certificate. My code is as follows:
require 'restclient'
p12 = OpenSSL::PKCS12.new(File.read('client.p12'), 'password')
client = RestClient::Resource.new('https://example.com/',
:ssl_client_key => p12.key,
:verify_ssl => OpenSSL::SSL::VERIFY_NONE)
client.get
When I run it, I see the following failure:
1.9.3-p374 :007 > client.get
RestClient::BadRequest: 400 Bad Request
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in `return!'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in `process_result'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:178:in `block in transmit'
from /home/duncan/.rvm/rubies/ruby-1.9.3-p374/lib/ruby/1.9.1/net/http.rb:745:in `start'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in `transmit'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in `execute'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in `execute'
from /home/duncan/.rvm/gems/ruby-1.9.3-p374/gems/rest-client-1.6.7/lib/restclient/resource.rb:51:in `get'
from (irb):7
from /home/duncan/.rvm/rubies/ruby-1.9.3-p374/bin/irb:13:in `<main>'
I'm fairly sure this is a failure to authenticate, as I get the same error in a browser if I don't install the client certificate.
I'm using OpenSSL::SSL::VERIFY_NONE because the server has a self-signed certificate, and I believe this is the correct value to pass to ignore that.
Any suggestions on how to get this working would be greatly appreciated - even a pointer to some detailed documentation, or a suggestion of a different Gem could work. I've not had much luck with either the Gem docs or Google :(
Your HTTPS request is going to need the client certificate as well as the key. Try:
client = RestClient::Resource.new('https://example.com/',
:ssl_client_cert => p12.certificate,
:ssl_client_key => p12.key,
:verify_ssl => OpenSSL::SSL::VERIFY_NONE)
If that doesn't work you can try capturing the handshake packets (e.g. with WireShark) to verify that the API is offering the certificate.

Resources