I have trouble with getting page source.
require 'mechanize'
agent = Mechanize.new
page = agent.get("https://#{ip}/")
end have error:
/home/lord/.gem/ruby/1.9.1/gems/mechanize-2.4/lib/mechanize/http/agent.rb:682:in `response_authenticate': 401 => Net::HTTPUnauthorized for https://82.144.208.6/cgi-bin/welcome.cgi -- no credentials found, provide some with #add_auth -- available realms: r722 (Mechanize::UnauthorizedError)
from /home/lord/.gem/ruby/1.9.1/gems/mechanize-2.4/lib/mechanize/http/agent.rb:288:in `fetch'
from /home/lord/.gem/ruby/1.9.1/gems/mechanize-2.4/lib/mechanize.rb:407:in `get'
from /home/lord/ruby/ruby_backup/backup-done.ru:35:in `block (2 levels) in <main>'
how can I ignore http auth, and get source? thx
The exception contains a page accessor. The documentation describes it: http://mechanize.rubyforge.org/Mechanize/ResponseCodeError.html
Try:
begin
page = agent.get ...
rescue Mechanize::ResponseCodeError => e
page = e.page
end
Related
I can't seem to get this RSS feed to work properly. I've tried Nokogiri and now RSS::Parser and neither work:
a = 'https://phys.org/rss-feed/biology-news/biology-other/'
URI.open(a) do |rss|
feed = RSS::Parser.parse(rss)
puts "Title: #{feed.channel.title}"
feed.items.each do |item|
puts "Item: #{item.title}"
end
end
The code is taken directly out of the docs: https://github.com/ruby/rss
The feed is valid, so I'm confused as to why there's a 400 error code.
What am I doing wrong? Anybody have insight as to how to get this RSS parsed?
Here is the error:
/Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:364:in `open_http': 400 Bad request (OpenURI::HTTPError)
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:741:in `buffer_open'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:212:in `block in open_loop'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:210:in `catch'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:210:in `open_loop'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:151:in `open_uri'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/gems/3.1.0/gems/open_uri_redirections-0.2.1/lib/open-uri/redirections_patch.rb:55:in `open_uri'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:721:in `open'
from /Users/user3/.rbenv/versions/3.1.2/lib/ruby/3.1.0/open-uri.rb:29:in `open'
from /users/user3/app.rb:1856:in `<main>'
The web server requires the request to have a User-Agent set in the headers. Without such a User-Agent header it returns the 400 error message.
require 'uri'
require 'open-uri'
require 'rss'
uri = URI.parse("https://phys.org/rss-feed/biology-news/biology-other/")
uri.open("User-Agent" => "Ruby/#{RUBY_VERSION}") do |rss|
feed = RSS::Parser.parse(rss)
puts "Title: #{feed.channel.title}"
feed.items.each do |item|
puts "Item: #{item.title}"
end
end
This code work for me.
Thanks for your time. Somewhat new to OOP and Ruby and after synthesizing solutions from a few different stack overflow answers I've got myself turned around.
My goal is to write a script that parses a CSV of URLs using Nokogiri library. After trying and failing to use open-uri and the open-uri-redirections plugin to follow redirects, I settled on Net::HTTP and that got me moving...until I ran into URLs that have a 302 redirect specifically.
Here's the method I'm using to engage the URL:
require 'Nokogiri'
require 'Net/http'
require 'csv'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(uri_str)
#puts "The value of uri_str is: #{ uri_str}"
#puts "The value of URI.parse(uri_str) is #{ url }"
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
# puts "THE URL IS #{url.scheme + ":" + url.host + url.path}" # just a reporter so I can see if it's mangled
response = Net::HTTP.start(url.host, url.port, :use_ssl => url.scheme == 'https') { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
#puts "Problem clause!"
response.error!
end
end
Further down in my script I take an ARGV with the URL csv filename, do CSV.read, encode the URL to a string, then use Nokogiri::HTML.parse to turn it all into something I can use xpath selectors to examine and then write to an output CSV.
Works beautifully...so long as I encounter a 200 response, which unfortunately is not every website. When I run into a 302 I'm getting this:
C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1570:in `addr_port': undefined method `+' for nil:NilClass (NoMethodError)
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1503:in `begin_transport'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1442:in `transport_request'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1416:in `request'
from httpcsv.rb:14:in `block in fetch'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:877:in `start'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:608:in `start'
from httpcsv.rb:14:in `fetch'
from httpcsv.rb:17:in `fetch'
from httpcsv.rb:42:in `block in <main>'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from httpcsv.rb:38:in `<main>'
I know I'm missing something right in front of me but I can't tell what I should puts to see if it is nil. Any help is appreciated, thanks in advance.
Using Mechanize 2.6.0 on Ruby 1.9.3 I'm trying to fetch a web page over HTTPS from Windows 7x64. When I attempt to get() the URL the CPU usage goes to 100% and the method never returns:
require 'mechanize'
uri = "https://my.com/wiki/api.php?action=query&titles=US4&prop=info&format=xml"
agent = Mechanize.new
u,p = %w[myusername mypassword]
agent.add_auth( uri, u, p )
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
info = agent.get( uri )
When I interrupt it, I get these stack traces (three different runs):
>> info = agent.get( page_api )
IRB::Abort: abort then interrupt!
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:27:in `call'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:27:in `parse'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:716:in `response_authenticate'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:306:in `fetch'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize.rb:431:in `get'
from (irb):10
from C:/Ruby193/bin/irb:12:in `<main>'
>> info = agent.get( page_api )
IRB::Abort: abort then interrupt!
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:29:in `call'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:29:in `new'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:29:in `parse'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:716:in `response_authenticate'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:306:in `fetch'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize.rb:431:in `get'
from (irb):11
from C:/Ruby193/bin/irb:12:in `<main>'
>> info = agent.get( page_api )
IRB::Abort: abort then interrupt!
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:114:in `call'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:114:in `token'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/www_authenticate_parser.rb:31:in `parse'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:716:in `response_authenticate'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize/http/agent.rb:306:in `fetch'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.6.0/lib/mechanize.rb:431:in `get'
from (irb):12
from C:/Ruby193/bin/irb:12:in `<main>'
How can I work around this problem and properly fetch an HTTPS URL via Ruby on Windows? (If there's a better solution than Mechanize for this—since I only need the source of the page to feed to Nokogiri anyhow—I'm open to not using Mechanize at all.)
Another datapoint: trying the same code on OS X produces the same result.
Here's the actual content of the page, using the alternative fetching method described in my workaround answer below:
p fetch_https_without_ssl_verification(uri, u, p)
#=> "\t\t <?xml version=\"1.0\"?><api><query><normalized><n from=\"Devtools/UI_Composer/DesignSpec/US7294\" to=\"Devtools/UI Composer/DesignSpec/US7294\" /></normalized><pages><page ns=\"0\" title=\"Devtools/UI Composer/DesignSpec/US7294\" missing=\"\" /></pages></query></api>"
If you simply need the contents of the URL (as I do) then using curl instead of Mechanize is far easier, and it works:
def fetch_https_without_ssl_verification( uri, user=nil, pass=nil )
`curl -s -k #{%Q{-u "#{user}#{":"<<pass if pass}"} if user} "#{uri}"`
end
I want to implement watirgrid, but I'm not able to do that, every time I'm getting errors related with controller and provider starting process, Also all the example over internet, none of them are working.
Could any one please help me to implement this, a full running example with steps will be a great help.
I'm trying this code:
require 'rubygems'
require 'watirgrid'
require 'watir'
require 'watir-webdriver'
# setup a controller on port 12351 for your new grid
controller = Controller.new(:ring_server_port => 12351, :loglevel => Logger::ERROR)
controller.start
# add a provider to your grid
# :browser_type => 'webdriver' if using webdriver or
# :browser_type => 'ie' if using watir...
provider = Provider.new(:controller_uri => 'druby://127.0.0.1:11235',
:ring_server_port => 12351,
:loglevel => Logger::ERROR,
:browser_type => 'webdriver')
provider.start
# connect to the grid and take all providers from it (this time only one)
grid = Watir::Grid.new(:ring_server_port => 12351, :ring_server_host => '127.0.0.1')
grid.start(:take_all => true)
# for each provider on the grid, launch a new thread to start multiple browsers
threads = []
grid.browsers.each do |browser|
threads << Thread.new do
p browser[:hostname]
p browser[:architecture]
p browser[:browser_type]
# in this case we are starting a new IE browser
b = browser[:object].new_browser(:ie)
b.goto("http://www.google.com")
b.text_field(:name, 'q').set("watirgrid")
b.button(:name, "btnI").click
end
end
threads.each {|thread| thread.join}
And Errors I'm getting is
DRb::DRbConnError: druby://127.0.0.1:11235 - #<Errno::ECONNREFUSED: No connection could
be made because the target machine actively refused it. - connect(2)>
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:736:in `rescue in block in open'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:730:in `block in open'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:729:in `each'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:729:in `open'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:1191:in `initialize'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:1171:in `new'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:1171:in `open'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:1087:in `block in method_missing'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:1105:in `with_friend'
from C:/Ruby192/lib/ruby/1.9.1/drb/drb.rb:1086:in `method_missing'
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/watirgrid-1.1.5/lib/provider.rb:141:in `start'
from (irb):44
from C:/Ruby192/bin/irb:12:in `<main>'
irb(main):045:0>
I got the same problem, I got the provider started successfully by changing the
controller_uri
to
controller_uri => 'druby://machineIPAddress:11235'
I try to learn REST in Ruby using Twitter API.
According https://dev.twitter.com/docs/api/1/get/trends I have to write GET request to http://api.twitter.com/1/trends.json.
My Ruby code is:
require 'rubygems'
require 'rest-client'
require 'json'
url = 'http://api.twitter.com/1/trends.json'
response = RestClient.get(url)
puts response.body
But i'm getting next errors:
/home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient /abstract_response.rb:48:in `return!': 404 Resource Not Found (RestClient::ResourceNotFound)
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in `process_result'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:178:in `block in transmit'
from /home/danik/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:745:in `start'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in `transmit'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in `execute'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in `execute'
from /home/danik/.rvm/gems/ruby-1.9.3-p194/gems/rest-client-1.6.7/lib/restclient.rb:68:in `get'
from TwitterTrends.rb:5:in `<main>'
What is wrong?
You are getting that error because the resource you are trying to fetch with http://api.twitter.com/1/trends.json does not exist, as is explained in this doc trends docs
This method is deprecated and has been replaced by GET trends/:woeid.
Please update your applications with the new endpoint.
You want to fetch a URL like this https://api.twitter.com/1/trends/1.json. So, in your code, try doing this:
require 'rubygems'
require 'rest-client'
require 'json'
url = 'https://api.twitter.com/1/trends/1.json'
response = RestClient.get(url)
puts response.body
And you should get a response.