NoMethodError in from Mechanize - ruby

running this code with mecahnize 2.7.3 and ruby 2.3.0dev:
require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.open_timeout = 2
agent.read_timeout = 2
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
agent.head(url)
Gives me this NoMethodError:
~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:648:in resolve': undefined
methodlength' for nil:NilClass (NoMethodError)
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:223:in `fetch'
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize.rb:459:in `head
Is this a bug in mechanize or am I doing something wrong? If so how can it be fixed?
EDIT: the url is obviously worng, but im reading a lot of urls from a file and some of them might be wrong.
EDIT2: lets say I have a file like this http://pastie.org/9934756
I need to get the head of all the urls that are correct and ignore the others

You write a wrong url, try this: url = 'http://scouts.org.uk'

Your target site is doing a redirect and uses meta refresh. Update your code to include those methods:
require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.follow_meta_refresh = true
agent.redirect_ok = true
agent.open_timeout = 10
agent.read_timeout = 10
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
begin
page_head = agent.head(url)
rescue Exception => exception
puts "Caught exception: #{exception.message}"
end
Result:
=> #Caught exception: undefined method `length' for nil:NilClass

You can add this method to check valid url or not :
require 'uri'
def valid?(url)
uri = URI.parse(url)
if uri.kind_of?(URI::HTTP) == true
puts '+'
else
puts '-'
end
rescue URI::InvalidURIError
puts 'false '
end
['http://web.de',
'http://web.de/',
'http:%5c%5cweb.de',
'http:web.de',
'foo://web.de',
'http://we b.de',
'http://|web.de'].each { |i|
valid?(i)
}
+
+
+
+
false
false

Related

ZAP automation :undefined method `[]' for nil:NilClass (NoMethodError)

ZAP automation :undefined method `[]' for nil:NilClass (NoMethodError)
I am getting the above error while trying to get the response of zap using ruby. below is my code:
Then(/^I should be able to see security warnings$/) do
#Get response from via RestClient framework method.
begin
response = JSON.parse RestClient.get "http://#{$zap_proxy}:#{$zap_proxy_port}/json/core/view/alerts"
rescue RestClient::ServerBrokeConnection
#Classify the alerts
events = response['alerts']
high_risks = events.select{|x| x['risk'] == 'High'}
high_count = high_risks.size
medium_count = events.select{|x| x['risk'] == 'Medium'}.size
low_count = events.select{|x| x['risk'] == 'Low'}.size
informational_count = events.select{|x| x['risk'] == 'Informational'}.size
end
#Check high alert count and print them
if high_count > 0
high_risks.each { |x| p x['alert'] }
end
#Expect high alert count equal to 0
expect(high_count).to eq 0
#Print alerts with risk levels
site = Capybara.app_host
response = JSON.parse RestClient.get "http://#{$zap_proxy}:#{$zap_proxy_port}/json/core/view/alerts",
params: { zapapiformat: 'JSON', baseurl: site }
response['alerts'].each { |x| p "#{x['alert']} risk level: #{x['risk']}"}
end
some one please help me. my intention is to print the security alerts and display them on my command prompt
I think you have nil value in events and you try to get value x['...'] from nil .
it would take a little more detail including the line.
edit:
try events = response['alerts'].reject { |x| x.nil? }

Undefined method 'host' in rspec

I have the following methods in a Ruby script:
def parse_endpoint(endpoint)
return URI.parse(endpoint)
end
def verify_url(endpoint, fname)
url = “#{endpoint}#{fname}”
req = Net::HTTP.new(url.host, url.port)
res = req.request_head(url.path)
if res.code == “200”
true
else
puts “#{fname} is an invalid file”
false
end
end
Testing the url manually like so works fine (returns true since the url is indeed valid):
endpoint = parse_endpoint('http://mywebsite.com/mySubdirectory/')
verify_url(endpoint, “myFile.json”)
However, when I try to do the following in rspec
describe 'my functionality'
let (:endpoint) { parse_endpoint(“http://mywebsite.com/mySubdirectory/”) }
it 'should verify valid url' do
expect(verify_url(endpoint, “myFile.json”).to eq(true))
end
end
it gives me this error
“NoMethodError:
undefined method `host' for "http://mysebsite.com/mySubdirectory/myFile.json":String”
What am I doing wrong?
url is a String object, and you are trying to access a method called host which does not exist in String:
url = “#{endpoint}#{fname}”
req = Net::HTTP.new(url.host, url.port)
EDIT you probably need an URI object. I think this is what you want:
2.2.1 :004 > require 'uri'
=> true
2.2.1 :001 > url = 'http://mywebsite.com/mySubdirectory/'
=> "http://mywebsite.com/mySubdirectory/"
2.2.1 :005 > parsed_url = URI.parse url
=> #<URI::HTTP http://mywebsite.com/mySubdirectory/>
2.2.1 :006 > parsed_url.host
=> "mywebsite.com"
So just add url = URI.parse url before using url.host.
Testing the url manually like so works fine (returns true since the url is indeed valid):
endpoint = parse_endpoint('http://mywebsite.com/mySubdirectory/')
verify_url(endpoint, “myFile.json”)
It seems you missed something when you tested code above (maybe you tested old version) because it can't work as it is now.
Look at these lines of code:
url = "#{endpoint}#{fname}"
req = Net::HTTP.new(url.host, url.port)
You're creating a string variable url from other two variables endpoint and fname. So far, so good.
But then you're trying to access method host on url variable, which doesn't exist (but it exists on the endpoint variable), that's why you get this error.
You may want to use this code instead:
def verify_url(endpoint, fname)
url = endpoint.merge(fname)
res = Net::HTTP.start(url.host, url.port) do |http|
http.head(url.path)
end
# it's actually a bad idea to puts some text in a query method
# let's just return value instead
res.code == "200"
end

Ruby Mechanize Stops Working while in Each Do Loop

I am using a mechanize Ruby script to loop through about 1,000 records in a tab delimited file. Everything works as expected until i reach about 300 records.
Once I get to about 300 records, my script keeps calling rescue on every attempt and eventually stops working. I thought it was because I had not properly set max_history, but that doesn't seem to be making a difference.
Here is the error message that I start getting:
getaddrinfo: nodename nor servname provided, or not known
Any ideas on what I might be doing wrong here?
require 'mechanize'
result_counter = 0
used_file = File.open(ARGV[0])
total_rows = used_file.readlines.size
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
}
File.open(ARGV[0]).each do |line|
item = line.split("\t").map {|item| item.strip}
website = item[16]
name = item[11]
if website
begin
tries ||= 3
page = mechanize.get(website)
primary1 = page.link_with(text: 'text')
secondary1 = page.link_with(text: 'other_text')
contains_primary = true
contains_secondary = true
unless contains_primary || contains_secondary
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - No"
end
end
for i in [primary1]
if i
page_to_visit = i.click
page_found = page_to_visit.uri
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name}"
end
break
end
end
rescue Timeout::Error
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Timeout"
rescue => e
STDERR.puts e.message
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Rescue"
end
end
end
You get this error because you don't close the connection after you used it.
This should fix your problem:
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
agent.keep_alive = false
}

How to exit from async call when url timeout with ruby/curb

I am using Ruby curb to call multiple urls at once, e.g.
require 'rubygems'
require 'curb'
easy_options = {:follow_location => true}
multi_options = {:pipeline => true}
Curl::Multi.get(['http://www.example.com','http://www.trello.com','http://www.facebook.com','http://www.yahoo.com','http://www.msn.com'], easy_options, multi_options) do|easy|
# do something interesting with the easy response
puts easy.last_effective_url
end
The problem I have is I want to break the subsequent async calls when any url timeout occurred, is it possible?
As far as I know the current API doesn't expose the Curl::Multi instance, since otherwise you could do:
stop_everything = proc { multi.cancel! }
multi = Curl::Multi.get(array_of_urls, on_failure: stop_everything)
The easiest way might be to patch the Curl::Multi.http to return the m variable.
See https://github.com/taf2/curb/blob/master/lib/curl/multi.rb#L85
I think this will do exactly what you ask for:
require 'rubygems'
require 'curb'
responses = {}
requests = ['http://www.example.com','http://www.trello.com','http://www.facebook.com','http://www.yahoo.com','http://www.msn.com']
m = Curl::Multi.new
requests.each do |url|
responses[url] = ""
c = Curl::Easy.new(url) do|curl|
curl.follow_location = true
curl.on_body{|data| responses[url] << data; data.size }
curl.on_success {|easy| puts easy.last_effective_url }
curl.on_failure {|easy| puts "ERROR:#{easy.last_effective_url}"; #should_stop = true}
end
m.add(c)
end
m.perform { m.cancel! if #should_stop }

Using ruby to retrieve a document from a website

I have written a script in ruby that navigates through a website and gets to a form page. Once the form page is filled out the script hits the submit button and then a dialogbox opens asking you where to save it too. I am having trouble trying to get this file. I have searched the web and cant find anything. How would i go about retrieveing the file name of the document?
I would really appreciate if someone could help me
My code is below:
browser = Mechanize.new
## CONSTANTS
LOGIN_URL = 'https://business.airtricity.com/ews/welcome.jsp'
HOME_PAGE_URL = 'https://business.airtricity.com/ews/welcome.jsp'
CONSUMPTION_REPORT_URL = 'https://business.airtricity.com/ews/touConsChart.jsp?custid=209495'
LOGIN = ""
PASS = ""
MPRN_GPRN_LCIS = "10000001534"
CONSUMPTION_DATE = "20/01/2013"
END_DATE = "27/01/2013"
DOWNLOAD = "DL"
### Login page
begin
login_page = browser.get(LOGIN_URL)
rescue Mechanize::ResponseCodeError => exception
login_page = exception.page
end
puts "+++++++++"
puts login_page.links
puts "+++++++++"
login_form = login_page.forms.first
login_form['userid'] = LOGIN
login_form['password'] = PASS
login_form['_login_form_'] = "yes"
login_form['ipAddress'] = "137.43.154.176"
login_form.submit
## home page
begin
home_page = browser.get(HOME_PAGE_URL)
rescue Mechanize::ResponseCodeError => exception
home_page = exception.page
end
puts "----------"
puts home_page.links
puts "----------"
# Consumption Report
begin
Report_Page = browser.get(CONSUMPTION_REPORT_URL)
rescue Mechanize::ResponseCodeError => exception
Report_Page = exception.page
end
puts "**********"
puts Report_Page.links
pp Report_Page
puts "**********"
Report_Form = Report_Page.forms.first
Report_Form['entity1'] = MPRN_GPRN_LCIS
Report_Form['start'] = CONSUMPTION_DATE
Report_Form['end'] = END_DATE
Report_Form['charttype'] = DOWNLOAD
Report_Form.submit
## Download Report
begin
browser.pluggable_parser.csv = Mechanize::Download
Download_Page = browser.get('https://business.airtricity.com/ews/touConsChart.jsp?custid=209495/meter_read_download_2013-1-20_2013-1-27.csv').save('Hello')
rescue Mechanize::ResponseCodeError => exception
Download_Page = exception.page
end
http://mechanize.rubyforge.org/Mechanize.html#method-i-get_file
File downloading from url it's pretty straightforward with mechanize:
browser = Mechanize.new
file_url = 'https://raw.github.com/ragsagar/ragsagar.github.com/c5caa502f8dec9d5e3738feb83d86e9f7561bd5e/.html'
downloaded_file = browser.get_file file_url
File.open('new_file.txt', 'w') { |file| file.write downloaded_file }
I've seen automation fail because of the browser agent. Perhaps you could try
browser.user_agent_alias = "Windows Mozilla"

Resources