Crawl data using ruby mechanize - ruby

I am crawling data from http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53
Below is the code I have tried :
uri = "http://www.mca.gov.in/DCAPortalWeb/dca/MyMCALogin.do?method=setDefaultProperty&mode=53"
#html, html_content = #mobj.get_data(uri)
agent = Mechanize.new
html_page = agent.get uri
html_form = html_page.form
html_form.radiobuttons_with(:name => 'search',:value => '2')[0].check
html_form.submit
puts html_page.content
Error :
var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:308:in `fetch': 500 => Net::HTTPInternalServerError for http://www.mca.gov.in/DCAPortalWeb/dca/ProsecutionDetailsSRAction.do -- unhandled response (Mechanize::ResponseCodeError)
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in `post_form'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize.rb:548:in `submit'
from /var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:223:in `submit'
from ministry_corp_aff.rb:32:in `start'
from ministry_corp_aff.rb:52:in `<main>'
If I manually click on the 3rd radio button and then submit it, I get a .zip file. I was trying to fetch data from the .xls file from that zip..

The radio button has an onclick even handler that triggers the execution of some javascript. In addition, clicking on the Submit <a> tag also causes some javascript to execute. That javascript probably sets some values that are returned with the form, which the server examines.
Mechanize cannot execute the javascript. You need selenium webdriver for that.

Related

How to set set a new default download directory or folder after clicking a download button on a page. I am using cucumber with watir

How to set a new default download directory or folder after clicking a download button on a page. I am using cucumber with watir
Error message:
Then(/^the user clicks on "(.*)" download on "(.*)" page$/) do |field_name, page_name|
# get the XPATH or CSS from page object file, Raises Error if not found
begin
selector, element_path = get_element_target(field_name, page_name).split('^^')
rescue
fail("Element Xpath is not found for #{field_name} in #{page_name} page objects File")
end
if selector.nil? || element_path.nil?
fail("Element Xpath is not found for #{field_name} in #{page_name} page objects File")
end
selector = (selector.downcase.include? 'xpath') ? :xpath : :css
# Create the Element object
element_obj = #browser.element(selector, element_path)
# Wait for element to be present
wait_for_element(element_obj)
# Focus on element to make it visible
focus_on_element(element_obj)
DOWNLOAD_DIR = "#{Dir.pwd}" + '/features/support/Downloads/'
element_obj.click(DOWNLOAD_DIR)
end

Using Mechanize to log into https://kindle.amazon.com/login

I am trying to use Mechanize to log into my Kindle account at Amazon.
The login page URL is https://kindle.amazon.com/login
I can manually log into this page without issue but if I try it using the following code it always fails with an error (see screenshot below).
require 'mechanize'
mechanize_agent = Mechanize.new
mechanize_agent.user_agent_alias = 'Windows Mozilla'
signin_page = mechanize_agent.get("https://kindle.amazon.com/login")
signin_form = signin_page.form("signIn")
signin_form.email = "email#example.com"
signin_form.password = "password"
post_signin_page = mechanize_agent.submit(signin_form)
This is always the resulting page (again, I'm certain my script is using valid values):
Looks like mechanize is trying to submit the form without the propper action. Try using the Continue button, and send the form with that button:
# ...
submit_button = signin_form.buttons.find { |b| b.value == "Continue" }
post_signin_page = mechanize_agent.submit signin_form, submit_button

Mechanize form submission

I have a website that I am attempting to scrape using Mechanize.
When I submit the form, the form is submitted with an URL of the following format :
https://www.website.com/Login/Options?returnURL=some_form_options
(If I enter that URL in the browser, it will send me to a nice error page saying that the requested page does not exist)
Whereas, if I submit the form from the website, the returned URL will be of the following format :
https://www.website.com/topic/country/list_of_form_options
The website has a login form that is not necessary to fill in to be able to submit a search query.
Any idea why I would get a different URL submitting the same form with Mechanize ? And how to counter that ?
I cannot process the URL I get after "mechanizing" the form.
Thanks!
You can find the exact form that you want to submit then submit, If you are unable to find the path then Even you can add form field using Mechanize and submit that form. Here is my code that i have used in my project.
I had create a rake task for this task:
namespace :test_namespace do
task :mytask => [:environment] do
site = "http://www.website.com/search/search.aspx?term=search term"
# prepare user agent
ua = Mechanize.new
page = ua.get("#{site}")
while (true)
page.search("//div[#class='resultsNoBackground']").each do |res|
puts res.at("table").at('tr').at('td').text
link_text =res.at_css('strong').at('a').text
link_href = res.at_css('strong').at('a')['href']
link_href ="http://www.website.com"+link_href
page_content=''
res.css('span').each do |ss|
ss.css('strong').remove
page_content=ss.text.gsub(/Vi.*s\)/, '')
end
# puts "HERE IS THE SUMMMER ......#{content_summery}"
end
if page.search("#ctl00_ContentPlaceHolder1_ctrlResults_gvResults_ctl01_lbNext").count > 0
form = page.forms.first
form.add_field! "__EVENTTARGET", "ctl00$ContentPlaceHolder1$ctrlResults$gvResults$ctl01$lbNext"
form.add_field! "__EVENTARGUMENT", ""
page = form.submit
else
break
end
end
end
end

Ruby form auto submission Mechanize::ResponseCodeError

submit_form = agent.get("http://sample.com/NewTask.aspx").form("aspnetForm") do |f|
f["ctl00$ContentPlaceHolder1$txtNumber"] = "1234",
f["ctl00$ContentPlaceHolder1$cmbText"] = "test",
f["ctl00$ContentPlaceHolder1$FUpload$fu"] = "",
f["ctl00$ContentPlaceHolder1$btn"] = "test"
f.submit(f.button_with(:name => "ctl00$ContentPlaceHolder1$btnOK"))
end
This is the code I wrote for the form auto submission using the mechanize lib for Ruby, it came back with Mechanize::ResponseCodeError as follow. I really don't see any error in my code, anyone could kindly let me know if this is a code error or something on the server side (say server prevents form auto submission)?
C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.4/lib/mechanize/http/agent.rb:29
1:in fetch': 500 => Net::HTTPInternalServerError for http://sample.com/NewTask.aspx -- unhandled response (Mechanize::ResponseCodeError)
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.4/lib/mechanize.rb:1207:inpost_form'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.4/lib/mechanize.rb:515:in submit'
from C:/Ruby193/lib/ruby/gems/1.9.1/gems/mechanize-2.4/lib/mechanize/form.rb:178:insubmit'
from auto_post.rb:27:in block in <main>'
from (eval):23:inform_with'
from auto_post.rb:13:in `'
You need to proxy through a debugging proxy like fiddler or charles:
agent.set_proxy 'localhost', 8888
then proxy your browser similarly and compare the requests

Ruby Mechanize 405 Net::HTTPMethodNotAllowed Error While Scraping Fedex Billing

I have a script that goes into Fedex Billing each week when they mail me my invoice, digs out information and posts it to xpenser.com. After the recent Fedex Billing site redesign, when I run this code:
agent = Mechanize.new
page = agent.get 'http://fedex.com/us/fcl/pckgenvlp/online-billing/'
form = page.form_with(:name => 'logonForm')
form.username = FEDEX['username']
form.password = FEDEX['password']
page = agent.submit form
pp page
I receive this error:
Mechanize::ResponseCodeError: 405 => Net::HTTPMethodNotAllowed
I see there is a javascript auth function that seems to build a URL that sets hidden variables. I've tried to pass various combinations of variable strings in without success.
While Mechanize doesn't support javascript, it will pass in variable strings and if you hit the correct one, you can auth that way. I'm hoping to do that here.
Using mechanize-1.0.0 the following works:
agent = Mechanize.new
page = agent.get 'http://fedex.com/us/fcl/pckgenvlp/online-billing/'
form = page.form_with(:name => 'logonForm')
form.username = FEDEX['username']
form.password = FEDEX['password']
form.add_field!('field_name', 'Page$2')
page = agent.submit form
pp page
try this. it may help you

Resources