Follow post form redirects using Ruby Mechanize - ruby

We're trying to follow post forms that initialize redirects before showing their content using ruby Mechanize/Nokogiri. One example would be the search form on
http://www.chewtonrose.co.uk/
... if you hit the "search" button on your browser, you get taken to
http://www.chewtonrose.co.uk/AdvancedSearch/tabid/4280/Default.aspx?view=tn
how could we set up Mechanize to return that second url?
is Mechanize even the right tool?

Yes, mechanize is good. I checked in this case you will need to submit WITH the button.
agent = Mechanize.new
page = agent.get(<url>)
form = #get form
button = #get button
page2 = agent.submit(form, button)
page2.uri # will show your 2nd url

Related

How to scrape website with search button

I have this website: codigos if You look at it, it has a selection field at left, and a go button at right, I need to scrape some of the items on left.
But, how can I tell to mechanize in ruby how to access that selection field and then make the search and scrape it?
I've seen examples with login forms but I don't know if it can really suit this case though.
The <select> tag is contained within a <form> tag, so you need to locate the form and then you can set the option by passing the name of the select list and specifying the appropriate option:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://comext.aduana.cl:7001/codigos/')
form = page.forms.first
form["wlw-select_key:{actionForm.opcion}"] = "Aduana"
result_page = form.submit
result_page.uri #=> http://comext.aduana.cl:7001/codigos/buscar.do;jsessionid=2hGwYfzD76WKGfFbXJvmS2yq4K19VnZycJfH8hJMTzRFhln4pTy2!1794372623!-1405983655!8080!-1

When I click a button that leads to another page, how do I get the contents of the new page?

I'm using selenium-webdriver to scrape a website. When the browser clicks the "Next" button, the next page loads, but when I try to find the elements I want, the driver prints contents from the previous page.
Here's my script:
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox
url = 'http://www.airforwarders.org/companies'
page = driver.navigate.to(url)
driver.find_elements(:css=>'.item_main').each{|div|
puts div.text
}
paginationToolbar = driver.find_element(:css=>'.pagination-toolbar')
paginationToolbar.find_elements(:css=>'.btn')[-2].click # Clicking the "next" button
driver.find_elements(:css=>'.item_main').each{|div|
puts div.text # This shows the same stuff from the previous loop
}
If I can get the contents from the new page, this would be no problem. How do I do this?
If you are sure, that Selenium had clicked on the button the next page is loaded, then I think you should add sleep 1 after click.
Possibly ajax wasn't finished on the moment after the click action.
Try to wait 1-3 seconds before doing additional actions.

How to use Mechanize on a page with no form?

I am trying to write a website crawler with Mechanize, and I found that my target website is written in a SPA fashion, and although there are a bunch of text fields and buttons, there is no form!
How can I use mechanize to fill text fields and click buttons outside forms?
I had the exact same problem you did. I ended up using 'capybara', 'launchy' and 'selenium-webdriver' to do what 'mechanize' would have in non-JavaScript env
Let's say agent is a Mechanize object and page is a Mechanize::Page.
You can do:
form = Mechanize::Form.new page.at('body'), agent
Now the form is initialized with all the fields and buttons on the page.
You will need to set the action and method yourself:
form.action = 'http://foo.com'
form.method = 'POST'
next_page = form.submit

Mechanize and invisible search form

I'm trying to perform search on some website using Mechanize but I can't submit a search form because mechanize does not see any forms. page.form returns nil and page = agent.get returns just {forms}> while I expect something like
<Mechanize::Form
{name "somename"}
{method "GET"}
{action "/search"}
Is it because the search form uses javascript? Is there any way to solve this? Or the only way is to give up on mechanize and use something else?
It means there's no form on that page. The workaround is to get the next page, the one that's pretending to be a form submit.
In other words when I type 'foo' into the search box and click the button I get redirected to:
http://s.weibo.com/weibo/foo&Refer=index
So just get that page and do something with it.

Mechanize breaks on ASP page

require 'mechanize'
agent = Mechanize.new
login = agent.get('http://www.schoolnet.ch/DE/HomeDE.htm')
agent.click login.link_with text: /Login/
And I get Mechanize::UnsupportedSchemeError.
Mechanize did'nt support javascript but you can add search field to the form assign search term to it and submit the form using mechanize
form = page.forms.first
form.add_field! "Field_name here","BotM$ucUser$ucUser2Col$cmdLogin"
page = form.submit
The link in question runs a javascript function.
Login
Mechanize doesn't support javascript links. Someone else suggests using Harmony.
Check https://github.com/mynyml/harmony

Resources