How to use Mechanize on a page with no form? - ruby

I am trying to write a website crawler with Mechanize, and I found that my target website is written in a SPA fashion, and although there are a bunch of text fields and buttons, there is no form!
How can I use mechanize to fill text fields and click buttons outside forms?

I had the exact same problem you did. I ended up using 'capybara', 'launchy' and 'selenium-webdriver' to do what 'mechanize' would have in non-JavaScript env

Let's say agent is a Mechanize object and page is a Mechanize::Page.
You can do:
form = Mechanize::Form.new page.at('body'), agent
Now the form is initialized with all the fields and buttons on the page.
You will need to set the action and method yourself:
form.action = 'http://foo.com'
form.method = 'POST'
next_page = form.submit

Related

Follow post form redirects using Ruby Mechanize

We're trying to follow post forms that initialize redirects before showing their content using ruby Mechanize/Nokogiri. One example would be the search form on
http://www.chewtonrose.co.uk/
... if you hit the "search" button on your browser, you get taken to
http://www.chewtonrose.co.uk/AdvancedSearch/tabid/4280/Default.aspx?view=tn
how could we set up Mechanize to return that second url?
is Mechanize even the right tool?
Yes, mechanize is good. I checked in this case you will need to submit WITH the button.
agent = Mechanize.new
page = agent.get(<url>)
form = #get form
button = #get button
page2 = agent.submit(form, button)
page2.uri # will show your 2nd url

How to scrape website with search button

I have this website: codigos if You look at it, it has a selection field at left, and a go button at right, I need to scrape some of the items on left.
But, how can I tell to mechanize in ruby how to access that selection field and then make the search and scrape it?
I've seen examples with login forms but I don't know if it can really suit this case though.
The <select> tag is contained within a <form> tag, so you need to locate the form and then you can set the option by passing the name of the select list and specifying the appropriate option:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://comext.aduana.cl:7001/codigos/')
form = page.forms.first
form["wlw-select_key:{actionForm.opcion}"] = "Aduana"
result_page = form.submit
result_page.uri #=> http://comext.aduana.cl:7001/codigos/buscar.do;jsessionid=2hGwYfzD76WKGfFbXJvmS2yq4K19VnZycJfH8hJMTzRFhln4pTy2!1794372623!-1405983655!8080!-1

Ruby Mechanize: Programmatically Clicking a Link Without Knowing the Name of the Link

I am writing a ruby script to search the web. Here is the code:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://www.example.com/)
example_page = page.link_with(:text => 'example').click
puts example_page.body
The code above works alright. The text 'example' ((:text => 'example') has to be a link on the page for the code to work correctly. The problem, however, is that when I do a web search (bing, yahoo, google, etc), hundreds of links show up. How can I programmatically click a link without knowing the exact name of the link? I want to be able to click a link if the name of the link partly (or fully) matches a text that I specify or click a link if it has a certain url. Any help would be appreciated.
Mechanize has regular expressions:
page.link_with(text: /foo/).click
page.link_with(href: /foo/).click
Here are the Mechanize criteria that generally work for links and forms:
name: name_matcher
id: id_matcher
class: class_matcher
search: search_expression
xpath: xpath_expression
css: css_expression
action: action_matcher
...
If you're curious, here's the Mechanize ElementMatcher code

Mechanize and invisible search form

I'm trying to perform search on some website using Mechanize but I can't submit a search form because mechanize does not see any forms. page.form returns nil and page = agent.get returns just {forms}> while I expect something like
<Mechanize::Form
{name "somename"}
{method "GET"}
{action "/search"}
Is it because the search form uses javascript? Is there any way to solve this? Or the only way is to give up on mechanize and use something else?
It means there's no form on that page. The workaround is to get the next page, the one that's pretending to be a form submit.
In other words when I type 'foo' into the search box and click the button I get redirected to:
http://s.weibo.com/weibo/foo&Refer=index
So just get that page and do something with it.

Mechanize breaks on ASP page

require 'mechanize'
agent = Mechanize.new
login = agent.get('http://www.schoolnet.ch/DE/HomeDE.htm')
agent.click login.link_with text: /Login/
And I get Mechanize::UnsupportedSchemeError.
Mechanize did'nt support javascript but you can add search field to the form assign search term to it and submit the form using mechanize
form = page.forms.first
form.add_field! "Field_name here","BotM$ucUser$ucUser2Col$cmdLogin"
page = form.submit
The link in question runs a javascript function.
Login
Mechanize doesn't support javascript links. Someone else suggests using Harmony.
Check https://github.com/mynyml/harmony

Resources