Mechanize breaks on ASP page - ruby

require 'mechanize'
agent = Mechanize.new
login = agent.get('http://www.schoolnet.ch/DE/HomeDE.htm')
agent.click login.link_with text: /Login/
And I get Mechanize::UnsupportedSchemeError.

Mechanize did'nt support javascript but you can add search field to the form assign search term to it and submit the form using mechanize
form = page.forms.first
form.add_field! "Field_name here","BotM$ucUser$ucUser2Col$cmdLogin"
page = form.submit

The link in question runs a javascript function.
Login
Mechanize doesn't support javascript links. Someone else suggests using Harmony.
Check https://github.com/mynyml/harmony

Related

Follow post form redirects using Ruby Mechanize

We're trying to follow post forms that initialize redirects before showing their content using ruby Mechanize/Nokogiri. One example would be the search form on
http://www.chewtonrose.co.uk/
... if you hit the "search" button on your browser, you get taken to
http://www.chewtonrose.co.uk/AdvancedSearch/tabid/4280/Default.aspx?view=tn
how could we set up Mechanize to return that second url?
is Mechanize even the right tool?
Yes, mechanize is good. I checked in this case you will need to submit WITH the button.
agent = Mechanize.new
page = agent.get(<url>)
form = #get form
button = #get button
page2 = agent.submit(form, button)
page2.uri # will show your 2nd url

How to scrape website with search button

I have this website: codigos if You look at it, it has a selection field at left, and a go button at right, I need to scrape some of the items on left.
But, how can I tell to mechanize in ruby how to access that selection field and then make the search and scrape it?
I've seen examples with login forms but I don't know if it can really suit this case though.
The <select> tag is contained within a <form> tag, so you need to locate the form and then you can set the option by passing the name of the select list and specifying the appropriate option:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://comext.aduana.cl:7001/codigos/')
form = page.forms.first
form["wlw-select_key:{actionForm.opcion}"] = "Aduana"
result_page = form.submit
result_page.uri #=> http://comext.aduana.cl:7001/codigos/buscar.do;jsessionid=2hGwYfzD76WKGfFbXJvmS2yq4K19VnZycJfH8hJMTzRFhln4pTy2!1794372623!-1405983655!8080!-1

How to use Mechanize on a page with no form?

I am trying to write a website crawler with Mechanize, and I found that my target website is written in a SPA fashion, and although there are a bunch of text fields and buttons, there is no form!
How can I use mechanize to fill text fields and click buttons outside forms?
I had the exact same problem you did. I ended up using 'capybara', 'launchy' and 'selenium-webdriver' to do what 'mechanize' would have in non-JavaScript env
Let's say agent is a Mechanize object and page is a Mechanize::Page.
You can do:
form = Mechanize::Form.new page.at('body'), agent
Now the form is initialized with all the fields and buttons on the page.
You will need to set the action and method yourself:
form.action = 'http://foo.com'
form.method = 'POST'
next_page = form.submit

Ruby Mechanize: Programmatically Clicking a Link Without Knowing the Name of the Link

I am writing a ruby script to search the web. Here is the code:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://www.example.com/)
example_page = page.link_with(:text => 'example').click
puts example_page.body
The code above works alright. The text 'example' ((:text => 'example') has to be a link on the page for the code to work correctly. The problem, however, is that when I do a web search (bing, yahoo, google, etc), hundreds of links show up. How can I programmatically click a link without knowing the exact name of the link? I want to be able to click a link if the name of the link partly (or fully) matches a text that I specify or click a link if it has a certain url. Any help would be appreciated.
Mechanize has regular expressions:
page.link_with(text: /foo/).click
page.link_with(href: /foo/).click
Here are the Mechanize criteria that generally work for links and forms:
name: name_matcher
id: id_matcher
class: class_matcher
search: search_expression
xpath: xpath_expression
css: css_expression
action: action_matcher
...
If you're curious, here's the Mechanize ElementMatcher code

stumped on clicking a link with nokogiri and mechanize

perhaps im doing it wrong, or there's another more efficient way. Here is my problem:
I first, using nokogiri open an html document and use its css to traverse the document until i find the link which i need to click.
Now once i have the link, how do i use mechanize to click it? According to the documentation, the object returned by Mechanize.new either the string or a Mechanize::Page::Link object.
I cannot use string - since there could be 100's of the same link - i only want mechanize to click the link that was traversed by nokogiri.
Any idea?
After you have found the link node you need, you can create the Mechanize::Page::Link object manually, and click it afterwards:
agent = Mechanize.new
page = agent.get "http://google.com"
node = page.search ".//p[#class='posted']"
Mechanize::Page::Link.new(node, agent, page).click
Easier way than #binarycode option:
agent = Mechanize.new
page = agent.get "http://google.com"
page.link_with(:class => 'posted').click
That is simple, you don't need to use mechanize link_with().click
You can just getthe link and update your page variable
Mechanize saves current working site internally, so it is smart enough to follow local links
Ex.:
agent = Mechanize.new
page = agent.get "http://somesite.com"
next_page_link = page.search('your exotic selectors here').first rescue nil #nokogyri object
next_page_href = next_page_link['href'] rescue nil # '/local/link/file.html'
page = agent.get(next_page_href) if next_page_href # goes to 'http://somesite.com/local/link/file.html'

Resources