Mechanize parsing error - ruby

I started to use mechanize in ruby recently and it was working perfectly.
Today I tried to get a page but for some reason the input fields are not taken, please refer to the code below:
agent = Mechanize.new
agent.add_auth(url, user, pass1, realm = nil, domain = nil)
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#agent.log = Logger.new(STDOUT)
page = agent.get(url)
page.forms.first.field_with(:name => 'Login[username]').value=user
page.forms.first.field_with(:name => 'Login[password]').value=pass2
page = agent.submit(page.forms.first)
page = page.link_with(:text => "Search").click
page = page.link_with(:text => "Spiral").click
pp page
The html page that Im trying to parse contains this line:
<input name="SpiralMatch_string" type="text" maxlength="128">
But for some reason there is nothing related to that when I dump the contents of the current "page"
There is one more thing that may be related, there is a java running below this field, every time I type something in it, the main contents of the page is dynamicaly changing. Has anyone encountered the same problem?

It sounds like the page may be getting populated through javascript or ajax calls.
Just because the browser shows you some html in 'view source' doesn't mean it actually in the response.
You should use a debugging proxy like charles or fiddler to see what the response(s) really looked like.

Related

How to scrape website with search button

I have this website: codigos if You look at it, it has a selection field at left, and a go button at right, I need to scrape some of the items on left.
But, how can I tell to mechanize in ruby how to access that selection field and then make the search and scrape it?
I've seen examples with login forms but I don't know if it can really suit this case though.
The <select> tag is contained within a <form> tag, so you need to locate the form and then you can set the option by passing the name of the select list and specifying the appropriate option:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://comext.aduana.cl:7001/codigos/')
form = page.forms.first
form["wlw-select_key:{actionForm.opcion}"] = "Aduana"
result_page = form.submit
result_page.uri #=> http://comext.aduana.cl:7001/codigos/buscar.do;jsessionid=2hGwYfzD76WKGfFbXJvmS2yq4K19VnZycJfH8hJMTzRFhln4pTy2!1794372623!-1405983655!8080!-1

Ruby Watir -- Trying to loop through links in cnn.com and click each one of them

I have created this method to loop through the links in a certain div in the web site. My porpose of the method Is to collect the links insert them in an array then click each one of them.
require 'watir-webdriver'
require 'watir-webdriver/wait'
site = Watir::Browser.new :chrome
url = "http://www.cnn.com/"
site.goto url
box = Array.new
container = site.div(class: "column zn__column--idx-1")
wanted_links = container.links
box << wanted_links
wanted_links.each do |link|
link.click
site.goto url
site.div(id: "nav__plain-header").wait_until_present
end
site.close
So far it seems like I am only able to click on the first link then I get an error message stating this:
unable to locate element, using {:element=>#<Selenium::WebDriver::Element:0x634e0a5400fdfade id="0.06177683611003881-3">} (Watir::Exception::UnknownObjectException)
I am very new to ruby. I appreciate any help. Thank you.
The problem is that once you navigate to another page, all of the element references (ie those in wanted_links) become stale. Even if you return to the same page, Watir/Selenium does not know it is the same page and does not know where the stored elements are.
If you are going to navigate away, you need to collect all of the data you need first. In this case, you just need the href values.
# Collect the href of each link
wanted_links = container.links.map(&:href)
# You have each page URL, so you can navigate directly without returning to the homepage
wanted_links.each do |link|
site.goto url
end
In the event that the links do not directly navigate to a page (eg they execute JavaScript when clicked), you will need to collect enough data to re-locate the elements later. What you use as the locator will depend on what is known to be static/unique. As an example, I will assume that the link text is a good locator.
# Collect the text of each link
wanted_links = container.links.map(&:text)
# Iterate through the links
wanted_links.each do |link_text|
container = site.div(class: "column zn__column--idx-1")
container.link(text: link_text).click
site.back
end

Ruby Mechanize: Programmatically Clicking a Link Without Knowing the Name of the Link

I am writing a ruby script to search the web. Here is the code:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://www.example.com/)
example_page = page.link_with(:text => 'example').click
puts example_page.body
The code above works alright. The text 'example' ((:text => 'example') has to be a link on the page for the code to work correctly. The problem, however, is that when I do a web search (bing, yahoo, google, etc), hundreds of links show up. How can I programmatically click a link without knowing the exact name of the link? I want to be able to click a link if the name of the link partly (or fully) matches a text that I specify or click a link if it has a certain url. Any help would be appreciated.
Mechanize has regular expressions:
page.link_with(text: /foo/).click
page.link_with(href: /foo/).click
Here are the Mechanize criteria that generally work for links and forms:
name: name_matcher
id: id_matcher
class: class_matcher
search: search_expression
xpath: xpath_expression
css: css_expression
action: action_matcher
...
If you're curious, here's the Mechanize ElementMatcher code

stumped on clicking a link with nokogiri and mechanize

perhaps im doing it wrong, or there's another more efficient way. Here is my problem:
I first, using nokogiri open an html document and use its css to traverse the document until i find the link which i need to click.
Now once i have the link, how do i use mechanize to click it? According to the documentation, the object returned by Mechanize.new either the string or a Mechanize::Page::Link object.
I cannot use string - since there could be 100's of the same link - i only want mechanize to click the link that was traversed by nokogiri.
Any idea?
After you have found the link node you need, you can create the Mechanize::Page::Link object manually, and click it afterwards:
agent = Mechanize.new
page = agent.get "http://google.com"
node = page.search ".//p[#class='posted']"
Mechanize::Page::Link.new(node, agent, page).click
Easier way than #binarycode option:
agent = Mechanize.new
page = agent.get "http://google.com"
page.link_with(:class => 'posted').click
That is simple, you don't need to use mechanize link_with().click
You can just getthe link and update your page variable
Mechanize saves current working site internally, so it is smart enough to follow local links
Ex.:
agent = Mechanize.new
page = agent.get "http://somesite.com"
next_page_link = page.search('your exotic selectors here').first rescue nil #nokogyri object
next_page_href = next_page_link['href'] rescue nil # '/local/link/file.html'
page = agent.get(next_page_href) if next_page_href # goes to 'http://somesite.com/local/link/file.html'

Mechanize breaks on ASP page

require 'mechanize'
agent = Mechanize.new
login = agent.get('http://www.schoolnet.ch/DE/HomeDE.htm')
agent.click login.link_with text: /Login/
And I get Mechanize::UnsupportedSchemeError.
Mechanize did'nt support javascript but you can add search field to the form assign search term to it and submit the form using mechanize
form = page.forms.first
form.add_field! "Field_name here","BotM$ucUser$ucUser2Col$cmdLogin"
page = form.submit
The link in question runs a javascript function.
Login
Mechanize doesn't support javascript links. Someone else suggests using Harmony.
Check https://github.com/mynyml/harmony

Resources