Mechanize links_with is not filtering the text properly - ruby

I am trying to click a list of links with Mechanize gem, but apparently Mechanize's links_with(criteria) is not properly filtering based on the criteria. For debugging purposes, I am only printing out the link.
The following script is printing out most (all?) links on the page:
require 'mechanize'
agent = Mechanize.new
url = "http://www.fearlessphotographers.com/location/470/sul-do-brasil"
agent.get(url)
agent.page.links_with(:text => /[VIEW FULL PROFILE]/).each do |link|
puts link.text
end
And if I change the (:text => /[VIEW FULL PROFILE]/) to (:text => "VIEW FULL PROFILE") then no link at all gets printed.
I can't understand what I am doing wrong. Any thoughts?

Brackets [] have special meaning in regex. You need to escape them with a slashie /\[\]/.
On second thought, there's no brackets in those links so leave them out.
page.links_with :text => /View Full Profile/
Also notice that the text seems to be getting uppercase()'d with css.

Related

I can not locate proper element - click on link Ruby

I tried to click on link (see screenshots)
http://imgur.com/q66g7z6
http://imgur.com/KNF1y7z
I tried using few examples
e.g
#browser.button(:class=> '//*[#class="login"]//ul/li[0]/a').click
and
browser.button(:xpath=> "//a[#data-viewmodel='PagesAsync/RegisterPrivate/RegisterPrivateViewModel']").click
but is not correct
I can see the message that unable to locate element
Can somebody help?
The main problem is that you are telling Watir to look for a button when you actually want a link. While the UI may be styled to look like a button, you will notice that the HTML has a a tag instead.
The first example, which also has the wrong locator type, should be:
#browser.link(:xpath => '//*[#class="login"]//ul/li[0]/a').click
The second example should be:
browser.link(:xpath => "//a[#data-viewmodel='PagesAsync/RegisterPrivate/RegisterPrivateViewModel']").click
Note that the second example would be more Watir-like if you use the normal attribute locators:
browser.link(data_viewmodel: 'PagesAsync/RegisterPrivate/RegisterPrivateViewModel').click
There are two options. One is get your developers to add better IDs.
If that is not possible, try this:
how does ruby webdriver get element with hyphen in <name, value> pair
It worked for me in several similar situations.
I wonder how you can find a button by using an xpath to a link. It is also not clear whether you use browser or #browser. You would need to look into how the browser instance is defined, which likely is one of these:
#browser = Watir::Browser.new :chrome
###or###
browser = Watir::Browser.start 'example.com', :firefox
and if you haven't create a browser instance, then you would need to do it before you can use Watir-Webdriver. ;)
As for your question, you could try searching using the text if it is unique like this, though it may be a brittle test:
#browser.div(:class => 'login').link(:text => /For priva/).click
but I would recommend to double check the number of elements found using the div and link locators like this to make sure you got the right element:
#browser.divs(:class => 'login').length
#browser.div(:class => 'login').links(:text => /For priva/).length

Mechanize scraping google urls

I have a program that searches google using either a key word or keywords that are taken as a parameter while running the program:
example: pull_sites.rb "testing"
returns these sites >>>
https://en.wikipedia.org/wiki/Software_testing
http://en.wikipedia.org/wiki/Test_automation
http://www.istqb.org/about-istqb.html
http://softwaretestingfundamentals.com/test-plan/
https://en.wikipedia.org/wiki/Software_testing
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:9qU2GDLzZzEJ:https://en.wikipedia.org/wiki/Software_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://en.wikipedia.org/wiki/Test_strategy
https://en.wikipedia.org/wiki/Category:Software_testing
https://en.wikipedia.org/wiki/Test_automation
https://en.wikipedia.org/wiki/Portal:Software_testing
https://en.wikipedia.org/wiki/Test
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:R94CAo00wOYJ:https://en.wikipedia.org/wiki/Test%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://en.wikipedia.org/wiki/Unit_testing
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:G9V8uRLkPjIJ:https://en.wikipedia.org/wiki/Unit_testing%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://testing.byu.edu/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:d9bGrCHr9fsJ:https://testing.byu.edu/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://www.test.com/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:S92tylTr1V8J:https://www.test.com/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://ddce.utexas.edu/disability/using-testing-accommodations/
http://blogs.vmware.com/virtualblocks/2015/07/06/vsan-vs-nutanix-head-to-head-performance-testing-part-4-exchange/
http://www.networkforgood.com/nonprofitblog/testing-101-4-steps-optimizing-your-fundraising-approach/
http://www.auslea.com/software-testing-training.html
http://academy.littletonpublicschools.net/Default.aspx%3Ftabid%3D12807%26articleType%3DArticleView%26articleId%3D2400
https://golang.org/pkg/testing/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:EALG7Jlm9eoJ:https://golang.org/pkg/testing/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://www.speedtest.net/
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:M47_v0xF3m8J:http://www.speedtest.net/%252Btesting%26gbv%3D1%26%26ct%3Dclnk
https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:1sMSoJBXydoJ:https://www.act.org/content/act/en/products-and-services/the-act/taking-the-test.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk
http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:pAzlNJl3YY4J:http://www.act.org/content/act/en/products-and-services/the-act/test-preparation.html%252Btesting%26gbv%3D1%26%26ct%3Dclnk
It works as expected but only scrapes the first page of google, is it possible to search say page 1-5?
Here's the source of the scrape:
def get_urls
puts "Searching...".green
agent = Mechanize.new
page = agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}" #SEARCH is the parameter given when program is run
page = agent.submit(google_form, google_form.buttons.first)
page.links.each do |link|
if link.href.to_s =~/url.q/
str=link.href.to_s
strList=str.split(%r{=|&})
url=strList[1]
File.open("links.txt", "a+"){ |s| s.puts(url) }
end
end
end
Ok if you are using google chrome or firefox, open up the developer tools. This will help you to identify the links you want to automate clicking. When you do a google search and then scroll to the bottom you will see the page links to click on. Using the developer tools in your browser you need to identify what class or id google is assigning these page number links. Then using mechanizes click method to follow these links. For example if the link is labelled "next" you can use something simple like:
page2 = page1.link_with(:text => "next").click
I'm answering from my phone so it may save you time to google "click a link" with mechanize for more details on it.
That's a GET form so much easier just to make the request yourself:
https://www.google.com/search?q=foo
https://www.google.com/search?q=foo&start=10
https://www.google.com/search?q=foo&start=20

Ruby Mechanize: Programmatically Clicking a Link Without Knowing the Name of the Link

I am writing a ruby script to search the web. Here is the code:
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('http://www.example.com/)
example_page = page.link_with(:text => 'example').click
puts example_page.body
The code above works alright. The text 'example' ((:text => 'example') has to be a link on the page for the code to work correctly. The problem, however, is that when I do a web search (bing, yahoo, google, etc), hundreds of links show up. How can I programmatically click a link without knowing the exact name of the link? I want to be able to click a link if the name of the link partly (or fully) matches a text that I specify or click a link if it has a certain url. Any help would be appreciated.
Mechanize has regular expressions:
page.link_with(text: /foo/).click
page.link_with(href: /foo/).click
Here are the Mechanize criteria that generally work for links and forms:
name: name_matcher
id: id_matcher
class: class_matcher
search: search_expression
xpath: xpath_expression
css: css_expression
action: action_matcher
...
If you're curious, here's the Mechanize ElementMatcher code

Mechanize parsing error

I started to use mechanize in ruby recently and it was working perfectly.
Today I tried to get a page but for some reason the input fields are not taken, please refer to the code below:
agent = Mechanize.new
agent.add_auth(url, user, pass1, realm = nil, domain = nil)
agent.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#agent.log = Logger.new(STDOUT)
page = agent.get(url)
page.forms.first.field_with(:name => 'Login[username]').value=user
page.forms.first.field_with(:name => 'Login[password]').value=pass2
page = agent.submit(page.forms.first)
page = page.link_with(:text => "Search").click
page = page.link_with(:text => "Spiral").click
pp page
The html page that Im trying to parse contains this line:
<input name="SpiralMatch_string" type="text" maxlength="128">
But for some reason there is nothing related to that when I dump the contents of the current "page"
There is one more thing that may be related, there is a java running below this field, every time I type something in it, the main contents of the page is dynamicaly changing. Has anyone encountered the same problem?
It sounds like the page may be getting populated through javascript or ajax calls.
Just because the browser shows you some html in 'view source' doesn't mean it actually in the response.
You should use a debugging proxy like charles or fiddler to see what the response(s) really looked like.

stumped on clicking a link with nokogiri and mechanize

perhaps im doing it wrong, or there's another more efficient way. Here is my problem:
I first, using nokogiri open an html document and use its css to traverse the document until i find the link which i need to click.
Now once i have the link, how do i use mechanize to click it? According to the documentation, the object returned by Mechanize.new either the string or a Mechanize::Page::Link object.
I cannot use string - since there could be 100's of the same link - i only want mechanize to click the link that was traversed by nokogiri.
Any idea?
After you have found the link node you need, you can create the Mechanize::Page::Link object manually, and click it afterwards:
agent = Mechanize.new
page = agent.get "http://google.com"
node = page.search ".//p[#class='posted']"
Mechanize::Page::Link.new(node, agent, page).click
Easier way than #binarycode option:
agent = Mechanize.new
page = agent.get "http://google.com"
page.link_with(:class => 'posted').click
That is simple, you don't need to use mechanize link_with().click
You can just getthe link and update your page variable
Mechanize saves current working site internally, so it is smart enough to follow local links
Ex.:
agent = Mechanize.new
page = agent.get "http://somesite.com"
next_page_link = page.search('your exotic selectors here').first rescue nil #nokogyri object
next_page_href = next_page_link['href'] rescue nil # '/local/link/file.html'
page = agent.get(next_page_href) if next_page_href # goes to 'http://somesite.com/local/link/file.html'

Resources