How to implement Watir classes (e.g. PageContainer)? - ruby

I'm writing a sample test with Watir where I navigate around a site with the IE class, issue queries, etc..
That works perfectly.
I want to continue by using PageContainer's methods on the last page I landed on.
For instance, using its HTML method on that page.
Now I'm new to Ruby and just started learning it for Watir.
I tried asking this question on OpenQA, but for some reason the Watir section is restricted to normal members.
Thanks for looking at my question.
edit: here is a simple example
require "rubygems"
require "watir"
test_site = "http://wiki.openqa.org/"
browser = Watir::IE.new
browser.goto(test_site)
# now if I want to get the HTML source of this page, I can't use the IE class
# because it doesn't have a method which supports that
# the PageContainer class, does have a method that supports that
# I'll continue what I want to do in pseudo code
Store HTML source in text file
# I know how to write to a file, so that's not a problem;
# retrieving the HTML is the problem.
# more specifically, using another Watir class is the problem.
Close browser
# end

Currently, the best place to get answers to your Watir questions is the Watir-General email list.
For this question, it would be nice to see more code. Is the application under test (AUT) opening a new window/tab that you were having trouble getting to and therefore wanted to try the PageContainer, or is it just navigating to a second page?
If it is the first one, you want to look at #attach, if it is the second, then I would recommend reading the quick start tutorial.
Edit after code added above:
What I think you missed is that Watir::IE includes the Watir::PageContainer module. So you can call browser.html to get the html displayed on the page to which you've navigated.

I agree. It seems to me that browser.html is what you want.

Related

Where is this text coming from with poltergeist?

I'm scraping my library's website with Poltergeist, in my first experience with that gem (or with Capybara, for that matter). It's working great. Super great.
def self.scrape_book_list(url)
session = Capybara::Session.new(:poltergeist)
session.visit(url)
books = session.all('.js-titleCard')
books_hash = books.map { |book|
# getting info from the session
}
books_hash
end
However, after the session.visit(url) line, before it even does anything else, it prints this:
Hi there! This site is powered by OverDrive and our vision is a world enlightened by reading. Maybe a curious cat like you can help https://company.overdrive.com/company/careers/open-positions/
I've tried inspecting the page in Chrome, and even peeking at a few js sources, but I can't seem to figure out where this text is coming from!
I imagine the question is "Why/how is poltergeist doing this?" and I figured that searching the html or js code would turn the text up in some tag from the header that poltergeist perhaps always prints when it visits a page or something (maybe there's a different method to pass the url to besides visit that won't do this). But no luck!
I'm so curious (like the cat they mention)! Any ideas?
That text will be coming from a console.log(...) statement somewhere in the sites JS. By default Poltergeist outputs all JS console logs to stdout.

How to mark a certain part of the text in watir?

Hello is there something that can only mark a certain part of the text?
I can not find the right solution anywhere.
I tried: double_click, flash, select_text didn't work for me.
This works, but this mark everything : browser.send_keys [:control, 'a']
I added picture of example, what i want to do.
Thank you for your answers.
The red rectangle shows the markings
You can use the Element#select_text method. Note that prior to Watir 6.8, you will need to manually include the extension (method).
Here is a working example using the Wikipedia page:
require 'watir'
require 'watir/extensions/select_text' # only include this if using Watir 6.7 or prior
browser = Watir::Browser.new
browser.goto('https://en.wikipedia.org/wiki/XPath')
browser.body.select_text('XPath may be used')
sleep(5) # so that you can see the selection
Note that this will highlight the first match. You may want to restrict searching a specific element rather than the entire body.
Here is another example using ckeditor.com:
require 'watir'
require 'watir/extensions/select_text' # only include this if using Watir 6.7 or prior
browser = Watir::Browser.new
browser.goto('ckeditor.com/')
frame = browser.iframe(class: 'cke_wysiwyg_frame')
frame.p.select_text('Bake the brownies')
browser.link(href: /Bold/).click
sleep(10)

Scrape website with Ruby based on embedded CSS styles

In the past, I have successfully used Nokogiri to scrape websites using a simple Ruby script. For a current project, I need to scrape a website that only uses inline CSS. As you can imagine, it is an old website.
What possibilities do I have to target specific elements on the page based on the inline CSS of the elements? It seems this is not possible with Nokogiri or have I overlooked something?
UPDATE: An example can be found here. I basically need the main content without the footnotes. The latter have a smaller font size and are grouped below each section.
I'm going to teach you how to fish. Instead of trying to find what I want, it's sometimes a lot easier to find what I don't want and remove it.
Start with this code:
require 'nokogiri'
require 'open-uri'
URL = 'http://www.eximsystems.com/LaVerdad/Antiguo/Gn/Genesis.htm'
FOOTNOTE_ACCESSORS = [
'span[style*="font-size: 8.0pt"]',
'span[style*="font-size:8.0pt"]',
'span[style*="font-size: 7.5pt"]',
'span[style*="font-size:7.5pt"]',
'font[size="1"]'
].join(',')
doc = Nokogiri.HTML(open(URL))
doc.search(FOOTNOTE_ACCESSORS).each do |footnote|
footnote.remove
end
File.write(File.basename(URI.parse(URL).path), doc.to_html)
Run it, then open the resulting HTML file in your browser. Scroll through the file looking for footnotes you want to remove. Select part of their text, then use "Inspect Element", or whatever tool you have that will find that selected text in the source of the page. Find something unique in that text that makes it possible to isolate it from the text you want to keep. For instance, I locate footnotes using the font-sizes in <span> and <font> tags.
Keep adding accessors to the FOOTNOTE_ACCESSORS array until you have all undesirable elements removed.
This code isn't complete, nor is it written as tightly as I'd normally do it for this sort of task, but it will give you an idea how to go about this particular task.
This is a version that is a bit more flexible:
require 'nokogiri'
require 'open-uri'
URL = 'http://www.eximsystems.com/LaVerdad/Antiguo/Gn/Genesis.htm'
FOOTNOTE_ACCESSORS = [
'span[style*="font-size: 8.0pt"]',
'span[style*="font-size:8.0pt"]',
'span[style*="font-size: 7.5pt"]',
'span[style*="font-size:7.5pt"]',
'font[size="1"]',
]
doc = Nokogiri.HTML(open(URL))
FOOTNOTE_ACCESSORS.each do |accessor|
doc.search(accessor).each do |footnote|
footnote.remove
end
end
File.write(File.basename(URI.parse(URL).path), doc.to_html)
The major difference is the previous version assumed all entries in FOOTNOTE_ACCESSORS were CSS. With this change XPath can also be used. The code will take a little bit longer to run as the entries are iterated over, but the ability to dig in with XPath might make it worthwhile for you.
You can do something like:
doc.css('*[style*="foo"]')
That will select any element with foo appearing anywhere in it's style attribute.

How to select from frames with the Watir Ruby Gem

When trying to select a list element's option I attempted to do:
myvar=ie.select_list(:id, 'myid').option(:text, 'mytext').select
But for some reason while I'm using Watir in irb to access the website and attempting to manipulate any of the items I get this exception.
Watir::Exception::UnknownObjectException: Unable to locate element...etc
I'm looking at page in the browser but using .html isn't showing the full page. It looks like the rest of the page is hidden and I'm not sure how to get into/around this.
irb(main):011:0> ie.html
=> "<HTML><HEAD><TITLE>My Title</TITLE>\r\n
<SCRIPT language=JavaScript type=text/javascript src=\"../../script.js\"></SCRIPT>\r\n</HEAD><FRAMESET id=mainFrameSet name=mainFrameSet rows=100%,0%><FRAME id=frmMain src=\"DefaultT.cfm?ID=2197024\" name=frmMain><FRAME id=frmHidden src=\"Dummy.html\" name=frmHidden scrolling=no></FRAMESET></HTML>"
EDIT:
Looking at this in retrospect I have changed the title so it would more accurately address the issue I was having. It was difficult for a new waiter user to find information like on Watir and Frames. The original title was something like "Using Watir On An Encrypted Site". I have severely edited the question to get to the essence of what I was asking. I can't thank those enough who attempted to answer the ramblings of a new Ruby user with minimal knowledge of the Web and programming in general. Please see previous revisions if necessary.
Based on the html you added, your webpage is using frames. Unlike other elements, you have to explicitly specify the frames you want to use.
You probably want the frame with id 'frmMain', so try:
myvar=ie.frame(:id, 'frmMain').select_list(:id, 'myid').option(:text, 'mytext').select
My guess is that the element is not on the page when you try to access it.
Try this (please notice when_present):
myvar=ie.select_list(:id, 'myid').when_present.option(:text, 'mytext').select
More information: http://watirwebdriver.com/waiting/

click on xpath link with Mechanize

I want to click a link with Mechanize that I select with xpath (nokogiri).
How is that possible?
next_page = page.search "//div[#class='grid-dataset-pager']/span[#class='currentPage']/following-sibling::a[starts-with(#class, 'page')][1]"
next_page.click
The problem is that nokogiri element doesn't have click function.
I can't read the href (URL) and send get request because the link has onclick function defined (no href attribute).
If that's not possible, what are the alternatives?
Use page.at instead of page.search when you're trying to find only one element.
You can make your selector simpler (shorter) by using CSS selector syntax:
next_page = page.at('div.grid-dataset-pager > span.currentPage + a[class^="page"]')
You can construct your own Link instance if you have the Nokogiri element, page, and mechanize object to feed the constructor:
next_link = Mechanize::Page::Link.new( next_page, mech, page )
next_link.click
However, you might not need that, because Mechanize#click lets you supply a string with the text of the anchor/button to click on.
# Assuming this link text is unique on the page, which I suspect it is
mech.click next_page.text
Edit after re-reading the question completely: However, none of this is going to help you, because Mechanize is not a web browser! It does not have a JavaScript engine, and thus won't (can't) execute your onclick for you. For this you'll need to use Ruby to control a real web browser, e.g. using Watir or Selenium or Celerity or the like.
In general you would do:
page.link_with(:node => next_link).click
However like Phrogz says, this won't really do what you want.
Why don't you use a hpricot element instead? Mechanize can click on a hpricot element as long as the link has a 'src' or 'href' attribute. Try something along these lines:
page = agent.get("http://www.example.com")
next_page = agent.click((page/"//your/xpath/a"))
Edit After reading Phrogz answer I also realized that this won't really do it. Mechanize doesn't support Javascript yet. With this in mind you have 3 options.
Use a library that controls a real web browser. See #Phrogz answer.
Use Capybara which is an integration testing library but can also be used as a stand alone crawler. I've done this successfully with HTMLUnit which is a also an integration testing library in Java. Capybara comes with Selenium support by default though it also supports Webkit via an external gem. Capybara interprets Javascript out of the box. This blog post might help.
Grok the page that you intend to crawl and use something like HTTPFox to monitor what the onclick Javascript function does and replicate this in your Mechanize script.
Good luck.

Resources