I am trying to click links on a page and able to do only the first one. There are four more having similar code, but it says it cannot locate the other four.
This is the line of code that works:
#browser.div(class:'ms-vb itx').link(:text =>'Rapid Alignment').click
This is one of the four that does not work:
#browser.div(class:'ms-vb itx').link(:text =>'Design Develop Integrate and Test').click
HTML:
<div class="ms-vb itx" onmouseover="OnItem(this)" CTXName="ctx586" id="1" Field="LinkTitle" Perm="0xb008031061" EventType=""><a onfocus="OnLink(this)" href="asdm.nwie.net/_layouts/15/…; onclick="EditLink2(this,586);return false;" target="_self">Rapid Alignment</a></div>
<div class="ms-vb itx" onmouseover="OnItem(this)" CTXName="ctx586" id="3" Field="LinkTitle" Perm="0xb008031061" EventType=""><a onfocus="OnLink(this)" href="asdm.nwie.net/_layouts/15/…; onclick="EditLink2(this,586);return false;" target="_self">Design Develop Integrate and Test</a></div>
I think the issue is the use of #div which will return a single div
Try this instead
divs = #browser.divs(class:'ms-vb itx')
Then
divs.each do |d|
d.link.click
end
#divs returns a DivCollection which includes Enumerable so all Enumerable methods will work as well including things like select e.g.
divs.select { |d| d.link(:text =>'Rapid Alignment') }
You'll have to specify which <div> you are targeting. There are two or possibly more <div> tags with the same class attribute.
Given this HTML snippet:
<div class="ms-vb itx" onmouseover="OnItem(this)" CTXName="ctx586" id="1" Field="LinkTitle" Perm="0xb008031061" EventType=""><a onfocus="OnLink(this)" href="asdm.nwie.net/_layouts/15/…" onclick="EditLink2(this,586);return false;" target="_self">Rapid Alignment</a></div>
<div class="ms-vb itx" onmouseover="OnItem(this)" CTXName="ctx586" id="3" Field="LinkTitle" Perm="0xb008031061" EventType=""><a onfocus="OnLink(this)" href="asdm.nwie.net/_layouts/15/…" onclick="EditLink2(this,586);return false;" target="_self">Design Develop Integrate and Test</a></div>
You need to target the appropriate <div> by supplying the index in the locator:
p b.div(:class => 'ms-vb itx').link(:text => 'Rapid Alignment').exists?
#=> true
p b.div(:class => 'ms-vb itx').link(:text => 'Design Develop Integrate and Test').exists?
#=> false
p b.div(:class => 'ms-vb itx', :index => 1).link(:text => 'Design Develop Integrate and Test').exists?
#=> true
But locating elements by index can be fragile if and when UI elements change. You should consider locating using the id attributes, which--according to spec--are unique.
This fails because div is same so it tries to locate the same div everytime and starts to search the given link, So it fails second time when you tries to locate the different link.
Actually you do not need of div to locate that link, you simply write this code it will work
b.link(:text=>'Rapid Alignment',:visible=>true).click
b.link(:text=>'Design Develop Integrate and Test',:visible=>true).click
That link text itself is the identification to that link, So you do not need of any division, directly write b.link(), it's enough.
Related
I'm working on a white-hat web-crawler that will periodically log into my account and check some information for me using Ruby with Watir and Nokogiri.
Here's the simplified HTML I'm trying to pull information from:
<div class="navbar navbar-default navbar-fixed-top hidden-lg hidden-md" style="z-index: 1002">
<div class="banner-g">
<div class="container">
<div id="user-info">
<div id="acct-value">
GAIN/LOSS <span class="SPShares">-$12.85</span>
</div>
<div id="committed">
INVESTED <span class="SPPortfolio">$152.11</span>
</div>
<div id="avail">
AVAILABLE <span class="SPBalance">$26.98</span>
</div>
I'm trying to pull the $26.98. at the bottom of the excerpt.
Here are three snippets of code I'm using. They're all pretty much identical except for the XPath. The first two return their values perfectly, but the third always returns a value of "0" even though it 'should' return "$26.98" or "26.98".
val_one = page_html.xpath(".//*[#id='openone']/div/div[2]/div[1]/div/div[2]/table/tbody/tr[2]/td[1]").text.gsub(/\D/,'').to_i
val_two = page_html.xpath(".//*[#id='opentwo']/div/div[2]/div[2]/div/div[2]/table/tbody/tr[2]/td[1]").text.gsub(/\D/,'').to_i
val_three = page_html.xpath(".//*[#id='avail']/a/span").text.gsub(/\D/,'').to_i
puts val_three
I assume it's a problem with the XPath, but I've gone through dozens of XPath troubleshooting questions here and none have worked. I checked the XPath with both FirePath and "XPath Checker". I also tried having the XPath search for the "SPBalance" class but that gave the same result.
When I remove to.i from the end, it returns a blank line instead of a zero.
Elsewhere in the site when using Watir, I was able to fix problems recording a value by calling .focus, but for this piece of the code, which is more Nokogiri, using .focus causes the error message:
undefined method `focus' for []:Nokogiri::XML::NodeSet (NoMethodError)
I assume .focus doesn't work for Nokogiri.
Update: Replaced HTML with a cleaner/more complete version.
I've continued to play around with different ways of reaching that data cell, including xpath, css and a search method. Someone told me xpath wouldn't work for this page so I spent even more time trying to get css to work. Someone else told me the page had Javascript, which would prevent Watir from working. So I tried rewriting the app for Selenium instead. Selenium did not solve the problem, and created a whole host of other problems.
Update: After following advice from the Tin Man, I've found that the node is not actually visible in the HTML when it is downloaded using curl.
I'm now trying to access the node using Watir instead of Nokogiri (as he suggested).
Here's some of what I've tried so far:
avail_funds = browser.span :class => 'SPBalance'
avail_funds.exists?
avail_funds.text
avail_funds = browser.span(:css, 'span[customattribute]').text
avail_funds = browser.div(:id => "avail").a(:href => "/Profile/MyShares").span(:class => "SPBalance").text
avail_funds = browser.span(:xpath, ".//*[#id='avail']/a/span").text
avail_funds = browser.span(:css, 'span[class="SPBalance"]').text
avail_funds = browser.span.text
avail_funds = browser.div.text
browser.span(:class, "SPBalance").focus
avail_funds = browser.span(:class, "SPBalance").text
avail_funds = #browser.span(:class => 'SPBalance').inner_html
puts #browser.spans(:class => "SPBalance")
puts #browser.span(:class => "SPBalance")
texts = #browser.spans(:class => "SPBalance").map do |span|
span.text
end
So far all of the above return either blank lines or an error message.
The div class with the ID "user-info" is visible within the HTML as downloaded via curl. Everything beneath that, however, is not visible.
When I try:
avail_funds = browser.div(:id => "user-info").text
I get only blank lines.
When I try:
avail_funds = browser.div(:class => "navbar navbar-default navbar-fixed-top hidden-xs hidden-sm").text
I get actual text back! But unfortunately the string does not contain the value I want.
I also tried:
puts browser.html
Because I thought if the value where visible in that version of the HTML, as it is through my Firefox plug-in, I could parse down to the value I want. But unfortunately the value is not visible in that version of the HTML.
By first 2 commands you fetch data directly from table cell beginning from the root of the document, and in the last one you starting from the center.
Try out to give span id and get data again, and then grow up the complexity and you will find your error in xpath
The first problem is you're trying to use a long, too-long, selector that is referencing tags that don't exist:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<head>
<body class="cbp-spmenu-push">
<div id="FreshWidget" class="freshwidget-container responsive" data-html2canvas-ignore="true" style="display: none;">
<div id="freshwidget-button" class="freshwidget-button fd-btn-right" data-html2canvas-ignore="true" style="display: none; top: 235px;">
<link rel="stylesheet" href="/Content/css/NavPushComponent.css"/>
<script src="/Scripts/classie.js"/>
<script src="/Scripts/modernizr.custom.js"/>
<div class="navbar navbar-default navbar-fixed-top hidden-lg hidden-md" style="z-index: 1002">
<div class="banner-g">
<div class="container">
<div id="user-info">
<div id="acct-value">
<div id="committed">
<div id="avail">
<a href="/Profile/MyBalance">
AVAILABLE
<span class="SPBalance">$31.59</span>
EOT
doc.at('tbody') # => nil
".//*[#id='openone']/div/div[2]/div[1]/div/div[2]/table/tbody/tr[2]/td[1]"
".//*[#id='opentwo']/div/div[2]/div[2]/div/div[2]/table/tbody/tr[2]/td[1]"
There is no <tbody> tag in your sample, and there rarely is in HTML created in the wild, especially if people created it manually. We usually see <tbody> in HTML someone grabbed from a browser's "View Source" display, which is the resulting output after their engine has mangled the HTML in an attempt to make it readable. Don't use that output. Instead, ALWAYS go straight to the source and use wget or curl and download the page and inspect it with an editor, or even use nokogiri some_url on the command-line and look at it there.
A second problem is your HTML snippet is invalid because it's full of unterminated tags. Nokogiri will do fixups on bad HTML, which can actually move nodes around, making it difficult to find nodes, especially when debugging. In this particular case Nokogiri is able to terminate them, but it's important to honor tag closures.
Here's what I'd use:
value = doc.at('span.SPBalance').text # => "$31.59"
This is using CSS which is usually much more readable than XPath. at means "find the first occurrence" and is equivalent to search('span.SPBalance').first.
The XPath equivalent would be:
doc.at('//span[#class="SPBalance"]')
doc.at('//span[#class="SPBalance"]').text # => "$31.59"
Once I have the value then it's easy to manipulate it.
value[/[\d.]+/].to_f # => 31.59
Moving on...
the third always returns a value of "0" even though it should return "$31.59" or "31.59"
'$31.58'.to_i # => 0
'$'.to_i # => 0
'31.58'.to_i # => 31
'$31.58'.to_f # => 0.0
'31.58'.to_f # => 31.58
The documentation for to_f and to_i say respectively:
Returns the result of interpreting leading characters in str as a floating point number.
and
Returns the result of interpreting leading characters in str as an integer base base (between 2 and 36).
In both cases "leading characters" is significant.
using .focus causes the error message:
undefined method `focus' for []:Nokogiri::XML::NodeSet (NoMethodError)
I assume .focus doesn't work for Nokogiri.
You could always check the NodeSet documentation, which confirms that focus is not a method.
Lets say I have a simple page that has less IDs than I'd like for testing
<div class="__panel_body">
<div class="__panel_header">Real Estate Rating</div>
<div class="__panel_body">
<div class="__panel_header">Property Rating Info</div>
<a class="icon.edit"></a>
<a class="icon.edit"></a>
</div>
<div class="__panel_body">
<div class="__panel_header">General Risks</div>
<a class="icon.edit"></a>
<a class="icon.edit"></a>
</div>
<div class="__panel_body">
<div class="__panel_header">Amenities</div>
<a class="icon.edit"></a>
<a class="icon.edit"></a>
</div>
</div>
I'm using Jeff Morgan's Page Object gem and I want to make accessors for the edit links in any given section.
The challenge is that the panel headers differentiate what body I want to choose. Then I need to access the parent and get all links with class "icon.edit". Assume I can't change the HTML to solve this.
Here's a start
module RealEstateRatingPageFields
div(:general_risks_section, ....)
def general_risks_edit_links
general_risks_section_element.links(class: "icon.edit")
end
end
How do I get the general_risks_section accessor to work, though?
I want that to represent the parent div to the panel header with text 'General Risks'...
There are a number of ways to get the general risk section.
Using a Block
The accessors can take a block where you can more programatically describe how to locate the element. This allows you to locate a distinguishing element and then traverse the DOM to the element you actually want. In this case, you can locate the header with the matching text and navigate to its parent.
div(:general_risks_section) { div_element(class: '__panel_header', text: 'General Risks').parent }
Using XPath
While harder to read and write, you could also use an XPath locator. The concept and thought process is the same as using the block. The only benefit is that it reduces the number of element calls, which slightly improves performance.
div(:general_risks_section, xpath: './/div[#class="__panel_body"][./div[#class="__panel_header" and text() = "General Risks"]]')
The XPath is saying:
.//div # Find a div element that
[#class="__panel_body"] # Has the class "__panel_body" and
[./div[ # Contains a div element that
#class="__panel_header" and # Has the class "__panel_header" and
text() = "General Risks" # Has the text "General Risks"
]]
Using the Body Text
Given the HTML, you could also just locate the section directly based on its text.
div(:general_risks_section, class: '__panel_body', text: 'General Risks')
Note that this assumes that the HTML given was not simplified. If there are actually other text nodes, this probably would not be the best option.
We have a page objects elements like
link (:test_link, xpath: './/a[#id = '3'])
unordered_list (:list, id: 'test')
And the code:
def method(elementcontainer, elementlink)
elementcontainer = elementcontainer.downcase.gsub(' ', '_')
elementlink = elementlink.downcase.gsub(' ', '_')
object = send("#{elementcontainer}_element")
object2 = send("#{elementlink}_element")
total_results_1 = object.element.links(id: '3')]").length
total_results_2 = object.element.links(object2).length
end
The last 2 lines contain the mystery.
The total_results_1 is able to get the number of links contained in the unordered list that have id = '3'.
total_results_2 does not work (of course). I don´t want to write in the middle of the code, again, the identification of the links. That is done in the page object.
How it is possible to write something like the total_results_2 line, but in a working version?
I might be misunderstanding the question, but I do not believe you need to create a method for what you want. It can all be done using the page object accessors.
Say we have the following page (I matched this to your accessors, though it seems unlikely that all links would have the same id):
<html>
<body>
<a id="3" href="#">1</a>
<ul id="test">
<li><a id="3" href="#">2</a></li>
<li><a id="3" href="#">3</a></li>
<li><a id="3" href="#">4</a></li>
</ul>
<a id="3" href="#">5</a>
</body>
</html>
As you did, you could define the list with the accessor:
unordered_list(:list, id: 'test')
To get the links with id 3, but are only within the list, you could:
Define the links as a collection - ie use links instead of link.
Use a block to locate the elements. This would allow you to consider the element nesting - ie locate links within the list element.
This would be done with:
links(:test_link){ list_element.link_elements(:id => '3') }
All together, your page object would be:
class MyPage
include PageObject
unordered_list(:list, id: 'test')
links(:test_link){ list_element.link_elements(:id => '3') }
end
To find the number of links, you would access the element collection and check its length.
browser = Watir::Browser.new
browser.goto('your_test_page.htm')
page = MyPage.new(browser)
puts page.test_link_elements.length
#=> 3
The context is I'm using watir-webdriver and I need to locate if an image appears prior to a particular item in a list.
More specifically, there is a section of the site that has articles uploaded to them. Those articles appear in a list. The structure looks like this:
<div id="article-resources"
<ul class="components">
...
<li>
<div class="component">
<img src="some/path/article.png">
<div class="replies">
<label>Replies</label>
</div>
<div class="subject">
Saving the Day
</div>
</div>
</li>
...
</ul>
</div>
Each article appears as a separate li item. (The ellipses above are just meant to indicate I can have lots of liste items.)
What I want our automation to do is find out if the article has been appropriately given the image article.png. The trick is I need to make sure the actual article -- in the above case, "Saving the Day" -- has the image next to it. I can't just check for the image because there will be multiples.
So I figured I had to use xpath to solve this. Using Firefox to help look at the xpath gave me this:
id("article-resources")/x:ul/x:li[2]/x:div/x:img
That does me no good, though, because the key discriminator seems to be the li[2], but I can't count on this article always being the second in the list.
So I tried this:
article_image = '//div[#class="component"]/a[contains(.,"Saving the Day")]/../img'
#browser.image(:xpath => article_image).exist?.should be_true
The output I get is:
expected: true value
got: false (RSpec::Expectations::ExpectationNotMetError)
So it's not finding the image which likely means I'm doing something wrong since I'm certain the test is on the correct page.
My thinking was I could use the above to get any link (a) tags in the div area referenced as class "component". Check if the link has the text and then "back up" one level to see if an image is there.
I'm not even checking the exact image, which I probably should be. I'm just checking if there's an image at all.
So I guess my questions are:
What am I doing wrong with my XPath?
Is this even the best way to solve this problem?
Using Watir
There are a couple of approaches possible.
One way would be find the link, go up to the component div and then check for the image:
browser.link(:text => 'Saving the Day').parent.parent.image.present?
or
browser.div(:class => 'subject', :text => 'Saving the Day').parent.image.present?
Another approach, which is a little more robust to changes, is to find the component div that contains the link:
browser.divs(:class => 'component').find { |component|
component.div(:class => 'subject', :text => 'Saving the Day').exists?
}.image.present?
Using XPath
The above could of course be done through xpath as well.
Here is your corrected xpath:
article_image = '//div[#class="component"]//a[contains(.,"Saving the Day")]/../../img'
puts browser.image(:xpath => article_image).present?
Or alternatively:
article_image = '//a[contains(.,"Saving the Day")]/../../img'
browser.image(:xpath => article_image).present?
Again, there is also the top down approach:
article_image = '//div[#class="component"][//a[contains(.,"Saving the Day")]]/img'
browser.image(:xpath => article_image).present?
You can read more about these approaches and other options in the book Watirways.
I'm have a document A and want to build a new one B using A's node values.
Given A looks like this...
<html>
<head></head>
<body>
<div id="section0">
<h1>Section 0</h1>
<div>
<p>Some <b>important</b> info here</p>
<div>Some unimportant info here</p>
</div>
<div>
<div id="section1">
<h1>Section 1</h1>
<div>
<p>Some <i>important</i> info here</p>
<div>Some unimportant info here</div>
</div>
<div>
</body>
</html>
When building a B document, I'm using method a.at_css("#section#{n} h1").text to grab the data from A's h1 tags like this:
require 'nokogiri'
a = Nokogiri::HTML(html)
Nokogiri::HTML::Builder.new do |doc|
...
doc.h1 a.at_css("#section#{n} h1").text
...
end
So there are three questions:
How do I grab the content of <p> tags preserving tags inside
<p>?
Currently, once I hit a.at_css("#section#{n} p").text it
returns a plain text, which is not what's needed.
If, instead of .text I hit .to_html or .inner_html, the html appears escaped. So I get, for example, <p> instead of <p>.
Is there any known true way of assigning nodes at the document building stage? So that I wouldn't dance with text method at all? I.e. how do I assign doc.h1 node with value of a.at_css("#section#{n} h1") node at building stage?
What's the profit of Nokogiri::Builder.with(...) method? I wonder if I can get use of it...
How do I grab the content of <p> tags preserving tags inside <p>?
Use .inner_html. The entities are not escaped when accessing them. They will be escaped if you do something like builder.node_name raw_html. Instead:
require 'nokogiri'
para = Nokogiri.HTML( '<p id="foo">Hello <b>World</b>!</p>' ).at('#foo')
doc = Nokogiri::HTML::Builder.new do |d|
d.body do
d.div(id:'content') do
d.parent << para.inner_html
end
end
end
puts doc.to_html
#=> <body><div id="content">Hello <b>World</b>!</div></body>
Is there any known true way of assigning nodes at the document building stage?
Similar to the above, one way is:
puts Nokogiri::HTML::Builder.new{ |d| d.body{ d.parent << para } }.to_html
#=> <body><p id="foo">Hello <b>World</b>!</p></body>
Voila! The node has moved from one document to the other.
What's the profit of Nokogiri::Builder.with(...) method?
That's rather unrelated to the rest of your question. As the documentation says:
Create a builder with an existing root object. This is for use when you have an existing document that you would like to augment with builder methods. The builder context created will start with the given root node.
I don't think it would be useful to you here.
In general, I find the Builder to be convenient when writing a large number of custom nodes from scratch with a known hierarchy. When not doing that you may find it simpler to just create a new document and use DOM methods to add nodes as appropriate. It's hard to tell how much hard-coded nodes/hierarchy your document will have versus procedurally created.
One other, alternative suggestion: perhaps you should create a template XML document and then augment that with details from the other, scraped HTML?