I've looked around and can't seem to find the answer for this.
Very simplified:
<a>
<b>
<div class=label>
<label="here"/>
</div>
</b>
<div id="something">
<b>
<div class=label>
<label="here"/>
</div>
</div>
</a>
so I'm trying to grab the second "here" label. What I want to do is do the id to get to the "something" part
//.[#id="something”]
and then from that point search for the label with something like
//.[#class="label" and label="here"]
But from reading a few other replies it doesn't appear that something like
//.[#id="something”][#class="label" and label="here"]
works and I was just wondering if I'm just missing something in how it's working? I know I can get the above really simply with another method, it's just an example to ask how to do two predicate statements after each other (if it is indeed possible).
Thanks!
I think you need something like this instead :
//.[#id="something”]//.[#class="label" and label="here"]
The point is that the // means : Selects nodes in the document from the current node that match the selection no matter where they are
ref : http://www.w3schools.com/xpath/xpath_syntax.asp
The syntax //*[#x='y'] is more idiomatic than //.[#x='y'], probably because it's valid in both XPath 1.0 and XPath 2.0, whereas the latter is only allowed in XPath 2.0. Disallowing predicates after "." was probably an accidental restriction in XPath 1.0, and I think some implementations may have relaxed the restriction, but it's there in the spec: a predicate can only follow either a NodeTest or a PrimaryExpr, and "." is neither.
In XPath 2.0, //* selects all element nodes in the tree, while //. selects all nodes of all kinds (including the document root node), but in this example the effect is the same because the predicate [#x='y'] can only be matched by an element node (for all other node kinds, #x selects nothing and therefore cannot be equal to anything).
Related
This is for XPath 1.0.
Here is an example of the mark up that I am matching against. The actual number of elements is not known ahead of time and thus varies, but following this sort of of pattern:
<div class="entry">
<p><iframe /></p>
<p>Text 1</p>
<p>Text 2</p>
<p>Test 3</p>
<p><iframe /></p>
<p>
<a>Test 4</a>
<br />
<a>Test 5</a>
</p>
</div>
I am trying to to match every <p> that does not contain an <iframe>, up until the next <p> that does contain an <iframe> or until the end of the enclosing <div> element.
To make things slightly more complicated, for specific reasons I need to use each <iframe> as the base, a la //div[#class='entry']//iframe, so that each nodeset is based from
(//div[#class='entry']//iframe)[1]
(//div[#class='entry']//iframe)[2]
...
and thus, in this case, matching
<p>Text 1</p>
<p>Text 2</p>
<p>Test 3</p>
and
<p>
<a>Test 4</a>
<br />
<a>Test 5</a>
</p>
respectively.
I tried some of the following for testing to no avail:
(//div[#class='entry']//iframe)/ancestor::p/following-sibling::p[preceding-sibling::p[iframe]]
(or for testing):
(//div[#class='entry']//iframe)[1]/ancestor::p/following-sibling::p[preceding-sibling::p[iframe]]
(//div[#class='entry']//iframe)[2]/ancestor::p/following-sibling::p[preceding-sibling::p[iframe]]
and some variations thereof but what happens for the first set is it gets all <iframe>-less <p> elements all the way to the end instead of stopping at the next <p> that contains a <iframe>.
I've been at this for a while and even though I'm usually quite handy with this sort of thing, I can't quite work my way thorigh this one and none of the search results from Google and such have helped.
Thanks. Any help is always appreciated.
Edit: It can be assumed that there is only one occurrence of <div class="entry"> in the document.
What you are asking for can't be done in one single XPath 1.0 expression without help. The problem is that the question you want to ask is
Starting from an element X (the p-containing-an-iframe), find the other p elements for which that element's nearest preceding p-with-an-iframe is the original node X
If we had a variable $x holding a reference to the top-level context node (the p[iframe] we're starting from) then you could say something like the following (in XPath 2.0)
following-sibling::p[not(iframe)][preceding-sibling::p[iframe][1] is $x]
XPath 1.0 doesn't have an is operator to compare node identity but there are other proxies you can use for this, for example
following-sibling::p[not(iframe)][count(preceding-sibling::p[iframe])
= (count($x/preceding-sibling::p[iframe]) + 1)]
i.e. those following p elements that have one more preceding-sibling::p[iframe] than $x has.
The nub of the problem then is how to get at the outer context node from inside the inner predicate - pure XPath 1.0 has no way to do this. In XSLT you have the current() function, but otherwise you have two basic choices:
If your XPath library allows you to provide variable bindings to your expressions, then inject a variable $x containing the context node and use the expression I've given above.
If you can't inject variables then use two separate XPath queries in sequence.
First execute the expression
count(preceding-sibling::p[iframe]) + 1
with the relevant p[iframe] as context node, and take the result as a number. Or alternatively, if you're already iterating over these p[iframe] elements in your host language then just take the iteration number from there directly, you don't need to count it up using XPath. Either way, you can then build a second expression dynamically:
following-sibling::p[not(iframe)][count(preceding-sibling::p[iframe]) = N]
(where N is the result of the first expression/iteration counter) and evaluate that with the same context node, taking the final result as a node set.
I'm not sure I understood completely, but sometimes it helps to comment on an attempted solution rather than trying to explain.
Please try the following XPath expression:
//div[#class='entry']//iframe//p[not(descendant::iframe)]
And let me know if this yields the correct result.
If not,
explain how the result differs from what you need
please show a more complete HTML sample: a reasonable document with multiple div elements, and more than one where div[#class = 'entry'] - and otherwise covering all the complexity you describe.
explain why you added [1] and [2] to your expressions
give more details about the platform you're using XPath with, perhaps post code
This seems like it should be easy, but I can never figure it out.
Presume I have the following document:
<data>
<a>
<b val="1"/>
</a>
<c val="1">
</data>
And assume that I am executing an XPath from the context of <b>. I need to check if there is an element c that has the same value as b.
Obviously, this doesn't work:
../a/c[#val=#val]
How to I get an XPath to remember its "current" context when traversing the tree?
Try the expression below. You'll notice that the current node is not lost since a predicate is used for finding the c node.
.[../../c/#val=#val]
I played around with nokogiri in ruby and the XML searching feature, e.g.:
a = Nokogiri.XML(open 'a.xml')
x = a.search('//div[#class="foo"]').text
which works quite nice.
But how can I specify to match the next (brother) element on the same level (and only the next)?
For example for this input:
<div>
<div>...</div>
<div>...</div>
<div class="foo"></div>
<div>EXTRACT ME</dev>
...
</div>
The actual input is some non-XHTML html, but so far Nokogiri.XML does not complain.
Btw, what filter syntax f.search actually expects? xpath?
Taking the hint from Brian Agnew and DevNull I guess that f.search actually expects xpath syntax and using the following-sibling predicate the following expression matches what was asked:
a = x.search('//div[#class="foo"]/following-sibling::div[1]')
I think you want XPath's following-sibling predicate.
How can I ignore first element and get rest of the elements?
<ul>
<li>some link</li>
<li>some link 2</li>
<li>link i want to find</li>
</ul>
Thanks
if you want to ignore the "first" element only then:
//li[position()>1]
or
(//a)[position()>1]
if you want the last only (like your example seems to suggest):
//li[last()]
or
(//a)[last()]
You can use position() to skip over the "first" one, but depending on which element you are interested in and what the context is, you may need a slight variation on your XPATH.
For instance, if you wanted to address all of the li elements and get all except the first, you could use:
//li[position()>1]
and it would work as expected, returning all of the li elements except for the first.
However, if you wanted to address all of the a elements you need to modify the XPATH slightly. In the context of the expression //a[position()>1] each one of the a elements will have a position() of 1 and last() will evaluate to true. So, it would always return every a and would not skip over the first one.
You need to wrap the expression that selects the a in parenthesis to group them in a node-set, then apply the predicate filter on position.
(//a)[position()>1]
Alternatively, you could also use an expression like this:
//a[preceding::a]
That will find all a elements except the first one (since there is no a preceding the first one).
When there is more than a single element with the same locator in a page, how should the next elements be referenced?
Using Xpath locators it's possible to add array notation, e.g. xpath=(//span/div)[1]
But with simple locators?
For example, if there are 3 links identified by "link=Click Here", simply appending [3] won't get the 3rd element.
And where is the authoritative reference for addressing array of elements? I couldn't find any.
Selenium doesn't handle arrays of locators by itself. It just returns the first element that meets your query, so if you want to do that, you have to use xpath, dom or even better, css.
So for the link example you should use:
selenium.click("css=a:contains('Click Here'):nth-child(3)")
Santi is correct that Selenium returns the first element matching your specified locator and you have to apply the appropriate expression of the locator type you use. I thought it would be useful to give the details here, though, for in this case they do border on being "gory details":
CSS
The :nth-child pseudo-class is tricky to use; it has subtleties that are little-known and not clearly documented, even on the W3C pages. Consider a list such as this:
<ul>
<li class="bird">petrel</li>
<li class="mammal">platypus</li>
<li class="bird">albatross</li>
<li class="bird">shearwater</li>
</ul>
Then the selector css=li.bird:nth-child(3) returns the albatross element not the shearwater! The reason for this is that it uses your index (3) into the list of elements that are siblings of the first matching element--unfiltered by the .bird class! Once it has the correct element, in this example the third one, it then applies the bird class filter: if the element in hand matches, it returns it. If it does not, it fails to match.
Now consider the selector css=li.bird:nth-child(2). This starts with the second element--platypus--sees it is not a bird and comes up empty. This manifests as your code throwing a "not found" exception!
What might fit the typical mental model of finding an indexed entry is the CSS :nth-of-type pseudo-class which applies the filter before indexing. Unfortunately, this is not supported by Selenium, according to the official documentation on locators.
XPath
Your question already showed that you know how to do this in XPath. Add an array reference at any point in the expression with square brackets. You could, for example use something like this: //*[#id='abc']/div[3]/p[2]/span to find a span in the second paragraph under the 3rd div under the specified id.
DOM
DOM uses the same square bracket notation as XPath except that DOM indexes from zero while XPath indexes from 1: document.getElementsByTagName("div")[1] returns the second div, not the first div! DOM offers an alternate syntax as well: document.getElementsByTagName("div").item(0) is exactly equivalent. And note that with getElementsByTagName you always have to use an index since it returns a node set, not a single node.