Unable to find Dynamic Xpath - xpath

My xpath is :
//*[#id='form_MenuBar:j_id24']/span
and the value # 24 changes.
//*[#id='form_MenuBar:j_id48']/span
I tried but doesn't works.
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id$')]/span"));
Source XML:
<li class="ui-menuitem ui-widget ui-corner-all ui-menuitem-active" role="menuitem">
<a id="form_MenuBar:j_id24" class="ui-menuitem-link ui-corner-all ui-state-hover" href="/Demand/j_spring_security_logout">
<span class="ui-menuitem-text">Log off</span>
</a>
</li>

Just try
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id')]/span"));
If you are using contains in xpath no need to use '$'.

It appears that you are using java so I'll try to answer it based on that. I'm not a java developer so I apologize if it's not syntactically correct.
If all that is changing is the number within the ID, and you know the ID, you could do:
driver.findElement(By.id(String.format("form_MenuBar:j_id%d", the_id));
Also, I'm not sure about your application that you are testing, but if there are multiple elements that have an id beginning with "form_MenuBar:j_id", then findElement will only find the first one, which might not be the link you are attempting to find.
you could use findElements which will return a list of all elements that match that and then iterate through those until you find the one you really want.

Related

Avoid parentheses in path using XPath 1.0

The following XML structure represents a website with many articles. Every article contains, among many other things, date of its creation and possibly arbitrarily many dates of its modification. I want to get the date of the last access (either creation or last modification) to every article using XPath 1.0.
<website>
<article>
<date><strong>22.11.2017</strong></date>
<edits>
<edit><strong>17.12.2017</strong></edit>
</edits>
</article>
<article>
<date><strong>17.4.2016</strong></date>
<edits></edits>
</article>
<article>
<date><strong>3.5.2011</strong></date>
<edits>
<edit><strong>4.5.2011</strong></edit>
<edit><strong>12.8.2012</strong></edit>
</edits>
</article>
<article>
<date><strong>12.2.2009</strong></date>
<edits></edits>
</article>
<article>
<date><strong>23.11.1987</strong></date>
<edits>
<edit><strong>3.4.2001</strong></edit>
<edit><strong>11.5.2006</strong></edit>
<edit><strong>13.9.2012</strong></edit>
</edits>
</article>
</website>
In other words, the expected output is:
<strong>17.12.2017</strong>
<strong>17.4.2016</strong>
<strong>12.8.2012</strong>
<strong>12.2.2009</strong>
<strong>13.9.2012</strong>
So far I've only created this path:
//article/*[self::date or self::edits/edit][last()]
that looks for date and nonempty edits nodes in every article and selects the latter one. But I don't know how to access the latest strong of every such selection and the naive //strong[last()] appended to the end of the path doesn't work.
I found a solution in XPath 2.0. Either of these paths should work, if I'm not mistaken:
//article/(*[self::date or self::edits/edit][last()]//strong)[last()]
//article/(*//strong)[last()]
Such use of parentheses within path is invalid in XPath 1.0 though.
This XPath 1.0 expression
/website/article/descendant::strong[parent::date|parent::edit][last()]
Selects the nodes:
<strong>17.12.2017</strong>
<strong>17.4.2016</strong>
<strong>12.8.2012</strong>
<strong>12.2.2009</strong>
<strong>13.9.2012</strong>
Tested in http://www.xpathtester.com/xpath/56d8f7bc4b9c8c064fdad16f22469026
Do note: position predicates acts over the context list.
Here is the simple xpath to get your output.
//article/descendant-or-self::strong[last()]

Thymeleaf URL Syntax is is confusing

So I'm trying to generate (dynamic) links through Thymeleaf.
Now the first url I want to link to is /resource/id so I have the following:
<a th:href="|/resource/${resource.id}|" href="#" class="btn btn-default">Edit...</a>
Which works great, and does exactly what you would expect.
So the next url I want to link to is /resource/id/child/id so I try the following:
<a th:href="|/resource/${resource.id}/child/${child.id}|" href="#" class="btn btn-default">Edit...</a>
Which I thought would give me what I wanted. Instead I get:
/resource/3/child%7D?id=4
Which is weird and confusing. I'm not really sure why this happens or how to get what I want which is:
/resource/3/child/4
Any help greatly appreciated. I'm using spring-boot/spring-mvc if it makes any difference.
Update:
Okay, so after re-reading (and comprehending) the documentation, it turns out the the more correct syntax is:
<a th:href="#{/resource/{rid}/child/{cid}(rid=${resource.id} cid=${child.id})}" href="#" class="btn btn-default">Edit...</a>
Now I get the url:
/resource/3/child?id=4
Which while very close is still slightly wrong and confusing.
Does Thymeleaf not support more than one path variable?
I think a comma is missing. Try like this :
th:href="#{/resource/{rid}/child/{cid}(rid=${resource.id},cid=${child.id})}"
Each key/value should be separate with a comma in order to correctly build an URL with multiple path variables.
When working with urls you should better use the url syntax #{...} instead of literal substitution.
For example in your case:
<a th:href="#{/resource/{resource.id}/child/{child.id}}"
href="#" class="btn btn-default">Edit...</a>
(note the absence of the dollar sign in rest resource placeholders)
Have you check the value of child.id? I used it very often like #{|/resource/${resource.id}/child/${child.id}|}.
You can evaluate the syntax wit
<a th:with="child_id='4'" th:href="#{|/resource/${resource.id}/child/${child_id}|}"
and put a <span th:text="${child.id}"></span> to your html to see what value child.id has.

XPath (1.0) Match consecutive elements until specific child or end

This is for XPath 1.0.
Here is an example of the mark up that I am matching against. The actual number of elements is not known ahead of time and thus varies, but following this sort of of pattern:
<div class="entry">
<p><iframe /></p>
<p>Text 1</p>
<p>Text 2</p>
<p>Test 3</p>
<p><iframe /></p>
<p>
<a>Test 4</a>
<br />
<a>Test 5</a>
</p>
</div>
I am trying to to match every <p> that does not contain an <iframe>, up until the next <p> that does contain an <iframe> or until the end of the enclosing <div> element.
To make things slightly more complicated, for specific reasons I need to use each <iframe> as the base, a la //div[#class='entry']//iframe, so that each nodeset is based from
(//div[#class='entry']//iframe)[1]
(//div[#class='entry']//iframe)[2]
...
and thus, in this case, matching
<p>Text 1</p>
<p>Text 2</p>
<p>Test 3</p>
and
<p>
<a>Test 4</a>
<br />
<a>Test 5</a>
</p>
respectively.
I tried some of the following for testing to no avail:
(//div[#class='entry']//iframe)/ancestor::p/following-sibling::p[preceding-sibling::p[iframe]]
(or for testing):
(//div[#class='entry']//iframe)[1]/ancestor::p/following-sibling::p[preceding-sibling::p[iframe]]
(//div[#class='entry']//iframe)[2]/ancestor::p/following-sibling::p[preceding-sibling::p[iframe]]
and some variations thereof but what happens for the first set is it gets all <iframe>-less <p> elements all the way to the end instead of stopping at the next <p> that contains a <iframe>.
I've been at this for a while and even though I'm usually quite handy with this sort of thing, I can't quite work my way thorigh this one and none of the search results from Google and such have helped.
Thanks. Any help is always appreciated.
Edit: It can be assumed that there is only one occurrence of <div class="entry"> in the document.
What you are asking for can't be done in one single XPath 1.0 expression without help. The problem is that the question you want to ask is
Starting from an element X (the p-containing-an-iframe), find the other p elements for which that element's nearest preceding p-with-an-iframe is the original node X
If we had a variable $x holding a reference to the top-level context node (the p[iframe] we're starting from) then you could say something like the following (in XPath 2.0)
following-sibling::p[not(iframe)][preceding-sibling::p[iframe][1] is $x]
XPath 1.0 doesn't have an is operator to compare node identity but there are other proxies you can use for this, for example
following-sibling::p[not(iframe)][count(preceding-sibling::p[iframe])
= (count($x/preceding-sibling::p[iframe]) + 1)]
i.e. those following p elements that have one more preceding-sibling::p[iframe] than $x has.
The nub of the problem then is how to get at the outer context node from inside the inner predicate - pure XPath 1.0 has no way to do this. In XSLT you have the current() function, but otherwise you have two basic choices:
If your XPath library allows you to provide variable bindings to your expressions, then inject a variable $x containing the context node and use the expression I've given above.
If you can't inject variables then use two separate XPath queries in sequence.
First execute the expression
count(preceding-sibling::p[iframe]) + 1
with the relevant p[iframe] as context node, and take the result as a number. Or alternatively, if you're already iterating over these p[iframe] elements in your host language then just take the iteration number from there directly, you don't need to count it up using XPath. Either way, you can then build a second expression dynamically:
following-sibling::p[not(iframe)][count(preceding-sibling::p[iframe]) = N]
(where N is the result of the first expression/iteration counter) and evaluate that with the same context node, taking the final result as a node set.
I'm not sure I understood completely, but sometimes it helps to comment on an attempted solution rather than trying to explain.
Please try the following XPath expression:
//div[#class='entry']//iframe//p[not(descendant::iframe)]
And let me know if this yields the correct result.
If not,
explain how the result differs from what you need
please show a more complete HTML sample: a reasonable document with multiple div elements, and more than one where div[#class = 'entry'] - and otherwise covering all the complexity you describe.
explain why you added [1] and [2] to your expressions
give more details about the platform you're using XPath with, perhaps post code

Select either A or B with Or Operator in XPath

I'm trying to crawl some websites, and the data I want can be found either of these places depending on the site:
Page 1:
<div>
<ul>
<li class="asd"> SomeText1 </li>
</ul>
</div>
Page 2:
<div>
<ul>
<li class="dsa"> SomeText2 </li>
</ul>
</div>
I would like an XPath expression which tries to select SomeText1 first, and if it doesn't exist, tries to get SomeText2.
I've tried //li[#class="asd"]/text() or //li[#class="dsa"]/text(), but this doesn't seem to cut it.
Am I using the or operator wrong? If so, how is it supposed to be used?
EDIT
I'm trying to feed a crawler an XPath in order to find information to store in a DB. On a given webpage, can the information I'm trying to get be two different places?
Which means webpage 1 could be:
<AA>
<BB>
<CC> Test </CC>
</BB>
</AA>
and on another there could be
<DD>
<EE>
<FF> Test </FF>
</EE>
</DD>
How can I construct an XPath expression which can say either do
AA/BB/CC or (if it fails/doesn't exist) DD/EE/FF?
You can shorten it to:
//li[#class = 'asd' or #class = 'dsa']/text()
Having said that, "not working" is never an accurate description of what went wrong. A potential source of error is double quotes instead of single quotes. If there are double quotes arround the expression, any quotes inside must be single.
Am I using the or operator wrong ?
No, your usage of the or operator is fine. Something else went wrong. (To really diagnose your problem, we'd need more context).
Try...
//li[#class="asd" or #class="dsa"]/text()

How to match elements after an element with certain content or attribute?

Simplified example
<td>caption</a>
<a id="tt-1">text1</a>
<a id="tt-2">text2</a>
<td>topics</td>
<a id="tt-3">text3</a>
<a id="tt-4">text4</a>
<a id="tt-5">text5</a>
What I need is to match all a elements below <td>topics</td>.
Note that there are plenty of elements between those elements in example. Also <td> may be enclosed into other elements.
My current real-world XPath expression looks like this
//a[contains(#id,'tt-')]
Updated to be closer to real-world
Another update to clarify.
Based on your statement "What I need is to match all a elements below <td>topics</td>"
//td[.='topics']/a
I'm sure that's not the whole story, though.
Based on your updated example:
//a[starts-with(#id, 'tt-') and preceding-sibling::td[1] = 'topics']

Resources