I'm trying to crawl some websites, and the data I want can be found either of these places depending on the site:
Page 1:
<div>
<ul>
<li class="asd"> SomeText1 </li>
</ul>
</div>
Page 2:
<div>
<ul>
<li class="dsa"> SomeText2 </li>
</ul>
</div>
I would like an XPath expression which tries to select SomeText1 first, and if it doesn't exist, tries to get SomeText2.
I've tried //li[#class="asd"]/text() or //li[#class="dsa"]/text(), but this doesn't seem to cut it.
Am I using the or operator wrong? If so, how is it supposed to be used?
EDIT
I'm trying to feed a crawler an XPath in order to find information to store in a DB. On a given webpage, can the information I'm trying to get be two different places?
Which means webpage 1 could be:
<AA>
<BB>
<CC> Test </CC>
</BB>
</AA>
and on another there could be
<DD>
<EE>
<FF> Test </FF>
</EE>
</DD>
How can I construct an XPath expression which can say either do
AA/BB/CC or (if it fails/doesn't exist) DD/EE/FF?
You can shorten it to:
//li[#class = 'asd' or #class = 'dsa']/text()
Having said that, "not working" is never an accurate description of what went wrong. A potential source of error is double quotes instead of single quotes. If there are double quotes arround the expression, any quotes inside must be single.
Am I using the or operator wrong ?
No, your usage of the or operator is fine. Something else went wrong. (To really diagnose your problem, we'd need more context).
Try...
//li[#class="asd" or #class="dsa"]/text()
Related
I have +100 files in my exist-DB instance.
I have them transform using a function. (https://pastebin.com/7Q2g4TPM)
I need several things:
I need them being transformed in order from 0 -> last number (will be 162). They are named 00001.xml, the tens begin with 00010.xml, the hundreds with 00100.xml (do you get, what I mean?)
I tried adding one file a time (up to 15 files) and I tried adding batches of files. All files are in the directory edition, with the first file at the moment being 00029.xml, which you find hardcoded as starting point for my Carousel (Bootstrap). (https://pastebin.com/WNKAgihw this pastebin is where I want them to be displayed for now. The structure etc. will probably change a little, but the general idea is this.)
Most of the time it seems to work fine, HOWEVER, with file 36 I get the case that this is displayed not at the needed position but two elements later. Later on, following 38, there is 142 inserted, then several mid-hundreds and then it goes back to "intended" order. I did not check for all files, but I saw this quite some times ...
Another question I have is this one:
Can I somehow get a
<ol class="carousel-indicators">
<li data-target="#carouselIndicators" data-slide-to="0" class="active">File 1</li>
<li data-target="#carouselIndicators" data-slide-to="1">File 2</li>
<li data-target="#carouselIndicators" data-slide-to="2"> File 3</li>
</ol>
where the data-slide-to="" is 1,2,3, etc. without hardcoding it for every file?
I guess the function (first pastebin) can serve as a starting point, but how to make the numbers go up ?
I hope I am clear with these questions and that someone knows how to help :-)
best wishes and many thanks in advance,
K
You likely need to use a order by clause in the FLWOR expression of your XQuery.
The following XML structure represents a website with many articles. Every article contains, among many other things, date of its creation and possibly arbitrarily many dates of its modification. I want to get the date of the last access (either creation or last modification) to every article using XPath 1.0.
<website>
<article>
<date><strong>22.11.2017</strong></date>
<edits>
<edit><strong>17.12.2017</strong></edit>
</edits>
</article>
<article>
<date><strong>17.4.2016</strong></date>
<edits></edits>
</article>
<article>
<date><strong>3.5.2011</strong></date>
<edits>
<edit><strong>4.5.2011</strong></edit>
<edit><strong>12.8.2012</strong></edit>
</edits>
</article>
<article>
<date><strong>12.2.2009</strong></date>
<edits></edits>
</article>
<article>
<date><strong>23.11.1987</strong></date>
<edits>
<edit><strong>3.4.2001</strong></edit>
<edit><strong>11.5.2006</strong></edit>
<edit><strong>13.9.2012</strong></edit>
</edits>
</article>
</website>
In other words, the expected output is:
<strong>17.12.2017</strong>
<strong>17.4.2016</strong>
<strong>12.8.2012</strong>
<strong>12.2.2009</strong>
<strong>13.9.2012</strong>
So far I've only created this path:
//article/*[self::date or self::edits/edit][last()]
that looks for date and nonempty edits nodes in every article and selects the latter one. But I don't know how to access the latest strong of every such selection and the naive //strong[last()] appended to the end of the path doesn't work.
I found a solution in XPath 2.0. Either of these paths should work, if I'm not mistaken:
//article/(*[self::date or self::edits/edit][last()]//strong)[last()]
//article/(*//strong)[last()]
Such use of parentheses within path is invalid in XPath 1.0 though.
This XPath 1.0 expression
/website/article/descendant::strong[parent::date|parent::edit][last()]
Selects the nodes:
<strong>17.12.2017</strong>
<strong>17.4.2016</strong>
<strong>12.8.2012</strong>
<strong>12.2.2009</strong>
<strong>13.9.2012</strong>
Tested in http://www.xpathtester.com/xpath/56d8f7bc4b9c8c064fdad16f22469026
Do note: position predicates acts over the context list.
Here is the simple xpath to get your output.
//article/descendant-or-self::strong[last()]
So I'm trying to generate (dynamic) links through Thymeleaf.
Now the first url I want to link to is /resource/id so I have the following:
<a th:href="|/resource/${resource.id}|" href="#" class="btn btn-default">Edit...</a>
Which works great, and does exactly what you would expect.
So the next url I want to link to is /resource/id/child/id so I try the following:
<a th:href="|/resource/${resource.id}/child/${child.id}|" href="#" class="btn btn-default">Edit...</a>
Which I thought would give me what I wanted. Instead I get:
/resource/3/child%7D?id=4
Which is weird and confusing. I'm not really sure why this happens or how to get what I want which is:
/resource/3/child/4
Any help greatly appreciated. I'm using spring-boot/spring-mvc if it makes any difference.
Update:
Okay, so after re-reading (and comprehending) the documentation, it turns out the the more correct syntax is:
<a th:href="#{/resource/{rid}/child/{cid}(rid=${resource.id} cid=${child.id})}" href="#" class="btn btn-default">Edit...</a>
Now I get the url:
/resource/3/child?id=4
Which while very close is still slightly wrong and confusing.
Does Thymeleaf not support more than one path variable?
I think a comma is missing. Try like this :
th:href="#{/resource/{rid}/child/{cid}(rid=${resource.id},cid=${child.id})}"
Each key/value should be separate with a comma in order to correctly build an URL with multiple path variables.
When working with urls you should better use the url syntax #{...} instead of literal substitution.
For example in your case:
<a th:href="#{/resource/{resource.id}/child/{child.id}}"
href="#" class="btn btn-default">Edit...</a>
(note the absence of the dollar sign in rest resource placeholders)
Have you check the value of child.id? I used it very often like #{|/resource/${resource.id}/child/${child.id}|}.
You can evaluate the syntax wit
<a th:with="child_id='4'" th:href="#{|/resource/${resource.id}/child/${child_id}|}"
and put a <span th:text="${child.id}"></span> to your html to see what value child.id has.
My xpath is :
//*[#id='form_MenuBar:j_id24']/span
and the value # 24 changes.
//*[#id='form_MenuBar:j_id48']/span
I tried but doesn't works.
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id$')]/span"));
Source XML:
<li class="ui-menuitem ui-widget ui-corner-all ui-menuitem-active" role="menuitem">
<a id="form_MenuBar:j_id24" class="ui-menuitem-link ui-corner-all ui-state-hover" href="/Demand/j_spring_security_logout">
<span class="ui-menuitem-text">Log off</span>
</a>
</li>
Just try
driver.findElement(By.xpath("//a[contains(#id,'form_MenuBar:j_id')]/span"));
If you are using contains in xpath no need to use '$'.
It appears that you are using java so I'll try to answer it based on that. I'm not a java developer so I apologize if it's not syntactically correct.
If all that is changing is the number within the ID, and you know the ID, you could do:
driver.findElement(By.id(String.format("form_MenuBar:j_id%d", the_id));
Also, I'm not sure about your application that you are testing, but if there are multiple elements that have an id beginning with "form_MenuBar:j_id", then findElement will only find the first one, which might not be the link you are attempting to find.
you could use findElements which will return a list of all elements that match that and then iterate through those until you find the one you really want.
Simplified example
<td>caption</a>
<a id="tt-1">text1</a>
<a id="tt-2">text2</a>
<td>topics</td>
<a id="tt-3">text3</a>
<a id="tt-4">text4</a>
<a id="tt-5">text5</a>
What I need is to match all a elements below <td>topics</td>.
Note that there are plenty of elements between those elements in example. Also <td> may be enclosed into other elements.
My current real-world XPath expression looks like this
//a[contains(#id,'tt-')]
Updated to be closer to real-world
Another update to clarify.
Based on your statement "What I need is to match all a elements below <td>topics</td>"
//td[.='topics']/a
I'm sure that's not the whole story, though.
Based on your updated example:
//a[starts-with(#id, 'tt-') and preceding-sibling::td[1] = 'topics']