xPath help > Find img with alt='My Keyword' - xpath

How can I alter this xpath to find if the content contains an img tag with an alt containing my keyword phrase?
$xPath->evaluate('count(/html/body//'.$tag.'[contains(.,"'.$keyword.'")])');

Use:
boolean(//img[contains(#alt, 'yourKeywordHere')])
to find (true(), false()) whether there is an img element in the XML document whose alt attribute contains 'yourKeywordHere'.
Use:
boolean(//yourTag//img[contains(#alt, 'yourKeywordHere')])
to find if there is an element in the document named yourTag that has a descendent img whose alt attribute contains 'yourKeywordHere'.

I don't understand exactly what elements you are loooking for, but here is the example, which returns all elements h1 which contains at least one image with your_keyword in alt:
//h1[.//img[contains(#alt, 'your_keyword')]]
You should also handle if it is case sensitive or not. You can use this xpath but be careful, some xpath evaluators doesn't support lower-case function.
//h1[.//img[contains(lower-case(#alt), lower-case('your_keyword'))]]
Here is example:
//h1[.//img[contains(#alt, 'key ')]]
<html>
<h1> <!-- found -->
<img alt='here is my key' />
</h1>
<h1><!-- not found -->
<img alt='here is not' />
</h1>
<h1> <!-- found -->
<h2>
<img alt='the key is also here' />
</h2>
</h1>
<h1></h1> <!-- not found -->
</html>

Related

XPath selector for tag with specific descendant tags selects other tags

Given a document:
<html>
<body>
<div>
<div>No span</div>
<span>Target</span>
</div>
</body>
</html>
I would like to select the <div> containing the <span>. However, when I use this selector:
//div[//span]
It matches both <div>s:
<div><div>No span</div><span>Target</span></div> <-- what I wanted
<div>No span</div> <-- this is also matched
I tested this on Google Chrome's Devtools, as well as several online XPath evaluators, so I assume this is the correct behavior.
Why is this happening, and how can I fix my selector?
select the <div> containing the <span>
Use relative paths.
//div[.//span]
// starts from the document root. .// starts from the context element.
Predicates evaluate to true when the contained expression selects nodes. This means that //div[//span] is always true when there is a <span> anywhere in the document, in which case all <div>s in the document will be selected. //div[.//span] is only true when there is a <span> anywhere in the respective <div>.
If you mean "has a <span> child" (as opposed to "has a <span> descendant") this will work:
//div[span]
which is a shorthand for this (to underline the difference between / and //):
//div[./span]

Xpath: select div that contains class AND whose specific child element contains text

With the help of this SO question I have an almost working xpath:
//div[contains(#class, 'measure-tab') and contains(., 'someText')]
However this gets two divs: in one it's the child td that has someText, the other it's child span.
How do I narrow it down to the one with the span?
<div class="measure-tab">
<!-- table html omitted -->
<td> someText</td>
</div>
<div class="measure-tab"> <-- I want to select this div (and use contains #class)
<div>
<span> someText</span> <-- that contains a deeply nested span with this text
</div>
</div>
To find a div of a certain class that contains a span at any depth containing certain text, try:
//div[contains(#class, 'measure-tab') and contains(.//span, 'someText')]
That said, this solution looks extremely fragile. If the table happens to contain a span with the text you're looking for, the div containing the table will be matched, too. I'd suggest to find a more robust way of filtering the elements. For example by using IDs or top-level document structure.
You can use ancestor. I find that this is easier to read because the element you are actually selecting is at the end of the path.
//span[contains(text(),'someText')]/ancestor::div[contains(#class, 'measure-tab')]
You could use the xpath :
//div[#class="measure-tab" and .//span[contains(., "someText")]]
Input :
<root>
<div class="measure-tab">
<td> someText</td>
</div>
<div class="measure-tab">
<div>
<div2>
<span>someText2</span>
</div2>
</div>
</div>
</root>
Output :
Element='<div class="measure-tab">
<div>
<div2>
<span>someText2</span>
</div2>
</div>
</div>'
You can change your second condition to check only the span element:
...and contains(div/span, 'someText')]
If the span isn't always inside another div you can also use
...and contains(.//span, 'someText')]
This searches for the span anywhere inside the div.

Xpath first occurrence of a tree

I want to find the first occurrence of a tree. Example:
<div id='post>
<p>text1</p>
<p>text2</p>
<img src="a.jpg">
<img src="b.jpg">
<p>text3</p>
<p>text4</p>
<img src="c.jpg">
<p>text5</p>
</div>
I want to find the first occurrence of "p/img/#src".
When i do xpath search: .//div/p/img[1]/#src
it gives 2 hits, a.jpg and c.jpg
What is the xpath for only the first occurrence (a.jpg).
I would say .//div/(p/img)[1]/#src but is of course not working.
The best option would be:
(//img[#src])[1]/#src
or
(//p//img[#src])[1]/#src
ensuring img itself within a p element.
As Martin says img is not a child of p. Moreover in your example are missing single quote closing of id attribute inside div and tag closing of img.
Here your xml corrected:
<div id='post'>
<p>text1</p>
<p>text2</p>
<img src="a.jpg"/>
<img src="b.jpg"/>
<p>text3</p>
<p>text4</p>
<img src="c.jpg"/>
<p>text5</p>
</div>
Now to select the first image you can use simply //img[1]/#src or //img[#src="a.jpg"]

xpath: count preceding elements

I have an xml structure that looks like this:
<document>
<body>
<section>
<title>something</title>
<subtitle>Something again</subtitle>
<section>
<p xml:id="1234">Some text</p>
<figure id="2121"></figure>
<section>
<p xml:id="somethingagain">Some text</p>
<figure id="939393"></figure>
<p xml:id="countelement"></p>
</section>
</section>
</section>
<section>
<title>something2</title>
<subtitle>Something again2</subtitle>
<section>
<p xml:id="12345678">Some text2</p>
<figure id="939394"></figure>
<p xml:id="countelement2"></p>
</section>
</section>
</body>
</document>
How can I count the figure elemtens I have before the <p xml:id="countelement"></p> element using XPath?
Edit:
And i only want to count figure elements within the parent section, in the next section it should start from 0 again.
Given you're using an XPath 2.0 compatible engine, find the count element and call fn:count() for each of them with using all preceding figure-elements as input.
This will return the number of figures preceding each "countelement" on the same level (I guess this is what you actually want):
//p[#xml:id="countelement"]/count(preceding-sibling::figure)
This will return the number of figures preceding each "countelement" and the level above:
//p[#xml:id="countelement"]/count(preceding-sibling::figure | parent::*/preceding-sibling::figure)
This will return the number of all preceeding figures preceding each "countelement" and the level above:
//p[#xml:id="countelement"]/count(preceding::figure)
If you're bound to XPath 1.0, you won't be able to get multiple results. If #id really is an id (and thus unique), you will be able to use this query:
count(//p[#xml:id="countelement"]/preceding::figure)
If there are "countelements" which are not <p/> elements, replace p by *.
count(id("countelement")/preceding-sibling::figure)
Please note that the xml:id attributes of two different elements cannot the same value, such as "countelement". If you wish two different elements to have a same-named attribute with the same value "countelement", it must be some other attribute perhaps "kind" that is not of DTD attribute type ID. In that case in place of id("countelement") you would use *[#kind="countelement"].

xquery/xpath- how to get number of descendant nodes of a particular type

Take a look at the sample XML below--
<div id="main">
<div id="1">
Some random text
</div>
<div id="2">
Some random text
</div>
<div id="3">
Some random text
</div>
<p> Some more random text</p>
<div id="4">
Some random text
</div>
</div>
Now, how do I find out the number of divs within the main div using Xquery? And how to do this in XPath?
You can use the following XPath:
count(div[#id="main"]/div)
The function count does the counting, the main div is selected by its id.
The XPath expressions below can be used both in XPath and XQuery. This is so, because XPath (2.0) is a proper subset of XQuery.
Use:
count(/*//div)
If "the main div" isn't the top element of the XML document, and this is the only div whose id attribute has string value of "main", use:
count((//div[#id='main'])[1]//div)
If it is guaranteed that the div children of the "main div" dont have div descendents, use:
count((//div[#id='main'])[1]/div)
Do note: The XPath pseudo-operator // can be very inefficient -- this is why, always try to avoid using it, whenever the structure of the XML document is statically known and specific paths can be used.

Resources