How to get a list of concatenated text nodes - xpath

My purpose is to request on a xml structure, using only one XPath evaluation, in order to get a list of strings containing the concatenation of text3 and text5 for each "my_class" div.
The structure example is given below:
<div>
<div>
<div class="my_class">
<div class="my_class_1"></div>
<div class="my_class_2">text2</div>
<div class="my_class_3">
text3
<div class="my_class_4">text4</div>
<div class="my_class_5">text5</div>
</div>
</div>
<div class="my_class_6"></div>
</div>
<div>
<div class="my_class">
<div class="my_class_1"></div>
<div class="my_class_2">text12</div>
<div class="my_class_3">
text13
<div class="my_class_4">text14</div>
<div class="my_class_5">text15</div>
</div>
</div>
</div>
</div>
This means I want to get this list of results:
- in index 0 => text3 text5
- in index 1 => text13 text15
I currently can only get the my_class nodes, but with the text12 that I want to exclude ; or a list of each string, not concatened.
How I could proceed ?
Thanks in advance for helping.
EDIT : I remove text4 and text14 from my search to be exact in my example

EDIT: Now the question has changed...
XPath 1.0: There is no such thing as "list of strings" data type. You can use this expression to select all the container elements of the text nodes you want:
/div/div/div[#class='my_class']/div[#class='my_class_3']
And then get with the proper DOM method of your host language the string value of every of those selected elements (the concatenation of all descendant text nodes) the descendat text nodes you want and concatenate their string value with the proper relative XPath or DOM method:
text()[1]|div[#class='my_class_5']
XPath 2.0: There is a sequence data type.
/div/div/div[#class='my_class']
/div[#class='my_class_3']
/concat(text()[1],div[#class='my_class_5'])

Could you not just use:
//my_class/my_class_3
And then get the .innerText from that? There might be a bit of spacing cleanup to do but it should contain all the inside text (including that from the class 4 and 5) but without the tags.
Edit: After clairification
concat(/div/div/div[#class=my_class]/div[#class=my_class_3]/text(), ' ', /div/div/div[#class=my_class]/div[#class=my_class_5]/text())
That might work

Related

Getting single element with similar xpaths but with different same level, "neighboring" node

I'm trying to get the xpath of an element with a similar xpath to others but has a "neighbor" element that's different . Please see example below.
<div>
<div id='a'> </div>
<span> Text here </span> #this is what i'm trying to get
</div>
<div>
<div id='b'> </div>
<span> Text here </span>
</div>
I tried using //div//span, but this gives me the 2 spans. So i tried using //div//child::div[#id='a']//ancestor::div//child::span, but it doesn't look pleasant and looks repetitive. Does this have a better implementation?
try
//div[div[#id='a']]/span
it says get the span child node of all div nodes with child node div (with an #id equal to 'a').

xpath:how to find a node that not contains text?

I have a html like:
...
<div class="grid">
"abc"
<span class="searchMatch">def</span>
</div>
<div class="grid">
<span class="searchMatch">def</span>
</div>
...
I want to get the div which not contains text,but xpath
//div[#class='grid' and text()='']
seems doesn't work,and if I don't know the text that other divs have,how can I find the node?
Let's suppose I have inferred the requirement correctly as:
Find all <div> elements with #class='grid' that have no directly-contained non-whitespace text content, i.e. no non-whitespace text content unless it's within a child element like a <span>.
Then the answer to this is
//div[#class='grid' and not(text()[normalize-space(.)])]
You need a not() statement + normalize-space() :
//div[#class='grid' and not(normalize-space(text()))]
or
//div[#class='grid' and normalize-space(text())='']

CSS / xpath selector to find h3 tag with text in a given class?

Selector to find a element with <h3> with some text which is a descendant of a class ?
Tried with xpath="//*[#class='body']//descendant::h3[contains(text(), sampletext]
This doesn't work. Is there a way I can find this ?
<div class="body">
<h3> text1 </h3>
<p>....</p>
<h3> text2 </h3>
<p>... </p>
<h3> text3 </h3>
</div>
Selector to find <h3> tag containing text3 in className="body"?
Try this simple xpath and let me know if facing any issue
//div[#class='body']/h3[text()='text3']
OR for trimming the spaces before and after your text
//div[#class='body']/h3[normalize-space()='text3']
Below to get the element bases on partial text match
//div[#class='body']/h3[contains(.,'text3')]
You missed single quote contains(text(), sampletext)]
It should be 'sampletext'
xpath="//div[#class='body']//descendant::h3[contains(text(), 'sampletext')]"
if you want to find h3 tag
xpath="//div[#class='body']/h3[contains(text(), 'text3')]"

Select all nodes between two elements excluding unnecessary element from the intersection using XPath

There’s a document structured as follows:
<div class="document">
<div class="title">
<AAA/>
</div class="title">
<div class="lead">
<BBB/>
</div class="lead">
<div class="photo">
<CCC/>
</div class="photo">
<div class="text">
<!-- tags in text sections can vary. they can be `div` or `p` or anything. -->
<DDD>
<EEE/>
<DDD/>
<CCC/>
<FFF/>
<FFF>
<GGG/>
</FFF>
</DDD>
</div class="text">
<div class="more_text">
<DDD>
<EEE/>
<DDD/>
<CCC/>
<FFF/>
<FFF>
<GGG/>
</FFF>
</DDD>
</div class="more_text">
<div class="other_stuff">
<DDD/>
</div class="other_stuff">
</div class="document">
The task is to grab all the elements between <div class="lead"> and <div class="other_stuff"> except the <div class="photo"> element.
The Kayessian method for node-set intersection $ns1[count(.|$ns2) = count($ns2)] works perfectly. After substituting $ns1 with //*[#class="lead"]/following::* and $ns2 with //*[#class="other_stuff"]/preceding::*,
the working code looks like this:
//*[#class="lead"]/following::*[count(. | //*[#class="other_stuff"]/preceding::*)
= count(//*[#class="other_stuff"]/preceding::*)]/text()
It selects everything between <div class="lead"> and <div class="other_stuff"> including the <div class="photo"> element. I tried several ways to insert not() selector in the formula itself
//*[#class="lead" and not(#class="photo ")]/following::*
//*[#class="lead"]/following::*[not(#class="photo ")]
//*[#class="lead"]/following::*[not(self::class="photo ")]
(the same things with /preceding::* part) but they don't work. It looks like this not() method is ignored – the <div class="photo"> element remains in the selection.
Question 1: How to exclude the unnecessary element from this intersection?
It’s not an option to select from <div class="photo"> element excluding it automatically because in other documents it can appear in any position or doesn't appear at all.
Question 2 (additional): Is it OK to use * after following:: and preceding:: in this case?
It initially selects everything up to the end and to the beginning of the whole document. Could it be better to specify the exact end point for the following:: and preceding:: ways? I tried //*[#class="lead"]/following::[#class="other_stuff"] but it doesn’t seem to work.
Question 1: How to exclude the unnecessary element from this intersection?
Adding another predicate, [not(self::div[#class='photo'])] in this case, to your working XPath should do. For this particular case, the entire XPath would look like this (formatted for readability) :
//*[#class="lead"]
/following::*[
count(. | //*[#class="other_stuff"]/preceding::*)
=
count(//*[#class="other_stuff"]/preceding::*)
][not(self::div[#class='photo'])]
/text()
Question 2 (additional): Is it OK to use * after following:: and preceding:: in this case?
I'm not sure if it would be 'better', what I can tell is following::[#class="other_stuff"] is invalid expression. You need to mention the element to which the predicate will be applied, for example, 'any element' following::*[#class="other_stuff"], or just 'div' following::div[#class="other_stuff"].

Xpath: select div that contains class AND whose specific child element contains text

With the help of this SO question I have an almost working xpath:
//div[contains(#class, 'measure-tab') and contains(., 'someText')]
However this gets two divs: in one it's the child td that has someText, the other it's child span.
How do I narrow it down to the one with the span?
<div class="measure-tab">
<!-- table html omitted -->
<td> someText</td>
</div>
<div class="measure-tab"> <-- I want to select this div (and use contains #class)
<div>
<span> someText</span> <-- that contains a deeply nested span with this text
</div>
</div>
To find a div of a certain class that contains a span at any depth containing certain text, try:
//div[contains(#class, 'measure-tab') and contains(.//span, 'someText')]
That said, this solution looks extremely fragile. If the table happens to contain a span with the text you're looking for, the div containing the table will be matched, too. I'd suggest to find a more robust way of filtering the elements. For example by using IDs or top-level document structure.
You can use ancestor. I find that this is easier to read because the element you are actually selecting is at the end of the path.
//span[contains(text(),'someText')]/ancestor::div[contains(#class, 'measure-tab')]
You could use the xpath :
//div[#class="measure-tab" and .//span[contains(., "someText")]]
Input :
<root>
<div class="measure-tab">
<td> someText</td>
</div>
<div class="measure-tab">
<div>
<div2>
<span>someText2</span>
</div2>
</div>
</div>
</root>
Output :
Element='<div class="measure-tab">
<div>
<div2>
<span>someText2</span>
</div2>
</div>
</div>'
You can change your second condition to check only the span element:
...and contains(div/span, 'someText')]
If the span isn't always inside another div you can also use
...and contains(.//span, 'someText')]
This searches for the span anywhere inside the div.

Resources