Use Xpath To Retrieve Elements - xpath

HTML Portion:
<div class="abc">
<div style="text-align:left; itemscopr itemtype="xyz">
<h1 itemtype="mno"> I want this text </h1>
</div>
</div>
I am using
$text = $xpath->query('//div[class="abc"]/div/h1]
but I am getting no value. Please help me as I am new to it.

You should try
//div[#class="abc"]/div/h1
The difference is in the # sign before class, because the attribute axis is accessed this way. When you omit the # sign, it looks for node names (tag names).
This returns you the whole h1 node (or, rather, a node-set containing all the matching h1 nodes).
If you only wanted the text from the element, try the evaluate function instead:
$text = $xpath->evaluate("//div[#class='abc']/div/h1/text()")

Related

xPath - Why is this exact text selector not working with the data test id?

I have a block of code like so:
<ul class="open-menu">
<span>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text Here</strong>
<small>...</small>
</div>
</li>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text</strong>
<small>...</small>
</div>
</li>
</span>
</ul>
I'm trying to select a menu item based on exact text like so in the dev tools:
$x('.//*[contains(#data-testid, "menu-item") and normalize-space() = "Text"]');
But this doesn't seem to be selecting the element. However, when I do:
$x('.//*[contains(#data-testid, "menu-item")]');
I can see both of the menu items.
UPDATE:
It seems that this works:
$x('.//*[contains(#class, "menu-item") and normalize-space() = "Text"]');
Not sure why using a class in this context works and not a data-testid. How can I get my xpath selector to work with my data-testid?
Why is this exact text selector not working
The fact that both li elements are matched by the XPath expression
if omitting the condition normalize-space() = "Text" is a clue.
normalize-space() returns ... Text Here ... for the first li
in the posted XML and ... Text ... for the second (or some other
content in place of ... from div/svg or div/small) causing
normalize-space() = "Text" to fail.
In an update you say the same condition succeeds. This has nothing to
do with using #class instead of #data-testid; it must be triggered
by some content change.
How can I get my xpath selector to work with my data-testid?
By testing for an exact text match in the li's descendant strong
element,
.//*[#data-testid = "menu-item" and div/strong = "Text"]
which matches the second li. Making the test more robust is usually
in order, e.g.
.//*[contains(#data-testid,"menu-item") and normalize-space(div/strong) = "Text"]
Append /div/small or /descendant::small, for example, to the XPath
expression to extract just the small text.
data-testid="menu-item" is matching both the outer li elements while text content you are looking for is inside the inner strong element.
So, to locate the outer li element based on it's data-testid attribute value and it's inner strong element text value you can use XPath expression like this:
//*[contains(#data-testid, "menu-item") and .//normalize-space() = "Text"]
Or
.//*[contains(#data-testid, "menu-item") and .//*[normalize-space() = "Text"]]
I have tested, both expressions are working correctly

xpath:how to find a node that not contains text?

I have a html like:
...
<div class="grid">
"abc"
<span class="searchMatch">def</span>
</div>
<div class="grid">
<span class="searchMatch">def</span>
</div>
...
I want to get the div which not contains text,but xpath
//div[#class='grid' and text()='']
seems doesn't work,and if I don't know the text that other divs have,how can I find the node?
Let's suppose I have inferred the requirement correctly as:
Find all <div> elements with #class='grid' that have no directly-contained non-whitespace text content, i.e. no non-whitespace text content unless it's within a child element like a <span>.
Then the answer to this is
//div[#class='grid' and not(text()[normalize-space(.)])]
You need a not() statement + normalize-space() :
//div[#class='grid' and not(normalize-space(text()))]
or
//div[#class='grid' and normalize-space(text())='']

Extract text and ignore next node

From this:
<span class="postbody">
<span style="color: #8e2fb6">
<span style="font-weight: bold">nickname</span>
</span>
<br>
Example text
<br>
Example text
<br>
<p class="signature">THIS IS WHAT I DO NOT WANT</p>
</span>
I want to extract:
<br>
Example text
<br>
Example text
<br>
I tried: span/text()[1] but it seems not to work. I always get unwanted p class. Is it even possible to do?
First you need to load your Html string into a HtmlDocument or HtmlNode (Using .load() function).
ChildNodes collection contains every children of your current node (Basically every nodes under span.postbody).
After that what you need to do is pretty obvious, just grab #text and br nodes (keep in mind that you will receive some #text nodes that have just whitespace characters. You may want to filter it out in the result.
//load html to HtmlNode
node.ChildNodes.Where(n => n.Name.Equals("#text") || n.Name.Equals("br")) //It will return collection of HtmlNode
You can use the jQuery selector for postbody, then the .text method which should ignore the HTML. This will also ignore the .
$('.postbody').text();
An alternative would be to iterate through the children of the $('.postbody').text();
'//text()[preceding-sibling::br and normalize-space()]'

How can I select nodes that don't contain links but which do contain specific text using xpath

Given the following HTML:
$content =
'<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
<div>
<p>During the interim there shall be interim nourishment supplied</p>
</div>
<div>
<ul><li>During the interim there shall be nourishment supplied</li></ul>
</div>
</body>
</html>';
I want all the nodes containing the word "interim" but not if the word "interim" is part of a link element.
The nodes I would expect back are the first P node and the LI node only.
I've tried the following:
'//*/text()[not(a) and contains(.,"interim")]'
... but this still returns the A and also returns part of it's parent P node (the part after the A), neither of which are desired. You can see my attempt here: https://glot.io/snippets/ehp7hmmglm
If you use the XPath expression //*[not(self::a) and not(a) and text()[contains(.,"interim")]] then you get all elements that do not contain an a element, are not a elements and contain a text node child containing that word.

How to write the single xpath when the text is in two lines

How to write the single xpath for this
<div class="col-lg-4 col-md-4 col-sm-4 profilesky"> <div class="career_icon">
<span> Boost </span> <br/>
Your Profile </div>
I am able to write by two line using "contains" method.
.//*[contains(text(),'Boost')]
.//*[contains(text(),'Your Profile')]
But i want in a single line to write the xpath for this.
You can try this way :
.//*[#class='career_icon' and contains(., 'Boost') and contains(., 'Your Profile')]
Above xpath check if there is an element having class attribute equals career_icon and contains both Boost and Your Profile texts in the element body.
Note that text() only checks direct child text node. To check entire text content of an element simply use dot (.).
You can combine several rules just by writing them one after another since they refer to the same element:
.//[contains(text(),'Boost')][contains(text(),'Your Profile')]

Resources