Selecting specific using x-path while disregarding certain nodes - xpath

I have some html that looks pretty much like this.
<p>
<a img src="img src">
<strong>foo</strong>
<strong>bar</strong>
<strong>baz</strong>
<strong>eek</strong>
This is the text I want to select using xpath.
</p>
How can I select only this particular text node as indicated above using xpath?

How do I get at only this particular
text element in question using xpath?
Use:
/p/text()[last()]

"/p/text()" xpath expression will select the text from "p" node in above XML (Posted in question).
/p/text()[normalize-space()]
this will remove trailing spaces from string. This xpath produces exactly what you want.
There is very good tutorial at http://www.w3schools.com/xpath/

Related

Xpath not contains

Is there a way to select the xpath which doesn't contain ng-show
Please find the xpath below:
I'm familiar with not(contains()), but it has 2 parameters.
I would like it to not contain the ng-showitself, because I have a few more element containing ng-show and I don't want to select any of them.
<span ng-show="displayValue" class="ng-binding">0 km</span>
Thanks in Advance
Try this below xpath
//span[not(#ng-show)][not(#class='percent ng-binding')][#class='ng-binding']
Explanation of xpath:- Only those <span> tag will return, which attribute does not contains ng-show

How to find xpath expression to select this text

I have this html code , trying many times to get the pure xpath for text "sample text" then "author" text in separate xpath and i don't find any criteria for that!!!
<div class="Text">
“sample article here with quotation marks .”
<br/>
―
Author
so please help , it make me mad!!
thanks
The first part you can get by getting the div by class, get br inside and retrieve the preceding-sibling's text:
//div[#class="Text"]/br/preceding-sibling::text()
The second part is easier, just get the text of a tag inside the div:
//div[#class="Text"]/a/text()

Xpath of a text containing Bold text

I am trying to click on the link whose site is www.qualtrapharma.com‎ by searching in google
"qualtra" but there is problem in writing xpath as <cite> tag contains <B> tag inside it. How to do any any one suggest?
<div class="f kv" style="white-space:nowrap">
<cite class="vurls">
www.
<b>qualtra</b>
pharma.com/
</cite>
<div>
You may overcome this by using the '.' in the XPath, which stands for the 'text in the current node'.
The XPath would look like the following:
//cite[.='www.qualtrapharma.com/']

what xpath to select CDATA content when some childs exist

Let's say I have an XML that looks like this:
<a>
<b>
<![CDATA[some text]]>
<c>xxx</c>
<d>yyy</d>
</b>
</a>
I can't find a way to get "some text". Any idea?
If I'm using "a/b" it returns also xxx and yyy
If I'm using "a/b/text()" it returns nothing
You can't actually select a CDATA section: CDATA is just a way of telling the parser to avoid unescaping special characters, and your input document looks to XPath exactly the same as:
<a>
<b>
some text
<c>xxx</c>
<d>yyy</d>
</b>
</a>
(Having said that, if you're using DOM, then some DOM XPath engines fail to implement the spec correctly, and treat the CDATA content as a separate text node from the text outside the CDATA section).
The XPath expression a/b/text() should select three text nodes, of which the first contains "some text" along with surrounding whitespace.
With the XPath data model the path /a/b/text()[1] should select a text node with the string value
some text
that is a line break, some spaces, the text some text followed by a line break and some spaces.

How to get node text without children?

I use Nokogiri for parse the html page with same content:
<p class="parent">
Useful text
<br>
<span class="child">Useless text</span>
</p>
When I call the method page.css('p.parent').text Nokogiri returns 'Useful text Useless text'. But I need only 'Useful text'.
How to get node text without children?
XPath includes the text() node test for selecting text nodes, so you could do:
page.xpath('//p[#class="parent"]/text()')
Using XPath to select HTML classes can become quite tricky if the element in question could belong to more than one class, so this might not be ideal.
Fortunately Nokogiri adds the text() selector to CSS, so you can use:
page.css('p.parent > text()')
to get the text nodes that are direct children of p.parent. This will also return some nodes that are whtespace only, so you may have to filter them out.
You should be able to use page.css('p.parent').children.remove.
Then your page.css('p.parent').text will return the text without the children nodes.
Note: the page will be modified by the remove

Resources