XPath: counting elements excluding another elements from count - xpath

I need to count all the elements that includes text 'Automation' but excluding the span[#class = "hot"] elements from the count.
The site with a html code is : https://www.epam.com/careers/job-listings?sort=best_match&query=&department=Software+Test+Engineering&city=all&country=all
I can count every search result by:
count(//li[contains(#class, 'search-result-item')])
and it works as needed but how should I include only the elements with
//a[contains(text(), 'Automation')]
and exclude the search result elements that has
span[#class = "hot"]

Selects li that does not have a span with class="hot" and from there select your a link.
It should be something like:
//li[not(.//span[#class="hot"])]//a[contains(text(), 'Automation')]
You could also add other restraints for li to have a certain class and/or from certain section.

Related

Xpath for selecting inner text of an element

I need to select the following element on a webpage
< td colspan = "1" >
< span class="text-left">
<strong>QTY Total
</strong></span>
<span style = "float: right;" >
< strong > 20.99 </ strong >
</ span >
</ td >
The element I need to extract is the 20.99 text which is a child element
I have tried the following to locate the element and it returns nothing
$x("//*[#style='float: right;' and contains(text(), '20.99')]")
I tried two locate the element using the following
$x("//*[#style='float: right;']")
But this locates to elements with the same style.
Does anybody know an Xpath I can try which will locate the 20.99?
In your first XPath the "20.99" text is the child strong. So you have to use that in the contains(...) clause like
//*[#style='float: right;' and contains(strong/text(), '20.99')]/strong
Probably an even better XPath would be searching explicitly for span elements
//span[#style='float: right;' and contains(strong/text(), '20.99')]/strong
which also works.
Both select the strong element child of a span element with the text "20.99".
The element I need to extract is the 20.99 text
To get the text() node of the strong tag append a /text() to the expressions:
Using text() is usually wrong: the predicate [contains(text(), 'xyz')] is almost always better written as [contains(., 'xyz')].
That's because . (which here is equivalent to string(.)) selects the content of the context element as a string regardless of any nested elements, comments etc, whereas text() only looks at immediate text node children.

How to find an element that has a specific text without other inside elements

I have an HTML string like this.
html = '<div>outer<div>inner</div></div>'
I want to get text from only inside of a div element.
doc = Nokogiri::HTML(html)
doc.xpath('//div[contains(.,"inner")]')
But this code gets not only the inner element, but also the outer element because the outer element also contains the text inner.
How can I find an element that contains a specific text without inner HTML tag?
I can easily get the inner element in this case by doc.css('div > div'), but in real case, I am not sure how many div tags exist. And the inner text may include more text, except for inner like:
html = '<div>outer<div>inner text</div></div>'

how to match no following sibling

Here's my xml,
<w:tc>
<w:p>
<w:pPr></w:pPr>
<w:r></w:r>
</w:p>
</w:tc>
<w:tc>
<w:p>
<w:pPr></w:pPr>
</w:p>
</w:tc>
I want to match w:p which is preceded by w:tc and has no following sibling w:r, Precisely i want second w:tc. Code what i have tried,
<xsl:template match="w:pPr[ancestor::w:p[ancestor::w:tc] and not(following-sibling::w:r)]">
I need xpath for w:pPr having no following-sibling
The problem is when w:pPr is followed by w:hyperlink. Now i have ignored w:hyperlink too.
If you want to match a w:pPr that has no following sibling elements at all (regardless of name), then just use a match pattern of
w:pPr[ancestor::w:p[ancestor::w:tc] and not(following-sibling::*)]
or equivalently (and slightly shorter)
w:tc//w:p//w:pPr[not(following-sibling::*)]
Using the XPath is simple and straightforward, you have to filter elements olny. Your filtring could be based on the content of the element (using [] and path inside the brackets). With the filtered elements you can work as same as with the XML tree (start filtering again or select the final elements).
In your case, first you have to choose the correct tc element (filter the element as you need):
Based on the count of elements: //tc[count(./p/*) = 1], or
Based on non existing r element: //tc[not(./p/r)], or
Based on non existing r and hyperlink element: //tc[not(./p/r) and not(./p/hyperlink)]
Based on existing pPr and non existing r (it is not a necessary because the pPr is filtred in second step): //tc[./p/r and not(./p/r)]
It returns the following XML.
<tc>
<p>
<pPr>pPr</pPr>
</p>
</tc>
Then just simply say what do you want from the new XML:
Do you want the pPr element? Use: /p/pPr
All together:
//tc[count(./p/*) = 1]/p/pPr
or
//tc[not(./p/r)]/p/pPr
Note: // means find the element anywhere in the document.
Update 1: Hyperlink condition added.

XPATH filter tag-less children

Is there any way to specify that I want to select only tag-less child elements (in the following example - "text")?
<div>
<p>...</p>
"text"
</div>
The text() function matches text nodes. Example: //div/text() — matches all text children within all div elements.
Use:
/*/text()[normalize-space()]
This selects all text nodes that are children of the top element of the document and that do not consist only of white-space characters.
In the concrete example this will select only the text node with string value:
'
"text"
'
The XPath expressions:
/*/text()
or
/div/text()
both select two text nodes, the first of which contains only white-space and the second is the same text node as above:
'
"text"
'
select only tag-less child elements
To me this sounds like selecting all elements that don't have other elements as children. But then again, "text" in your example is not an element, but a text node, so I'm not really sure what do you want to select...
Anyway, here is a solution for selecting such elements.
//*[not(*)]
Selects all elements that don't have an element as a child. Replace the first * with an element name if you only want to select certain elements that don't have child elements. Also note that using // is generally slow since it runs through the whole document. Consider using more specific path when possible (like /div/*[not(*)] in this case).

Use Nokogiri to get all nodes in an element that contain a specific attribute name

I'd like to use Nokogiri to extract all nodes in an element that contain a specific attribute name.
e.g., I'd like to find the 2 nodes that contain the attribute "blah" in the document below.
#doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML
<body>
<h1 blah="afadf">Three's Company</h1>
<div>A love triangle.</div>
<b blah="adfadf">test test test</b>
</body>
EOHTML
I found this suggestion (below) at this website: http://snippets.dzone.com/posts/show/7994, but it doesn't return the 2 nodes in the example above. It returns an empty array.
# get elements with attribute:
elements = #doc.xpath("//*[#*[blah]]")
Thoughts on how to do this?
Thanks!
I found this here
elements = #doc.xpath("//*[#*[blah]]")
This is not a useful XPath expression. It says to give you all elements that have attributes that have child elements named 'blah'. And since attributes can't have child elements, this XPath will never return anything.
The DZone snippet is confusing in that when they say
elements = #doc.xpath("//*[#*[attribute_name]]")
the inner square brackets are not literal... they're there to indicate that you put in the attribute name. Whereas the outer square brackets are literal. :-p
They also have an extra * in there, after the #.
What you want is
elements = #doc.xpath("//*[#blah]")
This will give you all the elements that have an attribute named 'blah'.
You can use CSS selectors:
elements = #doc.css "[blah]"

Resources