XPath and negation searches

XPath and negation searches - ruby

I have the following code sample in an xmlns root:
<ol class="stan">
<li>Item one.</li>
<li>
<p>Paragraph one.</p>
<p>Paragraph two.</p>
</li>
<li>
<pre>Preformated one.</pre>
<p>Paragraph one.</p>
</li>
</ol>
I would like to perform a different operation on the first item in <li> depending on the type of tag it resides in, or no tag, i.e. the first <li> in the sample.
EDIT:
My logic in pursuing the task turns out to be incorrect.
How do I query a <li> that has no descendants as in the first list item?
I tried negation:
#doc.xpath("//xmlns:ol[#class='stan']//xmlns:li/xmlns:*[1][not(p|pre)]")
That gives me the exact opposite for what I think I am asking for.
I think I am making the expression more complicated since I can't find the right solution.
UPDATE:
Navin Rawat has answered this one in the comments. The correct code would be:
#doc.xpath("//xmlns:ol[#class='stan']/xmlns:li[not(xmlns:*)]")
CORRECTION:
The correct question involves both an XPath search and a Nokogiri method.
Given the above xhtml code, how do I search for first descendant using xpath? And how do I use xpath in a conditional statement, e.g.:
#doc.xpath("//xmlns:ol[#class='stan']/xmlns:li").each do |e|
if e.xpath("e has no descendants")
perform task
elsif e.xpath("e first descendant is <p>")
perform second task
elsif e.xpath("e first descendant is <pre>")
perform third task
end
end
I am not asking for complete code. Just the part in parenthesis in the above Nokogiri code.

Pure XPath answer...
If you have the following XML :
<ol class="stan">
<li>Item one.</li>
<li>
<p>Paragraph one.</p>
<p>Paragraph two.</p>
</li>
<li>
<pre>Preformated one.</pre>
<p>Paragraph one.</p>
</li>
</ol>
And want to select <li> that has no child element as in the first list item, use :
//ol/li[count(*)=0]
If you have namespaces problem, please give to whole XML (with the root element and namespaces declaration) so that we can help you dealing with it.
EDIT after our discussion, here is your final tested code :):
#doc.xpath("//xmlns:ol[#class='footnotes']/xmlns:li").each do |e|
if e.xpath("count(*)=0")
puts "No children"
elsif e.xpath("count(*[1]/self::xmlns:p)=1")
puts "First child is <p>"
elsif e.xpath("count(*[1]/self::xmlns:pre)=1")
puts "First child is <pre>"
end
end

Related

XPath valid in Firefox but not in Chrome

I am trying to find a menu element via XPath in the JupyterLab UI; The following is an extract of the list of elements in the menu I am interested in, and should be a good minimal example of my problem:
<li tabindex="0" aria-disabled="true" role="menuitem" class="lm-Menu-item p-Menu-item lm-mod-disabled p-mod-disabled lm-mod-hidden p-mod-hidden" data-type="command" data-command="filemenu:logout">
<div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon"></div>
<div class="lm-Menu-itemLabel p-Menu-itemLabel">Log Out</div>
<div class="lm-Menu-itemShortcut p-Menu-itemShortcut"></div>
<div class="lm-Menu-itemSubmenuIcon p-Menu-itemSubmenuIcon"></div>
</li>
<li tabindex="0" role="menuitem" class="lm-Menu-item p-Menu-item" data-type="command" data-command="hub:logout"><div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon">
<div class="f1vya9e0 lm-Menu-itemIcon p-Menu-itemIcon jp-Icon"></div>
<div class="lm-Menu-itemLabel p-Menu-itemLabel">Log Out</div>
<div class="lm-Menu-itemShortcut p-Menu-itemShortcut"></div>
<div class="lm-Menu-itemSubmenuIcon p-Menu-itemSubmenuIcon"></div>
</li>
As you can see, both <li> items contain a <div> with the text Log Out, which is my main problem, as I am trying to write a general Xpath expression that can work for any Menu item. What I am currently trying to use is:
//div[contains(#class, 'p-Menu-itemLabel')][text() = '${item}']
Where ${item} can be any menu item, as all <li> items will have a similar div with text in them. The problem arises with the Log Out item, which is the only one that is repeated twice. In order to handle this special case, I have though of using
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/..[not(contains(#class,'p-mod-hidden'))]
Since either one of the two <li> items will not contain that specific class (i.e., the currently active Log Out element).
This XPath works fine in Firefox and finds the element I am looking for everytime, however Chrome complains that it is not a valid XPath expression. Somehow this reduced version:
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/..
works in Chrome, but any time I try to use an attribute selector on the parent element (i.e. /..[something]) it fails to recognize it as a valid XPath.
Does anyone have any idea of why? And what can I do to make Chrome recognize it as a valid XPath?

It seems that Chrome doesn't like applying a predicate directly from the .. parent axis.
But you can modify to use the long form: parent::*
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/parent::*[not(contains(#class,'p-mod-hidden'))]
Or apply the self::* axis and then apply the predicate:
//div[contains(#class, 'p-Menu-itemLabel')][text() = 'Log Out']/../self::*[not(contains(#class,'p-mod-hidden'))]

How to select by non-direct child condition in Xpath?

I would like to show an example.
This how the page looks:
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Hello</span>
</div>
</a>
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Pick Delivery Location</span>
</div>
</a>
I want to select anchor tags that have a child (direct or non-direct) span that has the text 'Hello'.
Right now, I do something like this:
//a[#class='aclass'][div/span[text() = 'Hello']]
I want to be able to select without having to select direct children (div in this case), like this:
//a[#class='aclass'][//span[text() = 'Hello']]
However, the second one finds all the anchor tags with the class 'aclass' rather than the one with the span with 'Hello' text.
I hope I worded my question clearly. Please feel free to edit if necessary.

In your attempt, // goes back to the root of the document - effectively you are saying "Give me the as for which there is a span anywhere in the document", which is why you get them all.
What you need is the descendant axis :
//a[#class='aclass' and descendant::span[text() = 'Hello']]
Note I have joined the conditions with and, but two separate conditions would also work.

XPath Exclude Text From Child Element

I'm looking to get the output:
50ml milk
From the following code:
<ul class="ingredients-list__group">
<li>50ml <a href="/glossary/milk" class="tooltip-processed">milk
<div class="tooltip">
<h2
class="node-title">Milk</h2> <span class="fonetic">mill-k</span>
<p>One of the most widely used ingredients, milk is often referred to as a complete food. While cow…</p>
</div>
</a>
</li>
</ul>
Currently I'm using the XPATH:
//ul[#class="ingredients-list__group"]/li
But getting:
50ml milk Milk mill-kOne of the most widely used ingredients, milk is often referred to as a complete food. While cowâ€¦
How do I exclude the stuff within the div/tooltip?

With xpath 2.0:
//ul[#class="ingredients-list__group"]/li/concat(./text()[1], ./a/text()[1])
With xpath 1.0:
concat(//ul[#class="ingredients-list__group"]/li/text()[1], //ul[#class="ingredients-list__group"]/li/a/text()[1])'

You can select the relevant text nodes using
//ul[#class="ingredients-list__group"]//
text()[not(ancestor::div[#class='tooltip'])]
If you're in XPath 2.0 you can then put this in a call of string-join() to join these into a single string. If you're stuck with 1.0, you'll have to return multiple text nodes to the calling application and concatenate them together in the host language code.

XPath in RSelenium for indexing list of values

Here is an example of html:
<li class="index i1"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample.com">
<li class="index i2"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample2.com">
I would like to take the value of every href in a element. What makes the list is the class in the first li in which class' name change i1, i2.
So I have a counter and change it when I go to take the value.
i <- 1
stablestr <- "index "
myVal <- paste(stablestr , i, sep="")
so even if try just to access the general lib with myVal index using this
profile<-remDr$findElement(using = 'xpath', "//*/input[#li = myVal]")
profile$highlightElement()
or the href using this
profile<-remDr$findElement(using = 'xpath', "/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']")
profile$highlightElement()
Is there anything wrong with xpath?

Your HTML structure is invalid. Your <li> tags are not closed properly, and it seems you are confusing <ol> with <li>. But for the sake of the question, I assume the structure is as you write, with properly closed <li> tags.
Then, constructing myVal is not right. It will yield "index 1" while you want "index i1". Use "index i" for stablestr.
Now for the XPath:
//*/input[#li = myVal]
This is obviously wrong since there is no input in your XML. Also, you didn't prefix the variable with $. And finally, the * seems to be unnecessary. Try this:
//li[#class = $myVal]
In your second XPath, there are also some errors:
/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']
^ ^ ^
missing $ should be #class is actually 'tlt mhead'
The first two issues are easy to fix. The third one is not. You could use contains(#class, 'tlt'), but that would also match if the class is, e.g., tltt, which is probably not what you want. Anyway, it might suffice for your use-case. Fixed XPath:
/li[#class=$myVal]/ol[#id='rem']/div[#class='bare']/h3/a[contains(#class, 'tlt')]

XPath: Select first element in each row which matches a specific class

Is it possible to select the first element in each row which matches a specific class? This is the HTML structure at the moment.
<ul>
<li>
<article>
<time class="published-date"></time>
<p>Text</p>
</article>
</li>
<li>
<article>
<time class="published-date"></time>
<p>Text</p>
</article>
</li>
<ul>
I was wondering what would be the best and most specific query string in terms of getting the time element with the class published-date in each row?

If there are more time elements with class="published-date" in every row, you need to use indexing (1-based):
//ul/li/article/time[#class = "published-date"][1]
If there is only a single time element in every row, simply do:
//ul/li/article/time[#class = "published-date"]

Using the XPath selector....
//time[#class="published-date"]
...will select all time nodes with the class published-date. XPathFiddle

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

XPath and negation searches - ruby

Related

XPath valid in Firefox but not in Chrome

How to select by non-direct child condition in Xpath?

XPath Exclude Text From Child Element

XPath in RSelenium for indexing list of values

XPath: Select first element in each row which matches a specific class

Categories

Resources