XPath Y if XPath X contains string Z - xpath

Is it possible to apply XPath query X if some XPath query Y contains some string Z?
HTML
<div class="pluto">
<ul>
<li>apples</li>
<li>oranges</li>
</ul>
Success
</div>
Something like XPath:
//div[#class='pluto']/text() IF contains(//div/ul/li[1], "apples")
Output
Success

Just bringing the condition to the expression:
//div[#class='pluto'][contains(//div/ul/li[1], "apples")]/text()
Will return Sucess only if that contition is met. Otherwise the result will be empty.
On a side note, what you really seem to search could be simplified:
//div[#class='pluto'][contains(ul/li[1], "apples")]/text()
Once again, it will only bring results if the contains condition is met.
In XPath 2.0, on the other hand, if you need a value depending on your condition and not the node value itself, you can use a if expression:
if (//div[#class='pluto'][contains(ul/li[1], "apples")]/text()) then 'OK!' else 'NOT OK!'
Which will bring the then part ('OK!') if it finds a match and the else part ('NOT OK!') if finds nothing.
Update:
What if I would like to execute another xPath query if the condition is NOT met (kind of like an else-if)?
You can put any XPath expression in the then or else part:
if (//div[#class='pluto'][contains(ul/li[1], "apples")]/text())
then
concat('OK! :',//div/ul/li[1])
else
concat('NOT OK! :',//div/ul/li[2])
If-then-else Using XPath 1.0:
concat(
substring(concat('OK! :', //div/ul/li[1]) , 1, number( boolean(count( //div[#class='pluto'][contains(ul/li[1], "apples")]/text() ))) * string-length( concat('OK! :', //div/ul/li[1]) )),
substring(concat('NOT OK! :',//div/ul/li[2]) , 1, number(not(boolean(count( //div[#class='pluto'][contains(ul/li[1], "apples")]/text() )))) * string-length( concat('NOT OK! :',//div/ul/li[2]) ))
)

Related

How to scrape data using xpath contains?

How can i exclude element to be scraped using contains with OR my current xpath that i use is not working.
//div/li[contains(text(), 'Night') OR contains(text(), 'Big')
To complete #Sergii Dmytrenko's answer, use also a lowercase or operator.
//div/li[contains(text(), 'Night') or contains(text(), 'Big')]
The preceding XPath will output li elements containing the text "Night" or "Big" (case sensitive).
In order to exclude elements, you can use the not operator as previoulsy described.
Side note : using != (not equal) with and operator is also possible to exclude elements :
//div/li[text()!='Night' and text()!='Big']
This will exclude elements which strictly contain (no more text) "Night" or "Big".
EDIT : Assuming you have :
<div>
<h2>Night of the living dead</h2>
<h2>Big fish</h2>
<h2>Save the last dance</h2>
<h2>Tomorrow never die</h2>
<h2>Australia nuclear war</h2>
</div>
To select elements which don't contain "Night","Big", or "Australia", you have two options :
Using or operators inside a not condition :
//div/h2[not(contains(text(),'Night') or contains(text(),'Big') or contains(text(),'Australia'))]
Using multiple not with and operators :
//div/h2[not(contains(text(),'Night')) and not(contains(text(),'Big')) and not(contains(text(),'Australia'))]
Output : 2 nodes :
Save the last dance
Tomorrow never die
Your XPath expression (if corrected the typos: li[contains(text(), 'Night') or contains(text(), 'Big')]) will return li elements having the text "Night" or "Big".
to exclude these the correct expression should be
//div/li[not(contains(text(), 'Night') or contains(text(), 'Big'))]
or you may try
//div/li[not(contains(text(), 'Night')) and not(contains(text(), 'Big'))]
Your xpath should end with ']', currently it is invalid one.
If you would like to exclude 'Night' and 'Big' you may try this:
//div/li[not(contains(text(), 'Night') OR contains(text(), 'Big'))]

Can't use the right XPath expression for a certain item

Tried a lot but can't locate the item from this element using xpath.
<div class="info-list-text"><b>Contact</b>: James Crisp</div>
I tried this XPath expression, but without luck:
//div[#class="info-list-text"]/text()
Thanks in advance to take care of this problem.
Btw, I wanna get to "James Crisp"
Try this :
normalize-space( translate( //div[#class="info-list-text"]/text() , ':', '' ) )
It works as follows :
Get the text from the <div>
Translate : into empty string
Then remove any spaces

Does xpath support "or" function

In case below two elements do not show in same time
<a title='a' />
<b title='b' />
I want to check if one of them can show
does xpath support the 'or' function? I just want to write in one line:
//a[#title='a'] or .. #title='b' ??
XPath Operators
Select either matching nodes (your case here):
//a[#title='a'] | //b[#title='b']
Select one element with either matching attributes
//a[#title='a' or #title='b']
If you want to match either <a/> elements with #title='a' attribute or <b/> elements with #title='b' attribute, you can also match all elements and perform a test on their name:
//*[local-name(.) = 'a' and #title='a' or local-name(.) = 'b' and #title='b']

XPath: How to grab multiple strings when doing a string, substring, or another function on text() nodes

I want to use XPath to grab a list of modified strings via the text() function
Example code:
<div>
<p>
Monday 2/4/13
</p>
<p>
Tuesday 2/5/13
</p>
</div>
Now in this example, if I wanted to grab an array of the text between the markups, I'd write an expression such as .//div/p/text(). However, if I wanted to only grab the dates, I could use a substring-after function, but the code substring-after(.//div/p/text(), ' ') only grabs one element. How does I write this expression to grab all the text elements?
In XPath 2.0, you can use the function directly in the text():
//div/p/substring-after(text(), ' ')
In XPath 1.0, that cannot be achieved with only one expression because:
the substring-after() function takes a string as first parameter, not a node-set
a function cannot be specified as a location step (as the 2.0 example above does).
So, in 1.0, your best bet is something like (which you'd have to repeat for each node - notice also it returns just a string):
concat(substring-after(//div/p[1]/text(), ' '),
' ',
substring-after(//div/p[2]/text(), ' '))

use YQL with substring-before in xpath

I am trying to get a string before '--' within a paragraph in an html page using the xpath and send it to yql
for example i want to get the date from the following article:
<div>
<p>Date --- the body of the article</p>
</div>
I tried this query in yql:
select * from html where url="article url" and xpath="//div/p/text()/[substring-before(.,'--')]"
but it does not work.
how can I get the date of the article which is before the '--'
You can simply use:
substring-before(//div/p,'--')
Use:
substring-before(/div/p/text(), '--')
This XPath expression evaluates to the string immediately preceding '--' in the first text node in the XML document, that is a child of a p that is a child of the div top element.
In case you want to get this value for every such text node, you have to use an expression like:
substring-before((//div/p/text())[$k], '--')
and evaluate this expression $N times, for $k = 1,2, ..., $N
where $N is count(//div/p/text())
Do note: Try to avoid using the // XPath pseudo-operator always when the structure of the XML document is statically known. Using // usually results in big inefficiency (O(N^2)) that are felt especially painful on big XML documents.

Resources