How to find duplicate / non distinct values with xpath - xpath

Given the following example html:
<products>
<product>
<sku>10021</sku>
</product>
<product>
<sku>10021</sku>
</product>
<product>
<sku>10022</sku>
</product>
<product>
<sku>10023</sku>
</product>
<product>
<sku>10023</sku>
</product>
</products>
I know how to find distinct sku values with xpath: distinct-values(//sku), which will output:
10021
10022
10023
But how would I get the ones that are not distinct, so:
10021
10023
I am using xidel, so XPath 3 is fine. But if it can be done somehow with XPath 1, preferably without XSLT, I would very much like to read about that as well.

You can try this one to get sku node with text that equal to text of at least one another sku:
distinct-values(//sku[.=following::sku])
You can also do it in XPath 1 like this:
//sku[.=preceding::sku and not(.=following::sku)]

Related

xpath for price with currency

<category>
<Movi Name="Test">
<Price>$3.95</Price>
</Movi>
<Movi Name="test d">
<Price>$13.95</Price>
</Movi>
</category>
can anyone help on this XML to find movie greater than $11 with XPath
Given all prices are in the same currency and format, this bit of XPath does the job:
/category/Movi[number(substring(./Price/text(), 2)) > 11]
Just for the sake of completeness, another option is:
//Price[number(translate(text(), '$','')) > 11]

How to compare element position in xpath

I am trying to compare customer account values to display only different values and ignore duplicate in XPath:
XML code:
<info>
<Customer CustAccount="1"/>
<Customer CustAccount="2"/>
<Customer CustAccount="2"/>
<Customer CustAccount="3"/>
</info>
The result should compare customer 1/2/3 and display:
customer 1
customer 2
customer 3
You can achieve this with the XPath-2.0 expression
for $c in distinct-values(/info/Customer/#CustAccount) return concat('customer ',$c,'
')
Output is:
customer 1
customer 2
customer 3
If you do not like the newlines, remove the
from the expression.
There is no pure XPath-1.0 expression achieving this; you could only do this with XSLT-1.0 if XPath-2.0 is unavailable.
Here is the pure xpath 1.0 solution.
Sample xml:
<root >
<info>
<Customer CustAccount="1"/>
<Customer CustAccount="2"/>
<Customer CustAccount="2"/>
<Customer CustAccount="3"/>
</info>
</root>
xpath 1.0:
/root/info/Customer[not(./#CustAccount=preceding::Customer/#CustAccount)]
Evidence:

Talend / XPath: Get text of CDATA element in mixed context

I've the following XML data:
<?xml version="1.0"?>
<products>
<product>
<ingredient id="1" weighting="1">
<![CDATA[Name of ingredient 1]]>
<blocked_search_terms><![CDATA[Term A, Term B, Term C]]></blocked_search_terms>
</ingredient>
<ingredient id="2" weighting="2">
<![CDATA[Name of ingredient 2]]>
<blocked_search_terms><![CDATA[Term E, Term F]]></blocked_search_terms>
</ingredient>
</product>
</products>
I am trying to get a list of all ingredient names via a tXmlMap component in Talend. The problem is, that I either get null a concatenated string of the ingredient names and blocked search terms, e.g.
Expression 1:
[xml.products:/products/product/ingredient]
Result 1:
"Name of ingredient 1Term A, Term B, Term C, Name of ingredient 2 Term E, Term F"
Expression 2:
[xml.products:/products/product/ingredient/text()]
Result 2:
"null, null"
The result that I want to achieve is:
"Name of ingredient 1, Name of ingredient 2"
What do I need to use to get it?
Since you cannot dive directly into the CDATA (see what xpath to select CDATA content when some childs exist) you should be able to achieve what you want with a indication of which element you want:
[xml.products:/products/product/ingredient[1]]

Is there a way to get the nodes whose child value is not equal to another child axis?

I have this XML and I'm trying to get all of the /root/ecommerce/cart/product whose branduid child is not present in /root/ecommerce/promo/promobrands/promobrand/branduid values.
<root>
<ecommerce>
<cart>
<product>
<branduid>value1</branduid>
</product>
<product>
<branduid>value2</branduid>
</product>
<product>
<branduid>value3</branduid>
</product>
<product>
<branduid>value4</branduid>
</product>
</cart>
<promo>
<promobrands>
<promobrand>
<branduid>value1</branduid>
</promobrand>
<promobrand>
<branduid>value3</branduid>
</promobrand>
</promobrands>
</promo>
</ecommerce>
</root>
So I only have to get the /root/ecommerce/product nodes whose branduid values are value2 and value4.
Is there a way in XPath to get this result?
Try below XPath expression and let me know the result
//cart/product[branduid[not(text()=//promobrand/branduid/text())]]
If you are using xpath 2.0 , below expression is a straightforward way to achieve:
//branduid except //promobrand/branduid
otherwise Andersson's answer is good.

Double node on Xpath for different values

How to write a Xpath for two attributes? e.g. i need to get a value of discount > 20% and also the same discount is greater than amount 200(without any link to base value)
You can combine constraints in predicates. E.g.:
from lxml import etree
doc = etree.XML("""<xml>
<items>
<item discount_perc="25" discount_value="250">Something</item>
</items>
</xml>
""")
doc.xpath('items/item[#discount_perc > 20 and #discount_value > 200]')
Will try to answer by a simple example. Imagine you have the following xml:
<?xml version="1.0"?>
<data>
<node value="10" weight="1">foo</node>
<node value="10" weight="2">bar</node>
</data>
Then use this query to select the first <node>'s text:
//node[#value="10" and #weight="1"]/text()
and this for the second:
//node[#value="10" and #weight="2"]/text()
Hope this helps.

Resources