Talend / XPath: Get text of CDATA element in mixed context - xpath

I've the following XML data:
<?xml version="1.0"?>
<products>
<product>
<ingredient id="1" weighting="1">
<![CDATA[Name of ingredient 1]]>
<blocked_search_terms><![CDATA[Term A, Term B, Term C]]></blocked_search_terms>
</ingredient>
<ingredient id="2" weighting="2">
<![CDATA[Name of ingredient 2]]>
<blocked_search_terms><![CDATA[Term E, Term F]]></blocked_search_terms>
</ingredient>
</product>
</products>
I am trying to get a list of all ingredient names via a tXmlMap component in Talend. The problem is, that I either get null a concatenated string of the ingredient names and blocked search terms, e.g.
Expression 1:
[xml.products:/products/product/ingredient]
Result 1:
"Name of ingredient 1Term A, Term B, Term C, Name of ingredient 2 Term E, Term F"
Expression 2:
[xml.products:/products/product/ingredient/text()]
Result 2:
"null, null"
The result that I want to achieve is:
"Name of ingredient 1, Name of ingredient 2"
What do I need to use to get it?

Since you cannot dive directly into the CDATA (see what xpath to select CDATA content when some childs exist) you should be able to achieve what you want with a indication of which element you want:
[xml.products:/products/product/ingredient[1]]

Related

Is it possible to select the properties of a node a XPATH?

I have an XML of the form:
<articleslist>
<articles>
<originalId>507948</originalId>
<title>Hogan Lovells Training Contract</title>
<slug>hogan-lovells-training-contract</slug>
<metaTitle>Hogan Lovells Training Contract</metaTitle>
<metaDescription>Find out about the Hogan Lovells Training Contract and Application Process</metaDescription>
<language>en</language>
<disableAds>false</disableAds>
<shortUrl>false</shortUrl>
<category_slug>law</category_slug>
<subcategory_slug>industry</subcategory_slug>
<updatedAt>2021-03-15T18:38:51.058+00:00</updatedAt>
<createdAt>2018-11-29T06:42:51.665+00:00</createdAt>
</articles>
</articlelist>
I'm able to select the row values with the XPATH //articles.
How can I select the child properties of articles (i.e. the column headings), so I get back a list of the form:
originalId
title
slug
etc...
Depends on your XPath version.
In XPath 2.0 it's simply //articles/*/name()
In 1.0 it's not possible because there's no such data type as a "sequence of strings". You would have to return the set of elements as //articles/*, and then extract their names in the calling program.

Xpath: return all nodes that match any one of the conditions

I am trying to fetch two nodes from XML as combined result using OR condition.
Nodes in XML where name = John or name="jim",both should be returned . So basically I expect following result:
<person name="John"></person>
<person name="Jim"></person>
I have tried XPath function * ///person[#name="John"] or ///person[#name="Jim"]*
but it gives me only one node.
How to construct Xpath function in this case ?
regards,
Venky
I would use a predicate person[#name = ('John', 'Jim')] if we assume Saxon means a Saxon 9 version where XPath 2 or 3 is supported. Of course the right place for your or expression would be inside the square brackets person[#name = 'Jim' or #name = 'John'].

How to get parent element with attribute using xpath

I have posted sample XML and expected output kindly help to get the result.
Sample XML
<root>
<A id="1">
<B id="2"/>
<C id="2"/>
</A>
</root>
Expected output:
<A id="1"/>
You can formulate this query in several ways:
Find elements that have a matching attribute, only ascending all the time:
//*[#id=1]
Find the attribute, then ascend a step:
//#id[.=1]/..
Use the fn:id($id) function, given the document is validated and the ID-attribute is defined as such:
/id('1')
I think it's not possible what you're after. There's no way of selecting a node without its children using XPATH (meaning that it'd always return the nodes B and C in your case)
You could achieve this using XQuery, I'm not sure if this is what you want but here's an example where you create a new node based on an existing node that's stored in the $doc variable.
declare variable $doc := <root><A id="1"><B id="2"/><C id="2"/></A></root>;
element {fn:node-name($doc/*)} {$doc/*/#*}
The above returns <A id="1"></A>.
is that what you are looking for?
//*[#id='1']/parent::* , similar to //*[#id='1']/../
if you want to verify that parent is root :
//*[#id='1']/parent::root
https://en.wikipedia.org/wiki/XPath
if you need not just parent - but previous element with some attribute: Read about Axis specifiers and use Axis "ancestor::" =)

XPath Expression referencing a node

I am trying to reference a node in an expression. Take this simple example:
<?xml version="1.0" encoding="UTF-8" ?>
<homelist>
<homes>
<home>
<hname>house</hname>
<location>hell</location>
<url>wee</url>
<cID>1234</cID>
</home>
</homes>
<contacts>
<contactdetails cID="1234">
<cname>John Smith</cname>
<phone>0123234</phone>
<email>test#gmail.com</email>
</contactdetails>
</contacts>
</homelist>
I basically want to select nodes if it's value is somewhere else in the tree.
For example, I want to display the url of homes that have cID of John Smith. I tried this but it doesn't work, what is wrong with it:
homelist/homes/home[ancestor::homelist/contacts/contactdetails[cname="John Smith"]/url
"/homelist/homes/home[cID = /homelist/contacts/contactdetails[cname='John Smith']/#cID]/url"
You want to find the <home> whose <cID> child's text content equals that of the cID= attribute of the <contactdetails> whose <cname> contains 'John Smith', then return its <url> child.
Note that I've written this as an absolute path, from the root, since you didn't tell us what the context node was going to be for this XPath.
There are certainly other ways of writing the same concept; this is just the first one that occurred to me offhand.
If you preferred to use ancestor or parent, you could say
"/homelist/homes/home[cID = ancestor::homelist/contacts/contactdetails[cname='John Smith']/#cID]/url"

Selecting a XML node with LINQ, and modifying

I've got the following XML:
<Config>
<Book>
<Name> Book Name #1 </Name>
<Available In>
<Country>US</Country>
<Country>Canada</Country>
</Available In>
</Book>
</Config>
I need to find all instances of Book which are available in a specific country, and then introduce a node underneath "Available In". My selection statement fails anytime I add the where statement:
XElement xmlFile = XElement.Load(xmlFileLocation);
var q = (from c in xmlFile.Elements(“Book”)
where c.Elements(Country).Value == "Canada"
select c;
.Value can't be resolved, and toString give me the entire subnode in stringform. I need to select all books in a particular country so that I can then update them all to include a new locale node, ex:
<Config>
<Book>
<Name> Book Name #1 </Name>
<Available In>
<Country>US</Country>
<Country>Canada</Country>
</Available In>
<LocaleIDs>
<LocalID> 3066 </LocaleID>
<LocaleIDs>
</Book>
</Config>
Thanks for your help!
You're trying to use Value on the result of calling Elements which returns a sequence of elements. That's not going to work - it doesn't make any sense. You want to call it on a single element at a time.
Additionally, you're trying to look for direct children of Book, which ignores the Available In element, which isn't even a valid element name...
I suspect you want something like:
var query = xmlFile.Elements("Book")
.Where(x => x.Descendants("Country")
.Any(x => (string) x == "Canada"));
In other words, find Book elements where any of the descendant Country elements has a text value of "Canada".
You'll still need to fix your XML to use valid element names though...

Resources