There seems to be a difference in how XPath and XQuery selects attributes.
Here is a toy example stolen from W3Schools:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
I want to retrieve the values of the lang attributes.
So, naively I do:
//title/#lang which works perfectly ... on an XPath evaluator but not on an XQuery evaluator.
What I need to know: How should I write my XPath expression to work on an XQuery evaluator?
What I want to know: What is going on?!
Here's the TL part:
I'm on a legacy platform without these capabilities so I send the XML and my query expression to an external service, which I believe is a Saxon-based XQuery evaluator. My syntax works as expected on CodeBeautify's XPath Tester.
I've also verified this difference on xpathtester.com: It works as expected in XPath mode but not in XQuery mode. (Note: link is not encrypted.).
xpathtester.com returns the following error message: ERROR - Cannot create an attribute node (lang) whose parent is a document node
The expression //title/#lang is valid under both XPath and XQuery, and returns a sequence of two attribute nodes.
Where you are seeing differences is in how different XPath and XQuery clients handle a result consisting of two attribute nodes.
If the tool tries to serialize the result as XML, it's going to fail, because XML serialization tries to construct a document node and attach the attributes to the document.
So you need to look at what options your XPath or XQuery tool provides for displaying the results.
After really having understood what #Martin Honnen and #Aaron were talking about and some deep dives in tutorials, I think I've come up with a solution that I'm happy with:
string-join((for $l in //title/#lang return string($l)) , ',')
https://xqueryfiddle.liberty-development.net/3Nzd8bR/2
Related
I'm trying to write a xpath query to pull data from an xml document. Unfortunately the document has a xml fragment embedded in it that seems to have lost its encoding (< has become < > has become > etc).
An example of the xml doc is:
<OrderData xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Id>1</Id>
<RawData><?xml version="1.0" encoding="UTF-16"?>
<Data xmlns="nnn-mmm-com">
<Order Action="Remove" >
<Instrument InstID="1"></Order><
/Data>
</RawData>
</OrderData>
I'm trying to extract the following values:
Id
Action
InstID
Getting the Id is no problem, but drilling into the fragment inside RawData is proving beyond me. Any pointers gratefully received
(I'm planning to execute the xpath query in Hive using Hive-XML-SerDe which is xpath 1.0)
Thanks
With XPath 3.1 you can parse the embedded XML document and turn it into a node tree, which you can then process using path expressions. So:
/OrderData/RawData/parse-xml(.)/*:Data/*:Instrument/#InstID
should get what you want.
You didn't say what version of XPath your library supports, which usually means that it only supports 1.0, so you may need to find a different library.
Need some help.I am using XML task in SSIS.
In the below example, I am trying to find the ID's of all those books whos price >20 .
If I use //book[price > '20']/self::*/attribute::id I am able to get the values like
bk101bk108bk109 . How can i get the solution like
bk101
bk108
bk109
What can be the solution for this ? Is there a better way to get the result than what I am trying ?
The Xpath operation in XML task is set to "Values"
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
You can specify the index:
//book[price > '20'][1]/#id
//book[price > '20'][2]/#id
//book[price > '20'][3]/#id
instead of using XML task, we can use Foreach NodeList Enumerator and within it a script task to
to enter the values. I just did it and it worked.
1) if you need - watch this video how to handle XML in SSIS: http://www.youtube.com/watch?v=PXDexFNj44M
2) this xPath will return id's of books with price greater than 20:
//book/price[text() > 20]/../#id
I am reading about and testing XQuery and like test tools I use BaseX(www.basex.org) and saxon-HE 9.4.0.6N.
For the following simple XML file - no schema attached to the sample.xml:
<rootab>
<l1>
<items p="a">
<itema x1="10" id="abc">testa</itema>
<itemb x1="10" id="dfe">testb</itemb>
<itemc x1="10" id="jgh">testc</itemc>
</items>
</l1>
<l2>
<items p="b">
<itema x1="10" xidref="abc">testa</itema>
<itemc x1="10" xidref="jgh">testc</itemc>
<itemd x1="10" xidref="abc">testA101</itemd>
<iteme x1="10" xidref="jgh">testB202</iteme>
</items>
</l2>
</rootab>
In Basex_GUI if I enter the following XPath expression: //idref("abc")/..
the result is: <itema x1="10" xidref="abc">testa</itema>
In BaseX_GUI if I add the simple XQuery expression:
for $x in doc('sample.xml')//idref("abc")/..
return <aaa>{$x}</aaa>
the result is:
<aaa>
<itema x1="10" xidref="abc">testa</itema>
</aaa>
<aaa>
<itemd x1="10" xidref="abc">testA101</itemd>
</aaa>
q1) Why the XPath expression returned only one node? I expected two...
In Saxon, by using the below xql file:
<test>
{
doc('sample.xml')//idref("abc")/..
}
</test>
or the XQuery expression , I receive the same result by running the command query sample.xql:
<?xml version="1.0" encoding="UTF-8"?><test/>
q2)what is wrong in my Saxon test ?
thank you in advance for your help!
Basically, idref() is sensitive to DTD validation - it recognizes attributes declared as type IDREF in your DTD.
You haven't shown us your DTD, and more importantly, you haven't shown how the input to the queries is supplied. There are many ways of constructing input in which the "IDREF-ness" of an attribute is lost - for example, going via a DOM. Even when you use the doc() function in Saxon, the way the input tree is built depends on many factors including configuration options and your URIResolver.
I see you are using .NET. When Saxon uses the Microsoft XML parser on .NET, it doesn't know which attributes are IDs and IDREFs, so the id() and idref() functions don't work (the MS parser simply doesn't supply this information). You therefore need to use the JAXP parser (Xerces) that comes with the Saxon product. I think this is the default these days.
So not really an answer, but hopefully some background that explains some of the things that can go wrong.
I want to query nodes from a XOM document which contains certain value but case insensitive. Something like this:
doc.query('/root/book[contains(.,"case-insentive-string")]')
But it contains is case sensitive.
I tried to use regexes, but it is
only XPATH2.0 and XOM does not seem
to support it.
I tried
contains(translate(."ABCEDF...","abcdef..."),"case-insentive-string")]'
failed too.
I tried to match
subnodes and read parent attributes
using getParent, but there is no
method to read parents attributes.
Any suggestions ?
If you are using XOM, then you can use Saxon to run XPath or XQuery against it. That gives you the ability to use the greatly increased function library in XPath 2.0, which includes functions lower-case() and upper-case(), and also the ability (though in a somewhat product-specific way) to choose your own collations for use with functions such as contains() - which means you can do matching that ignores accents as well as case, for example.
2.I tried contains(translate(."ABCEDF...","abcdef..."),"case-insentive-string")]'
failed too.
The proper way to write this is:
/root/book[contains(translate(., $vUpper, $vLower),
translate($vCaseInsentiveString, $vUpper, $vLower)
)
]
where $vUpper and $vLower are defined as (should be substituted by) the strings:
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
and
'abcdefghijklmnopqrstuvwxyz'
and $vCaseInsentiveString is defined as (should be substituted by) the specific case-insensitive string.
For example, given the following XML document:
<authors>
<author>
<name>Victor Hugo & Co.</name>
<nationality>French</nationality>
</author>
<author period="classical" category="children">
<name>J.K.Rollings</name>
<nationality>British</nationality>
</author>
<author period="classical">
<name>Sophocles</name>
<nationality>Greek</nationality>
</author>
<author>
<name>Leo Tolstoy</name>
<nationality>Russian</nationality>
</author>
<author>
<name>Alexander Pushkin</name>
<nationality>Russian</nationality>
</author>
<author period="classical">
<name>Plato</name>
<nationality>Greek</nationality>
</author>
</authors>
the following XPath expression (substitute the variables by the corresponding strings):
/*/author/name
[contains(translate(., $vUpper, $vLower),
translate('lEo', $vUpper, $vLower)
)
]
selects this element:
<name>Leo Tolstoy</name>
Explanation: Both arguments of the contains() function are converted to lower-case, and then the comparison is performed.
Admittedly, I'm a Nokogiri newbie and I must be missing something...
I'm simply trying to print the author > name node out of this XML:
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:gd="http://schemas.google.com/g/2005" xmlns:docs="http://schemas.google.com/docs/2007" xmlns="http://www.w3.org/2005/Atom" gd:etag="">
<category term="http://schemas.google.com/docs/2007#document" scheme="http://schemas.google.com/g/2005#kind"/>
<author>
<name>Matt</name>
<email>Darby</email>
</author>
<title>Title</title>
</entry>
I'm trying to using this, but it prints nothing. Seemingly no node (even '*') returns nothing.
Nokogiri::XML(#xml_string).xpath("//author/name").each do |node|
puts node
end
Alejandro already answered this in his comment (+1) but I'm adding this answer too because he left out the Nokogiri code.
Selecting elements in some namespace using Nokogiri with XPath
The elements you are trying to select are in the default namespace, which in this case seems to be http://www.w3.org/2005/Atom. Note the xmlns=" attribute on entry element. Your XPath expression instead matches elements that are not in any namespace. This is the reason why your code worked without namespaces
You need to define a namespace context for your XPath expression and point your XPath steps to match elements in that namespace. AFAIK there should be few different ways to accomplish this with Nokogiri, one of them is shown below
xml.xpath("//a:author/a:name", {"a" => "http://www.w3.org/2005/Atom"})
Note that here we define a namespace-to-prefix mapping and use this prefix (a) in the XPath expression.
For some reason, using remove_namespaces! makes the above bit work as expected.
xml = Nokogiri::XML(#xml_string)
xml.remove_namespaces!
xml.xpath("//author/name").each do |node|
puts node.text
end
=> "Matt"