I am reading about and testing XQuery and like test tools I use BaseX(www.basex.org) and saxon-HE 9.4.0.6N.
For the following simple XML file - no schema attached to the sample.xml:
<rootab>
<l1>
<items p="a">
<itema x1="10" id="abc">testa</itema>
<itemb x1="10" id="dfe">testb</itemb>
<itemc x1="10" id="jgh">testc</itemc>
</items>
</l1>
<l2>
<items p="b">
<itema x1="10" xidref="abc">testa</itema>
<itemc x1="10" xidref="jgh">testc</itemc>
<itemd x1="10" xidref="abc">testA101</itemd>
<iteme x1="10" xidref="jgh">testB202</iteme>
</items>
</l2>
</rootab>
In Basex_GUI if I enter the following XPath expression: //idref("abc")/..
the result is: <itema x1="10" xidref="abc">testa</itema>
In BaseX_GUI if I add the simple XQuery expression:
for $x in doc('sample.xml')//idref("abc")/..
return <aaa>{$x}</aaa>
the result is:
<aaa>
<itema x1="10" xidref="abc">testa</itema>
</aaa>
<aaa>
<itemd x1="10" xidref="abc">testA101</itemd>
</aaa>
q1) Why the XPath expression returned only one node? I expected two...
In Saxon, by using the below xql file:
<test>
{
doc('sample.xml')//idref("abc")/..
}
</test>
or the XQuery expression , I receive the same result by running the command query sample.xql:
<?xml version="1.0" encoding="UTF-8"?><test/>
q2)what is wrong in my Saxon test ?
thank you in advance for your help!
Basically, idref() is sensitive to DTD validation - it recognizes attributes declared as type IDREF in your DTD.
You haven't shown us your DTD, and more importantly, you haven't shown how the input to the queries is supplied. There are many ways of constructing input in which the "IDREF-ness" of an attribute is lost - for example, going via a DOM. Even when you use the doc() function in Saxon, the way the input tree is built depends on many factors including configuration options and your URIResolver.
I see you are using .NET. When Saxon uses the Microsoft XML parser on .NET, it doesn't know which attributes are IDs and IDREFs, so the id() and idref() functions don't work (the MS parser simply doesn't supply this information). You therefore need to use the JAXP parser (Xerces) that comes with the Saxon product. I think this is the default these days.
So not really an answer, but hopefully some background that explains some of the things that can go wrong.
Related
There seems to be a difference in how XPath and XQuery selects attributes.
Here is a toy example stolen from W3Schools:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
I want to retrieve the values of the lang attributes.
So, naively I do:
//title/#lang which works perfectly ... on an XPath evaluator but not on an XQuery evaluator.
What I need to know: How should I write my XPath expression to work on an XQuery evaluator?
What I want to know: What is going on?!
Here's the TL part:
I'm on a legacy platform without these capabilities so I send the XML and my query expression to an external service, which I believe is a Saxon-based XQuery evaluator. My syntax works as expected on CodeBeautify's XPath Tester.
I've also verified this difference on xpathtester.com: It works as expected in XPath mode but not in XQuery mode. (Note: link is not encrypted.).
xpathtester.com returns the following error message: ERROR - Cannot create an attribute node (lang) whose parent is a document node
The expression //title/#lang is valid under both XPath and XQuery, and returns a sequence of two attribute nodes.
Where you are seeing differences is in how different XPath and XQuery clients handle a result consisting of two attribute nodes.
If the tool tries to serialize the result as XML, it's going to fail, because XML serialization tries to construct a document node and attach the attributes to the document.
So you need to look at what options your XPath or XQuery tool provides for displaying the results.
After really having understood what #Martin Honnen and #Aaron were talking about and some deep dives in tutorials, I think I've come up with a solution that I'm happy with:
string-join((for $l in //title/#lang return string($l)) , ',')
https://xqueryfiddle.liberty-development.net/3Nzd8bR/2
I'm trying to write a xpath query to pull data from an xml document. Unfortunately the document has a xml fragment embedded in it that seems to have lost its encoding (< has become < > has become > etc).
An example of the xml doc is:
<OrderData xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Id>1</Id>
<RawData><?xml version="1.0" encoding="UTF-16"?>
<Data xmlns="nnn-mmm-com">
<Order Action="Remove" >
<Instrument InstID="1"></Order><
/Data>
</RawData>
</OrderData>
I'm trying to extract the following values:
Id
Action
InstID
Getting the Id is no problem, but drilling into the fragment inside RawData is proving beyond me. Any pointers gratefully received
(I'm planning to execute the xpath query in Hive using Hive-XML-SerDe which is xpath 1.0)
Thanks
With XPath 3.1 you can parse the embedded XML document and turn it into a node tree, which you can then process using path expressions. So:
/OrderData/RawData/parse-xml(.)/*:Data/*:Instrument/#InstID
should get what you want.
You didn't say what version of XPath your library supports, which usually means that it only supports 1.0, so you may need to find a different library.
I want to use XPath to select the sub tree containing the <name>-tag with "ABC" and not the other one from the following xml. Is this possible? And as a minor question, which keywords would I use to find something like that over Google (e.g. for selecting the sub tree by an attribute I would have the terminology for)?
<root>
<operation>
<name>ABC</name>
<description>Description 1</description>
</operation>
<operation>
<name>DEF</name>
<description>Description 2</description>
</operation>
</root>
Use:
/*/operation[name='ABC']
For your second question: I strongly recommend not to rely on online sources (there are some that aren't so good) but to read a good book on XPath.
See some resources listed here:
https://stackoverflow.com/questions/339930/any-good-xslt-tutorial-book-blog-site-online/341589#341589
For your first question, I think a more accurate way to do it would be://operation[./name[text()='ABC']].And according to this , we can also make it://operation[./name[text()[.='ABC']]]
This is my first post here. I have just started working with Ruby and am using REXML for some XML handling. I present a small sample of my xml file here:
<record>
<header>
<identifier>oai:lcoa1.loc.gov:loc.gmd/g3195.ct000379</identifier>
<datestamp>2004-08-13T15:32:50Z</datestamp>
<setSpec>gmd</setSpec>
</header>
<metadata>
<titleInfo>
<title>Meet-konstige vertoning van de grote en merk-waardige zons-verduistering</title>
</titleInfo>
</metadata>
</record>
My objective is to match the last numerical value in the tag with a list of values that I have from an array. I have achieved this with the following code snippet:
ids = XPath.match(xmldoc, "//identifier[text()='oai:lcoa1.loc.gov:loc.gmd/"+mapid+"']")
Having got a particular identifier that I wish to investigate, now I want to go back to and select and then select to get the value in the node for that particular identifier.
I have looked at the XPath tutorials and expressions and many of the related questions on this website as well and learnt about axes and the different concepts such as ancestor/following sibling etc. However, I am really confused and cannot figure this out easily.
I was wondering if I could get any help or if someone could point me towards an online resource "easy" to read.
Thank you.
UPDATE:
I have been trying various combinations of code such as:
idss = XPath.match(xmldoc, "//identifier[text()='oai:lcoa1.loc.gov:loc.gmd/"+mapid+"']/parent::header/following-sibling::metadata/child::mods/child::titleInfo/child::title")
The code compiles but does not output anything. I am wondering what I am doing so wrong.
Here's a way to accomplish it using XPath, then going up to the record, then XPath to get the title:
require 'rexml/document'
include REXML
xml=<<END
<record>
<header>
<identifier>oai:lcoa1.loc.gov:loc.gmd/g3195.ct000379</identifier>
<datestamp>2004-08-13T15:32:50Z</datestamp>
<setSpec>gmd</setSpec>
</header>
<metadata>
<titleInfo>
<title>Meet-konstige</title>
</titleInfo>
</metadata>
</record>
END
doc=Document.new(xml)
mapid = "ct000379"
text = "oai:lcoa1.loc.gov:loc.gmd/g3195.#{mapid}"
identifier_nodes = XPath.match(doc, "//identifier[text()='#{text}']")
record_node = identifier_nodes.first.parent.parent
record_node.elements['metadata/titleInfo/title'].text
=> "Meet-konstig"
I want to query nodes from a XOM document which contains certain value but case insensitive. Something like this:
doc.query('/root/book[contains(.,"case-insentive-string")]')
But it contains is case sensitive.
I tried to use regexes, but it is
only XPATH2.0 and XOM does not seem
to support it.
I tried
contains(translate(."ABCEDF...","abcdef..."),"case-insentive-string")]'
failed too.
I tried to match
subnodes and read parent attributes
using getParent, but there is no
method to read parents attributes.
Any suggestions ?
If you are using XOM, then you can use Saxon to run XPath or XQuery against it. That gives you the ability to use the greatly increased function library in XPath 2.0, which includes functions lower-case() and upper-case(), and also the ability (though in a somewhat product-specific way) to choose your own collations for use with functions such as contains() - which means you can do matching that ignores accents as well as case, for example.
2.I tried contains(translate(."ABCEDF...","abcdef..."),"case-insentive-string")]'
failed too.
The proper way to write this is:
/root/book[contains(translate(., $vUpper, $vLower),
translate($vCaseInsentiveString, $vUpper, $vLower)
)
]
where $vUpper and $vLower are defined as (should be substituted by) the strings:
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
and
'abcdefghijklmnopqrstuvwxyz'
and $vCaseInsentiveString is defined as (should be substituted by) the specific case-insensitive string.
For example, given the following XML document:
<authors>
<author>
<name>Victor Hugo & Co.</name>
<nationality>French</nationality>
</author>
<author period="classical" category="children">
<name>J.K.Rollings</name>
<nationality>British</nationality>
</author>
<author period="classical">
<name>Sophocles</name>
<nationality>Greek</nationality>
</author>
<author>
<name>Leo Tolstoy</name>
<nationality>Russian</nationality>
</author>
<author>
<name>Alexander Pushkin</name>
<nationality>Russian</nationality>
</author>
<author period="classical">
<name>Plato</name>
<nationality>Greek</nationality>
</author>
</authors>
the following XPath expression (substitute the variables by the corresponding strings):
/*/author/name
[contains(translate(., $vUpper, $vLower),
translate('lEo', $vUpper, $vLower)
)
]
selects this element:
<name>Leo Tolstoy</name>
Explanation: Both arguments of the contains() function are converted to lower-case, and then the comparison is performed.