Nested XPath Query - xpath

Yay. Accounting :|
I've got a set of accounting entries; they come in pairs -- 1 debit & 1 credit. The two entries share the same <SequenceID>. I want both entries if either of the entries references account 1111.
The (non-working) query I'm using (which I loosely based on [XPath. Select nodes based on an other, related node) is:
GLPostings/GLTransaction[GLPostings/GLTransaction[AccountCode = '1111']/SequenceID = SequenceID]
but I'm getting "empty sequence returned".
If I test part of the query: GLPostings/GLTransaction[AccountCode = '1111']/SequenceID I get multiple SequenceIDs as expected. So... how do I turn those multiple SequenceIDs into the set of nodes I'm after?
Here's some test data:
<?xml version="1.0" encoding="UTF-8"?>
<GLPostings>
<GLTransaction RowNumber="1">
<CRDR>Dr</CRDR>
<SequenceID>616</SequenceID>
<AccountCode>5531</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="2">
<CRDR>Cr</CRDR>
<SequenceID>616</SequenceID>
<AccountCode>2118</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="3">
<CRDR>Dr</CRDR>
<SequenceID>617</SequenceID>
<AccountCode>1111</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="4">
<CRDR>Cr</CRDR>
<SequenceID>617</SequenceID>
<AccountCode>1234</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="5">
<CRDR>Dr</CRDR>
<SequenceID>618</SequenceID>
<AccountCode>1231</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="6">
<CRDR>Cr</CRDR>
<SequenceID>618</SequenceID>
<AccountCode>1231</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="7">
<CRDR>Dr</CRDR>
<SequenceID>619</SequenceID>
<AccountCode>2341</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="8">
<CRDR>Cr</CRDR>
<SequenceID>619</SequenceID>
<AccountCode>1111</AccountCode>
</GLTransaction>
</GLPostings>
What I'd like to get back is:
<GLTransaction RowNumber="3">
<CRDR>Dr</CRDR>
<SequenceID>617</SequenceID>
<AccountCode>1111</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="4">
<CRDR>Cr</CRDR>
<SequenceID>617</SequenceID>
<AccountCode>1234</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="7">
<CRDR>Dr</CRDR>
<SequenceID>619</SequenceID>
<AccountCode>2341</AccountCode>
</GLTransaction>
<GLTransaction RowNumber="8">
<CRDR>Cr</CRDR>
<SequenceID>619</SequenceID>
<AccountCode>1111</AccountCode>
</GLTransaction>
Any hints greatly appreciated.
EDIT:
I can solve the problem so:
<xsl:for-each select="/GLPostings/GLTransaction[AccountCode = 1111']/SequenceID">
<xsl:variable name="Seq" select="."/>
<xsl:for-each select="/GLPostings/GLTransaction[SequenceID = $Seq]">
<xsl:call-template name="output-row">
</xsl:call-template>
</xsl:for-each>
</xsl:for-each>
But it seems kind of... dirty.

/GLPostings/GLTransaction[AccountCode=1111][SequenceID[.=following::SequenceID or .=preceding::SequenceID]]
will get GLTransaction nodes whose AccountCode child equals 1111 and whose SequenceID child is equal to preceding or following SequenceID nodes
/GLPostings/GLTransaction[SequenceID[.=following::SequenceID[./following-sibling::AccountCode=1111] or .=preceding::SequenceID[./following-sibling::AccountCode=1111]]]
will get GLTransaction nodes whose SequenceID child is equal to preceding or following SequenceID nodes that have an AccountCode following-sibling that is equals 1111
combine these xpaths into:
/GLPostings/GLTransaction[AccountCode=1111][SequenceID[.=following::SequenceID or .=preceding::SequenceID]]|/GLPostings/GLTransaction[SequenceID[.=following::SequenceID[./following-sibling::AccountCode=1111] or .=preceding::SequenceID[./following-sibling::AccountCode=1111]]]
will get you your 4 nodes (tested on xpathtester.com)

EDIT : Revisited XPath
//AccountCode[.="1111"]/parent::*|//AccountCode[following::AccountCode[1]="1111" and following::SequenceID[1]=preceding::SequenceID[1]]/parent::*|//AccountCode[preceding::AccountCode[1]="1111" and preceding::SequenceID[2]=preceding::SequenceID[1]]/parent::*
More secure option (in case you don't have two consecutive SequenceID) :
//AccountCode[.="1111"][following::SequenceID[1]=preceding::SequenceID[1]]/parent::*|//AccountCode[.="1111"][preceding::SequenceID[2]=preceding::SequenceID[1]]/parent::*|//AccountCode[following::AccountCode[1]="1111" and following::SequenceID[1]=preceding::SequenceID[1]]/parent::*|//AccountCode[preceding::AccountCode[1]="1111" and preceding::SequenceID[2]=preceding::SequenceID[1]]/parent::*

Related

SchemaTron rule to find invalid records

I am trying to validate the following XML using the Schematron rule.
XML:
<?xml version="1.0" encoding="utf-8"?>
<Biotic><Maul><Number>1</Number>
<Record><Code IDREF="a1"/>
<Detail><ItemID>1</ItemID></Detail>
<Detail><ItemID>3</ItemID></Detail>
</Record>
<Record><Code IDREF="b1"/>
<Detail><ItemID>3</ItemID></Detail>
<Detail><ItemID>4</ItemID></Detail>
</Record>
<Record><Code IDREF="b1"/>
<Detail><ItemID>4</ItemID></Detail>
<Detail><ItemID>6</ItemID></Detail>
</Record>
<Record><Code IDREF="c1"/>
<Detail><ItemID>5</ItemID></Detail>
<Detail><ItemID>5</ItemID></Detail>
</Record>
</Maul></Biotic>
And the check is "ItemID should be unique for the given Code within the given Maul."
So as per requirement Records with Code b1 is not valid because ItemId 4 exists in both records.
Similarly, record C1 is also not valid because c1 have two nodes with itemId 5.
Record a1 is valid, even ItemID 3 exists in the next record but the code is different.
Schematron rule I tried:
<?xml version="1.0" encoding="utf-8" ?><schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>Schematron validation rule</title>
<pattern id="P1">
<rule context="Maul/Record" id="R1">
<let name="a" value="//Detail/[./ItemID, ../Code/#IDREF]"/>
<let name="b" value="current()/Detail/[./ItemID, ../Code/#IDREF]"/>
<assert test="count($a[. = $b]) = count($b)">
ItemID should be unique for the given Code within the given Maul.
</assert>
</rule>
</pattern>
</schema>
The two let values seem problematic. They will each return a Detail element (and all of its content including attributes, child elements, and text nodes). I'm not sure what the code inside the predicates [./ItemID, ../Code/#IDREF] is going to, but I think it will return all Detail elements that have either a child ItemID element or a sibling Code element with an #IDREF attribute, regardless of what the values of ItemID or #IDREF are.
I think I would change the rule/#context to ItemID, so the assert would fail once for each ItemID that violates the constraint.
Here are a rule and assert that work correctly:
<?xml version="1.0" encoding="utf-8" ?><schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>Schematron validation rule</title>
<pattern id="P1">
<rule context="Maul/Record/Detail/ItemID" id="R1">
<assert test="count(ancestor::Maul/Record[Code/#IDREF = current()/ancestor::Record/Code/#IDREF]/Detail/ItemID[. = current()]) = 1">
ItemID should be unique for the given Code within the given Maul.
</assert>
</rule>
</pattern>
</schema>
The assert test finds, within the ancestor Maul, any Record that has a Code/#IDREF that equals the Code/#IDREF of the Record that the current ItemID is in. At minimum, it will find one Record (the one that the current ItemID is in). Then it looks for any Detail/ItemID within those Records that is equal to the current ItemID. It will find at least one (the current ItemID). The count function counts how many ItemIDs are found. If more than one is found, the assert fails.
Thanks for the reference to https://www.liquid-technologies.com/online-schematron-validator! I wasn't aware of that tool.

XSLT Function Return Type

Originally: **How to apply XPath query to a XML variable typed as element()* **
I wish to apply XPath queries to a variable passed to a function in XSLT 2.0.
Saxon returns this error:
Type error at char 6 in xsl:value-of/#select on line 13 column 50 of stackoverflow_test.xslt:
XTTE0780: Required item type of result of call to f:test is element(); supplied value has item type text()
This skeleton of a program is simplified but, by the end of its development, it is meant to pass an element tree to multiple XSLT functions. Each function will extract certain statistics and create reports from the tree.
When I say apply XPath queries, I mean I wish to have the query consider the base element in the variable... if you please... as if I could write {count(doc("My XSLT tree/element variable")/a[1])}.
Using Saxon HE 9.7.0.5.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:f="f:f">
<xsl:template match="/root">
<xsl:variable name="first" as="element()*">
<xsl:copy-of select="(./a[1])" />
</xsl:variable>
<html>
<xsl:copy-of select="f:test($first)" />
</html>
</xsl:template>
<xsl:function name="f:test" as="element()*">
<xsl:param name="frstElem" as="element()*" />
<xsl:value-of select="count($frstElem/a)" />
<!-- or any XPath expression -->
</xsl:function>
</xsl:stylesheet>
Some example data
<root>
<a>
<b>
<c>hi</c>
</b>
</a>
<a>
<b>
<c>hi</c>
</b>
</a>
</root>
Possibly related question: How to apply xpath in xsl:param on xml passed as input to xml
What you are doing is perfectly correct, except that you have passed an a element to the function, and the function is looking for an a child of this element, and with your sample data this will return an empty sequence.
If you want f:test() to return the number of a elements in the sequence that is the value of $frstElem, you can use something like
<xsl:value-of select="count($frstElem/self::a)" />
instead of using the (implicit) child:: axis.

xpath with multiple predicates equivalence

I was told that the following are not the same:
a[1][#attr="foo"]
a[#attr="foo"][1]
Can someone explain why that is the case?
Think of XPath expressions as defining a result set1 - a set of nodes that fulfil all the requirements stated in the XPath expression. The predicates of XPath expressions (the parts inside []) either have no effect on the result set or they incrementally narrow it.
Put another way, in the following expression:
//xyz[#abc="yes"]
[#abc="yes"] reduces the result set defined to the left of it, by //xyz.
Note that, as Michael Kay has suggested, all that is said below only applies to XPath expressions with at least one positional predicate. Positional predicates are either a number: [1] or evaluate to a number, or contain position() or last().
If no positional predicate is present, the order of predicates in XPath expressions is not significant.
Consider the following simple input document:
<root>
<a attr="other"/>
<a attr="foo"/>
<a attr="other"/>
<a attr="foo"/>
</root>
As you can see, a[#attr = 'foo'] is not the first child element of root. If we apply
//a[1]
to this document, this will of course result in
<a attr="other"/>
Now, crucially, if we add another predicate to the expression, like so:
//a[1][#attr="foo"]
Then, [#attr="foo"] can only influence the result set defined by //a[1] already. In this result set, there is no a[#attr="foo"] - and the final result is empty.
On the other hand, if we start out with
//a[#attr="foo"]
the result will be
<a attr="foo"/>
-----------------------
<a attr="foo"/>
and in this case, if we add a second predicate:
//a[#attr="foo"][1]
the second predicate [1] can narrow down the result set of //a[#attr="foo"] to only contain the first of those nodes.
If you know XSLT, you might find an XSLT (and XPath 2.0) proof of this helpful:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<result1>
<xsl:copy-of select="//a[1][#attr='foo']"/>
</result1>
<result2>
<xsl:copy-of select="//a[#attr='foo'][1]"/>
</result2>
</xsl:template>
</xsl:transform>
And the result will be
<result1/>
<result2>
<a attr="foo"/>
</result2>
1 Technically speaking, only XPath 1.0 calls this result a node-set. In XPath 2.0, all sets have become true sequences of nodes.

xpath with node(), how to express `node()[.//x]` condition?

I have a XPath that must match text and tags, except the tag <aa>; so,
./node()[name()!='aa']
is the correct xpath.
But it is insufficient for cases where tag aa is into the node, I need something like,
./node()[name()!='aa' and not(.//aa)]
but this xpath not works (!).
NOTE
I used
./*[not(self::aa or .//aa)] | ./text()
but it lost the original sequence order of the nodes. This problem is more evident when working with XSLT, example:
<xsl:for-each select="./*[not(self::aa or .//aa)] | ./text()">
<xsl:copy-of select="."/>
<xsl:for-each>
not works as expected (the order of nodes is not ensured). When using ./node() the order is always correct.
PS: with XSLT we have a solution using all the explained xpaths,
<xsl:for-each select="./node()[name()!='aa']">
<xsl:if test="not(.//aa)"><xsl:copy-of select="."/><xsl:if>
<xsl:for-each>
but the ideal/simplest one not works with the same result (when processing big and complex inputs),
<xsl:copy-of select="*[not(self::aa or .//aa)] | ./text()"/>
I'm imagining your file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<aa/>
<b>
<aa/>
</b>
<c>
<b>
<aa/>
</b>
</c>
<d/>
<e>
<b/>
</e>
</root>
Then the expression
//node()[not(descendant-or-self::aa)]
returns all nodes (including the whitespace text nodes) that are not themselves an <aa> element or have an <aa> descendant. Children of <aa> are matched as well.
You'll probably want to do something like
<xsl:copy-of select="node()[not(descendant-or-self::aa)]"/>

how to for every parent node select every not first child node in a tree with multiple parent nodes

His,
I think I've got a tricky questions for XPath experts. There is a node structure like this:
A(1)-|
|-B(1)
|-B(2)
|-B(3)
A(2)-|
|-B(2.1)
|-B(2.2)
|-B(2.3)
...
How to, with a single XPath-expression, extract only the following nodes
A(1)-|
|-B(2)
|-B(3)
A(2)-|
|-B(2.2)
|-B(2.3)
...
That is for every parent node its first child element should be excluded.
I tried A/B[position() != 1] but this would filter out only B(1.1) and select B(2.1).
Thanks
This XPath expression (no preceding-sibling:: axis used):
/*/a/*[not(position()=1)]
when applied on this XML document:
<t>
<a>
<b11/>
<b12/>
<b13/>
</a>
<a>
<b21/>
<b22/>
<b23/>
</a>
</t>
selects the wanted nodes:
<b12 />
<b13 />
<b22 />
<b23 />
This can be verified with this XSLT transformation, producing the above result:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="/*/a/*[not(position()=1)]"/>
</xsl:template>
</xsl:stylesheet>
Tricky. You could select nodes that have preceding siblings:
A/B[preceding-sibling::*]
This will fail for the first element and succeed for the rest.

Resources