SchemaTron rule to find invalid records - xpath

I am trying to validate the following XML using the Schematron rule.
XML:
<?xml version="1.0" encoding="utf-8"?>
<Biotic><Maul><Number>1</Number>
<Record><Code IDREF="a1"/>
<Detail><ItemID>1</ItemID></Detail>
<Detail><ItemID>3</ItemID></Detail>
</Record>
<Record><Code IDREF="b1"/>
<Detail><ItemID>3</ItemID></Detail>
<Detail><ItemID>4</ItemID></Detail>
</Record>
<Record><Code IDREF="b1"/>
<Detail><ItemID>4</ItemID></Detail>
<Detail><ItemID>6</ItemID></Detail>
</Record>
<Record><Code IDREF="c1"/>
<Detail><ItemID>5</ItemID></Detail>
<Detail><ItemID>5</ItemID></Detail>
</Record>
</Maul></Biotic>
And the check is "ItemID should be unique for the given Code within the given Maul."
So as per requirement Records with Code b1 is not valid because ItemId 4 exists in both records.
Similarly, record C1 is also not valid because c1 have two nodes with itemId 5.
Record a1 is valid, even ItemID 3 exists in the next record but the code is different.
Schematron rule I tried:
<?xml version="1.0" encoding="utf-8" ?><schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>Schematron validation rule</title>
<pattern id="P1">
<rule context="Maul/Record" id="R1">
<let name="a" value="//Detail/[./ItemID, ../Code/#IDREF]"/>
<let name="b" value="current()/Detail/[./ItemID, ../Code/#IDREF]"/>
<assert test="count($a[. = $b]) = count($b)">
ItemID should be unique for the given Code within the given Maul.
</assert>
</rule>
</pattern>
</schema>

The two let values seem problematic. They will each return a Detail element (and all of its content including attributes, child elements, and text nodes). I'm not sure what the code inside the predicates [./ItemID, ../Code/#IDREF] is going to, but I think it will return all Detail elements that have either a child ItemID element or a sibling Code element with an #IDREF attribute, regardless of what the values of ItemID or #IDREF are.
I think I would change the rule/#context to ItemID, so the assert would fail once for each ItemID that violates the constraint.
Here are a rule and assert that work correctly:
<?xml version="1.0" encoding="utf-8" ?><schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>Schematron validation rule</title>
<pattern id="P1">
<rule context="Maul/Record/Detail/ItemID" id="R1">
<assert test="count(ancestor::Maul/Record[Code/#IDREF = current()/ancestor::Record/Code/#IDREF]/Detail/ItemID[. = current()]) = 1">
ItemID should be unique for the given Code within the given Maul.
</assert>
</rule>
</pattern>
</schema>
The assert test finds, within the ancestor Maul, any Record that has a Code/#IDREF that equals the Code/#IDREF of the Record that the current ItemID is in. At minimum, it will find one Record (the one that the current ItemID is in). Then it looks for any Detail/ItemID within those Records that is equal to the current ItemID. It will find at least one (the current ItemID). The count function counts how many ItemIDs are found. If more than one is found, the assert fails.
Thanks for the reference to https://www.liquid-technologies.com/online-schematron-validator! I wasn't aware of that tool.

Related

xpath 1.0 set the value of non existent attribute

I have a xml document
<?xml version="1.0" encoding="UTF-16"?>
<APIDATA xmlns="api-com">
<ORDER EngineID="1" OrderID="66" OtherInfo="100"><INSTSPECIFIER InstID="27" SeqID="17"/>
</ORDER>
<ORDER EngineID="2" OrderID="67" OtherInfo="200"><INSTSPECIFIER InstID="28" SeqID="18"/>
</ORDER>
<ORDER EngineID="3" OrderID="68"><INSTSPECIFIER InstID="29" SeqID="19"/></ORDER>
</APIDATA>
How do i get the value of OtherInfo attribute using xpath
but when it does not exist i want Null to be returned
If i used the following xpath /APIDATA/ORDER/#OtherInfo i get the output as
100
200
But since for OrderID 68 the OtherInfo is missing i want the output to be
100
200
0
There is a post here which is close to my solution but i somehow cant get it to work
Can I create a value for a missing tag in XPath?
Unfortunately, the approach in the answer to the linked question only work if there is only one value to be returned from a given XML document (see below for a demo on this point). So given the XML sample posted in this question, short answer would be this can't be done using pure XPath 1.0.
If the XPath is used within a programming language, one possible approach would be, to use XPath expression that always return a value, for example /APIDATA/ORDER. Then, for each <ORDER> element returned, usually there are plenty of options to get OtherInfo attribute and provide default value in case the attribute is not found.
Applying the linked post approach to your case would results in the following XPath expression :
substring(concat("0", //ORDER[3]/#OtherInfo), 2 * number(boolean(//ORDER[3]/#OtherInfo)))
The XPath successfully return 0 when applied to the 3rd <ORDER> element which doesn't have attribute OtherInfo, see the demo* : xpathtester, xpatheval
default namespace has been removed in the demo for the sake of simplifying the XPath
Implementation of the same approach in XSLT 1.0, as requested :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:d="api-com">
<xsl:output method="text" indent="yes"/>
<xsl:template match="d:ORDER">
<xsl:value-of select="substring(concat('0', #OtherInfo), 2 * number(boolean(#OtherInfo)))"/>
</xsl:template>
</xsl:stylesheet>
output :
100
200
0
Demo : xpathtester

XPath Expression referencing a node

I am trying to reference a node in an expression. Take this simple example:
<?xml version="1.0" encoding="UTF-8" ?>
<homelist>
<homes>
<home>
<hname>house</hname>
<location>hell</location>
<url>wee</url>
<cID>1234</cID>
</home>
</homes>
<contacts>
<contactdetails cID="1234">
<cname>John Smith</cname>
<phone>0123234</phone>
<email>test#gmail.com</email>
</contactdetails>
</contacts>
</homelist>
I basically want to select nodes if it's value is somewhere else in the tree.
For example, I want to display the url of homes that have cID of John Smith. I tried this but it doesn't work, what is wrong with it:
homelist/homes/home[ancestor::homelist/contacts/contactdetails[cname="John Smith"]/url
"/homelist/homes/home[cID = /homelist/contacts/contactdetails[cname='John Smith']/#cID]/url"
You want to find the <home> whose <cID> child's text content equals that of the cID= attribute of the <contactdetails> whose <cname> contains 'John Smith', then return its <url> child.
Note that I've written this as an absolute path, from the root, since you didn't tell us what the context node was going to be for this XPath.
There are certainly other ways of writing the same concept; this is just the first one that occurred to me offhand.
If you preferred to use ancestor or parent, you could say
"/homelist/homes/home[cID = ancestor::homelist/contacts/contactdetails[cname='John Smith']/#cID]/url"

Selecting a XML node with LINQ, and modifying

I've got the following XML:
<Config>
<Book>
<Name> Book Name #1 </Name>
<Available In>
<Country>US</Country>
<Country>Canada</Country>
</Available In>
</Book>
</Config>
I need to find all instances of Book which are available in a specific country, and then introduce a node underneath "Available In". My selection statement fails anytime I add the where statement:
XElement xmlFile = XElement.Load(xmlFileLocation);
var q = (from c in xmlFile.Elements(“Book”)
where c.Elements(Country).Value == "Canada"
select c;
.Value can't be resolved, and toString give me the entire subnode in stringform. I need to select all books in a particular country so that I can then update them all to include a new locale node, ex:
<Config>
<Book>
<Name> Book Name #1 </Name>
<Available In>
<Country>US</Country>
<Country>Canada</Country>
</Available In>
<LocaleIDs>
<LocalID> 3066 </LocaleID>
<LocaleIDs>
</Book>
</Config>
Thanks for your help!
You're trying to use Value on the result of calling Elements which returns a sequence of elements. That's not going to work - it doesn't make any sense. You want to call it on a single element at a time.
Additionally, you're trying to look for direct children of Book, which ignores the Available In element, which isn't even a valid element name...
I suspect you want something like:
var query = xmlFile.Elements("Book")
.Where(x => x.Descendants("Country")
.Any(x => (string) x == "Canada"));
In other words, find Book elements where any of the descendant Country elements has a text value of "Canada".
You'll still need to fix your XML to use valid element names though...

Using XQuery/XPath to get the attribute value of an element's parent node

Given this xml document:
<?xml version="1.0" encoding="UTF-8"?>
<mydoc>
<foo f="fooattr">
<bar r="barattr1">
<baz z="bazattr1">this is the first baz</baz>
</bar>
<bar r="barattr2">
<baz z="bazattr2">this is the second baz</baz>
</bar>
</foo>
</mydoc>
that is being processed by this xquery:
let $d := doc('file:///Users/mark/foo.xml')
let $barnode := $d/mydoc/foo/bar/baz[contains(#z, '2')]
let $foonode := $barnode/../../#f
return $foonode
I get the following error:
"Cannot create an attribute node (f) whose parent is a document node".
It seems that the ../ operation is sort of removing the matching nodes from the rest of the document such that it thinks it's the document node.
I'm open to other approaches but the selection of the parent depends on the child attribute containing a certain sub-string.
Cheers!
The query you have written is selecting the attribute f. However it is not legal to return an attribute node from an XQuery. The error is refering to the output document which here contains just an attribute (although this error message is misleading, as technically there is no output document here, there is just an attribute node that is returned).
You probably wanted to return the value of the attribute rather than the attribute itself
return data($foonode)

XPath 1 query

I posted a similar question and I got a very useful reply.
Now the question is a little different, so I post it.
I specify it is an XPath 1 related question.
This is the content of my XML file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<mainNode>
<subNode>
<h hm="08:45">
<store id="1563">Open</store>
</h>
<h hm="13:00">
<store id="1045">Open</store>
<store id="763">Open</store>
<store id="1047">Open</store>
</h>
<h hm="16:30">
<store id="1045">Open</store>
<store id="763">Open</store>
<store id="1047">Open</store>
</h>
<h hm="20:00">
<store id="1045">Open</store>
<store id="1043">Open</store>
<store id="1052">Open</store>
</h>
<h hm="22:00">
<store id="763">Open</store>
<store id="1052">Open</store>
</h>
</subNode>
</mainNode>
My program gets the current time: if I get 12.40, I must retrieve all the stores id of the next h hm (13.00): this issue has been solved.
After retrieving the data, with a second XPath query, I must get until when, during the current day (of which the XML file is a representation), a single store will be open.
So, imagine the store is the store identified with the id=1045 and now it's 12.40 in the morning. This store will close at 20.00 because it is found in the h hm=13.00 subnode, in the h hm=16.30 subnode and in the h hm=20.00 subnode. So, I must get that 20.00.
Case 2: it's 12.40 and I must know when 763 will close. It will close at 16.30, no matter it is included in the last node (h hm=22.00). So, I must get that 16.30.
Is this possible?
Thanks in advance.
Here is how such an XPatch expression can be constructed:
The following XPath expression selects the wanted result
($vOpen[not(count(following-sibling::h[1] | $vOpen)
=
count($vOpen))
][1]/#hm
|
$vOpen[last()]/#hm
)
[1]
where $vOpen
is defined as:
$vge1240[store/#id=$vId]
and $vge1240 is defined as:
/*/*/h[translate(#hm,':','') >= 1240]
and $vId is defined as:
1045
or
763
The above variables may be defined and referenced within an XSLT stylesheet or, if XPath is embedded in another host, then each variable reference must be substituted with the right-hand-side of the variable definition. In this case the complete XPath expression will be:
(/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]
[not(count(following-sibling::h[1]
|
/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]
)
=
count(/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]))
]
[1]
/#hm
|
/*/*/h[translate(#hm,':','') >= 1240][store/#id=763]
[last()]
/#hm
)
[1]
Explanation:
($vOpen[not(count(following-sibling::h[1] | $vOpen)
=
count($vOpen))
][1]/#hm
|
$vOpen[last()]/#hm
)
[1]
means the following:
From all entries in the "Open" hours that contain the id (763)
Take those, whose immediate following sibling does not belong to that set (closed or not containing 763)
From those take the first one.
Take the first one (in document order) from the node selected in step 3. above and the last element in $vOpen. This will select the last entry in the "Open" hours if all entries in it contain the given Id.
Here we use essentially the Kayesian method for intersection of two nodesets $ns1 and $ns2:
$ns1[count(. | $ns2) = count($ns2)]
I'll just repeat the last part of my answer to you in that last question you refer to.
It would more pragmatic to load the XML into some data structures that are more conducive to your requirements. This secondary question just re-inforces the sensibleness of that advice.
I think this works. It finds the last consecutive occurrence (after the start time) of store/#id where #id = $id, which is what I believe you were looking for.
<xsl:variable name="id" select="'1045'"/>
<xsl:variable name="st" select="translate('12:40',':','')"/>
...
((//h[translate(#hm,':','') >= $st])[position() = count(preceding-sibling::*[store/#id=$id])+1 and store/#id=$id])[last()]/#hm
Explanation:
First, select all elements that share a start time >= to the provided start. Then, within those results, select the #hm attribute of the last element which has a position equal to the number of preceding elements that have the requested store/#id tag within them.
The one limitation in this short version is that it'll fail if the store id does not occur within the the first element after the start time. The one below fixes that limitation by starting with the first after the start time that contains the proper store/#id, but it's a bit messier:
<xsl:variable name="id" select="'1045'"/>
<xsl:variable name="st" select="translate('12:40',':','')"/>
...
((//h[translate(#hm,':','') >= $st and position() >= count(//h[not(preceding-sibling::*[store/#id=$id])])])[position() = count(preceding-sibling::*[store/#id=$id])+1 and store/#id=$id])[last()]/#hm
If the store ID you're looking for is in $store, this will get the last h/#hm which contains a that ID and is followed by an h which does not contain it or the last h/#hm which contains that store ID (for stores which close in the last h/#hm).
//h[not(store/#id=$store)]/preceding-sibling::h[store/#id = $store][1]/#hm | //h[store/#id=$store][last()]/#hm
To test in XSLT with your example XML:
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:variable name="store" select="763"/>
<xsl:template match="/">
<closes>
<xsl:value-of select="//h[not(store/#id=$store)]/preceding-sibling::h[store/#id = $store][1]/#hm | //h[store/#id=$store][last()]/#hm"/>
</closes>
</xsl:template>
</xsl:stylesheet>
Prints <closes>16:30</closes>. If you change $store to 1052, it prints <closes>22:00</closes> instead.

Resources