getting XmlSearch to return siblings only, not children - xpath

I'm getting a SOAP response that looks like this:
<Activity>
<Id>A</Id>
<Subject>foo</Subject>
<Activity>Task</Activity>
</Activity>
<Activity>
<Id>B</Id>
<Subject>bar</Subject>
<Activity>Appointment</Activity>
</Activity>
<Activity>
<Id>C</Id>
<Subject>snafu</Subject>
<Activity>Task</Activity>
</Activity>
In Coldfusion, I was trying to parse out the Activity nodes with this:
<cfset arrMainNodes = XmlSearch(soapResponse, "//*[name()='Activity']") />
The problem is, instead if getting an array with three elements, I get an array with six: 3 of the parent, and 3 of the children.
I can't for the life of me figure out the XPath statement the will find siblings only, and not children.
Please Help.

Use:
//*[name()='Activity' and not(ancestor::*[name()='Activity' ])]
This selects all elements in the document, whose name is "Activity" and that do not have an ancestor with name "Activity".

Related

SchemaTron rule to find invalid records

I am trying to validate the following XML using the Schematron rule.
XML:
<?xml version="1.0" encoding="utf-8"?>
<Biotic><Maul><Number>1</Number>
<Record><Code IDREF="a1"/>
<Detail><ItemID>1</ItemID></Detail>
<Detail><ItemID>3</ItemID></Detail>
</Record>
<Record><Code IDREF="b1"/>
<Detail><ItemID>3</ItemID></Detail>
<Detail><ItemID>4</ItemID></Detail>
</Record>
<Record><Code IDREF="b1"/>
<Detail><ItemID>4</ItemID></Detail>
<Detail><ItemID>6</ItemID></Detail>
</Record>
<Record><Code IDREF="c1"/>
<Detail><ItemID>5</ItemID></Detail>
<Detail><ItemID>5</ItemID></Detail>
</Record>
</Maul></Biotic>
And the check is "ItemID should be unique for the given Code within the given Maul."
So as per requirement Records with Code b1 is not valid because ItemId 4 exists in both records.
Similarly, record C1 is also not valid because c1 have two nodes with itemId 5.
Record a1 is valid, even ItemID 3 exists in the next record but the code is different.
Schematron rule I tried:
<?xml version="1.0" encoding="utf-8" ?><schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>Schematron validation rule</title>
<pattern id="P1">
<rule context="Maul/Record" id="R1">
<let name="a" value="//Detail/[./ItemID, ../Code/#IDREF]"/>
<let name="b" value="current()/Detail/[./ItemID, ../Code/#IDREF]"/>
<assert test="count($a[. = $b]) = count($b)">
ItemID should be unique for the given Code within the given Maul.
</assert>
</rule>
</pattern>
</schema>
The two let values seem problematic. They will each return a Detail element (and all of its content including attributes, child elements, and text nodes). I'm not sure what the code inside the predicates [./ItemID, ../Code/#IDREF] is going to, but I think it will return all Detail elements that have either a child ItemID element or a sibling Code element with an #IDREF attribute, regardless of what the values of ItemID or #IDREF are.
I think I would change the rule/#context to ItemID, so the assert would fail once for each ItemID that violates the constraint.
Here are a rule and assert that work correctly:
<?xml version="1.0" encoding="utf-8" ?><schema xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>Schematron validation rule</title>
<pattern id="P1">
<rule context="Maul/Record/Detail/ItemID" id="R1">
<assert test="count(ancestor::Maul/Record[Code/#IDREF = current()/ancestor::Record/Code/#IDREF]/Detail/ItemID[. = current()]) = 1">
ItemID should be unique for the given Code within the given Maul.
</assert>
</rule>
</pattern>
</schema>
The assert test finds, within the ancestor Maul, any Record that has a Code/#IDREF that equals the Code/#IDREF of the Record that the current ItemID is in. At minimum, it will find one Record (the one that the current ItemID is in). Then it looks for any Detail/ItemID within those Records that is equal to the current ItemID. It will find at least one (the current ItemID). The count function counts how many ItemIDs are found. If more than one is found, the assert fails.
Thanks for the reference to https://www.liquid-technologies.com/online-schematron-validator! I wasn't aware of that tool.

XPath to get parents with multiple children but only one type of child

I need an XPath (1.0) to get all parent nodes with multiple children but only one type of child (e.g., either <div> or <li> but not <div> and <li>). Any help? Thank you!
<doc>
<tom>
<janet />
</tom>
<dick>
<janet />
<jane />
</dick>
<harry>
<jane />
</harry>
</doc>
So for the above we should get tom and harry but not dick
Using the example as a reference, the following XPath 1.0 expression:
/doc/*[count(./*) = count(./*[name(.) = name(../*[1])])]
Will return all children of doc where the total number of children of that element equals the number of children with the same name as the first child of that element. Or, more simply put, all children have the same name aka 'type'.
However, the above will return nodes that have 0 or 1 children, so to restrict it to only those where there are multiple child nodes, we can use:
/doc/*[count(./*) = count(./*[name(.) = name(../*[1])]) and count(./*) > 1]
If you want to further restrict it so that all children have to be a certain element, for example jane, you could use: /doc/*[count(./*) = count(./*[name(.) = name(../*[1])]) and count(./*) > 1 and ./*[1] = ./jane[1]]

Find string in NodeSet with XPath (Nokgiri)

I have this XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
<pdf2xml>
<page number="1">
<text top="91">Rapport</text>
<text top="102">foo</text>
</page>
<page number="2">
<text top="91">Rapport</text>
<text top="102">bar</text>
</page>
<page number="3">
<text top="91">Rapport</text>
<text top="102">asdf</text>
</page>
</pdf2xml>
which I'm doing this with:
require 'nokogiri'
doc = Nokogiri::XML(File.read("file.xml"))
pages = doc.xpath("//page")
nodeset = pages[0].xpath("./text") + pages[1].xpath("./text")
I want to find a node by string in nodeset, like this
irb(main):011:0> nodeset.at_xpath("//text[text()[contains(., 'bar')]]")
=> #<Nokogiri::XML::Element:0x3fea6a4821d4 name="text" attributes=[#<Nokogiri::XML::Attr:0x3fea6a482170 name="top" value="102">] children=[#<Nokogiri::XML::Text:0x3fea6a481cac "bar">]>
but I don't want to use //
I have managed to do this
irb(main):018:0> nodeset.at_xpath("text()[contains(., 'bar')]")
=> #<Nokogiri::XML::Text:0x3fea6a481cac "bar">
but I want the whole <text> node.
What should my xpath query on nodeset look like?
For selecting parent of the current node you can use .. For example,
/pdf2xml/page[1]
points to the first <page> node. If you want to select its parent again you can write
/pdf2xml/page[1]/..
This will select <pdf2xml> node which is the parent of <page>.
On the similar lines you can use .. for selecting parent node in your example.
For more information you can refer this
Hope this helps.
Simpler than selecting the text() node and then selecting the parent node is to just select the node you want in the first place:
pages = doc.xpath("//page")
puts pages.xpath("text[contains(.,'bar')]")
#=> <text top="102">bar</text>
If it makes you feel better, you could alternatively explicitly test the text() child node of the text element instead of using the text equivalent for the element:
pages.xpath("text[contains(text(),'bar')]")
I just discovered that
nodeset.at_xpath("../text[text()[contains(., 'bar')]]")
works too.
Edit: But I think this is slower than /...

How to select node which has a parent with some attributes

How to select node which has a parent with some attributes.
Eg: what is Xpath to select all expiration_time elements.
In the following XML, I'm getting error if states elements has attributes, otherwise no probs.
Thanks
<lifecycle>
<states elem="0">
<expiration_time at="rib" zing="chack">08</expiration_time>
</states>
<states elem="1">
<expiration_time at="but">4:52</expiration_time>
</states>
<states elem="2">
<expiration_time at="ute">05:40:15</expiration_time>
</states>
<states elem="3">
<expiration_time>00:00:00</expiration_time>
</states>
</lifecycle>
states/expiration_time[../#elem = "0"]?
Use:
/*/*/expiration_time
This selects all expiration_time elements that are grand-children of the top-element of the XML document.
/*/*[#*]/expiration_time
This selects any expiration_time element whose parent has at least one attribute and is a child of the top element of the XML document.
/*/*[not(#*)]/expiration_time
This selects any expiration_time element whose parent has no attributes and is a child of the top element of the XML document.
/*/*[#elem = '2']/expiration_time
This selects any expiration_time element whose parent has an elem attribute with string value '2' and that is (the parent) a child of the top element of the XML document.
This will give you all nodes having atleast one attribute
//*[count(./#*) > 0]

Using XQuery/XPath to get the attribute value of an element's parent node

Given this xml document:
<?xml version="1.0" encoding="UTF-8"?>
<mydoc>
<foo f="fooattr">
<bar r="barattr1">
<baz z="bazattr1">this is the first baz</baz>
</bar>
<bar r="barattr2">
<baz z="bazattr2">this is the second baz</baz>
</bar>
</foo>
</mydoc>
that is being processed by this xquery:
let $d := doc('file:///Users/mark/foo.xml')
let $barnode := $d/mydoc/foo/bar/baz[contains(#z, '2')]
let $foonode := $barnode/../../#f
return $foonode
I get the following error:
"Cannot create an attribute node (f) whose parent is a document node".
It seems that the ../ operation is sort of removing the matching nodes from the rest of the document such that it thinks it's the document node.
I'm open to other approaches but the selection of the parent depends on the child attribute containing a certain sub-string.
Cheers!
The query you have written is selecting the attribute f. However it is not legal to return an attribute node from an XQuery. The error is refering to the output document which here contains just an attribute (although this error message is misleading, as technically there is no output document here, there is just an attribute node that is returned).
You probably wanted to return the value of the attribute rather than the attribute itself
return data($foonode)

Resources