How to specify namespace when querying nodes with XPath? - winapi

Short Version
You do it in .NET with:
XmlNode.SelectNodes(query, selectionNamespaces);
Can you do it in javascript?
https://jsfiddle.net/9ctoxbh0/
Can you do it in msxml?
Attempt A:
IXMLDOMNode.selectNodes(query); //no namespaces option
Attempt B:
IXMLDOMNode.ownerDocument.setProperty("SelectionNamespaces", selectionNamespaces);
IXMLDOMNode.selectNodes(query); //doesn't work
Attempt C:
IXMLDOMDocument3 doc;
doc.setProperty("SelectionNamespaces", selectionNamespaces);
IXMLDOMNodeList list = doc.selectNodes(...)[0].selectNodes(query); //doesn't work
Long Version
Given an IXMLDOMNode containing a fragment of xml:
<row>
<cell>a</cell>
<cell>b</cell>
<cell>c</cell>
</row>
We can use the IXMLDOMNode.selectNodes method to select child elements:
IXMLDOMNode row = //...xml above
IXMLDOMNodeList cells = row.selectNodes("/row/cell");
and that will return an IXMLDOMNodeList:
<cell>a</cell>
<cell>b</cell>
<cell>c</cell>
And that's fine.
But namespaces break it
If the XML fragment originated from a document with a namespace, e.g.:
<row xmlns:ss="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<cell>a</cell>
<cell>b</cell>
<cell>c</cell>
</row>
The same XPath query will nothing, because the elements row and cell do not exist; they are in another namespace.
Querying documents with default namespace
If you had a full IXMLDOMDocument, you would use the setProperty method to set a selection namespace:
a
b
c
You would query the default namespace by giving it a name, e.g.:
Before: xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
After: xmlns:peanut="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
and then you can query it:
IXMLDOMDocument3 doc = //...document xml above
doc.setProperty("SelectionNamespaces", "xmlns:peanut="http://schemas.openxmlformats.org/spreadsheetml/2006/main");
IXMLDOMNodeList cells = doc.selectNodes("/peanut:row/peanut:cell");
and you get your cells:
<cell>a</cell>
<cell>b</cell>
<cell>c</cell>
But that doesn't work for a node
An IXMLDOMNode has a method to perform XPath queries:
selectNodes Method
Applies the specified pattern-matching operation to this node's context and returns the list of matching nodes as IXMLDOMNodeList.
HRESULT selectNodes(
BSTR expression,
IXMLDOMNodeList **resultList);
Remarks
For more information about using the selectNodes method with namespaces, see the setProperty Method topic.
But there's no way to specify Selection Namespaces when issuing an XPath query against a DOM Node.
How can I specify a namespace when querying nodes with XPath?
.NET Solution
.NET's XmlNode provides a SelectNodes method that provides accepts a XmlNamespaceManager parameter:
XmlNamespaceManager ns = new XmlNamespaceManager(doc.NameTable);
ns.AddNamespace("peanut", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
cells = row.SelectNodes("/peanut:row/peanut:cell", ns);
But i'm not in C# (nor am i in Javascript). What's the native msxml6 equivalent?
Edit: Me not so much with the Javascript (jsFiddle)
Complete Minimal Example
program Project3;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, msxml, ActiveX;
procedure Main;
var
s: string;
doc: DOMDocument60;
rows: IXMLDOMNodeList;
row: IXMLDOMElement;
cells: IXMLDOMNodeList;
begin
s :=
'<?xml version="1.0" encoding="UTF-16" standalone="yes"?>'+#13#10+
'<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">'+#13#10+
'<row>'+#13#10+
' <cell>a</cell>'+#13#10+
' <cell>b</cell>'+#13#10+
' <cell>c</cell>'+#13#10+
'</row>'+#13#10+
'</worksheet>';
doc := CoDOMDocument60.Create;
doc.loadXML(s);
if doc.parseError.errorCode <> 0 then
raise Exception.CreateFmt('Parse error: %s', [doc.parseError.reason]);
doc.setProperty('SelectionNamespaces', 'xmlns:ss="http://schemas.openxmlformats.org/spreadsheetml/2006/main"');
//Query for all the rows
rows := doc.selectNodes('/ss:worksheet/ss:row');
if rows.length = 0 then
raise Exception.Create('Could not find any rows');
//Do stuff with the first row
row := rows[0] as IXMLDOMElement;
//Get the cells in the row
(row.ownerDocument as IXMLDOMDocument3).setProperty('SelectionNamespaces', 'xmlns:ss="http://schemas.openxmlformats.org/spreadsheetml/2006/main"');
cells := row.selectNodes('/ss:row/ss:cell');
if cells.length <> 3 then
raise Exception.CreateFmt('Did not find 3 cells in the first row (%d)', [cells.length]);
end;
begin
try
CoInitialize(nil);
Main;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
end.

This is answered on MSDN:
How To Specify Namespace when Querying the DOM with XPath
Update:
Note, however, that in your second example XML, the <row> and <cell> elements are NOT in the namespace being queried by the XPath when adding xmlns:peanut to the SelectionNamespaces property. That is why the <cell> elements are not being found.
To put them into the namespace properly, you would have to either:
change the namespace declaration to use xmlns= instead of xmlns:ss=:
<row xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<cell>a</cell>
<cell>b</cell>
<cell>c</cell>
</row>
use <ss:row> and <ss:cell> instead of <row> and <cell>:
<ss:row xmlns:ss="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<ss:cell>a</cell>
<ss:cell>b</cell>
<ss:cell>c</cell>
</ss:row>
The SelectionNamespaces property does not magically put elements into a namespace for you, it only specifies which namespaces are available for the XPath query to use. The XML itself has to put elements into the proper namespaces as needed.
Update:
In your new example, cells := row.selectNodes('/ss:row/ss:cell'); does not work because the XPath query is using an absolute path, where the leading / starts at the document root, and there are no <row> elements at the top of the XML document, only a <worksheet> element. That is why rows := doc.selectNodes('/ss:worksheet/ss:row'); works.
If you want to perform an XPath query that begins at the node being queried, don't use an absolute path, use a relative path instead:
cells := row.selectNodes('ss:row/ss:cell');
Or simply:
cells := row.selectNodes('ss:cell');

Related

Xpath sibling filter based on value of element in current node

Is there an Xpath to find a cousin node that has an element that matches the value of an element in the current node?
Please see below - I am iterating over each "Order" node and want to return the value of LocationID from the Collection node that has the same OrderLoadRef value as the order. For the first order it should return "AAA", for the second it should return "BBB".
The XPath works if I change the value of the OrderLoadRef manually, but how to I set it to be the value of the OrderLoadRef in the current Order Element? I've tried using the self axis, but think by the time we get to the condition, "self" is the collection node, not the order?
I can't hard code relative collection / order node positions as there could be a variable number of these nodes in the XML that my parser receives.
XDocument xDoc = XDocument.Parse(#"<DocRoot>
<Load>
<Collections>
<Collection>
<OrderLoadRef>1</OrderLoadRef>
<LocationID>AAA</LocationID>
</Collection>
<Collection>
<OrderLoadRef>2</OrderLoadRef>
<LocationID>BBB</LocationID>
</Collection>
</Collections>
<Orders>
<Order>
<OrderRef>1521505</OrderRef>
<OrderLoadRef>1</OrderLoadRef>
</Order>
<Order>
<OrderRef>1521505_2</OrderRef>
<OrderLoadRef>2</OrderLoadRef>
</Order>
</Orders>
</Load>
</DocRoot>");
List<XElement> orders = xDoc.XPathSelectElements("//Order").ToList();
foreach(XElement order in orders)
{
string locationId = order.XPathSelectElement("parent::Orders/parent::Load/Collections/Collection[OrderLoadRef = {OrderLoadRef from current order element}]/LocationID").Value;
}
Edited to add: I need this to be a purely XPath solution as I'm not able to alter the C# code in the parser. More than happy to be told it's not possible, but wanted to make sure before I relayed the message!
As Mads said, XPath 3 and later (i.e. the current version 3.1) allows you to use a let expression so e.g.
for $order in /DocRoot/Load/Orders/Order
return
let $col := /DocRoot/Load/Collections/Collection[OrderLoadRef = $order/OrderLoadRef]/LocationID
return $col
is pure XPath 3 and returns (for your sample) the two LocationID elements:
<LocationID>AAA</LocationID>
<LocationID>BBB</LocationID>
In the .NET framework XmlPrime and Saxon.NET support XPath 3.1 and XQuery 3.1 although only XmlPrime has extension methods for C# to work against XDocument, I think, Saxon.NET does allow XPath 3.1 against its XDM tree model or against System.Xml.XmlDocument.
XPath 3.0 (and greater) supports let expressions, which would allow you to do what you want. You could let a variable with the OrderLoadRef from the context node and use it within a predicate selecting the desired Collection by it's OrderLoadRef.
For a static XPath 1.0 expression, I don't think you can achieve what you want. You would need to construct the XPath using the context node information.
Inside your for loop, create a variable for the Order's OrderLoadRef value. Use that value to construct the XPath that you want to evaluate to then select the locationId
foreach(XElement order in orders)
{
string orderLoadRef = order.XPathSelectElement("OrderLoadRef").Value;
string locationId = order.XPathSelectElement("ancestor::Load/Collections/Collection[OrderLoadRef = " + orderLoadRef + "]/LocationID").Value;
//do something with the locationId
}

XQuery/Xpath referring to xml elements with no namespace, in a namespace environment

In Xquery 3.1 (under eXist-DB 4.7) I receive xml data like this, with no namespace:
<edit-request id="TC9999">
<title-collection>foocolltitle</title-collection>
<title-exempla>fooextitle</title-exempla>
<title-short>fooshorttitle</title-short>
</edit-request>
This is assigned to a variable $content and this statement:
let $collid := $content/edit-request/#id
...correctly returns: TC9999
Now, I need to actually transform all the data in $content into a TEI xml document.
I first need to get some info from an existing TEI file, so I assigned another variable:
let $oldcontent := doc(concat($globalvar:URIdata,$collid,"/",$collid,".xml"))
And then I create the new TEI document, referring to both $content and $oldcontent:
let $xml := <listBibl xmlns="http://www.tei-c.org/ns/1.0"
type="collection"
xml:id="{$collid}">
<bibl>
<idno type="old_sql_id">{$oldcontent//tei:idno[#type="old_sql_id"]/text()}</idno>
<title type="collection">{$content//title-exempla/text()}</title>
</bibl>
</listBibl>
The references to the TEI namespace in $oldcontent come through, but to my surprise the references to $content (no namespace) don't show up:
<listBibl xmlns="http://www.tei-c.org/ns/1.0"
type="collection"
xml:id="TC9999">
<bibl>
<idno type="old_sql_id">1</idno>
<title type="collection"/>
</bibl>
</listBibl>
The question is: how do I refer to the non-namespace elements in $content in the context of let $xml=...?
Nb: the Xquery document has a declaration at the top (as it is the principle namespace of virtually all the documents):
declare namespace tei = "http://www.tei-c.org/ns/1.0";
In essence you are asking how to write an XPath expression to select nodes in an empty namespace in a context where the default element namespace is non-empty. One of the most direct solutions is to use the "URI plus local-name syntax" for writing QNames. Here is an example:
xquery version "3.1";
let $x := <x><y>Jbrehr</y></x>
return
<p xmlns="foo">Hey there,
{ $x/Q{}y => string() }!</p>
If instead of $x/Q{}y the example had used the more common form of the path expression, $x/y, its result would have been an empty sequence, since the local name y used to select the <y> element specifies no namespace and thus inherits the foo element namespace from its context. By using the "URI plus local-name syntax", though, we are able to specify the empty namespace we are looking for.
For more information on this, see the XPath 3.1 specification's discussion of expanded QNames: https://www.w3.org/TR/xpath-31/#doc-xpath31-EQName.

Syntax error about XPath in Nokogiri, when combining namespace and node()

I'm learning XPath with Nokogiri. The XPath is like this:
xml_doc = Nokogiri::XML(open("test.xml"))
result = xml_doc.xpath("//x:foo", 'x' => 'www.example.com')
I could get the results. But when I perform this call:
result = xml_doc.xpath("//x:node()", 'x' => 'www.example.com')
I get an error:
Nokogiri::XML::XPath::SyntaxError: Invalid expression: //x:node()
Am I doing something wrong?
Different from elements, you don't need to use a namespace prefix to match by node(). The following will return all nodes in any namespace just fine:
result = xml_doc.xpath("//node()")
There are several types of nodes in XPath, namely text node, comment node, element node, so on. node() is a node tests which simply returns true for any node type whatsoever. Compare to text() which is another type of node tests that returns true only for text nodes. (See "w3.org > Xpath > Node Tests")
In my understanding, the notion of local name and namespace are only exists in the context of element nodes, so using a namespace prefix along with the node() test simply doesn't make sense.
If you meant to select all elements in a specific namespace use * instead of node():
result = xml_doc.xpath("//x:*", 'x' => 'www.example.com')

Inserting a child node when list is empty (XForms)

My problem is the following :
I usually have those data:
<structures>
<structure id="10">
<code>XXX</code>
</structure>
</structures>
so the table I display (single columns : code) is ok.
But in some cases, the data is the result a a query with no content, so the data is:
<structures/>
resulting in my table not displaying + error.
I am trying to insert, in the case of an empty instance, a single node so that the data would look like:
<structures>
<structure id="0"/>
</structures>
I am trying something like that :
<xforms:action ev:event="xforms-submit-done">
<xforms:insert if="0 = count(instance('{./instance-name}')/root/node())" context="instance('{./instance-name}')/root/node()" origin="xforms:element('structure', '')" />
</xforms:action>
but no node inserted when I look at the data in the inspector in the page.
Any obvious thing I am doing wrong?
There seems to be erros in your XPath if and context expressions:
if="0 = count(instance('{./instance-name}')/root/node())"
context="instance('{./instance-name}')/root/node()"
You are a using curly brackets { and }, I assume to have the behavior of attribute value templates (AVTs). But the if and context expressions are already XPath expressions, so you cannot use AVTs in them. Try instead:
if="0 = count(instance(instance-name)/root/node())"
context="instance(instance-name)/root/node()"
Also, the instance-name path is relative to something which might not be clear when reading or writing the expression. I would suggest using an absolute path for example instance('foo')/instance-name to make things clearer.
You don't provide the structure of the other instances, so I can tell for sure, but you'll expression above suppose that they have the form:
<xf:instance id="foo">
<some-root-element>
<root>
<structure/>
</root>
<some-root-element>
</xf:instance>
I don't know if that's what you intend.
Finally, you could replace count(something) = 0, with empty(something).

xerces-c 3.1 XPath evaluation

I could not find much examples of evaluate XPath using xerces-c 3.1.
Given the following sample XML input:
<abc>
<def>AAA BBB CCC</def>
</abc>
I need to retrieve the "AAA BBB CCC" string by the XPath "/abc/def/text()[0]".
The following code works:
XMLPlatformUtils::Initialize();
// create the DOM parser
XercesDOMParser *parser = new XercesDOMParser;
parser->setValidationScheme(XercesDOMParser::Val_Never);
parser->parse("test.xml");
// get the DOM representation
DOMDocument *doc = parser->getDocument();
// get the root element
DOMElement* root = doc->getDocumentElement();
// evaluate the xpath
DOMXPathResult* result=doc->evaluate(
XMLString::transcode("/abc/def"), // "/abc/def/text()[0]"
root,
NULL,
DOMXPathResult::ORDERED_NODE_SNAPSHOT_TYPE, //DOMXPathResult::ANY_UNORDERED_NODE_TYPE, //DOMXPathResult::STRING_TYPE,
NULL);
// look into the xpart evaluate result
result->snapshotItem(0);
std::cout<<StrX(result->getNodeValue()->getFirstChild()->getNodeValue())<<std::endl;;
XMLPlatformUtils::Terminate();
return 0;
But I really hate that:
result->getNodeValue()->getFirstChild()->getNodeValue()
Has it to be a node set instead of the exact node I want?
I tried other format of XPath such as "/abc/def/text()[0]", and "DOMXPathResult::STRING_TYPE". xerces always thrown exception.
What did I do wrong?
I don't code with Xerces C++ but it seems to implement the W3C DOM Level 3 so based on that I would suggest to select an element node with a path like /abc/def and then simply to access result->getNodeValue()->getTextContent() to get the contents of the element (e.g. AAA BBB CCC).
As far as I understand the DOM APIs, if you want a string value then you need to use a path like string(/abc/def) and then result->getStringValue() should do (if the evaluate method requests any type or STRING_TYPE as the result type).
Other approaches if you know you are only interested in the first node in document order you could evaluate /abc/def with FIRST_ORDERED_NODE_TYPE and then access result->getNodeValue()->getTextContent().

Resources