How to get namespace names in XPath? - xpath

I have this .xml file
<root xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3.org/TR/html4/">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:tr>
<f:td>Red</f:td>
<f:td>Yellow</f:td>
</f:tr>
</f:table>
</root>
How can i get only the elements with a specify namespace?
For example i want to retrieve only that elements in 'h' namespace.
How can i get it? In exist-db the 'namespace::' axis is not more working

Try using the in-scope-prefixes() function in a predicate:
//*[in-scope-prefixes(.)='h']

In the comments you show a solution that parses the return value from name():
//*[substring-before(name(), ":")='h']
There is a far simpler way to get all elements in the namespace that's mapped to the h prefix:
//h:*
Note: when I first tested this, I was getting back all elements in the document. That's because both of your prefixes are mapped to the same namespace:
xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3.org/TR/html4/"
You should also fix this.

Related

Access deep nested node from document.xml using nokogiri

I am using nokogiri to access a docx's document xml file.
here is a sample of it:
<w:document>
<w:body>
<w:p w:rsidR="00454EDC" w:rsidRDefault="00454EDC" w:rsidP="00454EDC">
<w:drawing>
<wp:inline distT="0" distB="0" distL="0" distR="0">
<wp:extent cx="1926590" cy="1088571"/>
<wp:effectExtent l="0" t="0" r="0" b="0"/>
<wp:docPr id="1" name="Picture 1"/>
<wp:cNvGraphicFramePr>
<a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/>
</wp:cNvGraphicFramePr>
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="0" name="Picture 1"/>
<pic:cNvPicPr>
<a:picLocks noChangeAspect="1" noChangeArrowheads="1"/>
</pic:cNvPicPr>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId5" cstate="print">
<a:extLst>
<a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">
<a14:useLocalDpi xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" val="0"/>
</a:ext>
</a:extLst>
</a:blip>
<a:srcRect/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr bwMode="auto">
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="1951299" cy="1102532"/>
</a:xfrm>
<a:prstGeom prst="rect">
<a:avLst/>
</a:prstGeom>
<a:noFill/>
<a:ln>
<a:noFill/>
</a:ln>
</pic:spPr>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
</w:p>
</w:body>
</w:document>
Now I want to access all <w:drawing> tags and from them I wan to access <a:blip> tag and extract the value of attribute of r:embed from it.
In this case as you can see it is rId5
I am able to access the <w:drawing> tag by using xml.xpath('//w:drawing') but when I do so xml.xpath('//w:drawing').xpath('//a:blip'), it throws error :
Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //a:blip
What am I doing wrong, can anyone point me in the right direction?
The error is telling you that in your XPath query, //a:blip, Nokogiri doesn’t know what namespace a refers to. You need to specify the namespaces that you are targeting in your query, not just the prefix. The fact that the prefix a is defined in the document doesn’t really matter, it is the actual namespace URI that is important. It is possible to use completely different prefixes in the query than those used in the document, as long as the namespace URIs match.
You may be wondering why the query //w:drawing works. You don’t include the full XML, but I suspect that the w prefix is defined on the root node (something like xmlns:w="http://some.uri.here"). If you don’t specify any namespaces, Nokogiri will automatically register any defined in the root node so they will be available in your query. The namespace corresponding to the a prefix isn’t defined on the root, so it is unavailable, and so you get the error you see.
To specify namespaces in Nokogiri you pass a hash, mapping the prefix (as used in the query) to namespace URI, to the xpath method (or which ever query method you’re using). Since you are providing your own namespace mappings, you also need to include any you use from the root node, Nokogiri doesn’t include them in this case.
In your case, the code would look something like this:
namespaces = {
'w' => 'http://some.uri', # whatever the URI is for this namespace
'a' => 'http://schemas.openxmlformats.org/drawingml/2006/main'
}
# You can combine this to a single query.
# Also note you don’t want a double slash infront of
# the `/a:blip` part, just one.
xml.xpath('//w:drawing/a:blip', namespaces)
Have a look at the Nokogiri tutorial section on namespaces for more info.
I would say that this is a bug in the xml parser that you are using :
Indeed, the error seems to be that the namespace prefix a is undefined, however, it has been defined in <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">, which is a parent of the <a:blip> element.
See here if you want to know more about xml namespaces
It seems that they are a few other questions about problems with namespace prefixes in nokogiri, for example : Undefined namespace prefix in Nokogiri and XPath

Java XPath with default Namespace issue

I am not able to ready node for expression
<ns:Msg xmlns:ns="http://www.noventus.se/epix1" xmlns="http:www.defaultnamespace.com">
<ns:Header>
<SubsysId>1</SubsysId>
<SubsysType>30003</SubsysType>
<SendDateTime>2009-08-13T14:28:15</SendDateTime>
</ns:Header>
</ns:Msg>
I am having this kind of xml with contains two namespaces 1 is with ns and other one is default one.
I am trying to get value for SubsysId using org.dom4j.XPath and adding namespace with
Map namespaces = new HashMap();
namespaces.put("ns", "http://www.noventus.se/epix1");
namespaces.put("main", "http:www.defaultnamespace.com");
Adding these namespaces like this
xpath.setNamespaceContext(new SimpleNamespaceContext(namespaces));
This is my expression
String expression = "/ns:Msg/ns:Header/SubsysId";
I tried multiple options but not able to get the value.
NOTE: If I remove default namespace and run then I am getting the value.
Your help is highly appreciated.
Since you defined namespaces.put("main", "http:www.defaultnamespace.com");
then you would need to specify it in your xpath.
So your xpath becomes:
String expression = "/ns:Msg/ns:Header/main:SubsysId";

simplexml_load_file with xPath returns empty array

Getting XML from this URL:
$xml = simplexml_load_file('http://geocode-maps.yandex.ru/1.x/?geocode=37.71677,55.75208&kind=metro&spn=1,1&rspn=1');
print_r($xml) shows that XML loaded, but xpath always returns empty array. I tried:
$xml->xpath('/');
$xml->xpath('/ymaps');
$xml->xpath('/GeoObjectCollection');
$xml->xpath('/ymaps/GeoObjectCollection');
$xml->xpath('//GeoObjectCollection');
$xml->xpath('precision');
Why I got empty array? Hope I just missing something easy.
It might be rather easy, but I guess it is also the most common mistake in the history of XML: You are forgetting namespaces!
A lot of elements in the given XML are changing the default namespace and you have to consider that in your XPath.
You can first register your namespace like so:
$xml->registerXPathNamespace('y', 'http://maps.yandex.ru/ymaps/1.x');
$xml->registerXPathNamespace('a', 'http://maps.yandex.ru/attribution/1.x');
and then you can query your data:
$xml->xpath('//y:ymaps/y:GeoObjectCollection');

XPath format required on namespace node

Can someone please show me the XPath format i should use to retrieve the 2nd txnDetail node's billAmount ?
I am expecting value 10.00 but i have issues with the namespace and "a:" and XPath fails to retrieve the correct value.
<TransactionRsp xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<avlBal>818.00</avlBal>
<blkAmt>0.00</blkAmt>
<cardID>2561683577196298</cardID>
<currBill>GBP</currBill>
<endBal>390.00</endBal>
<logDateTime>2013-04-30T12:17:20.4249292Z</logDateTime>
<msgID>121719721</msgID>
<rspCode>000</rspCode>
<startBal>400.00</startBal>
<txnDetail xmlns:a="http://schemas.datacontract.org/2004/07/CoreModels">
<a:txnDetail>
<a:billAmount>400.00</a:billAmount>
<a:billConvRate>0.00</a:billConvRate>
<a:blkAmount>0.00</a:blkAmount>
<a:debOrCred>1</a:debOrCred>
<a:itemID>2278</a:itemID>
<a:itemType>6</a:itemType>
<a:txnAmount>0.00</a:txnAmount>
<a:txnCurrency/>
<a:txnDateTime>2012-02-23T14:35:45</a:txnDateTime>
<a:txnDescription></a:txnDescription>
</a:txnDetail>
<a:txnDetail>
<a:billAmount>10.00</a:billAmount>
<a:billConvRate>0.00</a:billConvRate>
<a:blkAmount>0.00</a:blkAmount>
<a:debOrCred>0</a:debOrCred>
<a:itemID>3058</a:itemID>
<a:itemType>5</a:itemType>
<a:txnAmount>0.00</a:txnAmount>
<a:txnCurrency/>
<a:txnDateTime>2012-07-30T12:22:14</a:txnDateTime>
<a:txnDescription>Fee: Card Issue</a:txnDescription>
</a:txnDetail>
</txnDetail>
</TransactionRsp>
It's:
//TransactionRsp/txnDetail/a:txnDetail[2]
However, depending on your programming language you might have to register the a namespace. The document might have a default namespace as well. (Don't expect that the xml you've posted is the whole document)
I have managed to pull the relevant data using the following XPath:
/TransactionRsp/txnDetail/[local-name()='txnDetail'][2]/[local-name()='billAmount']
Now I need to know how to filter out only txnDetail with an itemType = 6 ??
Any thoughts ?

XPath using string functions in the middle of the path

I'm trying to use Web Deploy 3.0 to make changes to my web.config before deployment. Let's say I have the following xml:
<node>
<subnode>
<connectInfo httpURL="http://LookImAUrl.com" />
</subnode>
<node>
And I'd like to match just the "http" in "http://..." so that I can potentially replace it with https.
I looked into XPath string functions and understand them -- I just don't know how to put them in the middle of an expression, for example:
"//node/subnode/connectInfo/#httpURL/substring-before(../#httpURL,':')"
That's basically what I want to do, but it doesn't look right.
"//node/subnode/connectInfo/#httpURL/substring-before(../#httpURL,':')"
That's basically what I want to do, but it doesn't look right.
But it is right and will match the http.
(Btw, you could write it shorter without ..
//node/subnode/connectInfo/#httpURL/substring-before(.,':')
)
However, it will return the string "http" not some kind of pointer pointing to the value of #httpUrl, which is not possible, since there are no partial nodes within the value.
(In XPath 2,) you can return the attribute and a new value, and then perhaps change it in the calling language
//node/subnode/connectInfo/#httpURL/(., concat("https:", substring-after(.,':')))
Using XPath 1.0, if you want to return the initial part of the URL use:
substring-before(//node/subnode/connectInfo/#httpURL,':')
Note though that this will return the value of ONLY the first connectInfo element.
If you want to get the connectInfo nodes that use HTTP:
//node/subnode/connectInfo[starts-with(#httpURL,'http:')]
If you wan to get all httpURL that use HTTP:
//node/subnode/connectInfo/#httpURL[starts-with(.,'http:')]

Resources