Problem with namespace and libxml when i use Xpath - xpath

i've got a problem when i'm using libxml with XPath. I want to parse an youtube playlist :
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'
xmlns:openSearch='http://a9.com/-/spec/opensearch/1.1/'
xmlns:media='http://search.yahoo.com/mrss/'
xmlns:batch='http://schemas.google.com/gdata/batch'
xmlns:yt='http://gdata.youtube.com/schemas/2007'
xmlns:gd='http://schemas.google.com/g/2005'
gd:etag='W/"Dk8DRn47eCp7ImA9WxRQGEk."'>
<id>tag:youtube,2008:user:andyland74:playlists</id>
<updated>2008-07-21T16:43:25.232Z</updated>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#playlistLink'/>
<title>Playlists of andyland74</title>
<logo>http://www.youtube.com/img/pic_youtubelogo_123x63.gif</logo>
<link rel='related' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74?v=2'/>
<link rel='alternate' type='text/html'
href='http://www.youtube.com/profile_play_list?user=andyland74'/>
<link rel='http://schemas.google.com/g/2005#feed'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74/playlists?v=2'/>
<link rel='http://schemas.google.com/g/2005#post'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74/playlists?v=2'/>
<link rel='http://schemas.google.com/g/2005#batch'
type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74/playlists/batch?v=2'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74/playlists?...'/>
<link rel='service' type='application/atomsvc+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74/playlists?alt=...'/>
<author>
<name>andyland74</name>
<uri>http://gdata.youtube.com/feeds/api/users/andyland74</uri>
</author>
<generator version='2.0'
uri='http://gdata.youtube.com/'>YouTube data API</generator>
<openSearch:totalResults>3</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<openSearch:itemsPerPage>25</openSearch:itemsPerPage>
<entry gd:etag='W/"Dk8DRn47eCp7ImA9WxRQGEk."'>
<id>tag:youtube,2008:user:andyland74:playlist:8BCDD04DE8F771B2</id>
<published>2007-11-04T17:30:27.000-08:00</published>
<updated>2008-07-15T12:33:20.000-07:00</updated>
<app:edited xmlns:app='http://www.w3.org/2007/app'>2008-07-15T12:33:20.000-07:00</app:edited>
<category scheme='http://schemas.google.com/g/2005#kind'
term='http://gdata.youtube.com/schemas/2007#playlistLink'/>
<title>My New Playlist Title</title>
<summary>My new playlist Description</summary>
<content type='application/atom+xml;type=feed'
src='http://gdata.youtube.com/feeds/api/playlists/8BCDD04DE8F771B2?v=2'/>
<link rel='related' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74?v=2'/>
<link rel='alternate' type='text/html'
href='http://www.youtube.com/view_play_list?p=8BCDD04DE8F771B2'/>
<link rel='self' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74/playlists/8BCDD04DE8F771B2?v=2'/>
<link rel='edit' type='application/atom+xml'
href='http://gdata.youtube.com/feeds/api/users/andyland74/playlists/8BCDD04DE8F771B2?v=2'/>
<author>
<name>andyland74</name>
<uri>http://gdata.youtube.com/feeds/api/users/andyland74</uri>
</author>
<yt:countHint>9</yt:countHint>
</entry>
</feed>
when i use the following xpath expression "/feed", the xmlXPathEvalExpression say me that i doesnt find.
if i remove all the xmlns attributes of feed it works. How could i make it work even with xmlns attributes ?
i use libxml with objective-C

I ran into a similar issue when trying to use libxml-ruby to parse through xml. From http://libxml.rubyforge.org/rdoc/classes/LibXML/XML/XPath.html:
To find nodes you must define the atom
namespace for libxml. One way to do
this is:
node = doc.find('atom:title', 'atom:http://www.w3.org/2005/Atom')
Alternatively, you can register the
default namespace like this:
doc.root.namespaces.default_prefix = 'atom' node = doc.find('atom:title')
Either way works, but registering makes sense if you're going to be using the methods a lot. Then you can just reference items like 'atom:title'.

I am using the XPathQuery wrapper around xmlXPathEvalExpression which makes it harder to go the xmlXpathRegisterNS route.
If you are querying for the fields directly, you probably do not care about the namespaces - it doesn't matter for my app. So, I just modified the XML before I process it.
NSString *xmlString = [[NSString alloc] initWithData:originalXMLData encoding:NSUTF8StringEncoding];
NSString *modifiedXMLString = [xmlString stringByReplacingOccurrencesOfString:#"xmlns=" withString:#"foobar="];
NSData *modifiedXMLData = [modifiedXMLString dataUsingEncoding:NSUTF8StringEncoding];
Now you can use modifiedXMLData in xmlXPathEvalExpression or PerformXMLXPathQuery if you use XPathQuery.

You didn't post your query code, but it sounds like you aren't registering the namespaces with your XpathContext. Here's the API docs for xmlXPathRegisterNS, I believe it will do what you're looking for. It won't let you register a default namespace, so you'll need to change your XPath expression to /feed:feed or the like.

To use a default namespace just register the namespace xlmns= and then use /xmlns:feed in your query.

After some research, I found the following solution that just works like NSXMLDocument path queries:
when xml documents declare a default namespace without a prefix, like
xmlns="..."
simple xpaths queries fail, like
xpath: /node
that's because xmlXPathEvalExpressionexpects some kind of default namespace prefix but there is none.
One approach is to fix the missing prefix (like GDataXML does) but that requires all xpaths to use this prefix, like
xpath: /__def_ns:node
But this is not how xpath's and NSXMLDocument works.
The following solution (based on a DDXMLNode) goes to the root node and scans for a namesepace without a prefix.
Then all nodes below are being traversed and if they belong to that namespace, it is being removed.
This is just like if there was no namespace in the first place.
- (void)fixNameSpace
{
xmlNodePtr nodePtr = (xmlNodePtr)self->genericPtr;
xmlNsPtr ns = nodePtr->nsDef;
xmlNsPtr defaultNs = NULL;
while(ns != NULL)
{
if (ns->prefix == NULL)
{
defaultNs = ns;
break;
}
ns = ns->next;
}
if (defaultNs)
[self resetDefaultNs:defaultNs];
}
- (void)resetDefaultNs:(xmlNsPtr)defaultNs
{
xmlNodePtr nodePtr = (xmlNodePtr)self->genericPtr;
xmlNsPtr ns = nodePtr->ns;
if (ns && ns == defaultNs)
xmlSetNs(nodePtr, NULL);
for (NSXMLNode* child in self.children)
[child resetDefaultNs:defaultNs];
}

Related

name of node when you know an attribute using path?

I have some XML where I know an attribute (in my case an ID#). I can get the node I'm looking for using //*[#id='v6969482']. But isn't there a way to tell me the name of this id? (I'm trying to have it return 'title' or , in my case. I know it has to do with using name(), but I can't seem to get the right syntax of returning the name when I have the id attribute.
<?xml version="1.0" encoding="UTF-8"?>
<topic id="v6969481">
<title id="v6969482">CR - ASE | AXX2500>Engines>EIOA>EIOAn>GMACn>Ingress</title>
<body id="v6969483">
<p id="v6969484">
<table id="v6153057" frame="all" colsep="1" rowsep="1">
<desc id="v6049915">Global ingress attributes for EIOA engine GMAC ports.</desc>
You need the name of the parent node of the attribute, its parent element:
name(//*[#id='v6969482'])

How to remove namespace from xml

I have a XML in following format
<Body xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/" xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<TransactionAcknowledgement xmlns="">
<TransactionId>HELLO </TransactionId>
<UserId>MC</UserId>
<SendingPartyType>SE</SendingPartyType>
</TransactionAcknowledgement>
</Body>
I want to user XQuery or XPath expression for it.
Now I want to remove only
xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/"
namespace from xml.
Is there any way to achieve it.
Thanks
Try to use functx:change-element-ns-deep:
let $xml := <Body xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/" xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<TransactionAcknowledgement xmlns="">
<TransactionId>HELLO </TransactionId>
<UserId>MC</UserId>
<SendingPartyType>SE</SendingPartyType>
</TransactionAcknowledgement>
</Body>
return functx:change-element-ns-deep($xml, "http://schemas.xmlsoap.org/soap/envelope/", "")
But as said Dimitre Novatchev this function doesn't change namespace of the source xml, it creates a new XML.

Parsing an XML file with Nokogiri?

<DataSet xmlns="http://www.atcomp.cz/webservices">
<xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="file_mame">...</xs:schema>
<diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
<alldata xmlns="">
<category diffgr:id="category1" msdata:rowOrder="0">
<category_code>P.../category_code>
<category_name>...</category_name>
<subcategory diffgr:id="subcategory1" msdata:rowOrder="0">
<category_code>...</category_code>
<subcategory_code>...</subcategory_code>
<subcategory_name>...</subcategory_name>
</subcategory>
....
How can I obtain all categories and subcategories data?
I am trying something like:
reader.xpath('//DataSet/diffgr:diffgram/alldata').each do |node|
But this gives me:
undefined method `xpath' for #<Nokogiri::XML::Reader:0x000001021d1750>
Nokogiri's Reader parser does not support XPath. Try using Nokogiri's in-memory Document parser instead.
On another note, to query xpath namespaces, you need to provide a namespace mapping, like this:
doc = Nokogiri::XML(my_document_string_or_io)
namespaces = {
'default' => 'http://www.atcomp.cz/webservices',
'diffgr' => 'urn:schemas-microsoft-com:xml-diffgram-v1'
}
doc.xpath('//default:DataSet/diffgr:diffgram/alldata', namespaces).each do |node|
# ...
end
Or you can remove the namespaces:
doc.remove_namespaces!
doc.xpath('//DataSet/diffgram/alldata').each { |node| }

Use of text() function when using xPath in dom4j

I have inherited an application that parses xml using dom4j and xPath:
The xml being parsed is similar to the following:
<cache>
<content>
<transaction>
<page>
<widget name="PAGE_ID">WRK_REGISTRATION</widget>
<widget name="TRANS_DETAIL_ID">77145</widget>
<widget name="GRD_ERRORS" />
</page>
<page>
<widget name="PAGE_ID">WRK_REGISTRATION</widget>
<widget name="TRANS_DETAIL_ID">77147</widget>
<widget name="GRD_ERRORS" />
</page>
<page>
<widget name="PAGE_ID">WRK_PROCESSING</widget>
<widget name="TRANS_DETAIL_ID">77152</widget>
<widget name="GRD_ERRORS" />
</page>
</transaction>
</content>
</cache>
Individual Nodes are being searched using the following:
String xPathToGridErrorNode = "//cache/content/transaction/page/widget[#name='PAGE_ID'][text()='WRK_DNA_REGISTRATION']/../widget[#name='TRANS_DETAIL_ID'][text()='77147']/../widget[#name='GRD_ERRORS_TEMP']";
org.dom4j.Element root = null;
SAXReader reader = new SAXReader();
Document document = reader.read(new BufferedInputStream(new ByteArrayInputStream(xmlToParse.getBytes())));
root = document.getRootElement();
Node gridNode = root.selectSingleNode(xPathToGridErrorNode);
where xmlToParse is a String of xml similar to the excerpt provided above.
The code is trying to obtain the GRD_ERROR node for the page with the PAGE_ID and TRANS_DETAIL_ID provided in the xPath.
I am seeing an intermittent (~1-2%) failure (returned node is null) of this selectSingleNode request even though the requested node is in the xml being searched.
I know there are some gotchas associated with using text()= in xPath and was wondering if there was a better way to format the xPath string for this type of search.
From your snippets, there is a problem regarding GRD_ERRORS vs. GRD_ERRORS_TMP and WRK_REGISTRATION vs. WRK_DNA_REGISTRATION.
Ignoring that, I would suggest to rewrite
//cache/content/transaction/page
/widget[#name='PAGE_ID'][text()='WRK_DNA_REGISTRATION']
/../widget[#name='TRANS_DETAIL_ID'][text()='77147']
/../widget[#name='GRD_ERRORS_TEMP']
as
//cache/content/transaction/page
[widget[#name='PAGE_ID'][text()='WRK_REGISTRATION']]
[widget[#name='TRANS_DETAIL_ID'][text()='77147']]
/widget[#name='GRD_ERRORS']
Just because it makes the code, in my eyes, easier to read, and expresses what you seem to mean more clearly: “the page element that has children with these conditions, and then take the widget with this #name.” Or, if that is closer to how you think about it,
//cache/content/transaction/page/widget[#name='GRD_ERRORS']
[preceding-sibling::widget[#name='PAGE_ID'][text()='WRK_REGISTRATION']]
[preceding-sibling::widget[#name='TRANS_DETAIL_ID'][text()='77147']]

Can't get XPathSelectElements to work with XElement

I am creating an in-memory Xml tree using XElement. Here is a sample of my xml:
<Curve>
<Function>createCurve</Function>
<Parameters>
<Input>
<BaseCurve>
<CurveType Type="String">16fSmoothCurve</CurveType>
<Ccy Type="String">USD</Ccy>
<Tenors>
<Item Type="String">1M</Item>
<Item Type="String">3M</Item>
<Item Type="String">1U</Item>
<Item Type="String">Z1</Item>
</Tenors>
<Rates>
<Item Type="String">.02123</Item>
<Item Type="String">.02214</Item>
<Item Type="String">.021234</Item>
<Item Type="String">.02674</Item>
</Rates>
</BaseCurve>
</Input>
</Parameters>
</Curve>
I am creating the xml by chaining together XElements. For example,
var root = new XElement("Curve",
new XElement("Function", "createCurve"),
new XElement("Parameters"), etc);
I would then like to query the XElement via XPath. For example,
var tenors = root.XPathSelectElements("//Tenors/Item");
var rates = root.XPathSelectElements("//Rates/Item");
I can successfully select a single element, for example,
var firstTenor = root.XPathSelectElement("//Tenors/Item");
var firstRate = root.XPathSelectElement("//Rates/Item");
However, trying to select multiple elements give me 0 results.
I've tried creating an XDocument and querying off of that however I get the same results. I've also tried adding an XDeclaration to the beginning of the tree but no luck.
Why can I not query multiple elements from my XElement tree?
Thanks!
Drew
Use XmlNodeList:
XmlNodeList nodesXml = root.SelectNodes("//Tenors/Item");
foreach (XmlNode item in nodList)
{
var tenors = item.InnerText;
}
That what I do, and it works perfect.

Resources