NSXMLDocument, nodesForXPath with namespaces - cocoa

I want to get a set of elements from a xml-file, but as soon the the elements involve namespaces, it fails.
This is a fragment of the xml file:
<gpx xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
version="1.0" creator="Groundspeak Pocket Query"
xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd http://www.groundspeak.com/cache/1/0 http://www.groundspeak.com/cache/1/0/cache.xsd"
xmlns="http://www.topografix.com/GPX/1/0">
<name>My Finds Pocket Query</name>
<desc>Geocache file generated by Groundspeak</desc>
<author>Groundspeak</author>
<email>contact#groundspeak.com</email>
<time>2010-09-15T16:18:55.9846906Z</time>
<keywords>cache, geocache, groundspeak</keywords>
<bounds minlat="41.89687" minlon="5.561883" maxlat="70.669967" maxlon="25.74735" />
<wpt lat="62.244933" lon="25.74735">
<time>2010-01-11T08:00:00Z</time>
<name>GC22W1T</name>
<desc>Kadonneet ja karanneet by ooti, Traditional Cache (1.5/2)</desc>
<url>http://www.geocaching.com/seek/cache_details.aspx?guid=4af28fe9-401b-44df-b058-5fd5399fc083</url>
<urlname>Kadonneet ja karanneet</urlname>
<sym>Geocache Found</sym>
<type>Geocache|Traditional Cache</type>
<groundspeak:cache id="1521507" available="True" archived="False" xmlns:groundspeak="http://www.groundspeak.com/cache/1/0">
<groundspeak:name>Kadonneet ja karanneet</groundspeak:name>
<groundspeak:placed_by>ooti</groundspeak:placed_by>
<groundspeak:owner id="816431">ooti</groundspeak:owner>
<groundspeak:type>Traditional Cache</groundspeak:type>
<groundspeak:container>Small</groundspeak:container>
<groundspeak:difficulty>1.5</groundspeak:difficulty>
<groundspeak:terrain>2</groundspeak:terrain>
<groundspeak:country>Finland</groundspeak:country>
<groundspeak:state>
</groundspeak:state>
<groundspeak:short_description html="True">
</groundspeak:short_description>
<groundspeak:encoded_hints>
</groundspeak:encoded_hints>
<groundspeak:travelbugs />
</groundspeak:cache>
</wpt>
</gpx>
I want to get all the grounspeak:cache elements, but neither //groundspeak:cache nor //cache seems to return anything.
NSArray *caches = [self.xml nodesForXPath:#"//cache" error:&error];
Any clue?
Edit: Are there any cocoa-based software out there, where I can load my xml and test different xpaths? I'm quite new to objective-c and cocoa, so it would be nice to check that it is really my xpath that is wrong..

This //cache means: a descendant element under no namespace (or empty namespace)
Your groundspeak:cache element is under a namespace URI http://www.groundspeak.com/cache/1/0.
So, if you can't declare a namespace-prefix binding (I think you can't with cocoa...), you could use this XPath expression:
//*[namespace-uri()='http://www.groundspeak.com/cache/1/0' and
local-name()='cache']
If you don't want to be so strict about namespace...
//*[local-name()='cache']
But this last is a bad practice, because you could end up selecting wrong nodes, and because when dealing with XML, your tool should support namespaces.
As proof, this stylesheet:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:copy-of select="//*[namespace-uri() =
'http://www.groundspeak.com/cache/1/0' and
local-name() = 'cache']"/>
</xsl:template>
</xsl:stylesheet>
Output:
<groundspeak:cache id="1521507" available="True" archived="False"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.topografix.com/GPX/1/0"
xmlns:groundspeak="http://www.groundspeak.com/cache/1/0">
<groundspeak:name>Kadonneet ja karanneet</groundspeak:name>
<groundspeak:placed_by>ooti</groundspeak:placed_by>
<groundspeak:owner id="816431">ooti</groundspeak:owner>
<groundspeak:type>Traditional Cache</groundspeak:type>
<groundspeak:container>Small</groundspeak:container>
<groundspeak:difficulty>1.5</groundspeak:difficulty>
<groundspeak:terrain>2</groundspeak:terrain>
<groundspeak:country>Finland</groundspeak:country>
<groundspeak:state></groundspeak:state>
<groundspeak:short_description html="True"></groundspeak:short_description>
<groundspeak:encoded_hints></groundspeak:encoded_hints>
<groundspeak:travelbugs />
</groundspeak:cache>

You need to add a new namespace attribute to the root node of your document, defining a prefix that you can use when querying the children:
NSXMLDocument *xmldoc = ...
NSXMLElement *namespace = [NSXMLElement namespaceWithName:#"mns" stringValue:#"http://mynamespaceurl.com/mynamespace"];
[xmldoc.rootElement addNamespace:namespace];
then when you query things later, you can use that prefix to refer to the namespace:
NSArray * caches = [xmldoc.rootElement nodesForXPath:#"//mns:caches" error:&error];

//groundspeak:cache should work. You might need a namespace-uri setting as well

Related

Ruby nokogiri attribute selector in XML file

this is the xml file:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<ns1:putResponse
xmlns:ns1="urn:DmsManagerClient">
<result xsi:type="xsd:string">
<?xml version="1.0" encoding="ISO-8859-1"?>
<MESSAGE ID="11c73b9e-687c-4300-baba-b743c26f7c83" TYPE="CUSDMS">
<DELIVERY>
<FROM>
<SENDER>0072000</SENDER>
<SERVICE>eService</SERVICE>
<DATE>2019-03-08T12:27:25</DATE>
</FROM>
<TO>
<DEALER DEALERCODE="0072000" MARKETCODE="1000"/>
</TO>
</DELIVERY>
<CONTENT>
<dms:ComplexResponse ErrorCode="430" ErrorDescription="null : PrivacyUE Mancante" Return="false"
xmlns:dms="http://dmsmanagerservice">
<dms:Element Name="DMSVERSION">2.7</dms:Element>
</dms:ComplexResponse>
</CONTENT>
</MESSAGE>
</result>
</ns1:putResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
I am coding with Ruby and I used Nokogiri and the method xpath to extrapole the "CONTENT" of the file
this is the code:
def extrapolate_error(xml)
doc = Nokogiri::XML(File.open(xml))
doc.xpath('//CONTENT')
end
and this is the result:
[#<Nokogiri::XML::Element:0x1c5ba78 name="CONTENT" children=[
#<Nokogiri::XML::Text:0x1c5b940 "\n">,
#<Nokogiri::XML::Element:0x1c5b8bc name="ComplexResponse" namespace=#<Nokogiri::XML::Namespace:0x1c5b88c prefix="dms" href="http://dmsmanagerservice">
attributes=[
#<Nokogiri::XML::Attr:0x1c5b874 name="ErrorCode" value="430">,
#<Nokogiri::XML::Attr:0x1c5b868 name="ErrorDescription" value="null : PrivacyUE Mancante">,
#<Nokogiri::XML::Attr:0x1c5b85c name="Return" value="false">]
children=[#<Nokogiri::XML::Text:0x1c5b118 "\n">,
#<Nokogiri::XML::Element:0x1c5b094 name="Element" namespace=#<Nokogiri::XML::Namespace:0x1c5b88c prefix="dms" href="http://dmsmanagerservice">
attributes=[#<Nokogiri::XML::Attr:0x1c5b058 name="Name" value="DMSVERSION">]
children=[#<Nokogiri::XML::Text:0x1c5abe4 "2.7">]>,
#<Nokogiri::XML::Text:0x1c5aaac "\n">]>,
#<Nokogiri::XML::Text:0x1c5a974 "\n">]>]
Now I need to enter in it and select some attributes.
In the specific I need this:
name="ErrorCode" value="430"
name="ErrorDescription" value="null : PrivacyUE Mancante"
I do not know how to procceed. Can you help me?
The following should work for you assuming the dms namespace is always the same
doc.xpath('//CONTENT/dms:ComplexResponse', dms: 'http://dmsmanagerservice')
.xpath('#ErrorCode | #ErrorDescription')
.each_with_object({}) do |e,obj|
obj[e.name] = e.text
end
#=> {"ErrorCode"=>"430", "ErrorDescription"=>"null : PrivacyUE Mancante"}
You already understand how you got to //CONTENT so from there we use dms:ComplexResponse to navigate deeper but since this is namespaced we have to provide the namespace reference e.g. dms: 'http://dmsmanagerservice'.
Then we select the attributes we are interested in #ErrorCode and #ErrorDescription.
In XPath the pipe | means UNION (think AND) so we want to select both.
Then we are just building a Hash using the name as the key and the text as the value.
XPath Cheatsheet - Useful resource if you need additional reference
Update
You asked about conditionals so this is what I would propose
ndoc = Nokogiri::XML(doc)
namespaces = ndoc.collect_namespaces
response = ndoc.xpath("//CONTENT/dms:ComplexResponse", namespaces)
if response.xpath("self::node()[#ErrorCode != '' and #ErrorDescription != '']").any?
response.xpath("#ErrorCode | #ErrorDescription")
.each_with_object({}) do |e,obj|
obj[e.name] = e.text
end
else
response.xpath('dms:Element/#Name | dms:Element/text()',namespaces)
.each_slice(2)
.map {|s| s.map(&:text)}.to_h
end
This checks to see if there is an ErrorCode and and ErrorDescription if so then Hash as originally proposed. If Not then it returns all the dms:Elements as a Hash so {"DMSVERSION"=>"2.7"} in this case Functional Example

How to use following in Xpath to get siblings in a Tag

I have following Structure: I am trying to build a robust method to extract the elements of FT1_19_0 of the FT1_19 Tag in the order they appear. However
in my results the elements are rearranged. How can i get my result in correct order.
//*/FT1_19/FT1_19_0[contains(../FT1_19_2,'I10') and
not(.=../following::FT1_19/FT1_19_0)]
The Result(Rearranged)
X50.0XXA
M76.891
M17.11
M23.303
<?xml version="1.0" encoding="UTF-8"?>
<root>
<FT1>
<FT1_1>1</FT1_1>
<FT1_4>20180920130000</FT1_4>
<FT1_5>20180924110101</FT1_5>
<FT1_6>CG</FT1_6>
<FT1_7>99203</FT1_7>
<FT1_9/>
<FT1_10>1.00</FT1_10>
<FT1_13>NPI</FT1_13>
<FT1_16>
<FT1_16_1>Gavin, Matthew, MD</FT1_16_1>
<FT1_16_3>22</FT1_16_3>
</FT1_16>
<FT1_19 NO="1">
<FT1_19_0>M76.891</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
<FT1_19 NO="2">
<FT1_19_0>M17.11</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
<FT1_19 NO="3">
<FT1_19_0>M23.303</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
<FT1_19 NO="4">
<FT1_19_0>X50.0XXA</FT1_19_0>
<FT1_19_2>I10</FT1_19_2>
</FT1_19>
</FT1>
</root>
Use this if you are using java:
List<WebElement> list = driver.findElements(By.xpath("//ft1_19//following::ft1_19_0"));
for(WebElement we:list) {
System.out.println(we.getText());
}

Seperate XML content from a single XML file using XQuery

I have a XML file which contains multiple XML nodes. I would like to separate two XML notes and store them in separate variables. How would I write this functionality with XQuery? I have added my XML file below. Inside the XML file I have a division root element, Dive and top-song are two child elements. Now I want to read the Dive XML content in one variable and top-song content in another variable. Can any one please help me to sort out this issue?
<?xml version="1.0" encoding="UTF-8"?>
<division>
<Dive ID="2"><!-- I want this node in one variable -->
<DiverFName>Joe</DiverFName>
<DiverLName>Diver</DiverLName>
<Number>2</Number>
<Divedate>1998-03-30</Divedate>
<Country ID="1">Bahamas</Country>
<City ID="2">Freeport</City>
<Place ID="2">
<Site>South Pass</Site>
<Lat>24.865062</Lat>
<Lon>-77.871094</Lon>
</Place>
<Divetime>36.00</Divetime>
<Depth Scale="METRIC">5.48</Depth>
<Buddy IDs="2" Names="Tim Diver" />
<Comments>Great dive, saw 5 Caribbean Reef Sharks. Performed compass navigation skills for Scuba Diver certification.</Comments>
<Water>Salt</Water>
<Entry>Boat</Entry>
<Divetype>Research</Divetype>
<Tanktype>Alu</Tanktype>
<Tanksize>11.43</Tanksize>
<PresS>179.26</PresS>
<PresE>82.73</PresE>
<Gas>Air</Gas>
<Weather>Clear</Weather>
<UWCurrent>Medium Current</UWCurrent>
<MarineLife>
<Animal>
<Type>Nurse Shark</Type>
<Abundance>1</Abundance>
<Size>3 ft</Size>
<Description>Dormant on the bottom, not swimming.</Description>
<Image>
<Filename></Filename>
<Path></Path>
<Caption></Caption>
</Image>
</Animal>
<Animal>
<Type>Blue Tang Surgeonfish</Type>
<Abundance>25+</Abundance>
<Size>4 in</Size>
<Description>Blue with white "scalpel" near base </descreption>
<Image>
<Filename></Filename>
<Path></Path>
<Caption></Caption>
</Image>
</Animal>
</MarineLife>
</Dive>
<top-song><!-- I want this node in another variable -->
<title >Try Again</title>
<artist >Aaliyah</artist>
<weeks last="2008-06-17">
<week>2008-06-17</week>
</weeks>
<album> The
Album</album>
<released>February 29, 20008</released>
<formats>
<format>CD</format>
<format>12 single</format>
</formats>
<recorded>january2012</recorded>
<genres>
<genre>R&B</genre>
</genres>
<lengths>
<length>4:04</length>
</lengths>
<label>Blackground</label>
<writers>
<writer></writer>
<writer></writer>
</writers>
<producers>
<producer></producer>
</producers>
<descr>
<p>hai hello</p>
</descr>
</top-song>
</division>
It's not clear what you're trying to accomplish on a high level, but you can select those elements with some simple XQuery/Xpath:
let $dive := doc('mydoc.xml')/division/Dive
let $top-song := doc('mydoc.xml')/division/top-song
However, just looking at the document it's clear that these two elements are in totally unrelated schemas, and as a general recommendation for MarkLogic, they should probably each be separated before ingestion and inserted as separate documents.

Unable to findnodes() restricted just to current parent

I'm parsing a simple XML file to create a flat text file from it. The desired outcome is shown below the sample XML. The XML has sort of a header-detail structure (Assembly_Info and Part respectively), with a unique header node followed by any number of detail record nodes, all of which are siblings. After digging into the elements under the header, I can't then find a way back 'up' to then pick up all the sibling detail nodes.
XML file looks like this:
<?xml version="1.0" standalone="yes" ?>
<Wrapper>
<Record>
<Product>
<prodid>4094</prodid>
</Product>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0000</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0455</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>045A</dev_name>
</Part>
</Assembly>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0002</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0457</dev_name>
</Part>
</Assembly>
</Record>
</Wrapper>
For each Assembly I need to read the values of the two elemenmets in Assembly_Info which I do successfully. But, I then want to read each of the Part records that are associated with the Assembly. The objective is to 'flatten' the file into this:
prodid id interface status dev_name
4094 DF-7A C N/A 0000
4094 DF-7A C Ready 0455
4094 DF-7A C Ready 045A
4094 DF-7A C N/A 0002
4094 DF-7A C Ready 0457
I'm attempting to use findnodes() to do this, as that's about the only tool I thought I understood. My code unfortunately reads all of the Part records from the entire file foreach Assembly--since the only way I've been able to find the Part nodes is to start at the root. I don't know how to change 'where I am', if you will; to tell findnodes to begin at current parent. Code looks like this:
my $parser = XML::LibXML -> new();
my $tree = $parser -> parse_file ('DEMO.XML');
for my $product ($tree->findnodes ('/Wrapper/Record/Product/prodid')) {
$prodid = $product->textContent();
}
foreach my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly')){
$assemblies++;
$parts = 0;
for my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly/Assembly_Info')) {
$id = $assembly->findvalue('id');
$interface = $assembly->findvalue('interface');
}
foreach my $part ($tree->findnodes ('/Wrapper/Record/Assembly/Part')) {
$parts++;
$status = $part->findvalue('status');
$dev_name = $part->findvalue('dev_name');
}
print "Assembly No: ", $assemblies, " Parts: ",$parts, "\n";
}
How do I get just the Part nodes for a given Assembly, after I've gone down to the Assembly_Info depths? There is quite a bit I'm not getting, and I think a problem may be that I'm thinking of this as 'navigating' or moving a cursor, if you will. Examples of XPath path expressions have not helped me.
Instead of always using $tree as the starting point for the findnodes method, you can use any other node, especially also child nodes. Then you could use a relative XPath expression. For example:
for my $record ($tree->findnodes('/Wrapper/Record')) {
for my $assembly ($record->findnodes('./Assembly')) {
for my $part ($assembly->findnodes('./Part')) {
}
}
}

Xpath get distinct nodes using preceding-sibling

I need to get distinct values //name() withount distinct-values(//*/name())
I tried do like this, but its dosent work.
//*/name()[.!=//preceding-sibling::*]
How can i repair it?
Using XPath 1.0, to get the distinct values
For name attribute,
/*/*[not(#name = preceding::*/#name)]
For node name,
/*/*[not(name() = preceding::*/name())]
My Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<friend1 name="abc"/>
<friend2 name="def"/>
<friend3 name="abc"/>
<friend1 name="abcd"/>
<friend5 name="abcd"/>
<friend6 name="xyz"/>
<friend8 name="789"/>
<friend0 name="pqr"/>
<friend9 name="lmn"/>
<friend2 name="lmn"/>
<friend5 name="123"/>
<friend7 name="456"/>
<friend12 name="789"/>
</root>

Resources