Issue when selecting an node with a specific name - xpath

I am trying to select the node Prp[#name='node name'] which has a parent name item20 using the XPath expression //Prp[#name='node name'and ../../../*[#name='item20']] but this works only if my file contains only this part of XML:
<Node name="item20">
<Node name="config">
<Node name="runmodeparams">
<Node name="simple">
<Prp name="filename" type="S" value="p"/>
<Prp name="filepath" type="S" value="r"/>
</Node>
<Prp name="activerunmode" type="S" value="Simple"/>
</Node>
<Prp name="node name" type="S" value="lastversion"/>
</Node>
If it also contains another part of the XML file like the following one, then XPath returns an empty result.
<Node name="item20">
<Node name="config">
<Node name="runmodeparams">
<Node name="simple">
<Prp name="filename" type="S" value="p"/>
<Prp name="filepath" type="S" value="r"/>
</Node>
<Prp name="activerunmode" type="S" value="Simple"/>
</Node>
<Prp name="node name" type="S" value="lastversion"/>
</Node>
</Node>
<Node name="item21">
<Node name="config">
<Node name="runmodeparams">
<Node name="simple">
<Prp name="filename" type="S" value="p"/>
<Prp name="filepath" type="S" value="r"/>
</Node>
<Prp name="activerunmode" type="S" value="Simple"/>
</Node>
<Prp name="node name" type="S" value="lastversion"/>
</Node>
</Node>
How can I properly select the node?

The second XML snippet you gave is no valid XML as it contains two root nodes. If this really is your full XML input, you should
fix it if possible, or somewhat wrap it in a single root node and
try to fetch some error message from your XPath engine.
I wrapped it in another element and your second XPath somewhat worked - but probably didn't return the expected result; both node name elements of item20 and item21 are returned as you're stepping out too far.
Anyway, you'd better check for "item20" in a predicate when stepping down the XML tree:
//Node[#name='item20']//Prp[#name='node name']
This not only limits to the node you're looking for, but also should be faster for most cases.
If performance really matters and the <Prp/> element you're looking for is always at the same position, try to avoid the descendant-or-self-steps // and provide a full distinct path, here it would be
//Node[#name='item20']/Prp[#name='node name']

Related

Grouping in XSLT 2.0 (grouping by text)

I have a problem figuring out this grouping in xslt:
The initial information:
<Application>
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8a].UniqueID" Value="bfd0b74d-2888-49d9-a986-df807f08ad8a" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8a].Filename" Value="Document 1 Test" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8a].URI" Value="https/.test.pdf" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8b].UniqueID" Value="bfd0b74d-2888-49d9-a986-df807f08ad8b" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8b].Filename" Value="Document 2 Test" />
<ApplicationItem LayoutPath="Attachments.Package.Attachment[bfd0b74d-2888-49d9-a986-df807f08ad8b].URI" Value="google.com" />
</Application>
The expected result:
<Package>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8a"
Filename="Document 1 Test"
URI="https/.test.pdf"/>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8b"
Filename="Document 2 Test"
URI="google.com"/>
<Package>
My code:
I've done the grouping by using the id from the square brackets.
<xsl:for-each-group select="ApplicationItem[contains(#LayoutPath,'Attachments.Package.Attachment')]" group-by="substring-before(substring-after(#LayoutPath, 'Attachments.Package.Attachment['), ']')">
<Attachment>
<xsl:for-each select="current-group()">
<xsl:attribute name="UniqueID" select="current-grouping-key()"/>
<xsl:attribute name="Filename" select=".[contains(#LayoutPath,'Filename')]/#Value"/>
<xsl:attribute name="URI" select=".[contains(#LayoutPath,'URI')]/#Value"/>
</xsl:for-each>
<Attachment>
</xsl:for-each-group>
My results:
<Package>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8a"
Filename=""
URI="https/.test.pdf"/>
<Attachment UniqueID="bfd0b74d-2888-49d9-a986-df807f08ad8b"
Filename=""
URI="google.com"/>
<Package>
What i need to change in code to use the grouping because for now is not working taking only the last ApplicationItem with the unique #LayoutPath.
I think the problem is with the grouping but don't now how to fix it.
Remove the <xsl:for-each select="current-group()"> and change
<xsl:attribute name="Filename" select=".[contains(#LayoutPath,'Filename')]/#Value"/>
<xsl:attribute name="URI" select=".[contains(#LayoutPath,'URI')]/#Value"/>
to
<xsl:attribute name="Filename" select="current-group()[contains(#LayoutPath,'Filename')]/#Value"/>
<xsl:attribute name="URI" select="current-group()[contains(#LayoutPath,'URI')]/#Value"/>

How Navigate XML with augeas xml

I have a very strange xml file that i need to update using augeas.
<root>
<node name="Client">
<node name="Attributes">
<info>
<test>
<entry><key>colour</key><value type="string">blue</value></entry>
</test>
</info>
</node>
</node>
<node name="Network">
<node name="Server">
<info>
<test>
<entry><key>transport</key><value type="string">internet</value></entry>
<entry><key>ipAddr</key><value type="string">125.125.125.142</value></entry>
<entry><key>portNo</key><value type="string">1234</value></entry>
<entry><key>protocolType</key><value type="string">tcp</value></entry>
</test>
</info>
</node>
</node>
</root>
I need to update the element "value" which is just after the element "key" which contains the text ipAddr.
Based on your description of the node you want to update, here's a suggestion:
set /files/path/to/your/file.xml//entry[key/#text="ipAddr"]/value/#text "255.255.255.0"
This selects the entry node at any level in the file, which has a key/#text subnode with value ipAddr and then it updates its value/#text subnode to have value 255.255.255.0.

Trouble fetching nodes

I'm new to XML/Nokogiri. I'm trying to fetch all the nodes with a certain name from an XML document someone else generated. The document looks like:
<taxonomy>
<taxonomy_name>World</taxonomy_name>
<node atlas_node_id = "val">
<node_name></node_name>
<node atlas_node_id = "val>
<node_name></node_name>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
</node>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
</node>
</taxonomy>
I want to pull ALL the nodes with the attribute atlas_node_id. In my build_files method I have the following line:
destinations = tax_file.xpath("//node")
where tax_file is previously set to point to the XML file.
The above returns what seems like ALL the nodes in the file and if I try to set destinations to tax_file.xpath("//node_name/node") then I get an empty NodeSet. Is there some way I can pull all the nodes with the attribute atlas_node_id?
I glanced through "Searching a XML/HTML Document" but didn't really see anything that could help. Am I missing something really obvious?
Update
After trying the solutions suggested by haradwaith and Alexey Shein - both solutions seem to fetch all the nodes as one large node? Testing in irb:
destinations = tax_file.xpath("//node[#atlas_node_id]") (OR)
destinations = tax_file.css('[atlas_node_id]')
d = destinations[0]
d.content
>> \n Africa\n \n South Africa\n \n Cape Town\n \n Table Mountain National Park\n \n \n \n Free State\n \n Bloemfontein\n \n \n \n Gauteng\n \n Johannesburg\n \n \n Pretoria\n \n \n \n KwaZulu-Natal\n \n Durban\n \n \n Pietermaritzburg\n \n \n \n Mpumalanga\n \n Kruger National Park\n \n \n \n The Drakensberg\n \n Royal Natal National Park\n \n \n \n The Garden Route\n \n Oudtshoorn\n \n \n Tsitsikamma Coastal National Park\n \n \n \n\nSudan\n\nEastern Sudan\n\nPort Sudan\n\n\n\nKhartoum\n\n\n\nSwaziland\n\n
Where I would have expected to see just 'Africa'. Any ideas as to why this is happening?
Just use the [] CSS selector:
xml = <<EOD
<taxonomy>
<taxonomy_name>World</taxonomy_name>
<node atlas_node_id = "val">
<node_name>Africa</node_name>
<node atlas_node_id = "val>
<node_name>Capetown</node_name>
</node>
</node>
</taxonomy>
EOD
tax_file = Nokogiri::XML(xml)
nodes = tax_file.css('[atlas_node_id] > node_name')
p nodes.first.text # => "Africa"
You can read short introduction to CSS selectors on MDN page.
Oh, it seems you didn't need the nodes with attribute atlas_node_id themselves, but their <node_name> children.
What code above is actually says is find all tags that have an attribute with name "atlas_node_id" and get all his immediate (i.e. 1 level deep) children with tag "node_name".
You can find an explanation of the XPath 1.0 syntax in the documentation.
To get all the nodes with an attribute atlas_node_id, you can do:
tax_file.xpath("//node[#atlas_node_id]")

XSLT - Sort by a a custom string set

I have an xml as follows
<feed>
<entry>
<id>4</id>
<updated>2012-11-18T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>3</id>
<updated>2011-01-16T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>2</id>
<updated>2014-12-01T16:55:54Z</updated>
<title>EXPIRED</title>
</entry>
<entry>
<id>1</id>
<updated>2013-01-12T16:55:54Z</updated>
<title>COMPLETED</title>
</entry>
<entry>
<id>1</id>
<updated>2012-01-09T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>1</id>
<updated>2011-04-18T16:55:54Z</updated>
<title>COMPLETED</title>
</entry>
</feed>
I want to sort by with ASSIGNED first, then followed by EXPIRED, and then COMPLETED.
If there are more than one entries in each of these categories, I would like to sort by updated value descending.
I can sort by updated descending using xsl:sort, but how do I sort based on a set of strings {ASSIGNED, EXPIRED, COMPLETED} in an order
Appreciate your response!
You can use a translate in the xsl:sort line to convert the first character of the strings "ASSIGNED", "EXPIRED", and "COMPLETED" into simple "1", "2", "3". Since the first characters of your strings are unique, that's all that it takes; it would be harder if there were two strings starting with an "A".
The following example forces a hardcoded <feed> (as the template match itself removes it) and uses an Identity Transform for all other elements.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/feed">
<feed>
<xsl:apply-templates select="entry">
<xsl:sort select="translate (title, 'AaEeCc', '112233')" />
<xsl:sort select="updated" />
</xsl:apply-templates>
</feed>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Searching an XML and getting a subset of the nodes as an XML

Given a search term, how to search the attributes of nodes in an XML and return the XML which contains only those nodes that match the term along with their parents all the way tracing to the root node.
Here is an example of the input XML:
<root>
<node name = "Amaths">
<node name = "Bangles"/>
</node>
<node name = "C">
<node name = "Dangles">
<node name = "E">
<node name = "Fangles"/>
</node>
</node>
<node name = "Gdecimals" />
</node>
<node name = "Hnumbers"/>
<node name = "Iangles"/>
</root>
The output I'm looking for the search term "angles":
<root>
<node name = "Amaths">
<node name = "Bangles"/>
</node>
<node name = "C">
<node name = "Dangles">
<node name = "E">
<node name = "Fangles"/>
</node>
</node>
</node>
<node name = "Iangles"/>
</root>
The XPath that I use to search the xml is "//*[contains(#name,'angles')]"
I'm using Nokogiri in Ruby to search the XML which provides me a NodeSet of all nodes that match the term. I cannot figure out how to construct back the XML from that set of nodes.
Thanks!
EDIT: Fixed the example should have been . Thanks Dimitre.
EDIT 2: Fixed the xml again for well-formedness.
First, do note that the presented wanted output is incorrect and the following element has no end tag later in the document:
<node name = "C">
The results of evaluating an XPath expressions can be a set of nodes from the XML document, but these notes can't be altered by XPath.
This XPath expression selects the
nodes that match the term along with
their parents all the way tracing to
the root node
//*[contains(#name,'angles') and not(node())]/ancestor::*
However, the nodes are not changed and they contain all their children, meaning that the complete subtree rooted in Root still is a the subtree of Root in the returned result.
In case you want to obtain a new document (set of nodes) with different structure than the original XML document, you have to use another language that is hosting XPath. There are many such languages, such as XSLT, XQuery and any language with an XML DOM implementation.
Here is an XSLT transformation, producing the wanted result:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(descendant-or-self::*[contains(#name, 'angles')])]"/>
</xsl:stylesheet>
when this transformation is applied on the provided XML document(corrected to be well-formed):
<root>
<node name = "Amaths">
<node name = "Bangles"/>
</node>
<node name = "C">
<node name = "Dangles">
<node name = "E">
<node name = "Fangles"/>
</node>
<node name = "Gdecimals" />
</node>
</node>
<node name = "Hnumbers"/>
<node name = "Iangles"/>
</root>
the wanted (correct) result is produced:
<root>
<node name="Amaths">
<node name="Bangles"/>
</node>
<node name="C">
<node name="Dangles">
<node name="E">
<node name="Fangles"/>
</node>
</node>
</node>
<node name="Iangles"/>
</root>

Resources