I have a very strange xml file that i need to update using augeas.
<root>
<node name="Client">
<node name="Attributes">
<info>
<test>
<entry><key>colour</key><value type="string">blue</value></entry>
</test>
</info>
</node>
</node>
<node name="Network">
<node name="Server">
<info>
<test>
<entry><key>transport</key><value type="string">internet</value></entry>
<entry><key>ipAddr</key><value type="string">125.125.125.142</value></entry>
<entry><key>portNo</key><value type="string">1234</value></entry>
<entry><key>protocolType</key><value type="string">tcp</value></entry>
</test>
</info>
</node>
</node>
</root>
I need to update the element "value" which is just after the element "key" which contains the text ipAddr.
Based on your description of the node you want to update, here's a suggestion:
set /files/path/to/your/file.xml//entry[key/#text="ipAddr"]/value/#text "255.255.255.0"
This selects the entry node at any level in the file, which has a key/#text subnode with value ipAddr and then it updates its value/#text subnode to have value 255.255.255.0.
Related
I'm new to XML/Nokogiri. I'm trying to fetch all the nodes with a certain name from an XML document someone else generated. The document looks like:
<taxonomy>
<taxonomy_name>World</taxonomy_name>
<node atlas_node_id = "val">
<node_name></node_name>
<node atlas_node_id = "val>
<node_name></node_name>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
</node>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
<node atlas_node_id = "val">
<node_name></node_name>
</node>
</node>
</taxonomy>
I want to pull ALL the nodes with the attribute atlas_node_id. In my build_files method I have the following line:
destinations = tax_file.xpath("//node")
where tax_file is previously set to point to the XML file.
The above returns what seems like ALL the nodes in the file and if I try to set destinations to tax_file.xpath("//node_name/node") then I get an empty NodeSet. Is there some way I can pull all the nodes with the attribute atlas_node_id?
I glanced through "Searching a XML/HTML Document" but didn't really see anything that could help. Am I missing something really obvious?
Update
After trying the solutions suggested by haradwaith and Alexey Shein - both solutions seem to fetch all the nodes as one large node? Testing in irb:
destinations = tax_file.xpath("//node[#atlas_node_id]") (OR)
destinations = tax_file.css('[atlas_node_id]')
d = destinations[0]
d.content
>> \n Africa\n \n South Africa\n \n Cape Town\n \n Table Mountain National Park\n \n \n \n Free State\n \n Bloemfontein\n \n \n \n Gauteng\n \n Johannesburg\n \n \n Pretoria\n \n \n \n KwaZulu-Natal\n \n Durban\n \n \n Pietermaritzburg\n \n \n \n Mpumalanga\n \n Kruger National Park\n \n \n \n The Drakensberg\n \n Royal Natal National Park\n \n \n \n The Garden Route\n \n Oudtshoorn\n \n \n Tsitsikamma Coastal National Park\n \n \n \n\nSudan\n\nEastern Sudan\n\nPort Sudan\n\n\n\nKhartoum\n\n\n\nSwaziland\n\n
Where I would have expected to see just 'Africa'. Any ideas as to why this is happening?
Just use the [] CSS selector:
xml = <<EOD
<taxonomy>
<taxonomy_name>World</taxonomy_name>
<node atlas_node_id = "val">
<node_name>Africa</node_name>
<node atlas_node_id = "val>
<node_name>Capetown</node_name>
</node>
</node>
</taxonomy>
EOD
tax_file = Nokogiri::XML(xml)
nodes = tax_file.css('[atlas_node_id] > node_name')
p nodes.first.text # => "Africa"
You can read short introduction to CSS selectors on MDN page.
Oh, it seems you didn't need the nodes with attribute atlas_node_id themselves, but their <node_name> children.
What code above is actually says is find all tags that have an attribute with name "atlas_node_id" and get all his immediate (i.e. 1 level deep) children with tag "node_name".
You can find an explanation of the XPath 1.0 syntax in the documentation.
To get all the nodes with an attribute atlas_node_id, you can do:
tax_file.xpath("//node[#atlas_node_id]")
I have an xml as follows
<feed>
<entry>
<id>4</id>
<updated>2012-11-18T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>3</id>
<updated>2011-01-16T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>2</id>
<updated>2014-12-01T16:55:54Z</updated>
<title>EXPIRED</title>
</entry>
<entry>
<id>1</id>
<updated>2013-01-12T16:55:54Z</updated>
<title>COMPLETED</title>
</entry>
<entry>
<id>1</id>
<updated>2012-01-09T16:55:54Z</updated>
<title>ASSIGNED</title>
</entry>
<entry>
<id>1</id>
<updated>2011-04-18T16:55:54Z</updated>
<title>COMPLETED</title>
</entry>
</feed>
I want to sort by with ASSIGNED first, then followed by EXPIRED, and then COMPLETED.
If there are more than one entries in each of these categories, I would like to sort by updated value descending.
I can sort by updated descending using xsl:sort, but how do I sort based on a set of strings {ASSIGNED, EXPIRED, COMPLETED} in an order
Appreciate your response!
You can use a translate in the xsl:sort line to convert the first character of the strings "ASSIGNED", "EXPIRED", and "COMPLETED" into simple "1", "2", "3". Since the first characters of your strings are unique, that's all that it takes; it would be harder if there were two strings starting with an "A".
The following example forces a hardcoded <feed> (as the template match itself removes it) and uses an Identity Transform for all other elements.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/feed">
<feed>
<xsl:apply-templates select="entry">
<xsl:sort select="translate (title, 'AaEeCc', '112233')" />
<xsl:sort select="updated" />
</xsl:apply-templates>
</feed>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I am trying to select the node Prp[#name='node name'] which has a parent name item20 using the XPath expression //Prp[#name='node name'and ../../../*[#name='item20']] but this works only if my file contains only this part of XML:
<Node name="item20">
<Node name="config">
<Node name="runmodeparams">
<Node name="simple">
<Prp name="filename" type="S" value="p"/>
<Prp name="filepath" type="S" value="r"/>
</Node>
<Prp name="activerunmode" type="S" value="Simple"/>
</Node>
<Prp name="node name" type="S" value="lastversion"/>
</Node>
If it also contains another part of the XML file like the following one, then XPath returns an empty result.
<Node name="item20">
<Node name="config">
<Node name="runmodeparams">
<Node name="simple">
<Prp name="filename" type="S" value="p"/>
<Prp name="filepath" type="S" value="r"/>
</Node>
<Prp name="activerunmode" type="S" value="Simple"/>
</Node>
<Prp name="node name" type="S" value="lastversion"/>
</Node>
</Node>
<Node name="item21">
<Node name="config">
<Node name="runmodeparams">
<Node name="simple">
<Prp name="filename" type="S" value="p"/>
<Prp name="filepath" type="S" value="r"/>
</Node>
<Prp name="activerunmode" type="S" value="Simple"/>
</Node>
<Prp name="node name" type="S" value="lastversion"/>
</Node>
</Node>
How can I properly select the node?
The second XML snippet you gave is no valid XML as it contains two root nodes. If this really is your full XML input, you should
fix it if possible, or somewhat wrap it in a single root node and
try to fetch some error message from your XPath engine.
I wrapped it in another element and your second XPath somewhat worked - but probably didn't return the expected result; both node name elements of item20 and item21 are returned as you're stepping out too far.
Anyway, you'd better check for "item20" in a predicate when stepping down the XML tree:
//Node[#name='item20']//Prp[#name='node name']
This not only limits to the node you're looking for, but also should be faster for most cases.
If performance really matters and the <Prp/> element you're looking for is always at the same position, try to avoid the descendant-or-self-steps // and provide a full distinct path, here it would be
//Node[#name='item20']/Prp[#name='node name']
I have a large collection of xml documents with a wide array of different tags in them. I need to change all tags of the form <foo> and turn them into tags of the form <field name="foo"> in a way that will also ignore the attributes of a given tag. That is, a tag of the form <foo id="bar"> should also be changed to the tag <field name="foo">.
In order for this transformation to work, I also need to distinguish between <foo> and </foo>, as </foo> must go to </field>.
I have played around with sed in a bash script, but to no avail.
Although sed is not ideal for this task (see comments; further reading: regular, context-free grammar and xml), it can be pressed into service. Try this one-liner:
sed -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
First it will replace all end tags with </field>, then replace every open tag first words with <field name="firstStoredWord">
This solution prints everything on the standard output. If you want to replace it in file directly when processing, try
sed -i -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
That makes from
<html>
<person>
but <person name="bob"> and <person name="tom"> would both become
</person>
this
<field name="html">
<field name="person">
but <field name="person"> and <field name="person"> would both become
</field>
Sed is the wrong tool for the job - a simple XSL Transform can do this much more reliably:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="foo">
<field name="foo">
<xsl:apply-templates/>
</field>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that unlike sed, it can handle short empty elements, newlines within tags (e.g. as produced by some tools), and just about anything that's well-formed XML. Here's my test file:
<?xml version="1.0"?>
<doc>
<section>
<foo>Plain foo, simple content</foo>
</section>
<foo attr="0">Foo with attr, with content
<bar/>
<foo attr="shorttag"/>
</foo>
<foo
attr="1"
>multiline</foo
>
<![CDATA[We mustn't transform <foo> in here!]]>
</doc>
which is transformed by the above (using xsltproc 16970175.xslt 16970175.xml) to:
<?xml version="1.0"?>
<doc>
<section>
<field name="foo">Plain foo, simple content</field>
</section>
<field name="foo">Foo with attr, with content
<bar/>
<field name="foo"/>
</field>
<field name="foo">multiline</field>
We mustn't transform <foo> in here!
</doc>
Given a search term, how to search the attributes of nodes in an XML and return the XML which contains only those nodes that match the term along with their parents all the way tracing to the root node.
Here is an example of the input XML:
<root>
<node name = "Amaths">
<node name = "Bangles"/>
</node>
<node name = "C">
<node name = "Dangles">
<node name = "E">
<node name = "Fangles"/>
</node>
</node>
<node name = "Gdecimals" />
</node>
<node name = "Hnumbers"/>
<node name = "Iangles"/>
</root>
The output I'm looking for the search term "angles":
<root>
<node name = "Amaths">
<node name = "Bangles"/>
</node>
<node name = "C">
<node name = "Dangles">
<node name = "E">
<node name = "Fangles"/>
</node>
</node>
</node>
<node name = "Iangles"/>
</root>
The XPath that I use to search the xml is "//*[contains(#name,'angles')]"
I'm using Nokogiri in Ruby to search the XML which provides me a NodeSet of all nodes that match the term. I cannot figure out how to construct back the XML from that set of nodes.
Thanks!
EDIT: Fixed the example should have been . Thanks Dimitre.
EDIT 2: Fixed the xml again for well-formedness.
First, do note that the presented wanted output is incorrect and the following element has no end tag later in the document:
<node name = "C">
The results of evaluating an XPath expressions can be a set of nodes from the XML document, but these notes can't be altered by XPath.
This XPath expression selects the
nodes that match the term along with
their parents all the way tracing to
the root node
//*[contains(#name,'angles') and not(node())]/ancestor::*
However, the nodes are not changed and they contain all their children, meaning that the complete subtree rooted in Root still is a the subtree of Root in the returned result.
In case you want to obtain a new document (set of nodes) with different structure than the original XML document, you have to use another language that is hosting XPath. There are many such languages, such as XSLT, XQuery and any language with an XML DOM implementation.
Here is an XSLT transformation, producing the wanted result:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(descendant-or-self::*[contains(#name, 'angles')])]"/>
</xsl:stylesheet>
when this transformation is applied on the provided XML document(corrected to be well-formed):
<root>
<node name = "Amaths">
<node name = "Bangles"/>
</node>
<node name = "C">
<node name = "Dangles">
<node name = "E">
<node name = "Fangles"/>
</node>
<node name = "Gdecimals" />
</node>
</node>
<node name = "Hnumbers"/>
<node name = "Iangles"/>
</root>
the wanted (correct) result is produced:
<root>
<node name="Amaths">
<node name="Bangles"/>
</node>
<node name="C">
<node name="Dangles">
<node name="E">
<node name="Fangles"/>
</node>
</node>
</node>
<node name="Iangles"/>
</root>