Use Groovy to Sort XML File - sorting

Is there a way I can sort an xml file base on certain attributes with Groovy?
Here's my xml
<List>
<Person name="fff"/>
<Person name="ggg">
<PhoneNum>
<AreaCode>555</AreaCode>
<Number>1234567</Number>
</PhoneNum>
</Person>
<Person name="eee"/>
<Person name="ccc"/>
<Person name="jjj"/>
<Person name="ddd">
<PhoneNum>
<AreaCode>555</AreaCode>
<Number>7654321</Number>
</PhoneNum>
</Person>
<Person name="aaa"/>
<Person name="bbb"/>
<Person name="ttt"/>
</List>
and I want the output to be
<List>
<Person name="aaa"/>
<Person name="bbb"/>
<Person name="ccc"/>
<Person name="ddd">
<PhoneNum>
<AreaCode>555</AreaCode>
<Number>7654321</Number>
</PhoneNum>
</Person>
<Person name="eee"/>
<Person name="fff"/>
<Person name="ggg">
<PhoneNum>
<AreaCode>555</AreaCode>
<Number>1234567</Number>
</PhoneNum>
</Person>
<Person name="jjj"/>
<Person name="ttt"/>
</List>
I've looked into XMLSlurper but I can't quite seem to figure out how to do this.

Here's a modification to #dmahapatro's answer that preserves the nested node structure.
import groovy.xml.MarkupBuilder
String xml = '''
<List>
<Person name="fff"/>
<Person name="ggg">
<PhoneNum>
<AreaCode>555</AreaCode>
<Number>1234567</Number>
</PhoneNum>
</Person>
<Person name="eee"/>
<Person name="ccc"/>
<Person name="jjj"/>
<Person name="ddd">
<PhoneNum>
<AreaCode>555</AreaCode>
<Number>7654321</Number>
</PhoneNum>
</Person>
<Person name="aaa"/>
<Person name="bbb"/>
<Person name="ttt"/>
</List>
'''
def rootNode = new XmlParser().parseText(xml)
rootNode.children().sort(true) {it.attribute('name')}
new XmlNodePrinter().print(rootNode)
Here's what's going on:
Using XmlParser instead of XmlSlurper generates nodes that can be printed using XmlNodePrinter.
The children of the node are sorted by name using sort {it.attribute('name')}
The true attribute to sort mutates the underlying list, which reorders the child nodes.
The XmlNodePrinter prints the re-sorted xml document to standard out.

I think there can be a groovier way than this. But this should work on a Friday. :-)
import groovy.xml.MarkupBuilder
def xml = '''<List>
<Person name="fff"/>
<Person name="eee"/>
<Person name="ccc"/>
<Person name="jjj"/>
<Person name="aaa"/>
<Person name="bbb"/>
<Person name="ttt"/>
</List>'''
def rootNode = new XmlSlurper().parseText(xml)
def writer = new StringWriter()
def mkp = new MarkupBuilder(writer)
mkp.List{
rootNode.Person.#name.list()*.toString().sort().each{
Person(name: it)
}
}
println writer

Related

XQUERY return result in format <!ELEMENT dynosaur ($dynosaurName) +>

I am using an XQUERY FLWOR expression into an .xml file:
for $dynosaur in doc("document.xml")//species
let $dynosaurName := $dynosaur/text() // keeping the dynosaurName as a variable
return $dynosaur
The above return results like:
<species age="84">Velociraptor</species>
I need to format the result to be like:
<!ELEMENT dynosaur (Velociraptor) +>
so i am trying using the below but not working...
return <!ELEMENT dynosaur ({$dynosaurName}) +> //here i want that format but it return error
And the xml file:
<?xml version="1.0"?>
<dinosauria>
<group>
<name>saurishia</name>
<subgroups>
<group>
<name>theropoda</name>
<subgroups>
<group>
<name>carnosaurs</name>
<speciesList>
<species age="74">Dyptosaurus</species>
<species age="170">Megalosaurus</species>
<species age="67">Tyrannosaurus</species>
</speciesList>
</group>
<group>
<name>coelurosauria</name>
<speciesList>
<species age="84">Velociraptor</species>
<species age="110">Deinonychus</species>
<species age="228">Eoraptor</species>
</speciesList>
</group>
</subgroups>
</group>
<group>
<name>sauropodomorpha</name>
<subgroups>
<group>
<name>sauropods</name>
<speciesList>
<species age="155">Brachiosaurus</species>
<species age="155">Camarasaurus</species>
</speciesList>
</group>
</subgroups>
</group>
</subgroups>
</group>
<group>
<name>ornithishia</name>
<subgroups></subgroups>
</group>
</dinosauria>
Finally:
I can't find any way to return that type of result . I have checked a lot of links considering and this book : http://www.datypic.com/books/xquery/chapter09.html
Could you try outputting text?
return concat("<!ELEMENT dynosaur (",$dynosaurName, ") +>")

awk : parse and write to another file

I have records in XML file like below. I need to search for <keyword>SEARCH</keyword> and if present
then I need to take the entire record and write to another file.(starting from <record> to </record>)
Below is my awk code which is inside loop. $1 holds line by line value of each record.
if(index($1,"SEARCH")>0)
{
print $1>> "output.txt"
}
This logic has two problems,
It is writing to output.txt file, only <keyword>SEARCH</keyword> element and not the whole record(starting from <record> to </record>)
SEARCH can also be present in <detail> tag. This code will even write that tag to output.txt
XML File:
<record category="xyz">
<person ssn="" e-i="E">
<title xsi:nil="true"/>
<position xsi:nil="true"/>
<names>
<first_name/>
<last_name></last_name>
<aliases>
<alias>CDP</alias>
</aliases>
<keywords>
<keyword xsi:nil="true"/>
<keyword>SEARCH</keyword>
</keywords>
<external_sources>
<uri>http://www.google.com</uri>
<detail>SEARCH is present in abc for xyz reason</detail>
</external_sources>
</details>
</record>
<record category="abc">
<person ssn="" e-i="F">
<title xsi:nil="true"/>
<position xsi:nil="true"/>
<names>
<first_name/>
<last_name></last_name>
<aliases>
<alias>CDP</alias>
</aliases>
<keywords>
<keyword xsi:nil="true"/>
<keyword>DONTSEARCH</keyword>
</keywords>
<external_sources>
<uri>http://www.google.com</uri>
<detail>SEARCH is not present in abc for xyz reason</detail>
</external_sources>
</details>
</record>
Use GNU awk for multi-char RS:
$ awk -v RS='</record>\n' '{ORS=RT} /<keyword>SEARCH<\/keyword>/' file
<record category="xyz">
<person ssn="" e-i="E">
<title xsi:nil="true"/>
<position xsi:nil="true"/>
<names>
<first_name/>
<last_name></last_name>
<aliases>
<alias>CDP</alias>
</aliases>
<keywords>
<keyword xsi:nil="true"/>
<keyword>SEARCH</keyword>
</keywords>
<external_sources>
<uri>http://www.google.com</uri>
<detail>SEARCH is present in abc for xyz reason</detail>
</external_sources>
</details>
</record>
If you need to search for any of multiple keywords then simply list them as such:
$ awk -v RS='</record>\n' '{ORS=RT} /<keyword>(SEARCH1|SEARCH2|SEARCH3)<\/keyword>/' file
$ cat x.awk
/<record / { i=1 }
i { a[i++]=$0 }
/<\/record>/ {
if (found) {
for (i=1; i<=length(a); ++i) print a[i] > "output.txt"
}
i=0;
found=0
}
/<keyword>SEARCH<\/keyword>/ { found=1 }
$ awk -f x.awk x.xml
$ cat output.txt
<record category="xyz">
<person ssn="" e-i="E">
<title xsi:nil="true"/>
<position xsi:nil="true"/>
<names>
<first_name/>
<last_name></last_name>
<aliases>
<alias>CDP</alias>
</aliases>
<keywords>
<keyword xsi:nil="true"/>
<keyword>SEARCH</keyword>
</keywords>
<external_sources>
<uri>http://www.google.com</uri>
<detail>SEARCH is present in abc for xyz reason</detail>
</external_sources>
</details>
</record>
You seem to have cross posted this question from Unix & Linux - I give the same answer here as I did there:
I'm going to assume that what you've posted is a sample, because it isn't valid XML. If this assumption isn't valid, my answer doesn't hold... but if that is the case, you really need to hit the person who gave you the XML with a rolled up copy of the XML spec, and demand they 'fix it'.
But really - awk and regular expressions are not the right tool for the job. An XML parser is. And with a parser, it's absurdly simple to do what you want:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
#parse your file - this will error if it's invalid.
my $twig = XML::Twig -> new -> parsefile ( 'your_xml' );
#set output format. Optional.
$twig -> set_pretty_print('indented_a');
#iterate all the 'record' nodes off the root.
foreach my $record ( $twig -> get_xpath ( './record' ) ) {
#if - beneath this record - we have a node anywhere (that's what // means)
#with a tag of 'keyword' and content of 'SEARCH'
#print the whole record.
if ( $record -> get_xpath ( './/keyword[string()="SEARCH"]' ) ) {
$record -> print;
}
}
xpath is quite a lot like regular expressions - in some ways - but it's more like a directory path. That means it's context aware, and can handle XML structures.
In the above: ./ means 'below current node' so:
$twig -> get_xpath ( './record' )
Means any 'top level' <record> tags.
But .// means "at any level, below current node" so it'll do it recursively.
$twig -> get_xpath ( './/search' )
Would get any <search> nodes at any level.
And the square brackets denote a condition - that's either a function (e.g. text() to get the text of the node) or you can use an attribute. e.g. //category[#name] would find any category with a name attribute, and //category[#name="xyz"] would filter those further.
XML used for testing:
<XML>
<record category="xyz">
<person ssn="" e-i="E">
<title xsi:nil="true"/>
<position xsi:nil="true"/>
<details>
<names>
<first_name/>
<last_name></last_name>
</names>
<aliases>
<alias>CDP</alias>
</aliases>
<keywords>
<keyword xsi:nil="true"/>
<keyword>SEARCH</keyword>
</keywords>
<external_sources>
<uri>http://www.google.com</uri>
<detail>SEARCH is present in abc for xyz reason</detail>
</external_sources>
</details>
</person>
</record>
<record category="abc">
<person ssn="" e-i="F">
<title xsi:nil="true"/>
<position xsi:nil="true"/>
<details>
<names>
<first_name/>
<last_name></last_name>
</names>
<aliases>
<alias>CDP</alias>
</aliases>
<keywords>
<keyword xsi:nil="true"/>
<keyword>DONTSEARCH</keyword>
</keywords>
<external_sources>
<uri>http://www.google.com</uri>
<detail>SEARCH is not present in abc for xyz reason</detail>
</external_sources>
</details>
</person>
</record>
</XML>
Output:
<record category="xyz">
<person
e-i="E"
ssn="">
<title xsi:nil="true" />
<position xsi:nil="true" />
<details>
<names>
<first_name/>
<last_name></last_name>
</names>
<aliases>
<alias>CDP</alias>
</aliases>
<keywords>
<keyword xsi:nil="true" />
<keyword>SEARCH</keyword>
</keywords>
<external_sources>
<uri>http://www.google.com</uri>
<detail>SEARCH is present in abc for xyz reason</detail>
</external_sources>
</details>
</person>
</record>
Note - the above just prints the record to STDOUT. That's actually... in my opinion, not such a great idea. Not least because - it doesn't print the XML structure, and so it isn't actually 'valid' XML if you've more than one record (there's no "root" node).
So I would instead - to accomplish exactly what you're asking:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ('your_file.xml');
$twig -> set_pretty_print('indented_a');
foreach my $record ( $twig -> get_xpath ( './record' ) ) {
if ( not $record -> findnodes ( './/keyword[string()="SEARCH"]' ) ) {
$record -> delete;
}
}
open ( my $output, '>', "output.txt" ) or die $!;
print {$output} $twig -> sprint;
close ( $output );
This instead - inverts the logic, and deletes (from the parsed data structure in memory) the records you don't want, and prints the whole new structure (including XML headers) to a new file called "output.txt".

XPath: Select element which has specific attribute with special character

<document>
<element>
<attribut a:name="my-name">My Name</attribut>
<attribut a:parent="parent1">Parent 1</attribut>
</element>
</document>
In this XML document, how to select the node which has the attribut a:name ?
$xmlTest = <<<XML
<?xml version="1.0" encoding="UTF-8" ?>
<document xmlns:a="http://example.org/a">
<element>
<attribut a:name="my-name">My Name</attribut>
<attribut a:parent="parent1">Parent 1</attribut>
</element>
</document>
XML;
$xml = new SimpleXMLElement($xmlTest);
echo current($xml->xpath('//element/attribut[#a:name]'));

How to find sibling group with Nokogiri

For example I have this XML:
<root>
<group>
<person gender="male" name="Daniel" />
</group>
<group>
<person gender="male" name="Peter" />
<person gender="female" name="Claudia" />
</group>
<group>
<person gender="female" name="Andrea" />
</group>
</root>
I want to find only groups that have a male and a female person. I just want to find:
<group>
<person gender="male" name="Peter" />
<person gender="female" name="Claudia" />
</group>
Because inside that group there is a male and a female.
I don't want to find:
<group>
<person gender="female" name="Andrea" />
</group>
<group>
<person gender="male" name="Daniel" />
</group>
I'm not entirely familiar with Nokogiri, but I do know xpath. If you want to select the group with male and females only you can do this
//group[person/#gender='male' and person/#gender = 'female']
It should return
<group>
<person gender="male" name="Peter"/>
<person gender="female" name="Claudia"/>
</group>

Referencing specific element(s) in a RelaxNG schema with externalRef

So I have one RelaxNG schema that references another:
<define name="review">
<element name="review">
<externalRef href="other.rng"/>
</element>
</define>
other.rng:
<start>
<choice>
<ref name="good"/>
<ref name="bad"/>
</choice>
</start>
<define name="good">
<element name="good"/>
</define>
<define name="bad">
<element name="bad"/>
</define>
Is there any way I can import only <good>, but not allow <bad>? The goal being:
<review><good/></review>: valid
<review><bad/></review>: invalid
The grammar you import with externalRef can't be modified. To achieve the kind of validation you're after, I see this method :
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<include href="other.rng">
<start combine="choice">
<ref name="review"/>
</start>
</include>
<define name="review">
<element name="review">
<ref name="good"/>
</element>
</define>
</grammar>
You include the other schema.
You override the start element in the include (good and bad elements won't be possible root).
The specification says :
If the include element has a start component, then all start
components are removed from the grammar element.
You make a reference to the good element in your review definition.

Resources