Sorting filesets numerically in Ant - sorting

I have a folder structure like this:
wsdl/v1,----,v11
I need to copy all of its files to a new folder called "latestVersion" and I need to maintain the copying order from v1 to v11. So to do that, I need to sort the directories by name while copying. My code is like this:
<copy todir="${srcdist.layout.dir}/etc/wsdl/latestVersion" flatten="true" overwrite="true" verbose="true">
<sort>
<fileset dir="../../sdk/etc/wsdl">
<include name="**/*.wsdl"/>
</fileset>
</sort>
</copy>
I would like the copying to start from v1 and end in v11. However, it copies like this:
v1,v10,v11,v2,v3,v4,v5,v6,v7,v8,v9
How, instead, do I get Ant to copy like:
v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,v11

Ant is sorting correctly, since v10 comes before v2 lexicographically (the sorting comparator compares characters one by one).
In order to have v2 before v11 you have to write a custom comparator (the list of built-in comparators in the documentation is not enough). In other words, you have to write a class that implements the org.apache.tools.ant.types.resources.comparators.ResourceComparator class, add your class to the classpath and declare it as a typedef in your Ant script:
<typedef name="my_custom_sort" classname="com.example.MyCustomResourceComparator" />

You can use JavaScript embedded in an Ant script to numerically sort the directory names.
Then, you can use <for> task from the third-party Ant-Contrib library to copy from the sorted directories one at a time:
<dirset id="wsdl.dirs" dir="../../sdk/etc/wsdl" includes="v*"/>
<script language="javascript">
<![CDATA[
var dirSet = project.getReference( "wsdl.dirs" );
var ds = dirSet.getDirectoryScanner( project );
var includes = ds.getIncludedDirectories();
var versions = [];
for ( var i = 0; i < includes.length; i++ ) {
var dirname = includes[i]
// chop off the "v" from the front
var dirVersion = dirname.substr(1);
versions.push( dirVersion );
}
versionsSorted = versions.sort(function (a, b) {
return a - b;
});
// the "list" of <for> takes a comma-delimited string
project.setProperty( "versions", versionsSorted.join( ',' ) );
]]>
</script>
<echo>sorted versions: ${versions}</echo>
<for list="${versions}" param="version">
<sequential>
<copy todir="${srcdist.layout.dir}/etc/wsdl/latestVersion">
<fileset dir="../../sdk/etc/wsdl/v#{version}" includes="**/*.wsdl"/>
</copy>
</sequential>
</for>

Related

Seperate XML content from a single XML file using XQuery

I have a XML file which contains multiple XML nodes. I would like to separate two XML notes and store them in separate variables. How would I write this functionality with XQuery? I have added my XML file below. Inside the XML file I have a division root element, Dive and top-song are two child elements. Now I want to read the Dive XML content in one variable and top-song content in another variable. Can any one please help me to sort out this issue?
<?xml version="1.0" encoding="UTF-8"?>
<division>
<Dive ID="2"><!-- I want this node in one variable -->
<DiverFName>Joe</DiverFName>
<DiverLName>Diver</DiverLName>
<Number>2</Number>
<Divedate>1998-03-30</Divedate>
<Country ID="1">Bahamas</Country>
<City ID="2">Freeport</City>
<Place ID="2">
<Site>South Pass</Site>
<Lat>24.865062</Lat>
<Lon>-77.871094</Lon>
</Place>
<Divetime>36.00</Divetime>
<Depth Scale="METRIC">5.48</Depth>
<Buddy IDs="2" Names="Tim Diver" />
<Comments>Great dive, saw 5 Caribbean Reef Sharks. Performed compass navigation skills for Scuba Diver certification.</Comments>
<Water>Salt</Water>
<Entry>Boat</Entry>
<Divetype>Research</Divetype>
<Tanktype>Alu</Tanktype>
<Tanksize>11.43</Tanksize>
<PresS>179.26</PresS>
<PresE>82.73</PresE>
<Gas>Air</Gas>
<Weather>Clear</Weather>
<UWCurrent>Medium Current</UWCurrent>
<MarineLife>
<Animal>
<Type>Nurse Shark</Type>
<Abundance>1</Abundance>
<Size>3 ft</Size>
<Description>Dormant on the bottom, not swimming.</Description>
<Image>
<Filename></Filename>
<Path></Path>
<Caption></Caption>
</Image>
</Animal>
<Animal>
<Type>Blue Tang Surgeonfish</Type>
<Abundance>25+</Abundance>
<Size>4 in</Size>
<Description>Blue with white "scalpel" near base </descreption>
<Image>
<Filename></Filename>
<Path></Path>
<Caption></Caption>
</Image>
</Animal>
</MarineLife>
</Dive>
<top-song><!-- I want this node in another variable -->
<title >Try Again</title>
<artist >Aaliyah</artist>
<weeks last="2008-06-17">
<week>2008-06-17</week>
</weeks>
<album> The
Album</album>
<released>February 29, 20008</released>
<formats>
<format>CD</format>
<format>12 single</format>
</formats>
<recorded>january2012</recorded>
<genres>
<genre>R&B</genre>
</genres>
<lengths>
<length>4:04</length>
</lengths>
<label>Blackground</label>
<writers>
<writer></writer>
<writer></writer>
</writers>
<producers>
<producer></producer>
</producers>
<descr>
<p>hai hello</p>
</descr>
</top-song>
</division>
It's not clear what you're trying to accomplish on a high level, but you can select those elements with some simple XQuery/Xpath:
let $dive := doc('mydoc.xml')/division/Dive
let $top-song := doc('mydoc.xml')/division/top-song
However, just looking at the document it's clear that these two elements are in totally unrelated schemas, and as a general recommendation for MarkLogic, they should probably each be separated before ingestion and inserted as separate documents.

trying to parse specific data using xpath

I have a small xml file that I'm trying to grab the away_team first and then the home_team second.
/game/team/statistics/#goals gives me the data I want but I need to reverse the order. So I'm trying to understand how to get the away_team goals first, followed by the home_team.
Below is the file
<game id="f24275a9-4f30-4a81-abdf-d16a9aeda087" status="closed" coverage="full" home_team="4416d559-0f24-11e2-8525-18a905767e44" away_team="44167db4-0f24-11e2-8525-18a905767e44" scheduled="2013-10-10T23:00:00+00:00" attendance="18210" start_time="2013-10-10T23:08:00+00:00" end_time="2013-10-11T01:32:00+00:00" clock="00:00" period="3" xmlns="http://feed.elasticstats.com/schema/hockey/game-v2.0.xsd">
<venue id="bd7b42fa-19bb-4b91-8615-214ccc3ff987" name="First Niagara Center" capacity="18690" address="One Seymour H. Knox III Plaza" city="Buffalo" state="NY" zip="14203" country="USA"/>
<team name="Sabres" market="Buffalo" id="4416d559-0f24-11e2-8525-18a905767e44" points="1">
<scoring>
<period number="1" sequence="1" points="1"/>
<period number="2" sequence="2" points="0"/>
<period number="3" sequence="3" points="0"/>
</scoring>
<statistics goals="1" assists="2" penalties="7" penalty_minutes="23" team_penalties="0" team_penalty_minutes="0" shots="27" blocked_att="14" missed_shots="8" hits="25" giveaways="5" takeaways="10" blocked_shots="7" faceoffs_won="22" faceoffs_lost="28" powerplays="1" faceoffs="50" faceoff_win_pct="44.0" shooting_pct="3.7" points="3">
<powerplay faceoffs_won="2" faceoffs_lost="0" shots="0" goals="0" missed_shots="1" assists="0" faceoff_win_pct="100.0" faceoffs="2"/>
<shorthanded faceoffs_won="3" faceoffs_lost="3" shots="1" goals="0" missed_shots="0" assists="0" faceoffs="6" faceoff_win_pct="50.0"/>
<evenstrength faceoff_win_pct="40.5" missed_shots="7" goals="1" faceoffs_won="17" shots="26" faceoffs="42" faceoffs_lost="25" assists="2"/>
<penalty shots="0" goals="0" missed_shots="0"/>
</statistics>
<shootout shots="0" missed_shots="0" goals="0" shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<goaltending shots_against="33" goals_against="4" saves="29" saves_pct="0.879" total_shots_against="33" total_goals_against="4">
<powerplay shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<shorthanded shots_against="7" goals_against="0" saves="7" saves_pct="1.0"/>
<evenstrength goals_against="4" saves_pct="0.846" shots_against="26" saves="22"/>
<penalty shots_against="0" goals_against="0" saves="0" saves_pct="0"/>
<emptynet goals_against="0" shots_against="0">
<powerplay goals_against="0"/>
<shorthanded goals_against="0"/>
<evenstrength goals_against="0"/>
</emptynet>
</goaltending>
Here's an XPath 2.0 expression that should do what you asked, yielding a sequence of two elements:
(/game/team[#id = /game/#home_team]/statistics/#goals,
/game/team[#id = /game/#away_team]/statistics/#goals)
Credit to #Ian for sleuthing out the details of the question.
In XPath 1.0, you could concatenate string data from the two teams in whatever order you want:
concat(/game/team[#id = /game/#home_team]/statistics/#goals, ' ',
/game/team[#id = /game/#away_team]/statistics/#goals)
But as Ian said, you can't produce a nodeset with an order different from document order. (I don't think a nodeset has any intrinsic order at all... it's how it's processed that imposes an order.)
Update:
As Ian pointed out, your XML data is in a namespace, thanks to the default namespace declaration on <game>. Since you said that "/game/team/statistics/#goals gives me the data", I'm assuming that you've already taken care of this aspect of the problem, perhaps by declaring the default namespace in your XPath execution environment.

Unable to findnodes() restricted just to current parent

I'm parsing a simple XML file to create a flat text file from it. The desired outcome is shown below the sample XML. The XML has sort of a header-detail structure (Assembly_Info and Part respectively), with a unique header node followed by any number of detail record nodes, all of which are siblings. After digging into the elements under the header, I can't then find a way back 'up' to then pick up all the sibling detail nodes.
XML file looks like this:
<?xml version="1.0" standalone="yes" ?>
<Wrapper>
<Record>
<Product>
<prodid>4094</prodid>
</Product>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0000</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0455</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>045A</dev_name>
</Part>
</Assembly>
<Assembly>
<Assembly_Info>
<id>DF-7A</id>
<interface>C</interface>
</Assembly_Info>
<Part>
<status>N/A</status>
<dev_name>0002</dev_name>
</Part>
<Part>
<status>Ready</status>
<dev_name>0457</dev_name>
</Part>
</Assembly>
</Record>
</Wrapper>
For each Assembly I need to read the values of the two elemenmets in Assembly_Info which I do successfully. But, I then want to read each of the Part records that are associated with the Assembly. The objective is to 'flatten' the file into this:
prodid id interface status dev_name
4094 DF-7A C N/A 0000
4094 DF-7A C Ready 0455
4094 DF-7A C Ready 045A
4094 DF-7A C N/A 0002
4094 DF-7A C Ready 0457
I'm attempting to use findnodes() to do this, as that's about the only tool I thought I understood. My code unfortunately reads all of the Part records from the entire file foreach Assembly--since the only way I've been able to find the Part nodes is to start at the root. I don't know how to change 'where I am', if you will; to tell findnodes to begin at current parent. Code looks like this:
my $parser = XML::LibXML -> new();
my $tree = $parser -> parse_file ('DEMO.XML');
for my $product ($tree->findnodes ('/Wrapper/Record/Product/prodid')) {
$prodid = $product->textContent();
}
foreach my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly')){
$assemblies++;
$parts = 0;
for my $assembly ($tree->findnodes ('/Wrapper/Record/Assembly/Assembly_Info')) {
$id = $assembly->findvalue('id');
$interface = $assembly->findvalue('interface');
}
foreach my $part ($tree->findnodes ('/Wrapper/Record/Assembly/Part')) {
$parts++;
$status = $part->findvalue('status');
$dev_name = $part->findvalue('dev_name');
}
print "Assembly No: ", $assemblies, " Parts: ",$parts, "\n";
}
How do I get just the Part nodes for a given Assembly, after I've gone down to the Assembly_Info depths? There is quite a bit I'm not getting, and I think a problem may be that I'm thinking of this as 'navigating' or moving a cursor, if you will. Examples of XPath path expressions have not helped me.
Instead of always using $tree as the starting point for the findnodes method, you can use any other node, especially also child nodes. Then you could use a relative XPath expression. For example:
for my $record ($tree->findnodes('/Wrapper/Record')) {
for my $assembly ($record->findnodes('./Assembly')) {
for my $part ($assembly->findnodes('./Part')) {
}
}
}

Distinct Result via xQuery

I'm trying to get reviewers who review one or more books published after 2010.
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return {$r/Reviewer}
The following are both XML files.
review.xml:
<Reviews>
<Review>
<ReviewID>R1</ReviewID>
<BookTitle>B1</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Review>
<ReviewID>R2</ReviewID>
<BookTitle>B1</BookTitle>
<Reviewer>BBB</Reviewer>
</Review>
<Review>
<ReviewID>R3</ReviewID>
<BookTitle>B2</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Review>
<ReviewID>R4</ReviewID>
<BookTitle>B3</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Reviews>
book.xml:
<Books>
<Book>
<Title>B1</Title>
<Year>2005</Year>
</Book>
<Book>
<Title>B2</Title>
<Year>2011</Year>
</Book>
<Book>
<Title>B3</Title>
<Year>2012</Year>
</Book>
</Books>
I'll get two AAA by my xQuery code. I was wondering if I can get the distinct result, which means only one AAA. I've tried distinct-value() but don't know how to use it probably. Thanks for your reply!
----My Updated Solution with XML format for xQuery 1.0----
<root>
{
for $x in distinct-values
(
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return {$r/Reviewer}
)
return <reviewer>{$x}</reviewer>
}
</root>
To preserve nodes, you can use the "group by" clause and select the first item of a group sequence:
for $r in doc("review.xml")//Review,
$b in doc("book.xml")//Book
let $n := $r/Reviewer
where $b/Title = $r/BookTitle
and $b/Year > 2010
group by $n
return $r[1]/Reviewer
The following query will give you all distint reviewer names (note that the values are atomized, which means the element nodes are removed):
distinct-values(
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return $r/Reviewer
)

XPath Query to select hyperlink

The following is a subset of xml from a twitter atom feed:
<entry>
<id>tag:search.twitter.com,2005:18232030105964545</id>
<published>2010-12-24T09:10:29Z</published>
<link type="text/html" rel="alternate" href="http://twitter.com/KTNKenya/statuses/18232030105964545"/>
<title>Synovate Poll: PM Raila Odinga remains the preffered presidential candidate at 42% while Uhuru Kenyatta is at 14%... http://fb.me/yjmMbmBx</title>
<content type="html">Synovate Poll: PM <b>Raila</b> Odinga remains the preffered presidential candidate at 42% while Uhuru Kenyatta is at 14%... <a href="http://fb.me/yjmMbmBx">http://fb.me/yjmMbmBx</a></content>
<updated>2010-12-24T09:10:29Z</updated>
<link type="image/png" rel="image" href="http://a3.twimg.com/profile_images/701825859/NEW_KTN_normal.png"/>
<google:location>nairobi, kenya</google:location>
<twitter:geo>
</twitter:geo>
<twitter:metadata>
<twitter:result_type>recent</twitter:result_type>
</twitter:metadata>
<twitter:source><a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a></twitter:source>
<twitter:lang>en</twitter:lang>
<author>
<name>KTNKenya (KTN Kenya)</name>
<uri>http://twitter.com/KTNKenya</uri>
</author>
</entry>
From the <title>...</title> element, i need to select the hyperlink http://fb.me/yjmMbmBx via an XPath query. How do I do it? Is it possible?
*I'm an XPath newbie.
Thanks.
You have two options:
Use <title> (xpath: "/entry/title/text()") and get the URL yourself (e.g. using regex or finding the last instance of "http://" in the string.
Get the data first:
/entry/content[#type="html"]/text()
Then you need to parse this as HTML and extract any tags, and use the href attribute of those tags. How you do this last part depends on the language/environment you are doing this in.
Update: Added basic example code for option 1 above, as requested:
xmlpp::Element *node = parser.get_document()->get_root_node();
xmlpp::NodeSet results = node->find("/entry/title/text()");
xmlpp::ContentNode* content = dynamic_cast<xmlpp::ContentNode*>(results.front());
std::string text = content->get_content();
std::string link = "";
int res = text.rfind("http://");
if(res == text.npos)
res = text.rfind("https://");
if(res != text.npos)
link = text.substr(res);
With atom prefix bound to http://www.w3.org/2005/Atom namespace URI, use:
/atom:feed/atom:entry/atom:title[contains(.,'http://')]
This selects every atom:title element child of atom:entry, having the string "http://" contained in its string value.

Resources