I need to return the top 5 <Concelho> elements with the most <Habitante> grandchildren from Ano = 2001, but I'm having problems.
My code:
for $x in doc("Camaras.xml")/Portugal/Concelho
order by xs:integer($x/Habitantes/Habitante[#Ano = "2001"]) descending
return data($x[position() <= 5])
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE Portugal SYSTEM "CamarasDTD.dtd"> <Portugal>
<Concelho Nome="Arganil " id="0">
<Contactos>
<Email>geral#cm-arganil.pt</Email>
<Telefone> +351 235 200 150</Telefone>
<Fax> +351 235 200 158</Fax>
</Contactos>
<Localização>
<Codigo-Postal>3304-954 Arganil</Codigo-Postal>
</Localização>
<Mapa src="http://cim-regiaodecoimbra.pt/wp-content/uploads/2014/04/3D_arganil.png" />
<Habitantes>
<Habitante Ano="2001">2001</Habitante>
<Habitante Ano="2011">12145</Habitante>
</Habitantes>
</Concelho>
<Concelho Nome="Cantanhede " id="1">
<Contactos>
<Email>geral#cm-cantanhede.pt</Email>
<Telefone> +351 231 410 100</Telefone>
<Fax> +351 231 410 199</Fax>
</Contactos>
<Localização>
<Codigo-Postal>3060-133 Cantanhede</Codigo-Postal>
</Localização>
<Mapa src="http://cim-regiaodecoimbra.pt/wp-content/uploads/2014/04/3D_cantanhede1.png" />
<Habitantes>
<Habitante Ano="2001">37910</Habitante>
<Habitante Ano="2011">36595</Habitante>
</Habitantes>
</Concelho>
<Concelho Nome="Coimbra " id="2">
<Contactos>
<Email>geral#cm-coimbra.pt</Email>
<Telefone> +351 239 857 500</Telefone>
<Fax> +351 239 820 114</Fax>
</Contactos>
<Localização></Portugal>
The ordering is working correctly, but it's retuning all <Concelho> elements.
You need to do the sort, then filter:
let $foo :=
for $x in doc("Camaras.xml")/Portugal/Concelho
order by xs:integer($x/Habitantes/Habitante[#Ano = "2001"]) descending
return $x
return $foo[ position() <= 5 ]
Related
How to get the value of len as 4 and not as 4.0, OR, Is there another way I can find the length of the array using the jelly it-self and use it from there?
<?xml version="1.0" encoding="utf-8" ?>
<j:jelly trim="false" xmlns:j="jelly:core" xmlns:g="glide" xmlns:j2="null" xmlns:g2="null">
<g2:evaluate>
var transport = ["cars", "trains", "buses", "bikes"];
var len = transport.length; <!-- len = 4 -->
</g2:evaluate>
<p>len = $[len]</p> <!-- len = 4.0 -->
<j2:forEach begin="0" end="$[len]" step ="1">
<p>Hello</p>
</j2:forEach>
</j:jelly>
I just get one Hello as output, instead of 4 Hellos
I am currently constructing an XPath condition in SAP PI (receiver determination object) which should either route the message to receiver 1 or receiver 2.
The given documentID values that the business sends are as follows.
Receiver 1 receives messages within below documentID range
Range: "F00" to "F99"
Receiver 2 receives messages within below documentID range
Range: "FA0" to "FZ9"
Sample condition that I can think of, but not sure if this will work or if the logic is correct. Follow up question too, does greater/less than signs accept non-numerical characters?
Condition for Receiver 1
(/p1:Upload/ContainerEvent[WorkAssignmentID >= F00] EX AND /p1:Upload/ContainerEvent[WorkAssignmentID <= F99] EX )
Condition for Receiver 2
(/p1:Upload/ContainerEvent[WorkAssignmentID >= FA0] EX AND /p1:Upload/ContainerEvent[WorkAssignmentID <= FZ9] EX )
I am also thinking if substring can be used in XPath. Feel free to provide your inputs. Thanks
Regards,
Charles Tan
Pure XPath 1.0 solution:
Receiver 1 receives messages within below documentID range Range: F00
to F99
/*/Upload/ContainerEvent
[WorkAssignmentId
[string-length() = 3]
[starts-with(., 'F')][substring(.,2,2) >= 0][99 >= substring(.,2,2)]
]/EX
Receiver 2 receives messages within below documentID range Range: FA0
to FZ9
/*/Upload/ContainerEvent
[WorkAssignmentId
[string-length() = 3]
[starts-with(., 'F')]
[26 > string-length(
translate('ABCDEFGHIJKLMNOPQRSTUVWXYZ',substring(.,2,1), ''))]
[substring(.,3,1) >= 0][9 >= substring(.,3,1)]
]/EX
Here is XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/Upload/ContainerEvent
[WorkAssignmentId
[string-length() = 3]
[starts-with(., 'F')][substring(.,2,2) >= 0][99 >= substring(.,2,2)]
]/EX"/>
==============================
<xsl:copy-of select=
"/*/Upload/ContainerEvent
[WorkAssignmentId
[string-length() = 3]
[starts-with(., 'F')]
[26 > string-length(
translate('ABCDEFGHIJKLMNOPQRSTUVWXYZ',substring(.,2,1), ''))]
[substring(.,3,1) >= 0][9 >= substring(.,3,1)]
]/EX"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML source document (none provided):
<p1>
<Upload>
<ContainerEvent>
<WorkAssignmentId>F13</WorkAssignmentId>
<EX>F13</EX>
</ContainerEvent>
<ContainerEvent>
<WorkAssignmentId>F99</WorkAssignmentId>
<EX>F99</EX>
</ContainerEvent>
<ContainerEvent>
<WorkAssignmentId>E15</WorkAssignmentId>
<EX>E15</EX>
</ContainerEvent>
<ContainerEvent>
<WorkAssignmentId>FA7</WorkAssignmentId>
<EX>FA7</EX>
</ContainerEvent>
<ContainerEvent>
<WorkAssignmentId>FZ9</WorkAssignmentId>
<EX>FZ9</EX>
</ContainerEvent>
<ContainerEvent>
<WorkAssignmentId>FAB</WorkAssignmentId>
<EX>FAB</EX>
</ContainerEvent>
</Upload>
</p1>
The wanted result is produced:
<EX>F13</EX>
<EX>F99</EX>
==============================
<EX>FA7</EX>
<EX>FZ9</EX>
Answer to your follow-up question ("does greater/less than signs accept non-numerical characters?"): in XPath 1.0, no, greater-than/less-than operate only on numerics. This changes in XPath 2.0.
When parsing a XML document in Ruby with libxml, I receive too much data from a find XPath call.
My test data is:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<MAIN>
<EPS>
<EP ID="EDM01">EP 1
<BP ID="EDM01_BP1">BP1 for EP1
<Activities>
<Activity ID="1">Activity 1 for EDM01_BP1</Activity>
<Activity ID="2">Activity 2 for EDM01_BP1</Activity>
<Activity ID="3">Activity 3 for EDM01_BP1</Activity>
</Activities>
</BP>
<BP ID="EDM01_BP2">BP2 for EP1
<Activities>
<Activity ID="1">Activity 1 for EDM01_BP2</Activity>
<Activity ID="2">Activity 2 for EDM01_BP2</Activity>
<Activity ID="3">Activity 3 for EDM01_BP2</Activity>
</Activities>
</BP>
</EP>
<EP ID="APO01">EP 2
<BP ID="APO01_BP1">BP 1 for EP2
<Activities>
<Activity ID="1">Activity 1 for APO01_BP1</Activity>
<Activity ID="2">Activity 2 for APO01_BP1</Activity>
<Activity ID="3">Activity 3 for APO01_BP1</Activity>
<Activity ID="4">Activity 4 for APO01_BP1</Activity>
<Activity ID="5">Activity 5 for APO01_BP1</Activity>
</Activities>
</BP>
</EP>
</EPS>
</MAIN>
And I parse it with:
xmlparser = XML::Parser.string(#strXML,:encoding => XML::Encoding::UTF_8)
#xmlDoc = xmlparser.parse
#projects = nil
project = nil
cl = #xmlDoc.find('/MAIN')
unless (cl.empty?)
puts ""
#projects = #xmlDoc.find('//EP [#ID]')
#projects.each do |p|
puts('<----------1--------->')
puts(p.inner_xml)
bps = p.find('//BP [#ID]')
bps.each do |bp|
puts('<----------2--------->')
puts(bp.inner_xml)
puts('<---- Activities ---->')
acts = bp.find('//Activity [#ID]')
acts.each do |act|
puts('ActID> ' + act['ID'].to_s)
puts(act.first.content.to_s)
end
end
end
end
assert true
end
when looking at the displayed results, it shows that the fetched xml::node is correct (p.inner_xml)
<----------1--------->
EP 1 <BP ID="EDM01_BP1">BP1 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP1</Activity><Activity ID="2">Activity 2 for EDM01_BP1</Activity><Activity ID="3">Activity 3 for EDM01_BP1</Activity> </Activities></BP><BP ID="EDM01_BP2">BP2 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP2</Activity><Activity ID="2">Activity 2 for EDM01_BP2</Activity><Activity ID="3">Activity 3 for EDM01_BP2</Activity> </Activities></BP>
<----------2--------->
BP1 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP1</Activity> <Activity ID="2">Activity 2 for EDM01_BP1</Activity><Activity ID="3">Activity 3 for EDM01_BP1</Activity></Activities>
<---- Activities ---->
ActID> 1
Activity 1 for EDM01_BP1
ActID> 2
Activity 2 for EDM01_BP1
ActID> 3
Activity 3 for EDM01_BP1
ActID> 1
Activity 1 for EDM01_BP2
ActID> 2
Activity 2 for EDM01_BP2
ActID> 3
Activity 3 for EDM01_BP2
ActID> 1
Activity 1 for APO01_BP1
ActID> 2
Activity 2 for APO01_BP1
ActID> 3
Activity 3 for APO01_BP1
ActID> 4
Activity 4 for APO01_BP1
ActID> 5
Activity 5 for APO01_BP1
As you can see has the first XML node that is inspected only 3 activities.
But the program displays all activities from the complete xml doc. Not just from the fetched node.
Is it a wrong assumption that when doing a xmldoc.find() and traversing it with
nodes.each do |n|
the n variable is a libXML::XML::Node that is a subset from the xml document?
How is it otherwise possible to reference data (activities APOxxx) that is not part of the fetched node?
Replace the lines
bps = p.find('//BP [#ID]')
acts = bp.find('//Activity [#ID]')
with
bps = p.find('BP [#ID]')
acts = bp.find('Activities/Activity [#ID]')
XPath expressions starting with / or // are absolute location paths and always return all matching nodes below the root node regardless of the context node p or bp. This is similar to an absolute filesystem path that doesn't care about the current directory.
This is my sample.xml:
<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
<Message>
<Header>
<MemberId>MID-0000001</MemberId>
<MemberName>Bruce</MemberName>
<DeliveryId>0000001</DeliveryId>
<OrderNumber>ON-000000001</OrderNumber>
<ShipToName>Alan</ShipToName>
<ShipToZip>123-4567</ShipToZip>
<ShipToStreet>West</ShipToStreet>
<ShipToCity>Seatle</ShipToCity>
<Payments>
<PayType>Credit Card</PayType>
<Amount>20</Amount>
</Payments>
<Payments>
<PayType>Points</PayType>
<Amount>22</Amount>
</Payments>
<PayType />
</Header>
<Line>
<LineNumber>3.1</LineNumber>
<ItemId>A-0000001</ItemId>
<Description>Apple</Description>
<Quantity>2</Quantity>
<UnitCost>5</UnitCost>
</Line>
<Line>
<LineNumber>4.1</LineNumber>
<ItemId>P-0000001</ItemId>
<Description>Peach</Description>
<Quantity>4</Quantity>
<UnitCost>6</UnitCost>
</Line>
<Line>
<LineNumber>5.1</LineNumber>
<ItemId>O-0000001</ItemId>
<Description>Orange</Description>
<Quantity>2</Quantity>
<UnitCost>4</UnitCost>
</Line>
</Message>
</ShipmentRequest>
And my sample.rb:
#!/usr/bin/ruby -w
require 'nokogiri'
doc = Nokogiri::XML(open("sample.xml"))
doc.xpath("//ShipmentRequest").each {
|node| puts node.text
}
And the results I get:
MID-0000001
Bruce
0000001
ON-000000001
Alan
123-4567
West
Seatle
Credit Card
20
Points
22
3.1
A-0000001
Apple
2
5
4.1
P-0000001
Peach
4
6
5.1
O-0000001
Orange
2
4
I'd like also to print tag names and skip tags/nodes with blank values:
MemberID: MID-0000001
MemberName: Bruce
DeliveryId: 0000001
OrderNumber: ON-000000001
ShipToName: Alan
ShipToZip: 123-4567
ShipToStreet: West
etc...
You basically want all the leaf elements. You can capture all of them in a single XPath expression:
leaves = doc.xpath('//*[not(*)]')
leaves.each do |node|
puts "#{node.name}: #{node.text}" unless node.text.empty?
end
Output:
MemberId: MID-0000001
MemberName: Bruce
DeliveryId: 0000001
OrderNumber: ON-000000001
ShipToName: Alan
ShipToZip: 123-4567
ShipToStreet: West
ShipToCity: Seatle
PayType: Credit Card
Amount: 20
PayType: Points
Amount: 22
LineNumber: 3.1
ItemId: A-0000001
Description: Apple
Quantity: 2
UnitCost: 5
LineNumber: 4.1
ItemId: P-0000001
Description: Peach
Quantity: 4
UnitCost: 6
LineNumber: 5.1
ItemId: O-0000001
Description: Orange
Quantity: 2
UnitCost: 4
Explanation of XPath
The XPath //*[not(*)] finds all the leaf elements. How does it do that? Let's break it down:
The // means scan through the entire document.
The * means any element, so //* matches all elements in the document.
The part in [] is called a predicate and it constrains the previous expression. I read it like a "such that". Its scope is the children of the element, so for example a[b] means all the a elements such that they have a b child.
The not() simply is a boolean negation, so not(*) means "no element", so in a predicate it means "no child element".
Putting it all together, you have "all elements in the document such that they do not have any child elements" == leaf elements.
Another version
In the comments, #Phrogz made a nice addition, moving the logic checking whether the element is empty to the XPath expression by adding another predicate. This has two benefits:
It will have improved performance because it doesn't return all leaves and then check them. This might be noticeable in a large document or if there are lots of empty leaves.
It becomes a one-liner!
puts doc.xpath('//*[not(*)][text()]').map{ |n| "#{n.name}: #{n.text}" }
Meaning "Every element that has no child elements, but that does have at least one child text node."
doc = Nokogiri::XML(File.open("sample.xml"))
doc.xpath("//ShipmentRequest/Message/Header").each do |row|
row.elements.each do |e|
next if e.text.to_s.empty?
if e.name.match(/Payments/)
e.elements.each do |ie|
puts "#{ie.name} : #{ie.text}"
end
else
puts "#{e.name} : #{e.text}"
end
end
end
doc.xpath("//ShipmentRequest/Message/Line").each do |row|
row.elements.each do |e|
next if e.text.to_s.empty?
puts "#{e.name} : #{e.text}"
end
end
Output
MemberId : MID-0000001
MemberName : Bruce
DeliveryId : 0000001
OrderNumber : ON-000000001
ShipToName : Alan
ShipToZip : 123-4567
ShipToStreet : West
ShipToCity : Seatle
PayType : CreditCard
Amount : 20
PayType : Points
Amount : 22
LineNumber : 3.1
ItemId : A-0000001
Description : Apple
Quantity : 2
UnitCost : 5
LineNumber : 4.1
ItemId : P-0000001
Description : Peach
Quantity : 4
UnitCost : 6
LineNumber : 5.1
ItemId : O-0000001
Description : Orange
Quantity : 2
UnitCost : 4
I'm trying to get reviewers who review one or more books published after 2010.
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return {$r/Reviewer}
The following are both XML files.
review.xml:
<Reviews>
<Review>
<ReviewID>R1</ReviewID>
<BookTitle>B1</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Review>
<ReviewID>R2</ReviewID>
<BookTitle>B1</BookTitle>
<Reviewer>BBB</Reviewer>
</Review>
<Review>
<ReviewID>R3</ReviewID>
<BookTitle>B2</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Review>
<ReviewID>R4</ReviewID>
<BookTitle>B3</BookTitle>
<Reviewer>AAA</Reviewer>
</Review>
<Reviews>
book.xml:
<Books>
<Book>
<Title>B1</Title>
<Year>2005</Year>
</Book>
<Book>
<Title>B2</Title>
<Year>2011</Year>
</Book>
<Book>
<Title>B3</Title>
<Year>2012</Year>
</Book>
</Books>
I'll get two AAA by my xQuery code. I was wondering if I can get the distinct result, which means only one AAA. I've tried distinct-value() but don't know how to use it probably. Thanks for your reply!
----My Updated Solution with XML format for xQuery 1.0----
<root>
{
for $x in distinct-values
(
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return {$r/Reviewer}
)
return <reviewer>{$x}</reviewer>
}
</root>
To preserve nodes, you can use the "group by" clause and select the first item of a group sequence:
for $r in doc("review.xml")//Review,
$b in doc("book.xml")//Book
let $n := $r/Reviewer
where $b/Title = $r/BookTitle
and $b/Year > 2010
group by $n
return $r[1]/Reviewer
The following query will give you all distint reviewer names (note that the values are atomized, which means the element nodes are removed):
distinct-values(
for $r in doc("review.xml")//Reviews//Review,
$b in doc("book.xml")//Books//Book
where $b/Title = $r/BookTitle
and $b/Year > 2010
return $r/Reviewer
)