Xquery 1.0 - average over elements with subelements - xpath

So I have an XML file that has elements with numbers in them, and also some sub elements with numbers in them. Something like:
<data>
<points>
<score>80</score>
<score>90</score>
<score>10</score>
<score>13</score>
</points>
<favor>50</favor>
<ranked>
<rank>50</rank>
<rank>10</rank>
</ranked>
</data>
I want to compute the average across all these elements that have numbers including sub elements. So, I want a query that could produce:
(80+90+10+13+50+50+10) / 7 = 43.285714286

You can use expression //*[not(*)] to select leaf elements (element that doesn't have child element), anywhere in the XML document :
let $elements := //*[not(*)]
return sum($elements) div count($elements)
xpathtester demo

Related

Xpath-filtering items

I have a short question. How can I display only the elements who's value is = '.'
I have no idea how to do that. I'm newbie in XPath.
<SalesTransaction>
<TransactionHeader>
<TransactionHeaderFields>
<WrntyID>a</WrntyID>
<ExternalID/>
<Type>.</Type>
<Status>
Submited
</Status>
<CreationDate>
2015-01-12
</CreationDate>
<Date>
2015-01-12T11:41:29Z
</Date>
<DeliveryDate>
2015-01-12
</DeliveryDate>
<Remark/>
</TransactionHeaderFields>
<CatalogFields>
<CatalogID>
saf
</CatalogID>
</CatalogFields>
</TransactionHeader>
</SalesTransaction>
Ignoring any of the structure and just looking for any element who's text() is equal to ".", you could use:
//*[text()='.']
//* will search through the entire tree structure, looking for any element at any level
[text()='.'] is a predicate filter (kind of like a WHERE clause in SQL) that performs a test on each of those matched elements. Only the ones that have a text() node who's value is equal to . will evaluate to true() and will be what is left.
It's not not he most efficient XPath expression, but may be good enough for what you need.

Select all nodes until a specific given node/tag

Given the following markup:
<div id="about">
<dl>
<dt>Date</dt>
<dd>1872</dd>
<dt>Names</dt>
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
<dt>Status</dt>
<dd>on</dd>
<dt>Another Field</dt>
<dd>X</dd>
<dd>Y</dd>
</dl>
</div>
I'm trying to extract all the <dd> nodes following <dt>Names</dt> but only until another <dt> starts. In this case, I'm after the following nodes:
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
I'm trying the following XPath code, but it's not working as intended.
xpath("//div[#id='about']/dl/dt[contains(text(),'Names')]/following-sibling::dd[not(following-sibling::dt)]/text()")
Any thoughts on how to fix it?
Many thanks.
Update: much simpler solution
There is a prerequisite in your situation, that is that the anchor item always is the first preceding sibling with a certain property. Because of that, here's a much simpler way of writing the below complex expression:
/div/dl/dd[preceding-sibling::dt[1][. = 'Names']]
In other words:
select any dd
that has a first preceding sibling dt (the preceding sibling axis counts backwards)
that itself has a value of "Names"
As can be seen in the following screenshot from oXygen, it selects the nodes you wanted to select (and if you change "Names" to "Status" or "Another Field", it will select only the following ones before the next dt also).
Original complex solution (leaving in for reference)
This is far easier in XPath 2.0, but let's assume you can only use XPath 1.0. The trick is to count the number of preceding siblings from your anchor element (the one with "Names" in it), and disregard any that have the wrong count (i.e., when we cross over <dt>Status</dt>, the number of preceding siblings has increased).
For XPath 1.0, remove the comments between (: and :) (in XPath, whitespace is insignificant, you can make it a multiline XPath for readability, but in 1.0, comments are not possible)
/div/dl/dd
(: any dd having a dt before it with "Names" :)
[preceding-sibling::dt[. = 'Names']]
(: count the preceding siblings up to dt with "Names", add one to include 'self' :)
[count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1
=
(: compare with count of all preceding siblings :)
count(preceding-sibling::dt)]
As a one-liner:
/div/dl/dd[preceding-sibling::dt[. = 'Names']][count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1 = count(preceding-sibling::dt)]
How about this:
//dd[preceding-sibling::dt[contains(., 'Names')]][following-sibling::dt]

XPath combine node sets

Let's say I have two expressions: //div[#class="foo"] and //span[#class="foo"]. Is it possible to "combine" them, like so:
//(div | span)[#class="foo"]
Or can I only take the union of the two complete expressions?
//div[#class="foo"] | //span[#class="foo"]
A more idiomatic (and dare I say readable) way to get all of the div and span elements having class="foo" is this:
//*[(self::div or self::span) and #class="foo"]
In English:
Select all elements that are themselves a div or a span and that have a class attribute whose value is 'foo'
As for your original question, the following expressions return equivalent results:
(//div | //span)[#class="foo"]
//div[#class="foo"] | //span[#class="foo"]
The first gives you the set that is the union of all the div and span elements in the document, further filtered to include only those having class="foo" while the latter gives you the union of 1) the set of all div elements having class="foo" and 2) the set of all span elements having class="foo".
It should be fairly obvious that those two sets contain the same thing.
This construct works:
(//golfer | //batter)[#ID="2" or #ID="3"]
...much to my astonishment.

How to select all nodes such that their group size is higher than a given value, in XPath

I would like to select all <mynode> elements that have a value that appears a certain number of times (say, x) in all the elements.
Example:
<root>
<mynode>
<attr1>value_1</attr1>
<attr2>value_2</attr2>
</mynode>
<mynode>
<attr1>value_3</attr1>
<attr2>value_3</attr2>
</mynode>
<mynode>
<attr1>value_4</attr1>
<attr2>value_5</attr2>
</mynode>
<mynode>
<attr1>value_6</attr1>
<attr2>value_5</attr2>
</mynode>
</root>
In this case, I want all the <mynode> elements that whose attr2 value occurs > 1 time (x = 1). So, the last two <mynode>s.
Which query I have to perform in order to achieve this target?
If you're using XPath 2.0 or greater, then the following will work:
for $value in distinct-values(/root/mynode/attr2)
return
if (count(/root/mynode[attr2 = $value]) > 1) then
/root/mynode[attr2 = $value]
else ()
For a more detailed discussion see: XPath/XSLT nested predicates: how to get the context of outer predicate?
This is also possible in plain XPath 1.0 (also works in newer versions of XPath); and probably easier to read. Think of your problem as you're looking for all <mynode/>s which have an <att2/> node that also occurs before or after the <mynode/>:
//mynode[attr2 = preceding::attr2 or attr2 = following::attr2]
If <att2/> nodes can also accour inside other elements and you do not want to test for those:
//mynode[attr2 = preceding::mynode/attr2 or attr2 = following::mynode/attr2]

XPath 2.0: Finding number of distinct elements before first element with current node's value

Setup: I am using XPath 2.0. But inside Altova Stylevision, see my comment later on.
I have got the following XML structure:
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<bla>
<blub>222</blub>
</bla>
<bla>
<blub>222</blub>
</bla>
<bla>
<blub>123</blub>
</bla>
<bla>
<blub>234</blub>
</bla>
<bla>
<blub>123</blub>
<!--I want to find the number of distinct elements before the first occurance of a blub element with the same value as the current node - so for this node the result should be one (two times 222 before the first appearance of 123)-->
</bla>
</entries>
When parsing that I file I would like to know at each occurance of a blub: How many distinct values of blub's are there before the first occurance of a blub with the same value as the current node.
So basically first determining where the first occurance of a blub with the same value as the current node is, and then figuring out the number of distinct blubs before.
One of my problems is that Altova doesn't support the current() function. Quote: "Note that the current() function is an XSLT function, not an XPath function, and cannot therefore be used in StyleVision's Auto-Calculations and Conditional Templates. To select the current node in an expression use the for expression of XPath 2.0."
So any solution that could do without the current() function would be great ;)
Thanks all!
Stevo
If you need the first node with the same value, you can always start at the beginning and search it with /entries/bla[blub=string()][1]. (string without parameter should return the value of the current node)
And then you can insert it in your expression and get
count(distinct-values( /entries/bla[blub=string()][1]/preceding-sibling::bla/blub ))
And if you need it for all blubs you can count it for all of them:
for $x in /entries/bla/blub return count(distinct-values( /entries/bla[blub=string($x)][1]/preceding-sibling::bla/blub ))
edit: it might however be slow to perform, so many loops. If distinct-values in that Stylevision preserves the order of the elements, the number of elements before a value is the index of that a value in the distinct value sequence.
So you can the count for one node with index-of(distinct-values(/entries/bla/blub), string()) - 1 and the count for all nodes with
for $x in /entries/bla/blub return index-of(distinct-values(/entries/bla/blub), $x) - 1
And if it is possible to define new variables you could set $s to distinct-values(/entries/bla/blub) and simplify it to
for $x in /entries/bla/blub return index-of($s, $x) - 1

Resources