XPath 2.0: Finding number of distinct elements before first element with current node's value - distinct

Setup: I am using XPath 2.0. But inside Altova Stylevision, see my comment later on.
I have got the following XML structure:
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<bla>
<blub>222</blub>
</bla>
<bla>
<blub>222</blub>
</bla>
<bla>
<blub>123</blub>
</bla>
<bla>
<blub>234</blub>
</bla>
<bla>
<blub>123</blub>
<!--I want to find the number of distinct elements before the first occurance of a blub element with the same value as the current node - so for this node the result should be one (two times 222 before the first appearance of 123)-->
</bla>
</entries>
When parsing that I file I would like to know at each occurance of a blub: How many distinct values of blub's are there before the first occurance of a blub with the same value as the current node.
So basically first determining where the first occurance of a blub with the same value as the current node is, and then figuring out the number of distinct blubs before.
One of my problems is that Altova doesn't support the current() function. Quote: "Note that the current() function is an XSLT function, not an XPath function, and cannot therefore be used in StyleVision's Auto-Calculations and Conditional Templates. To select the current node in an expression use the for expression of XPath 2.0."
So any solution that could do without the current() function would be great ;)
Thanks all!
Stevo

If you need the first node with the same value, you can always start at the beginning and search it with /entries/bla[blub=string()][1]. (string without parameter should return the value of the current node)
And then you can insert it in your expression and get
count(distinct-values( /entries/bla[blub=string()][1]/preceding-sibling::bla/blub ))
And if you need it for all blubs you can count it for all of them:
for $x in /entries/bla/blub return count(distinct-values( /entries/bla[blub=string($x)][1]/preceding-sibling::bla/blub ))
edit: it might however be slow to perform, so many loops. If distinct-values in that Stylevision preserves the order of the elements, the number of elements before a value is the index of that a value in the distinct value sequence.
So you can the count for one node with index-of(distinct-values(/entries/bla/blub), string()) - 1 and the count for all nodes with
for $x in /entries/bla/blub return index-of(distinct-values(/entries/bla/blub), $x) - 1
And if it is possible to define new variables you could set $s to distinct-values(/entries/bla/blub) and simplify it to
for $x in /entries/bla/blub return index-of($s, $x) - 1

Related

How to grab Index instead of relative position using xpath

Given the following xml:
<randomName>
<otherName>
<a>item1</a>
<a>item2</a>
<a>item3</a>
</otherName>
<lastName>
<a>item4</a>
<a>item5</a>
</lastName>
</randomName>
Running: '//a' Gives me an array of all 5 "a" elements, however '//a[1]' does not give me the first of those five elements (item1). It instead gives me an array containing (item1 and item 4).
I believe this is because they are both position 1 relatively. How can I grab any a element by its overall index?
I would like to be able to use a variable "x" to get itemX.
You can wrap it in parenthesis so it knows to apply the index to the entire result set
(//a)[1]

Select all nodes until a specific given node/tag

Given the following markup:
<div id="about">
<dl>
<dt>Date</dt>
<dd>1872</dd>
<dt>Names</dt>
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
<dt>Status</dt>
<dd>on</dd>
<dt>Another Field</dt>
<dd>X</dd>
<dd>Y</dd>
</dl>
</div>
I'm trying to extract all the <dd> nodes following <dt>Names</dt> but only until another <dt> starts. In this case, I'm after the following nodes:
<dd>A</dd>
<dd>B</dd>
<dd>C</dd>
I'm trying the following XPath code, but it's not working as intended.
xpath("//div[#id='about']/dl/dt[contains(text(),'Names')]/following-sibling::dd[not(following-sibling::dt)]/text()")
Any thoughts on how to fix it?
Many thanks.
Update: much simpler solution
There is a prerequisite in your situation, that is that the anchor item always is the first preceding sibling with a certain property. Because of that, here's a much simpler way of writing the below complex expression:
/div/dl/dd[preceding-sibling::dt[1][. = 'Names']]
In other words:
select any dd
that has a first preceding sibling dt (the preceding sibling axis counts backwards)
that itself has a value of "Names"
As can be seen in the following screenshot from oXygen, it selects the nodes you wanted to select (and if you change "Names" to "Status" or "Another Field", it will select only the following ones before the next dt also).
Original complex solution (leaving in for reference)
This is far easier in XPath 2.0, but let's assume you can only use XPath 1.0. The trick is to count the number of preceding siblings from your anchor element (the one with "Names" in it), and disregard any that have the wrong count (i.e., when we cross over <dt>Status</dt>, the number of preceding siblings has increased).
For XPath 1.0, remove the comments between (: and :) (in XPath, whitespace is insignificant, you can make it a multiline XPath for readability, but in 1.0, comments are not possible)
/div/dl/dd
(: any dd having a dt before it with "Names" :)
[preceding-sibling::dt[. = 'Names']]
(: count the preceding siblings up to dt with "Names", add one to include 'self' :)
[count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1
=
(: compare with count of all preceding siblings :)
count(preceding-sibling::dt)]
As a one-liner:
/div/dl/dd[preceding-sibling::dt[. = 'Names']][count(preceding-sibling::dt[. = 'Names']/preceding-sibling::dt) + 1 = count(preceding-sibling::dt)]
How about this:
//dd[preceding-sibling::dt[contains(., 'Names')]][following-sibling::dt]

Xquery not returning desired values

I am trying to return a certain set of values however the query is not quite returning what I would like. I would like to return records by the author "Hennie J. Steenhagen" grouped by year. However what it is returning is records grouped by year if it’s of the same year as one of Hennies records. Not only Hennies.
For example, if we have the record <www><author>Hennie*</author><year>1990</year></www> and <www><author>Derpie</author><year>1990></year></www> the query will return both records grouped in the year 1990, I would only like Hennies to be returned.
for $y in /*/*/year where $y/../author ="Hennie J. Steenhagen" return <year-Pub>{$y}{/*/*[year = $y]}</year-Pub>
Your question is quite difficult to understand because your XPath addresses a larger XML node tree than the example XML you have provided. However for the example I will assume that your records are named record. Also your output of your XPath does not make a lot of sense to me, but I will assume that you know what you want!
Given the XML:
<record>
<www>
<author>Hennie J. Steenhagen</author>
<year>1990</year>
</www>
and
<www>
<author>Derpie</author>
<year>1990></year>
</www>
</record>
If you have an XQuery 3.0 processor, you could use the following:
/record/www[author = "Hennie J. Steenhagen"] ! <year-Pub>{year}{.}</year-Pub>
If you only have access to an XQuery 1.0 processor, then you could fall-back to the following:
for $w in /record/www[author = "Hennie J. Steenhagen"]
return
<year-Pub>{$w/year}{$w}</year-Pub>
Both of my examples only use a single predicate which will only filter the data once. Whereas your self-found solution uses both a predicate and a where expression, and so has to filter the data twice.
Fixed it,
for $y in /*/*/year where $y/../author ="Hennie J. Steenhagen" and /*/*[year=$y] return <year-Pub>{$y/../*}</year-Pub>
Thanks for any one whom spend their time looking.

XPath :: running counter two levels

Using the count(preceding-sibling::*) XPath expression one can obtaining incrementing counters. However, can the same also be accomplished in a two-levels deep sequence?
example XML instance
<grandfather>
<father>
<child>a</child>
</father>
<father>
<child>b</child>
<child>c</child>
</father>
</grandfather>
code (with Saxon HE 9.4 jar on the CLASSPATH for XPath 2.0 features)
Trying to get an counter sequence of 1,2 and 3 for the three child nodes with different kinds of XPath expressions:
XPathExpression expr = xpath.compile("/grandfather/father/child");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
System.out.printf("child's index is: %s %s %s, name is: %s\n"
,xpath.compile("count(preceding-sibling::*)").evaluate(node)
,xpath.compile("count(preceding-sibling::child)").evaluate(node)
,xpath.compile("//child/position()").evaluate(doc)
,xpath.compile(".").evaluate(node));
}
The above code prints:
child's index is: 0 0 1, name is: a
child's index is: 0 0 1, name is: b
child's index is: 1 1 1, name is: c
None of the three XPaths I tried managed to produce the correct sequence: 1,2,3. Clearly it can trivially be done using the i loop variable but I want to accomplish it with XPath if possible. Also I need to keep the basic framework of evaluating an XPath expression to get all the nodes to visit and then iterating on that set since that's the way the real application I work on is structured. Basically I visit each node and then need to evaluate a number of XPath expressions on it (node) or on the document (doc); one of these XPAth expressions is supposed to produce this incrementing sequence.
Use the preceding axis with a name test instead.
count(preceding::child)
Using XPath 2.0, there is a much better way to do this. Fetch all <child/> nodes and use the position() function to get the index:
//child/concat("child's index is: ", position(), ", name is: ", text())
You don't say efficiency is important, but I really hate to see this done with O(n^2) code! Jens' solution shows how to do that if you can use the result in the form of a sequence of (position, name) pairs. You could also return an alternating sequence of strings and numbers using //child/(string(.), position()): though you would then want to use the s9api API rather than JAXP, because JAXP can only really handle the data types that arise in XPath 1.0.
If you need to compute the index of each node as part of other processing, it might still be worth computing the index for every node in a single initial pass, and then looking it up in a table. But if you're doing that, the simplest way is surely to iterate over the result of //child and build a map from nodes to the sequence number in the iteration.

How to select all nodes such that their group size is higher than a given value, in XPath

I would like to select all <mynode> elements that have a value that appears a certain number of times (say, x) in all the elements.
Example:
<root>
<mynode>
<attr1>value_1</attr1>
<attr2>value_2</attr2>
</mynode>
<mynode>
<attr1>value_3</attr1>
<attr2>value_3</attr2>
</mynode>
<mynode>
<attr1>value_4</attr1>
<attr2>value_5</attr2>
</mynode>
<mynode>
<attr1>value_6</attr1>
<attr2>value_5</attr2>
</mynode>
</root>
In this case, I want all the <mynode> elements that whose attr2 value occurs > 1 time (x = 1). So, the last two <mynode>s.
Which query I have to perform in order to achieve this target?
If you're using XPath 2.0 or greater, then the following will work:
for $value in distinct-values(/root/mynode/attr2)
return
if (count(/root/mynode[attr2 = $value]) > 1) then
/root/mynode[attr2 = $value]
else ()
For a more detailed discussion see: XPath/XSLT nested predicates: how to get the context of outer predicate?
This is also possible in plain XPath 1.0 (also works in newer versions of XPath); and probably easier to read. Think of your problem as you're looking for all <mynode/>s which have an <att2/> node that also occurs before or after the <mynode/>:
//mynode[attr2 = preceding::attr2 or attr2 = following::attr2]
If <att2/> nodes can also accour inside other elements and you do not want to test for those:
//mynode[attr2 = preceding::mynode/attr2 or attr2 = following::mynode/attr2]

Resources