XQuery: How to know if there are doublets?

XQuery: How to know if there are doublets? - xpath

I have XML file there is <a> and <b> for each element
I want to write a query using XQuery to return True or False
there is an element called <element>.
each <element> has 2 element in it <a>and<b>.
Return False :
if there is any <a> has the same value as another <a> in another element && there <b>'s value are different
otherwise True :
<a> values are differnt in each element
or there is similarity but there <b> values are different
for example
<root>
<element>
<a>ttt</a>
<b>tttsame</b>
</element>
<element>
<a>ttt</a>
<b>tttsame</b>
</element>
<element>
<a/>
<b>value</b>
</element>
<element>
<a>rrr</a>
<b>rrrvalue</b>
</element>
<element>
<a>mmm</a>
<b>rrrvalue</b>
</element>
<element>
<a>mmm</a>
<b>rrrvalue</b>
</element>
</root>
This one should be okay
should return true
<root>
<element>
<a>ttt</a>
<b>ttt value</b>
</element>
<element>
<a>ttt</a>
<b>ttrdiff</b>
</element>
<element>
<a/>
<b>value</b>
</element>
<element>
<a>mmm</a>
<b>rrrvalue</b>
</element>
</root>
shoudn't be accepted because ttt has two different values
should return false

Simple XPath 2.0:
empty(
(for $parentA-Dubled in /*/*[a = following-sibling::*/a]
return
empty($parentA-Dubled/following-sibling::*
[$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
)
[not(.)]
)
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"empty(
(for $parentA-Dubled in /*/*[a = following-sibling::*/a]
return
empty($parentA-Dubled/following-sibling::*
[$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
)
[not(.)]
)
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any XML document, it evaluates the XPath expression and outputs the result of this evaluation.
When applied on the first provided XML document, the wanted, correct result is produced:
true
When applied on the second provided XML document, again the wanted, correct result is produced:
false
Explanation:
This sub-expression:
(for $parentA-Dubled in /*/*[a = following-sibling::*/a]
return
empty($parentA-Dubled/following-sibling::*
[$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
evaluates to a sequence of boolean values: true() / false()
true() is returned when this is true:
empty($parentA-Dubled/following-sibling::*
[$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
This means that true() is returned for every occasion when there is an $parentA-Dubled/a that has no other a (a child of a following sibling of $parentA-Dubled with the same value as $parentA-Dubled/a but the value of its b sibling is different than the value of $parentA-Dubled/b.
To summarize: true() is returned when for all a elements with the same value, their b siblings also have (all b s) the same value
Then when is the case when false() is returned?
Returning false() means that empty() returned false() -- that is, there exists at least one occasion of two a elements that have the same value, but their b siblings have different values.
Thus, the sub-expression above returns a sequence such as:
true(), true(), true(), ..., true() -- all values are true()
or
true(), true(), true(), ..., false), ..., true() -- at least one of the values is false()
The original problem requires us to return true() in the first case and to return false() in the second case.
This is easy to express as:
empty($booleanSequence[. eq false()]) -- and this is equivalent to the shorter:
empty($booleanSequence[not(.)])
Now, we just need to substitute in the above expression $booleanSequence with the first sub-expression that we analyzed above:
(for $parentA-Dubled in /*/*[a = following-sibling::*/a]
return
empty($parentA-Dubled/following-sibling::*
[$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
Thus we obtain the complete XPath expression that solves the original problem:
empty(
(for $parentA-Dubled in /*/*[a = following-sibling::*/a]
return
empty($parentA-Dubled/following-sibling::*
[$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
)
[not(.)]
)

You could group on a and then check if there is more than one distinct b in any group, for instance with
not
(
for $a-group in root/element
group by $a := $a-group/a
where tail(distinct-values($a-group/b))
return $a-group
)
https://xqueryfiddle.liberty-development.net/6qM2e2r/0 and https://xqueryfiddle.liberty-development.net/6qM2e2r/1 has your two input samples.
As for how it works, the question asks to return false "if there is any <a> has the same value as another <a> in another element && there <b>'s value are different".
To find element elements with the same a child element we can group by $a := $a-group/a in a for $a-group in root/element expression. The distinct or different b values in each group of as with the same value are computed by distinct-values($a-group/b), if there are at least two different b values then tail(distinct-values($a-group/b)) contains at least one value, otherwise it is an empty sequence. This works as through XQuery 3's group by clause "In the post-grouping tuple generated for a given group, each non-grouping variable is bound to a sequence containing the concatenated values of that variable in all the pre-grouping tuples that were assigned to that group" (https://www.w3.org/TR/xquery-31/#id-group-by) so that after the group by $a := $a-group/a clause the variable $a-group is bound to a sequence of element elements with the same grouping key based on the a child element.
So the complete for .. group by .. where .. return selects the groups of element elements with the same a value where there are at least two different/distinct b values.
As the requirement is to "return false" if any such groups exist the not() function is applied to implement that condition as the boolean value of a non-empty sequence is true and the not(..) then gives false if there are any elements meeting the condition expressed in the for selection.

Try this XQuery code to get only one distinct item of <a> (The corresponding <b> value is not specified; here, the first element is chosen):
let $file := doc("input.xml")/root,
$vals := distinct-values($file/element/a) return
<root>
{for $i in $vals return $file/element[a=$i][1]}
</root>
Its result is:
<root>
<element>
<a>ttt</a>
<b>ttt value</b>
</element>
<element>
<a/>
<b>value</b>
</element>
<element>
<a>rrr</a>
<b>rrrvalue</b>
</element>
<element>
<a>mmm</a>
<b>rrrvalue</b>
</element>
</root>

Related

Xpath 1.0 select first node back to ancestor

My XML as below :
<Query>
<Comp>
<Pers>
<Emp>
<Job>
<Code>Not selected</Code>
</Job>
</Emp>
<Emp>
<Job>
<Code>selected</Code>
</Job>
</Emp>
</Pers>
</Comp>
</Query>
I have an XPath : /Query/Comp/Pers/Emp/Job[Code='selected']/../../../..
The result should only have one < Emp > that meet condition
<Query>
<Comp>
<Pers>
<Emp>
<Job>
<Code>selected</Code>
</Job>
</Emp>
</Pers>
</Comp>
</Query>
How could I get the result?
The system doesn't work with ancestor::*. I have to use '/..' to populate the ancestor.

You shouldn't have to use ancestor here to get the <emp> tag, the following expath should select any <emp> tag that meets your criteria:
/Query/Comp/Pers/Emp[Job[Code='selected']]
Note: You say your result should be one, which will be correct in this case but this expression will return all nodes that match your criteria
Edit:
You've stated you're using XSLT and you've given me a bit of a snippet below, but I'm still not 100% sure of your actual structure. You can use the XPath to identify all the nodes that are not equal to selected, and then use XSLT to copy everything except those.
// Copy's all nodes in the input to the output
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
// Matches specifically the Emp records that are not equal to selected and
// applies no action to them to they do not appear in the output
<xsl:template match="/Query/Comp/Pers/Emp[Job[Code!='selected']]" />
The two templates above would transform your input to your desired output!

Self axis in xslt

<element>
<bye>do not delete me</bye>
<hello>do not delete me</hello>
<hello>delete me</hello>
<hello>delete me</hello>
</element>
Applied to the above xml, this deletes all the nodes except the first hello child of /element:
<xsl:template match="hello[not(current() = parent::element/hello[1])]" />
Why these ones doesn't work? (assuming the first node is not a text node)
<xsl:template match="hello[not(self::hello/position() = 1)]" />
<xsl:template match="hello[not(./position() = 1)]" />
Or this one?
<xsl:template match="hello[not(self::hello[1])]" />
What is the self axis selecting? Why isn't this last example equivalent to not(hello[1])?

First, you are wrong when you say that:
This deletes all the nodes except the first hello child of /element
The truth is that it deletes (if that's the correct word) any hello child of /element whose value is not the same as the value of the first one of these. For example, given:
XML
<element>
<hello>a</hello>
<hello>b</hello>
<hello>c</hello>
<hello>a</hello>
</element>
the template:
<xsl:template match="hello[not(current() = parent::element/hello[1])]" />
will match the second and the third hello nodes - but not the first or the fourth.
Now, with regard to your question: in XSLT 1.0, position() is not a valid location step - so this:
<xsl:template match="hello[not(self::hello/position() = 1)]" />
should return an error.
In XSLT 2.0, the pattern hello[not(self::hello/position() = 1)] will not match any hello element - because there is only one node on the self axis, and therefore its position is always 1.
Similarly:
<xsl:template match="hello[not(./position() = 1)]" />
is invalid in XSLT 1.0.
In XSLT 2.0, ./position() will always return 1 for the same reason as before: . is short for self::node() and there is only one such node.
Finally, this template:
<xsl:template match="hello[not(self::hello[1])]" />
is looking for a node that doesn't have (the first instance of) itself. Of course, no such node can exist.

Using position() on the RHS of the "/" operator is never useful -- and in XSLT 1.0, which is the tag on your question, it's not actually permitted.
In XSLT 2.0, the result of the expression X/position() is a sequence of integers 1..count(X). If the LHS is a singleton, like self::E, then count(X) is one so the result is a single integer 1.

Nested conditional if else statements in xpath

I have this XML:
<property id="1011">
<leasehold>No</leasehold>
<freehold>Yes</freehold>
<propertyTypes>
<propertyType>RESIDENTIAL</propertyType>
</propertyTypes>
</property>
and I want to create an xpath statement that is same as the following nested if-else pseudocode block.
if( propertyTypes/propertyType == 'RESIDENTIAL') {
if( leasehold == 'Yes' ){
return 'Rent'
} else
return 'Buy'
}
} else {
if( leasehold == 'Yes' ){
return 'Leasehold'
} else
return 'Freehold'
}
}
I've seen something about Becker's method but I couldn't really follow it. XPath isn't my strong point really.

I. In XPath 2.0 one simply translates this to:
if(/*/propertyTypes/propertyType = 'RESIDENTIAL')
then
(if(/*/leasehold='Yes')
then 'Rent'
else 'Buy'
)
else
if(/*/leasehold='Yes')
then 'Leasehold'
else 'Freehold'
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:sequence select=
"if(/*/propertyTypes/propertyType = 'RESIDENTIAL')
then
(if(/*/leasehold='Yes')
then 'Rent'
else 'Buy'
)
else
if(/*/leasehold='Yes')
then 'Leasehold'
else 'Freehold'
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<property id="1011">
<leasehold>No</leasehold>
<freehold>Yes</freehold>
<propertyTypes>
<propertyType>RESIDENTIAL</propertyType>
</propertyTypes>
</property>
the XPath expression is evaluated and the result of this evaluation is copied to the output:
Buy
II. XPath 1.0 solution
In XPath 1.0 there isn't an if operator.
A conditional statement can still be implemented with a single XPath 1.0 expression, but this is more tricky and the expression may not be too readable and understandable.
Here is a generic way (first proposed by Jeni Tennison) to produce $stringA when a condition $cond is true() and otherwise produce $stringB:
concat(substring($stringA, 1 div $cond), substring($stringB, 1 div not($cond)))
One of the main achivements of this formula is that it works for strings of any length and no lengths need to be specified.
Explanation:
Here we use the fact that by definition:
number(true()) = 1
and
number(false()) = 0
and that
1 div 0 = Infinity
So, if $cond is false, the first argument of concat() above is:
substring($stringA, Infinity)
and this is the empty string, because $stringA has a finite length.
On the other side, if $cond is true() then the first argument of concat() above is:
sibstring($stringA, 1)
that is just $stringA.
So, depending on the value of $cond only one of the two arguments of concat() above is a nonempty string (respectively $stringA or $stringB).
Applying this generic formula to the specific question, we can translate the first half of the big conditional expression into:
concat(
substring('rent',
1 div boolean(/*[leasehold='Yes'
and
propertyTypes/propertyType = 'RESIDENTIAL'
]
)
),
substring('buy',
1 div not(/*[leasehold='Yes'
and
propertyTypes/propertyType = 'RESIDENTIAL'
]
)
)
)
This should give you an idea how to translate the whole conditional expression into a single XPath 1.0 expression.
XSLT 1.0 - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"concat(
substring('rent',
1 div boolean(/*[leasehold='Yes'
and
propertyTypes/propertyType = 'RESIDENTIAL'
]
)
),
substring('buy',
1 div not(/*[leasehold='Yes'
and
propertyTypes/propertyType = 'RESIDENTIAL'
]
)
)
)
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (above), the XPath expression is evaluated and the result of this evaluation is copied to the output:
buy
Do note:
If you decide to replace the specific strings with other strings that have different lengths than the original, you simply replace these strings in the above XPath 1.0 expression and you don't have to worry about specifying any lengths.

Becker's method for your data is the following:
concat(substring('Rent', 1 div boolean(propertyTypes/propertyType ="RESIDENTIAL" and leasehold="Yes")),
substring('Buy', 1 div boolean(propertyTypes/propertyType ="RESIDENTIAL" and leasehold="No")),
substring('Leasehold', 1 div boolean(propertyTypes/propertyType!="RESIDENTIAL" and leasehold="Yes")),
substring('Freehold', 1 div boolean(propertyTypes/propertyType!="RESIDENTIAL" and leasehold="No")))

Spent all day today, but works for me this is for Xpath 1.0:
concat(
substring(properties/property[#name="Headline"], 1, string-length(properties/property[#name="Headline"]) * 1),
substring(properties/property[#name="Name"], 1, not(number(string-length(properties/property[#name="Headline"]))) * string-length(properties/property[#name="Name"]))
)

Try this
if (condition)
then
if (condition) stmnt
else stmnt
else
if (condition) stmnt
else stmnt

Return a string value based on XPATH condition

If I have the below XML, how to specify a xpath to return a string based on a condition. For example here if //b[#id=23] then "Profit" else "Loss"
<a>
<b id="23"/>
<c></c>
<d></d>
<e>
<f id="23">
<i>123</i>
<j>234</j>
<f>
<f id="24">
<i>345</i>
<j>456</j>
<f>
<f id="25">
<i>678</i>
<j>567</j>
<f>
</e>
</a>

I. XPath 2.0 solution (recommended if you have access to an XPath 2.0 engine)
(: XPath 2.0 has if ... then ... else ... :)
if(//b[#id=23])
then 'Profit'
else 'Loss'
II. XPath 1.0 solution:
Use:
concat(substring('Profit', 1 div boolean(//b[#id=23])),
substring('Loss', 1 div not(//b[#id=23]))
)
Verification using XSLT 1.0:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:value-of select=
"concat(substring('Profit', 1 div boolean(//b[#id=23])),
substring('Loss', 1 div not(//b[#id=23]))
)"/>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document (corrected to make it well-formed):
<a>
<b id="23"/>
<c></c>
<d></d>
<e>
<f id="23">
<i>123</i>
<j>234</j>
</f>
<f id="24">
<i>345</i>
<j>456</j>
</f>
<f id="25">
<i>678</i>
<j>567</j>
</f>
</e>
</a>
produces the wanted, correct result:
Profit
When we replace in the XML document:
<b id="23"/>
with:
<b id="24"/>
again the correct result is produced:
Loss
Explanation:
We use the fact that:
substring($someString, $N)
is the empty string for all $N > string-length($someString).
Also, the number Infinity is the only number greater than the string-length of any string.
Finally:
number(true()) is 1 by definition,
number(false()) is 0 by definition.
Therefore:
1 div $someCondition
is 1 exactly when the $someCondition is true()
and is Infinity exactly when $someCondition is false()
Thus it follows from this that if we want to produce $stringX when $Cond is true() and to produce $stringY when $Cond is false(), one way to express this is by:
concat(substring($stringX, 1 div $cond),
substring($stringY, 1 div not($cond)),
)
In the above expression exactly one of the two arguments of the concat() function is non-empty.

You can't; you'd have to use XQuery for this. see e.g. XQuery Conditional Expressions
Or, if the resulting string is only used within Java, you can just process the value returned by XPath within your Java code:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//b[#id=23]");
boolean result = expr.evaluate(doc, XPathConstants.BOOLEAN);
if (result) return "Profit";
else return "Loss";

XPath : select all following siblings until another sibling

Here is an excerpt of my xml :
<node/>
<node/>
<node id="1">content</node>
<node/>
<node/>
<node/>
<node id="2">content</node>
<node/>
<node/>
I am positioned in the node[#id='1']. I need an Xpath to match all the <node/> elements until the next not empty node (here node[#id='2']).
Edit:
the #id attributes are only to explain my problem more clearly, but are not in my original XML. I need a solution which does not use the #id attributes.
I do not want to match the empty siblings after node[#id='2'], so I can't use a naive following-sibling::node[text()=''].
How can I achieve this ?

You could do it this way:
../node[not(text()) and preceding-sibling::node[#id][1][#id='1']]
where '1' is the id of the current node (generate the expression dynamically).
The expression says:
from the current context go to the parent
select those child nodes that
have no text and
from all "preceding sibling nodes that have an id" the first one must have an id of 1
If you are in XSLT you can select from the following-sibling axis because you can use the current() function:
<!-- the for-each is merely to switch the current node -->
<xsl:for-each select="node[#id='1']">
<xsl:copy-of select="
following-sibling::node[
not(text()) and
generate-id(preceding-sibling::node[#id][1])
=
generate-id(current())
]
" />
</xsl:for-each>
or simpler (and more efficient) with a key:
<xsl:key
name="kNode"
match="node[not(text())]"
use="generate-id(preceding-sibling::node[#id][1])"
/>
<xsl:copy-of select="key('kNode', generate-id(node[#id='1']))" />

Simpler than the accepted answer:
//node[#id='1']/following-sibling::node[following::node[#id='2']]
Find a node anywhere whose id is '1'
Now find all the following sibling node elements
...but only if those elements also have a node with id="2" somewhere after them.
Shown in action with a more clear test document (and legal id values):
xml = '<root>
<node id="a"/><node id="b"/>
<node id="c">content</node>
<node id="d"/><node id="e"/><node id="f"/>
<node id="g">content</node>
<node id="h"/><node id="i"/>
</root>'
# A Ruby library that uses libxml2; http://nokogiri.org
require 'nokogiri'; doc = Nokogiri::XML(xml)
expression = "//node[#id='c']/following-sibling::node[following::node[#id='g']]"
puts doc.xpath(expression)
#=> <node id="d"/>
#=> <node id="e"/>
#=> <node id="f"/>

XPath 2.0 has the operators '<<' and '>>' where node1 << node2 is true if node1 precedes node2 in document order.
So based on that with XPath 2.0 in an XSLT 2.0 stylesheet where the current node is the node[#id = '1'] you could use
following-sibling::node[not(text()) and . << current()/following-sibling::node[#od][1]]
That also needs the current() function from XSLT, so that is why I said "with XPath 2.0 in an XSLT 2.0 stylesheet". The syntax above is pure XPath, in an XSLT stylesheet you would need to escape '<<' as '<<'.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

XQuery: How to know if there are doublets? - xpath

Related

Xpath 1.0 select first node back to ancestor

Self axis in xslt

Nested conditional if else statements in xpath

Return a string value based on XPATH condition

XPath : select all following siblings until another sibling

Categories

Resources