How to calculate a weighted average in XPath or XQuery? - xpath

I need to calculate a simple weighted average in a XForms form. How can I do that in an elegant and declarative way using XPath and/or XQuery ?
[EDITED] This is the source XML document :
<Examens>
<Examen>
<ExamenId>1</ExamenId>
<Coef>1</Coef>
<Notes>
<Note>
<EleveId>100</EleveId>
<Valeur>4</Valeur>
</Note>
<Note>
<EleveId>101</EleveId>
<Valeur>4.2</Valeur>
</Note>
<Note>
<EleveId>102</EleveId>
<Valeur>3.8</Valeur>
</Note>
</Notes>
</Examen>
<Examen>
<ExamenId>2</ExamenId>
<Coef>2</Coef>
<Notes>
<Note>
<EleveId>100</EleveId>
<Valeur>5</Valeur>
</Note>
<Note>
<EleveId>101</EleveId>
<Valeur/>
</Note>
<Note>
<EleveId>102</EleveId>
<Valeur>3.5</Valeur>
</Note>
</Notes>
</Examen>
<Examen>
<ExamenId>3</ExamenId>
<Coef>3</Coef>
<Notes>
<Note>
<EleveId>100</EleveId>
<Valeur>6</Valeur>
</Note>
<Note>
<EleveId>101</EleveId>
<Valeur>5.4</Valeur>
</Note>
<Note>
<EleveId>102</EleveId>
<Valeur>2</Valeur>
</Note>
</Notes>
</Examen>
</Examens>
The following snippet is correctly displaying the values (= ./Valeur) and the weight (= ./../../Coef) :
<xforms:repeat nodeset="$currentBranche//Note[EleveId=$currentEleveId]">
<xforms:output ref="./Valeur"/>
<xforms:output ref="./../../Coef"/>
</xforms:repeat>
BTW, I also need to exclude the nodes for which Valeur is an empty string. For example, in the following simple average calculation with the XPath avg() function, I got an error ("Cannot convert '' to double") if one node's content is an empty string. This is a problem, because the node exist (it's part of a model instance) and the value is an empty string when the user has not yet entered a value.
<xforms:output ref="round(avg($currentBranche//Note[EleveId=$currentEleveId]/Valeur)*100) div 100"/>
[EDITED]
The correct calculations are :
If EleveId=100 : weighted average = (1*4+2*5+3*6) / (1+2+3) = 5.333
If EleveId=101 : weighted average = (1*4.2+3*5.4) / (1+3) = 5.1
If EleveId=102 : weighted average = (1*3.8+2*3.5+3*2) / (1+2+3) = 2.8

In XPath 1.0 Use:
sum($currentBranche//Note[EleveId=$currentEleveId]/Valeur[number(.)=number(.)])
div
count($currentBranche//Note[EleveId=$currentEleveId]/Valeur)
In Xpath 2.0 (XQuery) use:
round(avg($currentBranche//Note[EleveId=$currentEleveId]/Valeur
[number(.)=number(.)])*100
) div 100
If all Valeur values are guaranteed to be castable as xs:decimal, then use:
avg($currentBranche//Note[EleveId=$currentEleveId]/Valeur
[castable as xs:decimal]
/xs:decimal(.)
)
In this case there won't be (noticeable) loss of precision and you can use later the format-number() function to get the wanted number of digits after the decimal point.
II. Producing "weighted average":
Given the provided XML document:
<Examens>
<Examen>
<ExamenId>1</ExamenId>
<Coef>1</Coef>
<Notes>
<Note>
<EleveId>100</EleveId>
<Valeur>4</Valeur>
</Note>
<Note>
<EleveId>101</EleveId>
<Valeur>4.2</Valeur>
</Note>
<Note>
<EleveId>102</EleveId>
<Valeur>3.8</Valeur>
</Note>
</Notes>
</Examen>
<Examen>
<ExamenId>2</ExamenId>
<Coef>2</Coef>
<Notes>
<Note>
<EleveId>100</EleveId>
<Valeur>5</Valeur>
</Note>
<Note>
<EleveId>101</EleveId>
<Valeur/>
</Note>
<Note>
<EleveId>102</EleveId>
<Valeur>3.5</Valeur>
</Note>
</Notes>
</Examen>
<Examen>
<ExamenId>3</ExamenId>
<Coef>3</Coef>
<Notes>
<Note>
<EleveId>100</EleveId>
<Valeur>6</Valeur>
</Note>
<Note>
<EleveId>101</EleveId>
<Valeur>5.4</Valeur>
</Note>
<Note>
<EleveId>102</EleveId>
<Valeur>2</Valeur>
</Note>
</Notes>
</Examen>
</Examens>
this XPath 2.0 expression produces the weighted average:
for $elevId in distinct-values(/*/*/*/*/EleveId)
return
round(100*
(sum(/*/*/*/Note
[EleveId eq $elevId
and number(Valeur) eq number(Valeur)
]
/(Valeur * ../../Coef)
)
div
sum(/*/*/*/Note
[EleveId eq $elevId
and number(Valeur) eq number(Valeur)
]
/../../Coef
)
)
)
div 100
and the expected, correct result is produced:
5.33 5.1 2.8

Related

Can't index data in alphabetical order in spanish alphabet before to select it in a query

I have a set of assets which had a property "name".
I want to get a dynamic number of those assets and I should get it alphabetically sorted by that "name" property.
I query that with this query:
type=dam:Asset
path=/content/dam/en/foobar/contacts/
orderby=#jcr:content/data/master/#name
orderby.sort=asc
p.limit=3
and this is working, so in a set of names:
[Paloma, Abel, José, Eduardo]
it retrieves:
Abel, Eduardo, José.
The problem is with spanish alphabet, in which Á is the same letter as A.
So in a set of:
[Paloma, Abel, José, Álvaro, Eduardo]
it retrieves:
Abel, Eduardo, José.
Being Álvaro excluded because its not part of the first 3 elements after ordeby it, when in should be the second, it should retrieve:
Abel, Álvaro, Eduardo.
So, to fix that, I've created a custom oak lucene index like below:
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:oak="http://jackrabbit.apache.org/oak/ns/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0" xmlns:rep="internal"
jcr:mixinTypes="[rep:AccessControllable]"
jcr:primaryType="nt:unstructured">
<socialLucene/>
<workflowDataLucene/>
<slingeventJob/>
<jcrLanguage/>
<versionStoreIndex/>
<repMembers/>
<cqReportsLucene/>
<commerceLucene/>
<counter/>
<authorizables/>
<enablementResourceName/>
<externalPrincipalNames/>
<cmLucene/>
<foobarCFIndexFilter
jcr:primaryType="oak:QueryIndexDefinition"
async="[async,nrt]"
evaluatePathRestrictions="{Boolean}true"
includedPaths="[/content/dam/es/foobar,/content/dam/en/foobar]"
queryPaths="[/content/dam/es/foobar,/content/dam/en/foobar]"
reindex="{Boolean}false"
reindexCount="{Long}24"
seed="{Long}3850652403740003290"
type="lucene">
<analyzers jcr:primaryType="nt:unstructured">
<default jcr:primaryType="nt:unstructured">
<filters jcr:primaryType="nt:unstructured">
<Synonym
jcr:primaryType="nt:unstructured"
format="solr"
synonyms="synonyms.txt">
<synonyms.txt/>
</Synonym>
</filters>
<tokenizer
jcr:primaryType="nt:unstructured"
name="Classic"/>
</default>
</analyzers>
<indexRules jcr:primaryType="nt:unstructured">
<nt:base jcr:primaryType="nt:unstructured">
<properties jcr:primaryType="nt:unstructured">
<title
jcr:primaryType="nt:unstructured"
analyzed="{Boolean}true"
isRegexp="{Boolean}false"
name="jcr:content/data/master/title"
nodeScopeIndex="{Boolean}true"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"
type="String"/>
<date
jcr:primaryType="nt:unstructured"
name="jcr:content/data/master/date"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"/>
<sectors
jcr:primaryType="nt:unstructured"
name="jcr:content/data/master/sectors"
propertyIndex="{Boolean}true"/>
<contentFragment
jcr:primaryType="nt:unstructured"
name="jcr:content/contentFragment"
propertyIndex="{Boolean}true"/>
<model
jcr:primaryType="nt:unstructured"
name="cq:model"
propertyIndex="{Boolean}true"/>
<name
jcr:primaryType="nt:unstructured"
analyzed="{Boolean}true"
isRegexp="{Boolean}false"
name="jcr:content/data/master/name"
nodeScopeIndex="{Boolean}true"
ordered="{Boolean}true"
propertyIndex="{Boolean}true"
type="String"/>
</properties>
</nt:base>
</indexRules>
</foobarCFIndexFilter>
<cqProjectLucene/>
<ntFolderDamLucene/>
<acPrincipalName/>
<uuid/>
<damAssetLucene/>
<rep:policy/>
<cqPayloadPath/>
<nodetypeLucene/>
<nodetype/>
<ntBaseLucene/>
<reference/>
<principalName/>
<cqTagLucene/>
<lucene/>
<repTokenIndex/>
<externalId/>
<authorizableId/>
<cqPageLucene/>
</jcr:root>
Where in the synonyms.txt I had:
á, a
Á, A
and so on.
Also tried with a charFilter with Mapping equivalent chars.
I have made sure that my custom oak index is the one my query is using with Query Performance Diagnosis tool.
But nothing works, after reindex the query results are the same.
How to solve that?

XPath: Get element with matching attribute value

I'm trying to get content from an element whose #id attribute matches the context node's #idref. For example, given the following xml (just a contrived sample)...
<doc>
<toc>
<entry idref="ch1"/>
<entry idref="ch2"/>
</toc>
<body>
<chapter id="ch1">
<title>Chapter 1</title>
<para/>
</chapter>
<chapter id="ch2">
<title>Chapter 2</title>
<para/>
</chapter>
<chapter id="ch3">
<title>Chapter 3</title>
<para/>
</chapter>
</body>
</doc>
From the [entry] element, how can I get the content of [title] within [chapter] whose #id matches the current #idref.
So, basically find chapter[where chapter #id = current entry #idref]/title
I've tried
string(//chapter[#id = #idref]/title)
string(//chapter[#id = ./#idref]/title)
string(//chapter[#id = current()/#idref]/title)
all with no luck.
Can you try this expression on your xml?
//chapter[#id=//toc/entry/#idref]/string-join((title,#id),' ')
Output:
Chapter 1 ch1
Chapter 2 ch2

XPath with multiple child node conditions

I'm trying to find a XPath expression, to get an elment with multiple conditions on child nodes.
Which XPath can I use to get the ball element with ART_NR = 146334 and FABRICATOR = SPALDING?
The corresponding XML:
<xml>
<ball sellCode="ABC7001" type="basket ball">
<detail>
<type>INFO</type>
<values>
<type>NUMERIC</type>
<value>146334</value>
<id>ART_NR</id>
</values>
<values>
<type>NUMERIC</type>
<value>39.99</value>
<id>PRICE</id>
</values>
<values>
<type>STRING</type>
<value>SPALDING</value>
<id>FABRICATOR</id>
</values>
<values>
</detail>
<detail>
<type>MOD</type>
...
</detail>
</ball>
<ball sellCode="ABC34564" type="golf ball">
...
</xml>
Both the following XPath expressions should work:
/xml/ball[detail[values[id='ART_NR'][value=146334]]
[values[id='FABRICATOR'][value='SPALDING']]]
/xml/ball[detail[values[id='ART_NR' and value=146334]
and values[id='FABRICATOR' and value='SPALDING']]]

xpath selection of a title node within a resultset

I've an xml doc like below. I was trying to select a title node with a particular value in it say "![CDATA[ 1234 ]]". That Title node may be in any Type node. I was using this xpath query
/Results/ResultSet/Type[Title="![CDATA[ 1234 ]]"]
but didnt get anything selected. can someone pls help.
<Results>
<Info>...</Info>
<ResultSet num="4">
<Type type="A">
<Title>
<![CDATA[ 1234 ]]>
</Title>
<Description>
<![CDATA[ 1234 ]]>
</Description>
<Domain>
<![CDATA[1234 ]]>
</Domain>
<Target>
<![CDATA[]]>
</Target>
</Type>
<Type type="A">
<Title>
<![CDATA[ abcdef ]]>
</Title>
<Description>
<![CDATA[abcdef]]>
</Description>
<Domain>
<![CDATA[abcdef]]>
</Domain>
<Target>
<![CDATA[abcdef]]>
</Target>
</Type>
EDIT: included the ruby code that I am using
doc = Nokogiri::HTML(html)
Element = doc.xpath('/Results/ResultSet/Type/Title[text()=" 1234 "]')
if Element.empty?()
puts "not there "
else
Element.each do |node|
puts "Found Title: #{node.text}"
end
end
end
The XPath is wrong:
Use this:
/Results/ResultSet/Type/Title[text()=" 1234 "]
Based on the link OP posted for the XML, here is the working XPath:
/QuigoResults/ResultSet/Listing/Title[text()=" location in DYNAMICREGION "]

XPath in Nokogiri returning empty array [] whereas I am expecting to have results

I am trying to parse XML files using Nokogiri, Ruby and XPath. I usually don't encounter any problem but with the following I can't make any xpath request:
doc = Nokogiri::HTML(open("myfile.xml"))
doc.("//Meta").count
# result ==> 0
doc.xpath("//Meta")
# result ==> []
doc.xpath(.).count
# result => 1
Here is an simplified version of my XML File
<Answer xmlns="test:com.test.search" context="hf%3D10%26target%3Dst0" last="0" estimated="false" nmatches="1" nslices="0" nhits="1" start="0">
<time>
...
</time>
<promoted>
...
</promoted>
<hits>
<Hit url="http://www.test.com/" source="test" collapsed="false" preferred="false" score="1254772" sort="0" mask="272" contentFp="4294967295" did="1287" slice="1">
<groups>
...
</groups>
<metas>
<Meta name="enligne">
<MetaString name="value">
</MetaString>
</Meta>
<Meta name="language">
<MetaString name="value">
fr
</MetaString>
</Meta>
<Meta name="text">
<MetaText name="value">
<TextSeg highlighted="false" highlightClass="0">
La
</TextSeg>
</MetaText>
</Meta>
</metas>
</Hit>
</hits>
<keywords>
...
</keywords>
<groups>
...
</groups>
How can I get all children of <Hit> from this XML?
Include the namespace information when calling xpath:
doc.xpath("//x:Meta", "x" => "test:com.test.search")
You can use the remove_namespaces! method and save your day.
This is one of the most FAQ XPAth questions -- search for "XPath default namespace".
If there is no way to register a namespace for the default namespace and use the registered prefix (say "x" in //x:Meta) then use:
//*[name() = 'Meta` and namespace-uri()='test:com.test.search']
If it is known that Meta can only belong to the default namespace, then the above can be shortened to:
//*[name() = 'Meta`]

Resources