Faster XPath expressions to execute queries from multiple XMLs - xpath

I have the two following XMLs and the problem statement is as follows.
Parse XML 1 and if subnode of any node_x contains 'a' in its name (like in value_a_0) and value_a_0 contains a specific number, parse XML 2 and go to node_x-1 for all abc_x in and compare the content of value_x-1_0/1/2/3 with certain entities.
If subnode of any node_x contains 'b' in its name (like in value_b_0) and value_b_0 contains a specific number(say 'm'), parse XML 2 and go to node_x+1 for all abc_x in and compare the content of value_x-1_0/1/2/3 with 'm'.
Example : For all the value_a_0 in record1 check if value_a_0 node contains 5. If so, which are the case for node_1 and node_9, go to record2/node_0 and record2/node_8 and compare the contents of value_0_0/1/2/3 whether they contains 5 or not. Similarly, for rest of the cases.
I was wondering what would be the best practice to solve it? Is there any hash-table approach in Xpath 3.0?
First XML
<record1>
<node_1>
<value_a_0>5</value_1_0>
<value_b_1>0</value_1_1>
<value_c_2>10</value_1_2>
<value_d_3>8</value_1_3>
</node_1>
.................................
.................................
<node_9>
<value_a_0>5</value_a_0>
<value_b_1>99</value_b_1>
<value_c_2>53</value_c_2>
<value_d_3>5</value_d_3>
</node_9>
</record1>
Second XML
<record2>
<abc_0>
<node_0>
<value_0_0>5</value_0_0>
<value_0_1>0</value_0_1>
<value_0_2>150</value_0_2>
<value_0_3>81</value_0_3>
</node_0>
<node_1>
<value_1_0>55</value_1_0>
<value_1_1>30</value_1_1>
<value_1_2>150</value_1_2>
<value_1_3>81</value_1_3>
</node_1>
.................................
.................................
<node_63>
<value_63_0>1</value_63_0>
<value_63_1>99</value_63_1>
<value_63_2>53</value_63_2>
<value_63_3>5</value_63_3>
</node_63>
</abc_0>
================================================
<abc_99>
<node_0>
<value_0_0>555</value_0_0>
<value_0_1>1810</value_0_1>
<value_0_2>140</value_0_2>
<value_0_3>80</value_0_3>
</node_0>
<node_1>
<value_1_0>555</value_1_0>
<value_1_1>1810</value_1_1>
<value_1_2>140</value_1_2>
<value_1_3>80</value_1_3>
</node_1>
<node_2>
<value_2_0>5</value_2_0>
<value_2_1>60</value_2_1>
<value_2_2>10</value_2_2>
<value_2_3>83</value_2_3>
</node_2>
.................................
.................................
<node_63>
<value_63_0>1</value_63_0>
<value_63_1>49</value_63_1>
<value_63_2>23</value_63_2>
<value_63_3>35</value_63_3>
</node_63>
</abc_99>
</record2>

First I would say that using structured element names like this is pretty poor XML design. That's relevant because when you do a join query in XPath or XQuery you're very dependent on the optimizer to find a fast execution path (e.g. a hash join), and the "weirder" your query is, the less likely the optimizer is to find a fast execution strategy.
I often start by converting "weird" XML into something more sanitary. For example in this case I would transform <value_a_0>5</value_1_0> into <value cat="a" seq="0">5</value>. That makes it easier to write your query and easier for the optimizer to recognize it, and the transformation phase is re-usable so you can apply it before any operations on the XML, not just this one.
If you're looking for better than O(n*m) performance on a join query, you need to look at the capabilities of your chosen XPath engine. Saxon-EE for example will do such optimizations, Saxon-HE won't. You're generally more likely to find advanced optimization in an XQuery engine than an XPath engine.
As for the detail of your query, I got lost with the requirement statement when you start talking about abc_x. I'm not sure what that refers to.

It seems like a task that can partially solved by grouping but as in your previous examples the poor use of XML elements names that all differ by index values that should be part of an element or attribute value and not part of the element name makes it harder to write succinct code:
let $abc-elements := $doc2/record2/*
for $node-element in record1/*
for $index in (1 to count($node-element[1]/*))
for $index-element in $node-element/*[position() = $index]
group by $index, $group-value := $index-element
where tail($index-element)
return
<group index="{$index}" value="{$group-value}">
{
let $suffixes := $index-element/../string((xs:integer(substring-after(local-name(), '_')) - 1)),
$relevant-abc-node-elements := $abc-elements/*[substring-after(local-name(), '_') = $suffixes]
return $relevant-abc-node-elements[* = $group-value]
}
</group>
https://xqueryfiddle.liberty-development.net/nbUY4kA

Related

Can't select XML attributes with Oxygen XQuery implementation; Oxygen XPath emits result

I learned that every Xpath expression is also a valid Xquery expression. I'm using Oxygen 16.1 with this sample XML:
<actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors>
My expression is:
//actor/#id
When I evaluate this expression in Oxygen with Xpath 3.0, I get exactly what I expect:
15
38
However, when I evaluate this expression with Xquery 3.0 (also 1.0), I get the message: "Your query returned an empty sequence.
Can anyone provide any insight as to why this is, and how I can write the equivalent Xquery statement to get what the Xpath statement did above?
Other XQuery implementations do support this query
If you want to validate that your query (as corrected per discussion in comments) does in fact work with other XQuery implementations when entered exactly as given in the question, you can run it as follows (tested in BaseX):
declare context item := document { <actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors> };
//actor/#id
Oxygen XQuery needs some extra help
Oxygen XML doesn't support serializing attributes, and consequently discards them from a result sequence when that sequence would otherwise be provided to the user.
Thus, you can work around this with a query such as the following:
//actor/#id/string(.)
data(//actor/#id)
Below applies to a historical version of the question.
Frankly, I would not expect //actors/#id to return anything against that data with any valid XPath or XQuery engine, ever.
The reason is that there's only one place you're recursing -- one // -- and that's looking for actors. The single / between the actors and the #id means that they need to be directly connected, but that's not the case in the data you give here -- there's an actor element between them.
Thus, you need to fix your query. There are numerous queries you could write that would find the data you wanted in this document -- knowing which one is appropriate would require more information than you've provided:
//actor/#id - Find actor elements anywhere, and take their id attribute values.
//actors/actor/#id - Find actors elements anywhere; look for actor elements directly under them, and take the id attribute of such actor elements.
//actors//#id - Find all id attributes in subtrees of actors elements.
//#id - Find id attributes anywhere in the document.
...etc.

Selecting multiple results from XQUERY query

I am trying to select multiple columns from a query, but so far, I can only manage to select one. So I'm basically stuck with either selecting one, or all of them.
Here's my expression, what I got so far, which select only (1) column:
let $y := doc("http://en.wikipedia.org/wiki/List_of_deaths_on_eight-thousanders")//table[preceding-sibling::h2//span[string() = "K2"]][1]
return $y/tr/td[2]/string()
I would love some explanation of how one would go about doing this, since there's almost no documentation of this lovely language.
How would you like the result to be returned? You could construct new elements, or concatenate strings. There are many ways that this could be accomplished.
Here's one way to get comma-separated values:
return $y/tr/fn:string-join( (td[2] | td[4]), ", " )
You can try it on zorb.io.
Update
(td[2] | td[4]) selects both elements, and passes them, as a sequence, to fn:string-join(). | is the XQuery union operator (and can be substituted for the keyword).
As far as documention, the functx site documents the standard library (all fn-prefixed functions), and has useful examples. And the specs are surprisingly readable.

xpath - matching value of child in current node with value of element in parent

Edit: I think I found the answer but I'll leave the open for a bit to see if someone has a correction/improvement.
I'm using xpath in Talend's etl tool. I have xml like this:
<root>
<employee>
<benefits>
<benefit>
<benefitname>CDE</benefitname>
<benefit_start>2/3/2004</benefit_start>
</benefit>
<benefit>
<benefitname>ABC</benefitname>
<benefit_start>1/1/2001</benefit_start>
</benefit>
</benefits>
<dependent>
<benefits>
<benefit>
<benefitname>ABC</benefitname>
</benefit>
</dependent>
When parsing benefits for dependents, I want to get elements present in the employee's
benefit element. So in the example above, I want to get 1/1/2001 for the dependent's
start date. I want 1/1/2001, not 2/3/2004, because the dependent's benefit has benefitname ABC, matching the employee's benefit with the same benefitname.
What xpath, relative to /root/employee/dependent/benefits/benefit, will yield the value of
benefit_start for the benefit under parent employee that has the same benefit name as the
dependent benefit name? (Note I don't know ahead of time what the literal value will be, I can't just look for 'ABC', I have to match whatever value is in the dependent's benefitname element.
I'm trying:
../../../benefits/benefit[benefitname=??what??]/benefit_start
I don't know how to refer to the current node's ancestor in the middle of
the xpath (since I think "." at the point I have ??what?? will refer to
the benefit node of the employee/benefits.
EDIT: I think what I want is "current()/benefitname" where the ??what?? is. Seems to work with saxon, I haven't tried it in the etl tool yet.
Your XML is malformed, and I don't think you've described your siduation very well (the XPath you're trying has a bunch of ../../s at the beginning, but you haven't said what the context node is, whether you're iterating through certain nodes, or what.
Supposing the current context node were an employee element, you could select benefit_starts that match dependent benefits with
benefits/benefit[benefitname = ../../dependent/benefits/benefit/benefitname]
/benefit_start
If the current context node is a benefit element in a dependents section, and you want to get the corresponding benefit_start for just the current benefit element, you can do:
../../../benefits/benefit[benefitname = current()/benefitname]/benefit_start
Which is what I think you've already discovered.

Prefix the result of a XPATH query

I use libxmljs to parse some html.
I have a xpath query which has an "or" conjunction to retrieve basically the information of two queries
Example
doc.find("//div[contains(#class,'important') or contains(#class,'overdue')]")
this returns all the divs with either important or overdue...
Can I prefix or see within my result set which comes from which condition?
The result could be an array with an index for the match 0 for the first condition and 1 for the 2... Is this possible...
Or how can I find out which result comes from which query condition...
Thanks for any help...
P.S.: this is a simplified exampled of a sequence of elements which either have an important or an overdue item ... both, one or none of them... So I cannot go by looking for every second entry ... etc
This is the result I want to get...
message:{},
message:{
.....
important: "some immportant text",
overdue: "overdue date,
.....
}
There is no way to know which clause of an or XPath query caused a particular result to be included. It's simply not information that's kept around.
You'll either need to do entirely separate queries for important and overdue, or do one large query to get the entire result set (as you are now) and then further test each result's class to find out which one it is.

Explain xpath and xquery in simple terms

I am new to programming. I know what XML is. Can anyone please explain in simple terms what xpath and xquery do Where are they used?
XPath is a way of locating specific elements in an XML tree.
For instance, given the following structure:
<myfarm>
<animal type="dog">
<name>Fido</name>
<color>Black</color>
</animal>
<animal type="cat">
<name>Mitsy</name>
<color>Orange</color>
</animal>
</myfarm>
XPath allows you to traverse the structure, such as:
/myfarm/animal[#type="dog"]/name/text()
which would give you "Fido"
XQuery is an XML query language that makes use of XPath to query XML structures. However it also allows for functions to be defined and called, as well as complex querying of data structures using FLWOR expressions. FLWOR allows for join functionality between data sets defined in XML.
FLWOR article from wikipedia
Sample XQuery (using some XPath) is:
declare function local:toggle-boolean($b as xs:string)
as xs:string
{
if ($b = "Yes") then "true"
else if ($b = "No") then "false"
else if ($b = "true") then "Yes"
else if ($b = "false") then "No"
else "[ERROR] # local:toggle-boolean"
};
<ResultXML>
<ChangeTrue>{ local:toggle-boolean(doc("file.xml")/article[#id="1"]/text()) }</ChangeTrue>
<ChangeNo>{ local:toggle-boolean(doc("file.xml")/article[#id="2"]/text()) }</ChangeNo>
</ResultXML>
XPath is a simple query language which serves to search in XML DOM. I think that it can be compared to SQL Select statements with databases. XPath can evaluate many programs which work with XML and has a mass usage. I recommend u to learn it.
XQuery is much more powerful and complicated it also offers many options how to transform result, it offers cycles etc. But also it is query language. It is also used as query language into XML databases. I think that this language has only specific usage and probably is not necessary to know it, in the beginning there will be enough if u know that it exists and what it can
There is simple explanation I hope that it is enough and understandable

Resources