Explain xpath and xquery in simple terms - xpath

I am new to programming. I know what XML is. Can anyone please explain in simple terms what xpath and xquery do Where are they used?

XPath is a way of locating specific elements in an XML tree.
For instance, given the following structure:
<myfarm>
<animal type="dog">
<name>Fido</name>
<color>Black</color>
</animal>
<animal type="cat">
<name>Mitsy</name>
<color>Orange</color>
</animal>
</myfarm>
XPath allows you to traverse the structure, such as:
/myfarm/animal[#type="dog"]/name/text()
which would give you "Fido"
XQuery is an XML query language that makes use of XPath to query XML structures. However it also allows for functions to be defined and called, as well as complex querying of data structures using FLWOR expressions. FLWOR allows for join functionality between data sets defined in XML.
FLWOR article from wikipedia
Sample XQuery (using some XPath) is:
declare function local:toggle-boolean($b as xs:string)
as xs:string
{
if ($b = "Yes") then "true"
else if ($b = "No") then "false"
else if ($b = "true") then "Yes"
else if ($b = "false") then "No"
else "[ERROR] # local:toggle-boolean"
};
<ResultXML>
<ChangeTrue>{ local:toggle-boolean(doc("file.xml")/article[#id="1"]/text()) }</ChangeTrue>
<ChangeNo>{ local:toggle-boolean(doc("file.xml")/article[#id="2"]/text()) }</ChangeNo>
</ResultXML>

XPath is a simple query language which serves to search in XML DOM. I think that it can be compared to SQL Select statements with databases. XPath can evaluate many programs which work with XML and has a mass usage. I recommend u to learn it.
XQuery is much more powerful and complicated it also offers many options how to transform result, it offers cycles etc. But also it is query language. It is also used as query language into XML databases. I think that this language has only specific usage and probably is not necessary to know it, in the beginning there will be enough if u know that it exists and what it can
There is simple explanation I hope that it is enough and understandable

Related

Faster XPath expressions to execute queries from multiple XMLs

I have the two following XMLs and the problem statement is as follows.
Parse XML 1 and if subnode of any node_x contains 'a' in its name (like in value_a_0) and value_a_0 contains a specific number, parse XML 2 and go to node_x-1 for all abc_x in and compare the content of value_x-1_0/1/2/3 with certain entities.
If subnode of any node_x contains 'b' in its name (like in value_b_0) and value_b_0 contains a specific number(say 'm'), parse XML 2 and go to node_x+1 for all abc_x in and compare the content of value_x-1_0/1/2/3 with 'm'.
Example : For all the value_a_0 in record1 check if value_a_0 node contains 5. If so, which are the case for node_1 and node_9, go to record2/node_0 and record2/node_8 and compare the contents of value_0_0/1/2/3 whether they contains 5 or not. Similarly, for rest of the cases.
I was wondering what would be the best practice to solve it? Is there any hash-table approach in Xpath 3.0?
First XML
<record1>
<node_1>
<value_a_0>5</value_1_0>
<value_b_1>0</value_1_1>
<value_c_2>10</value_1_2>
<value_d_3>8</value_1_3>
</node_1>
.................................
.................................
<node_9>
<value_a_0>5</value_a_0>
<value_b_1>99</value_b_1>
<value_c_2>53</value_c_2>
<value_d_3>5</value_d_3>
</node_9>
</record1>
Second XML
<record2>
<abc_0>
<node_0>
<value_0_0>5</value_0_0>
<value_0_1>0</value_0_1>
<value_0_2>150</value_0_2>
<value_0_3>81</value_0_3>
</node_0>
<node_1>
<value_1_0>55</value_1_0>
<value_1_1>30</value_1_1>
<value_1_2>150</value_1_2>
<value_1_3>81</value_1_3>
</node_1>
.................................
.................................
<node_63>
<value_63_0>1</value_63_0>
<value_63_1>99</value_63_1>
<value_63_2>53</value_63_2>
<value_63_3>5</value_63_3>
</node_63>
</abc_0>
================================================
<abc_99>
<node_0>
<value_0_0>555</value_0_0>
<value_0_1>1810</value_0_1>
<value_0_2>140</value_0_2>
<value_0_3>80</value_0_3>
</node_0>
<node_1>
<value_1_0>555</value_1_0>
<value_1_1>1810</value_1_1>
<value_1_2>140</value_1_2>
<value_1_3>80</value_1_3>
</node_1>
<node_2>
<value_2_0>5</value_2_0>
<value_2_1>60</value_2_1>
<value_2_2>10</value_2_2>
<value_2_3>83</value_2_3>
</node_2>
.................................
.................................
<node_63>
<value_63_0>1</value_63_0>
<value_63_1>49</value_63_1>
<value_63_2>23</value_63_2>
<value_63_3>35</value_63_3>
</node_63>
</abc_99>
</record2>
First I would say that using structured element names like this is pretty poor XML design. That's relevant because when you do a join query in XPath or XQuery you're very dependent on the optimizer to find a fast execution path (e.g. a hash join), and the "weirder" your query is, the less likely the optimizer is to find a fast execution strategy.
I often start by converting "weird" XML into something more sanitary. For example in this case I would transform <value_a_0>5</value_1_0> into <value cat="a" seq="0">5</value>. That makes it easier to write your query and easier for the optimizer to recognize it, and the transformation phase is re-usable so you can apply it before any operations on the XML, not just this one.
If you're looking for better than O(n*m) performance on a join query, you need to look at the capabilities of your chosen XPath engine. Saxon-EE for example will do such optimizations, Saxon-HE won't. You're generally more likely to find advanced optimization in an XQuery engine than an XPath engine.
As for the detail of your query, I got lost with the requirement statement when you start talking about abc_x. I'm not sure what that refers to.
It seems like a task that can partially solved by grouping but as in your previous examples the poor use of XML elements names that all differ by index values that should be part of an element or attribute value and not part of the element name makes it harder to write succinct code:
let $abc-elements := $doc2/record2/*
for $node-element in record1/*
for $index in (1 to count($node-element[1]/*))
for $index-element in $node-element/*[position() = $index]
group by $index, $group-value := $index-element
where tail($index-element)
return
<group index="{$index}" value="{$group-value}">
{
let $suffixes := $index-element/../string((xs:integer(substring-after(local-name(), '_')) - 1)),
$relevant-abc-node-elements := $abc-elements/*[substring-after(local-name(), '_') = $suffixes]
return $relevant-abc-node-elements[* = $group-value]
}
</group>
https://xqueryfiddle.liberty-development.net/nbUY4kA

How to grab a piece of data which has a different xpath on different webpages?

So I am trying to grab a piece of data that is displayed in a different xpath on different pages.
if you will see the xpath of the IPA pronunction on wiktionary... https://en.wiktionary.org/wiki/foo you will see that the xpath is
//*[#id="mw-content-text"]/ul[1]/li[1]/span[4]
but if I got to another word, like https://en.wiktionary.org/wiki/bar then the xpath would be
//*[#id="mw-content-text"]/ul[1]/li[2]/span[5]
I cannot think of any way to reconcile these, is there something that I am missing?
The answer is simple. Never let a tool write any XPath for you. All tools get it wrong.
Look at the document's HTML source and write the appropriate XPath it yourself.
var result = document.evaluate("//*[#class = 'IPA']", document),
elem;
while (elem = result.iterateNext()) {
console.log(elem);
}
The above shows the simplest variant. It selects two occurrences of <span class="IPA"> on https://en.wiktionary.org/wiki/foo and quite a few more on https://en.wiktionary.org/wiki/bar.
Use a more specific expression to narrow down the results.

Can't select XML attributes with Oxygen XQuery implementation; Oxygen XPath emits result

I learned that every Xpath expression is also a valid Xquery expression. I'm using Oxygen 16.1 with this sample XML:
<actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors>
My expression is:
//actor/#id
When I evaluate this expression in Oxygen with Xpath 3.0, I get exactly what I expect:
15
38
However, when I evaluate this expression with Xquery 3.0 (also 1.0), I get the message: "Your query returned an empty sequence.
Can anyone provide any insight as to why this is, and how I can write the equivalent Xquery statement to get what the Xpath statement did above?
Other XQuery implementations do support this query
If you want to validate that your query (as corrected per discussion in comments) does in fact work with other XQuery implementations when entered exactly as given in the question, you can run it as follows (tested in BaseX):
declare context item := document { <actors>
<actor filmcount="4" sex="m" id="15">Anderson, Jeff</actor>
<actor filmcount="9" sex="m" id="38">Bishop, Kevin</actor>
</actors> };
//actor/#id
Oxygen XQuery needs some extra help
Oxygen XML doesn't support serializing attributes, and consequently discards them from a result sequence when that sequence would otherwise be provided to the user.
Thus, you can work around this with a query such as the following:
//actor/#id/string(.)
data(//actor/#id)
Below applies to a historical version of the question.
Frankly, I would not expect //actors/#id to return anything against that data with any valid XPath or XQuery engine, ever.
The reason is that there's only one place you're recursing -- one // -- and that's looking for actors. The single / between the actors and the #id means that they need to be directly connected, but that's not the case in the data you give here -- there's an actor element between them.
Thus, you need to fix your query. There are numerous queries you could write that would find the data you wanted in this document -- knowing which one is appropriate would require more information than you've provided:
//actor/#id - Find actor elements anywhere, and take their id attribute values.
//actors/actor/#id - Find actors elements anywhere; look for actor elements directly under them, and take the id attribute of such actor elements.
//actors//#id - Find all id attributes in subtrees of actors elements.
//#id - Find id attributes anywhere in the document.
...etc.

RethinkDB text search?

I am trying to study some rethinkdb for my next project. My backend is in Haskell and rethink db haskell driver looks a bit better then mongodb. So I want to try it.
My question is how do you do simple text search with rethinkdb?
Nothing too complex. Just find field which value contains these words.
I assume this should be built in as even a smallest blog app needs a search facility of some kind, right?.
So I am looking for a mongodb equivalent of:
var search = { "$text": { "$search": "some text" } };
Thank you.
EDIT
I am not looking for regular expressions and the match function.
It is extremely slow for more or less large sets.
I does not have any notion of indexes.
It does not have any notion of stemming.
With the rethinkdb driver documented here
run h $ table "table" # R.filter (\row -> match "some text" (row ! "field"))

Easy way to transform XPath with contains to equals check?

Is there an easy way to transform an XPath query (as string), like:
my/x/path[contains(sub/path, 'text')]
to an XPath query which uses equals instead of contains? Such that I can easily use the same query, one time with contains and another time with equals? Unfortunately there is no "equals" function in XPath...
You might differentiate between the two options via a flag:
my/x/path[ ($wantContains and contains(sub/path, 'text'))
or
(not($wantContains) and sub/path = 'text')
]

Resources