Understanding for-loop in XQuery to count occurrences in BaseX - for-loop

I am trying to count the occurrences of an XML structure in BaseX.
declare variable $a := 0;
for $node in db:open("My_DB")/my/xml//path
$a += 1
return $a
When running this, BaseX returns the error: Incomplete FLWOR expression: expecting 'return'.
I know that I can count with this simple function:
count(db:open("My_DB")/my/xml//path)
But there are two reasons zhy I am trying to do this with a for loop:
I have been told by my supervisor that a for loop is faster
In the future I may want to execute more operations per hit (in the for loop)
So the question is: how can I count elements in a for loop with XQuery using BaseX.

As XQuery is a functional language, it’s not possible to reassign other values to a function. However, you can use fold-left to increment values in a loop:
fold-left(db:open("My_DB")/my/xml//path, 0, function($result, $curr) {
$result + 1
})
The execution time for count() depends on the implementation of XQuery. In BaseX, count() is usually much faster than a loop, because it can in many cases be accelerated by lookups in the database statistics.

Related

Performance of local variable vs. array access

I was doing some benchmarking of Perl performance, and ran into a case that I thought was somewhat odd. Suppose you have a function which uses a value from an array multiple times. In this case, you often see some code as:
sub foo {
my $value = $array[17];
do_something_with($value);
do_something_else_with($value);
}
The alternative is not to create a local variable at all:
sub foo {
do_something_with($array[17]);
do_something_else_with($array[17]);
}
For readability, the first is clearer. I assumed that performance would be at least equal (or better) for the first case too - array lookup requires a multiply-and-add, after all.
Imagine my surprise when this test program showed the opposite. On my machine, re-doing the array lookup is actually faster than storing the result, until I increase ITERATIONS to 7; in other words, for me, creating a local variable is only worthwhile if it's used at least 7 times!
use Benchmark qw(:all);
use constant { ITERATIONS => 4, TIME => -5 };
# sample array
my #array = (1 .. 100);
cmpthese(TIME, {
# local variable version
'local_variable' => sub {
my $index = int(rand(scalar #array));
my $val = $array[$index];
my $ret = '';
for (my $i = 0; $i < ITERATIONS; $i ++) {
$ret .= $val;
}
return $ret;
},
# multiple array access version
'multi_access' => sub {
my $index = int(rand(scalar #array));
my $ret = '';
for (my $i = 0; $i < ITERATIONS; $i ++) {
$ret .= $array[$index];
}
return $ret;
}
});
Result:
Rate local_variable multi_access
local_variable 245647/s -- -5%
multi_access 257907/s 5% --
It's not a HUGE difference, but it brings up my question: why is it slower to create a local variable and cache the array lookup, than to do the lookup again? Reading other S.O. posts, I've seen that other languages / compilers do have the expected outcome, and sometimes even transform these into the same code. What is Perl doing?
I've done more poking around at this today, and what I've determined is that scalar assignment of any sort is an expensive operation, relative to the overhead of one-deep array lookup.
This seems like it's just restating the initial question, but I feel I have found more clarity. If, for example, I modify my local_variable subroutine to do another assignment like so:
my $index = int(rand(scalar #array));
my $val = 0; # <- this is new
$val = $array[$index];
my $ret = '';
...the code suffers an additional 5% speed penalty beyond the single-assignment version - even though it does nothing but a dummy assignment to the variable.
I also tested to see if scope caused setup/teardown of $var to impede performance, by switching it to global instead of local scoped one. The difference is negligible (see comments to #zdim above), pointing away from construct/destruct as the performance bottleneck.
In the end, my confusion was based on faulty assumptions that scalar assignment should be fast. I am used to working in C, where copying a value to a local variable is an extremely quick operation (1-2 asm instructions).
As it turns out, this is not the case in Perl (though I don't know exactly why, it's ok). Scalar assignment is a relatively "slow" operation... Whatever Perl internals are doing to get at the nth element of an Array object is actually quite fast by comparison. The "multiply and add" I mentioned in the initial post is still far less work than the code for scalar assignment.
That is why it takes so many lookups to match the performance of caching the result: simply assigning to the "cache" variable is ~7 times slower (for my setup).
Let's first turn the statement: Caching the lookup is expected to be faster as it avoids the repeated lookups, even as it does cost some, and it starts being faster once more than 7 lookups are done. Now that's not so shocking, I think.
As to why it's slower for fewer than seven iterations ... I'll guess that the cost of the scalar creation is still greater than those few lookups. It is surely greater than one lookup, yes? How about two, then? I'd say that "a few" may well be a good measure.

Compare 43 variables in all possible ways

I am trying to figure out which method is the best way to cross compare 43 variables (data sets, data)
I need to compare variable 1 with variable 2,3,4,5,6,7....43 and then compare variable 2 with variable 1,3,4,5,6,7....43 and so on, to variable no. 43.
I think i should use some kind of a loop, but i am clueless how to perform this operation efficient.
I think I just need some kind of pseudo code. Either way I want to do this in a do-file in Stata.
Assuming e.g. variables var1-var43 and that the "comparison" between the first and the second differs from that between the second and the first, which is what your question implies, then
forval i = 1/43 {
forval j = 1/43 {
if `i' != `j' {
<code for comparison between var`i' and var`j'>
}
}
}
With other variable names, foreach might be better.
As #NickCox suggested, you could use a O(NxN) nested loop. If that takes too long, which it could if your "43" is actually 1000, then there's a better way. Sort each list (indirectly), which is O(N logN), and run a merge-order loop, which is O(N), so altogether it is O(N logN).

how to optimize contains in oracle sql query

I need to use oracle Contains function in a query like this:
select *
from iindustrialcasehistory B
where CONTAINS(B.ItemTitle, '%t1%') > 0 OR
CONTAINS(B.ItemTitle, '%t2%') > 0
I've defined context index for ItemTitle column, but execution time is about a minute!whereas i need it to be executed in less than a second!
thanks for any execution time reduction guide in advanced!
Contains searches for all appearances of substring in a string, that’s why better use instr() instead, because it searches only first occurrence of substring, what's supposed to be faster.
Then you can build index for function Instr(B.ItemTitle,'t1') + Instr(B.ItemTitle,'t1').
And use this function value > 0 in a query after that.
You can see more details about using index with instr function here.

XPath :: running counter two levels

Using the count(preceding-sibling::*) XPath expression one can obtaining incrementing counters. However, can the same also be accomplished in a two-levels deep sequence?
example XML instance
<grandfather>
<father>
<child>a</child>
</father>
<father>
<child>b</child>
<child>c</child>
</father>
</grandfather>
code (with Saxon HE 9.4 jar on the CLASSPATH for XPath 2.0 features)
Trying to get an counter sequence of 1,2 and 3 for the three child nodes with different kinds of XPath expressions:
XPathExpression expr = xpath.compile("/grandfather/father/child");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
System.out.printf("child's index is: %s %s %s, name is: %s\n"
,xpath.compile("count(preceding-sibling::*)").evaluate(node)
,xpath.compile("count(preceding-sibling::child)").evaluate(node)
,xpath.compile("//child/position()").evaluate(doc)
,xpath.compile(".").evaluate(node));
}
The above code prints:
child's index is: 0 0 1, name is: a
child's index is: 0 0 1, name is: b
child's index is: 1 1 1, name is: c
None of the three XPaths I tried managed to produce the correct sequence: 1,2,3. Clearly it can trivially be done using the i loop variable but I want to accomplish it with XPath if possible. Also I need to keep the basic framework of evaluating an XPath expression to get all the nodes to visit and then iterating on that set since that's the way the real application I work on is structured. Basically I visit each node and then need to evaluate a number of XPath expressions on it (node) or on the document (doc); one of these XPAth expressions is supposed to produce this incrementing sequence.
Use the preceding axis with a name test instead.
count(preceding::child)
Using XPath 2.0, there is a much better way to do this. Fetch all <child/> nodes and use the position() function to get the index:
//child/concat("child's index is: ", position(), ", name is: ", text())
You don't say efficiency is important, but I really hate to see this done with O(n^2) code! Jens' solution shows how to do that if you can use the result in the form of a sequence of (position, name) pairs. You could also return an alternating sequence of strings and numbers using //child/(string(.), position()): though you would then want to use the s9api API rather than JAXP, because JAXP can only really handle the data types that arise in XPath 1.0.
If you need to compute the index of each node as part of other processing, it might still be worth computing the index for every node in a single initial pass, and then looking it up in a table. But if you're doing that, the simplest way is surely to iterate over the result of //child and build a map from nodes to the sequence number in the iteration.

Lua - why for loop limit is not calculated dynamically?

Ok here's a basic for loop
local a = {"first","second","third","fourth"}
for i=1,#a do
print(i.."th iteration")
a = {"first"}
end
As it is now, the loop executes all 4 iterations.
Shouldn't the for-loop-limit be calculated on the go? If it is calculated dynamically, #a would be 1 at the end of the first iteration and the for loop would break....
Surely that would make more sense?
Or is there any particular reason as to why that is not the case?
The main reason why numerical for loops limits are computed only once is most certainly for performance.
With the current behavior, you can place arbitrary complex expressions in for loops limits without a performance penalty, including function calls. For example:
local prod = 1
for i = computeStartLoop(), computeEndLoop(), computeStep() do
prod = prod * i
end
The above code would be really slow if computeEndLoop and computeStep required to be called at each iteration.
If the standard Lua interpreter and most notably LuaJIT are so fast compared to other scripting languages, it is because a number of Lua features have been designed with performance in mind.
In the rare cases where the single evaluation behavior is undesirable, it is easy to replace the for loop with a generic loop using while end or repeat until.
local prod = 1
local i = computeStartLoop()
while i <= computeEndLoop() do
prod = prod * i
i = i + computeStep()
end
The length is computed once, at the time the for loop is initialized. It is not re-computed each time through the loop - a for loop is for iterating from a starting value to an ending value. If you want the 'loop' to terminate early if the array is re-assigned to, you could write your own looping code:
local a = {"first", "second", "third", "fourth"}
function process_array (fn)
local inner_fn
inner_fn =
function (ii)
if ii <= #a then
fn(ii,a)
inner_fn(1 + ii)
end
end
inner_fn(1, a)
end
process_array(function (ii)
print(ii.."th iteration: "..a[ii])
a = {"first"}
end)
Performance is a good answer but I think it also makes the code easier to understand and less error-prone. Also, that way you can (almost) be sure that a for loop always terminates.
Think about what would happen if you wrote that instead:
local a = {"first","second","third","fourth"}
for i=1,#a do
print(i.."th iteration")
if i > 1 then a = {"first"} end
end
How do you understand for i=1,#a? Is it an equality comparison (stop when i==#a) or an inequality comparison (stop when i>=#a). What would be the result in each case?
You should see the Lua for loop as iteration over a sequence, like the Python idiom using (x)range:
a = ["first", "second", "third", "fourth"]
for i in range(1,len(a)+1):
print(str(i) + "th iteration")
a = ["first"]
If you want to evaluate the condition every time you just use while:
local a = {"first","second","third","fourth"}
local i = 1
while i <= #a do
print(i.."th iteration")
a = {"first"}
i = i + 1
end

Resources