Optimizing XQuery interogation with XPath predicate - xpath

So, I was optimizing a query I carried over from SQL, and I ran into a bit of a performance issue when compared to how it used to work in sql.
Basically, my php script is sending between 2 and 5 sets of two (numeric) values.
These have to be compare against id and doc from my collection's elements. Of course, the fewer elements in the predicate, the faster the query
My for with predicate looks like this right now:
for $p in collection("/db/col1")//set1/page[(id eq val1 and doc eq altval1) or (id eq val2 and doc eq altval2) or (id eq val3 and doc eq altval3) or (id eq val4 and doc eq altval4) or (id eq val5 and doc eq altval5)]
I need to somehow write a predicate that changes depending on the number of values. I tried writing a function that writes the conditions and calling it in the predicate, depending on how many values are passed, but that didn't seem to work.
I would really appreciate if someone knows a workaround for this.
Edit: Removed a typo in the code.

If $val and $altval are two sequences of values, then you can write the generic predicate
SOMETHING[some $i in 1 to count($val) satisfies (id=$val[$i] and doc=$altval[$i]]
But I've no idea how well it will perform.

If you wanted to use a function in the predicate, then something like the following could possibly work for you:
xquery version "3.1";
declare variable $local:criteria := array {
("val1", "altval1"),
("val2", "altval2"),
("val3", "altval3"),
("val4", "altval4"),
("val5", "altval5")
};
declare function local:match($id, $doc) as xs:boolean {
array:size(
array:filter($local:criteria, function($x) {
$id eq $x[1] and $doc eq $x[2]
})
) eq 1
};
collection("/db/col1")//set1//page[local:match(id, doc)]
Note - I have not tested the performance of the above.
Also maybe worth mentioning that ancestor lookup in eXist-db is very fast due to its DLN node numbering. So it may be worth testing if //set1//page is slower than say //page[ancestor::set1].

I upvoted both answers since they both get the job done and I could clearly see an improvement. I don't want to select one over the other as I really think it's more a matter of taste at this point.
For my part, I found a third one which, specifically for this case is even faster. On the down side, it's horribly tedious, inelegant and very context specific. Also, while your answers can be adapted to several similar problems, this one only works in the case when you are 'triggering' the XQuery scripts externally. So here it goes:
I actually made FIVE different xql scripts, one that deals with 1 pair of values, one that deals with the first two pairs, on with the first three pairs etcetera.
So script one would contain:
for $p in collection("/db/col1")//set1/page[id eq val1 and doc1 eq altval1]
while in script five you would find something like the original:
for $p in collection("/db/col1")//set1/page[(id eq val1 and doc eq altval1) or (id eq val2 and doc eq altval2) or (id eq val3 and doc eq altval3) or (id eq val4 and doc eq altval4) or (id eq val5 and doc eq altval5)]
I then call them from my PHP script, depending on the number of parameters I need to send. I wouldn't attempt to scale this to more than five pairs, but for the moment it gets the job done.

Related

Power Query - Multiple OR statement with values

I've been doing research on this and I find a plethora of articles related to Text, but they don't seem to be working for me.
To be clear this formula works, I'm just looking to make it more efficient. My formula looks like:
if [organization_id] = 1 or [organization_id] = 2 or [organization_id] = 3 then "North" else if … where organization_id is of type "WholeNumber"
I'd like to simplify this by doing something like:
if [organization_id] in {1, 2, 3} then "North" else if …
I've tried wrapping in Parenthesis, Braces, & Brackets. Nothing seems to work. Most articles are using some form of text.replace function and mine is just a custom column.
Does MCode within Power Query have any efficiencies like this or do I have to write out each individual statement like the first line?
I've had success with the a List.Contains formulation:
List.Contains({1,2,3}, [organization_id])
The above checks if [organization_id] is in the list supplied in the first argument.
In some cases, you may not want to hardcode a list as shown above but reference a table column instead. For example,
List.Contains(TableWithDesiredIds[id_column], [organization_id])

Filter an collection of tuples

I'm playing with iterables and comprehension in Julia and tried to code simple problem: find all pairs of numbers less then 10 whose product is less then 10. This was my first try:
solution = filter((a,b)->a*b<10, product(1:10, 1:10))
collect(solution)
but I got error "wrong number of arguments". This is kind of expected because anonymous function inside filter expects two arguments but it gets one tuple.
I know I can do
solution = filter(p->p[1]*p[2]<10, product(1:10, 1:10))
but it doesn't look nice as the one above. Is there a way I can tell that (a,b) is argument of type tuple and use something similar to syntax in first example?
I don't think there's a way to do exactly as you'd like, but here are some alternatives you could consider for the anonymous function:
x->let (a,b)=x; a*b<10 end
x->((a,b)=x; a*b<10)
These can of course be made into macros if you like:
macro tup(ex)
#assert ex.head == :(->)
#assert ex.args[1].head == :tuple
arg = gensym()
quote
$arg -> ( $(ex.args[1]) = $arg; $(ex.args[2]) )
end
end
Then #tup (a, b) -> a * b < 10 will do as you like.
Metaprogramming in Julia is pretty useful and common for situations where you are doing something over and over and would like specialized syntax for it. But I would avoid this kind of metaprogramming if this were a one-off thing, because adding new syntax means learning new syntax and makes code harder to read.

XPath : Find following siblings that don't follow an order pattern

This is for C code detection. I'm trying to flag case statements that don't have a break. The hierarchy of the tree looks like this when there are multiple lines before the break statement. This is an example in C:
switch (x) {
case 1:
if (...) {...}
int y = 0;
for (...) {...}
break;
case 2:
It is somehow represented as this:
<switch>
<case>...</case>
<if>...</if>
<expression>...</expression>
<for>...</for>
<break>...</break>
<case>...</case>
</switch>
I need to find <case>s where a <break> exists after any number of lines, but before the next <case>.
This code only helps me find those where the break doesn't immediately follow the case:
//case [name(following-sibling::*[1]) != 'break']
..but when I try to use following-sibling::* it will find a break, but not necessarily before the next case.
How can I do this?
Select any case that has a following break and either no following case or where the position of the next break is less than the position of the next case. With the positions determined by running count() on the preceding siblings.
//case
[
following-sibling::break and
(
not(following-sibling::case) or
(
count(following-sibling::break[1]/preceding-sibling::*) <
count(following-sibling::case[1]/preceding-sibling::*)
)
)
]
To grab the other cases, those without breaks, just throw a big old not() in there like so:
//case
[not(
following-sibling::break and
(
not(following-sibling::case) or
(
count(following-sibling::break[1]/preceding-sibling::*) <
count(following-sibling::case[1]/preceding-sibling::*)
)
)
)]
I agree with #PeterHall, It would be better to restructure the XML into something more closely representing the abstract syntax tree of the C grammar. You can do this easily enough (for this case) with XSLT grouping:
<xsl:for-each-group select="*" group-starting-with="case">
<case>
<xsl:copy-of select="current-group()[not(self::case)]"/>
</case>
</xsl:for-each-group>
You can then find cases with no break as switch/case[not(break)].
I think you are struggling because your XML format does not really model the problem very well. It would be much easier if the other statements were nested inside the <case> elements, instead of being siblings, then you could just use switch/case[break].
With your current structure, it's easiest to start by finding the <break> and then work backwards to find the matching <case>. As #LarsH pointed out, my original expression would find some additional clauses. It can't really be modified to fix that, unless you restrict it to find just the first case:
switch/break/preceding-sibling::case[1]
#derp's answer is better, and can find both cases with and without breaks.
Derp's answer is correct. But I'll just add another. This selects case elements that do have a break:
//case[generate-id(.) =
generate-id(following-sibling::break[1]/preceding-sibling::case[1])]
In otherwords, this selects case elements for which this is true:
The context element is identical to the first case element preceding the next break element (considering siblings only).
If you have a lot of case statements, this variant could be faster than using count(). But you never know for sure unless you test it with the relevant data using the relevant XPath processor.
BTW, the . in generate-id(.) is not required, as the argument defaults to . anyway. But I prefer to make it explicit, for readability.

Recursively (?) compose LINQ predicates into a single predicate

(EDIT: I have asked the wrong question. The real problem I'm having is over at Compose LINQ-to-SQL predicates into a single predicate - but this one got some good answers so I've left it up!)
Given the following search text:
"keyword1 keyword2 keyword3 ... keywordN"
I want to end up with the following SQL:
SELECT [columns] FROM Customer
WHERE
(Customer.Forenames LIKE '%keyword1%' OR Customer.Surname LIKE '%keyword1%')
AND
(Customer.Forenames LIKE '%keyword2%' OR Customer.Surname LIKE '%keyword2%')
AND
(Customer.Forenames LIKE '%keyword3%' OR Customer.Surname LIKE '%keyword3%')
AND
...
AND
(Customer.Forenames LIKE '%keywordN%' OR Customer.Surname LIKE '%keywordN%')
Effectively, we're splitting the search text on spaces, trimming each token, constructing a multi-part OR clause based on each token, and then AND'ing the clauses together.
I'm doing this in Linq-to-SQL, and I have no idea how to dynamically compose a predicate based on an arbitrarily-long list of subpredicates. For a known number of clauses, it's easy to compose the predicates manually:
dataContext.Customers.Where(
(Customer.Forenames.Contains("keyword1") || Customer.Surname.Contains("keyword1")
&&
(Customer.Forenames.Contains("keyword2") || Customer.Surname.Contains("keyword2")
&&
(Customer.Forenames.Contains("keyword3") || Customer.Surname.Contains("keyword3")
);
but I want to handle an arbitrary list of search terms. I got as far as
Func<Customer, bool> predicate = /* predicate */;
foreach(var token in tokens) {
predicate = (customer
=> predicate(customer)
&&
(customer.Forenames.Contains(token) || customer.Surname.Contains(token));
}
That produces a StackOverflowException - presumably because the predicate() on the RHS of the assignment isn't actually evaluated until runtime, at which point it ends up calling itself... or something.
In short, I need a technique that, given two predicates, will return a single predicate composing the two source predicates with a supplied operator, but restricted to the operators explicitly supported by Linq-to-SQL. Any ideas?
I would suggest another technique
you can do:
var query = dataContext.Customers;
and then, inside a cycle do
foreach(string keyword in keywordlist)
{
query = query.Where(Customer.Forenames.Contains(keyword) || Customer.Surname.Contains(keyword));
}
If you want a more succinct and declarative way of writing this, you could also use Aggregate extension method instead of foreach loop and mutable variable:
var query = keywordlist.Aggregate(dataContext.Customers, (q, keyword) =>
q.Where(Customer.Forenames.Contains(keyword) ||
Customer.Surname.Contains(keyword));
This takes dataContext.Customers as the initial state and then updates this state (query) for every keyword in the list using the given aggregation function (which just calls Where as Gnomo suggests.

Is it better to use NOT or <> when comparing values?

Is it better to use NOT or to use <> when comparing values in VBScript?
is this:
If NOT value1 = value2 Then
or this:
If value1 <> value2 Then
better?
EDIT:
Here is my counterargument.
When looking to logically negate a Boolean value you would use the NOT operator, so this is correct:
If NOT boolValue1 Then
and when a comparison is made in the case of the first example a Boolean value is returned. either the values are equal True, or they are not False. So using the NOT operator would be appropriate, because you are logically negating a Boolean value.
For readability placing the comparison in parenthesis would probably help.
The latter (<>), because the meaning of the former isn't clear unless you have a perfect understanding of the order of operations as it applies to the Not and = operators: a subtlety which is easy to miss.
Because "not ... =" is two operations and "<>" is only one, it is faster to use "<>".Here is a quick experiment to prove it:
StartTime = Timer
For x = 1 to 100000000
If 4 <> 3 Then
End if
Next
WScript.echo Timer-StartTime
StartTime = Timer
For x = 1 to 100000000
If Not (4 = 3) Then
End if
Next
WScript.echo Timer-StartTime
The results I get on my machine:
4.783203
5.552734
Agreed, code readability is very important for others, but more importantly yourself. Imagine how difficult it would be to understand the first example in comparison to the second.
If code takes more than a few seconds to read (understand), perhaps there is a better way to write it. In this case, the second way.
The second example would be the one to go with, not just for readability, but because of the fact that in the first example, If NOT value1 would return a boolean value to be compared against value2. IOW, you need to rewrite that example as
If NOT (value1 = value2)
which just makes the use of the NOT keyword pointless.

Resources