I have a relatively simple DAX query which becomes very slow if I change my SUM(...) into SUMX(...).
The main measure has three versions
Version A: CALCULATE(SUM(...))
Version B: SUMX(..., IF some_column = constant, other_column)
Version C: SUMX(..., IF some_column = variable, other_column)
Finally, I have a FILTER option
Version A is always quick, with or without filter with all work done in storage engine
Version B is pretty quick too, also using the storage engine mostly
Version C is slow IF I add a filter and I don't get why it behaves differently from B, or why the filter has such a bad effect on it.
It uses a variable instead of a constant in the SUMX() and that seems to get resolved by the query engine, hence big delays. In my query context, the variable always evaluates to "Monthly".
In particular it becomes very slow if I add a filter to my query.
Can someone explains why B & C behave so differently?
How could I get B's performance with a variable instead of the constant?
Could I use a filter somehow to avoid the Query Engine kicking in?
I join screenshots of the Version C queries that use a lot of query engine when I add the filter
Without filter -> Storage Engine
With filter (Query Engine kicks in)
Adding the tables relationship as a diagram
In this particular case, it looks like I can solve my performance problem by "rephrasing" using CALCULATE() and a simple filter.
I have no explanation for the major speed difference between Versions B & C but version D performs very well with or without the filter
Related
I'm building a pure lucene-based implementation for legacy system that should locally (in-process) run arbitrary (more or less) queries. I receive queries in textual form, like term1:A OR term2:B. My default operator for queries is AND. For compatibility reasons I'm using lucene 5.5. I'm basing the solution on MemoryIndex single-document implementation.
My main problem is with how lucene deals with NOT clause: in query of sort
termA:(NOT 0) term2:(xxx) lucene would produce the following query:
+(-termA:0) +term2:xxx. While logically pretty sound, in practice this query shall always produce nothing since the negate-only clause always returns no_docs response and AND with it shall always produce empty result. The only workaround that I found (here on SOF) for this is to inject a MatchAllDocs clause together with the negative clause, like termA:(*:* NOT 0), but this is an error prone approach since it involves intricate clause parsing - the MatchAllDocs should only be injected if there are no positive clauses inside the parenthesis.
I'm looking for a more robust/generic approach to this problem, preferable some library that can parse/handle it for me, maybe something in contrib or similar.
I'd like to escape the queries that nhibernate generates when I use Contains.
I mean, supposing that I'm using SQL Server Contains("%"), NHibernate should generate like '[%]'.
I'd like to obtain this without using a different extension method (i.e. implementing and using MyOwnContain).
In this case the answer could be this:
Linq to NHibernate extensibility for custom string query operations?
Not possible with currently available versions of NHibernate, though the upcoming NH 4.1 will fix at least one of these relevant issues:
https://nhibernate.jira.com/browse/NH-3726 (escape for Like())
https://nhibernate.jira.com/browse/NH-3829 (escape for Contains(), etc.)
However, since the % wildcard can be disabled in your case by enclosing it in brackets, you should be able to do this yourself before calling Contains(). Just beware and prepare to handle the situation when the above is eventually implemented in NH.
I'm looking for confirmation/clarification with these LINQ expressions:
var context = new SomeCustomDbContext()
// LINQ to Entities?
var items = context.CustomItems.OrderBy(i => i.Property).ToList();
// LINQ to Objects?
var items2 = context.CustomItems.ToList().OrderBy(i => i.Property);
Am I correct in thinking the first method is LINQ to Entities where EF builds a more specific SQL statement to pass on, putting the ordering effort on on the database?
Is the second method LINQ to Objects where LINQ drags the whole collection into memory (the ToList() enumeration?) before ordering thus leaving the burden on the server side (the web server in this case)?
If this is the case, I can quickly see situations where L2E would be advantageous (ex. filtering/trimming collections before pulling them into memory).
But are there any other details/trade-offs I should be aware of, or times when "method 2" might be advantageous over the first method?
UPDATE:
Let's say we are not using EntityFramework, this is still true so long as the underlying repository/data source implements IQueryable<T> right? And if it doesn't both these statements result in LINQ to Objects operations in memory?
Yes.
Yes.
Yes.
You are correct that calling ToList() forces linq-to-entities to evaluate and return the results as a list. As you suspect, this can have huge performance implications.
There are cases where linq-to-entities cannot figure out how to parse what looks like a perfectly simple query (like Where(x => SomeFunction(x))). In these cases you often have no choice but to call ToList() and operate on the collection in memory.
In response to your update:
ToList() always forces everything ahead of it to evaluate immediately, as opposed to deferred execution. Take this example:
someEnumerable.Take(10).ToList();
vs
someEnumerable.ToList().Take(10);
In the second example, any deferred work on someEnumerable must be executed before taking the first 10 elements. If someEnumerable is doing something labor intensive (like reading files from the disk using Directory.EnumerateFiles()), this could have very real performance implications.
Am I correct in thinking the first method is LINQ to Entities where EF builds a more specific SQL statement to pass on, putting the ordering effort on on the database?
Yes
Is the second method LINQ to Objects where LINQ drags the whole collection into memory ... before ordering thus leaving the burden on the server side ...?
Yes
But are there any other details/trade-offs I should be aware of, or times when "method 2" might be advantageous over the first method?
There will be many times where Method 1 is not possible - usually when you have a complex filter or sort order that can't be directly translated to SQL (or more appropriately where EF does not support a direct SQL translation). Also since you can't transmit lazy-loaded IQueryables over-the-wire, any time you have to serialize a result you're going to have to materialize it first with ToList() or something similar.
The other thing to be aware of is that IQueryable makes no guarantees on either (a) the semantic reasoning of the underlying provider, or (b) how much of the set of IQueryable methods are implemented by the provider.
For example: -
EF does not support Last().
Nor does it support time-part comparisons of DateTimes into valid T-SQL.
It doesn't support FirstOrDefault() in subqueries.
In such circumstances you need to bring data back to the client and then perform further evaluation client-side.
You also need to have an understanding of "how" it parses the LINQ pipeline to generate (in the case of EF) T-SQL. So you sometimes have to think carefully about how you construct your LINQ queries in order to generate effective T-SQL.
Having said all that, IQueryable<> is an extremely powerful tool in the .NET framework and well worth getting more familiar with.
I am aware of few efforts in constructing Linq queries dynamically, such as this, and this.
None seem to be ideal as I would like to avoid putting expressions in a string, and omitting a where if it is not needed.
My main concern is that the query is optimized for the database, and dynamically omits unnecessary clauses whenever possible.
Are there any new developments in EF 4.0 for such scenarios?
UPDATE
here is one link i found very helpful:
http://www.albahari.com/nutshell/predicatebuilder.aspx
indeed, adding "And" filters dynamically is trivial, and adding "Or" filters can be done easily using predicate builder:
var predicate = PredicateBuilder.False<Product>();
predicate = predicate.Or (p => p.Description.Contains (temp));
and according to LinqPad the sql gets emitted accordingly to what filters were applied..
For omitting the Where cause (pseudocode, hope I understood your question correctly):
var query = IQueryable<Foo>();
if(someCondition)
query = query.Where(......);
var result = query.Select(.......);
For dynamic queries - I haven't heard about anything new. IMHO we will have to stay with strings. Can you come up with some better approach?
Now that LINQ is such an integral part of .NET, are their optimizations at the compiler level that would use the optimal path to get results?
For example, imagine you had an array of integers and wanted to get the lowest value. You could do this without LINQ using a foreach, but it's certainly easier to use the Min function in LINQ. Once this goes to the compiler with LINQ would you have been better off to skip LINQ altogether or does it essentially just convert it to something similar to a foreach?
The C# compiler doesn't do much at all - it just calls the methods you tell it to, basically.
You could argue that removing unnecessary Select calls is an optimization:
from x in collection
where x.Condition
select x
is compiled to collection.Where(x => x.Condition) instead of collection.Where(x => x.Condition).Select(x => x) as the compiler recognises the identity transformation as being redundant. (A degenerate query of the form from x in collection select x is immune to this optimization, however, to allow LINQ providers to make sure that any query goes through at least one of their methods.)
The LINQ to Objects Min method just does a foreach, yes. Various LINQ to Objects methods do perform optimization. For example, Count() will check whether the data source implements ICollection or ICollection<T> and use the Count property if so. As madgnome points out in a comment, I wrote a bit more about this in a blog post a while ago.
Of course, other LINQ providers can perform their own optimizations.