Should i use .ToList().Distinct() or .Distinct().ToList()? - linq

When it comes to performance, should i use .ToList().Distinct() or .Distinct().ToList() ?
Both extension methods generate the same SQL query or not?
It seems that the second approach should perform better but is that true?
Are there any advantages or disadvantages of using one over another?

Short Answer: .Distinct().ToList()
Explain:
ToList: It converts an IEnumerable<T> to a List<T>, It's called Immediate execution. So you should filter all data in DB Server first instead of get all data then Distinct in "client-side"

It depends. If it is a query that is executed against a List<T> or a Dictionary<K,V> then the latter (Distinct().ToList()) would be preferrable.
The reason being, that if you do .ToList().Distinct(), Distinct() returns an IEnumerable that has to be executed again to get a real collection. In essence, you create two collections, but you would never use the first one.
There is a situation however where .ToList().Distinct() can be preferrable and that is if you are working with a Object-to-Relational mapper (see: EntityFramework) and you want to fetch all rows from a database table (maybe to populate a cache in the background or to use up less CPU on the database) and then do the .Distinct() operation locally.

Your mention of SQL suggests that your datasource is a DBContext of some kind.
In that situation, by definition, once you have done .ToList() all available data has been converted to objects in .NET Memory. Doing a .Distinct() after that can only run in .NET memory - it will run as if there is no database.
The SQL query for the above is definitely not the same as for .Distinct().ToList(), which will let the database do the DISTINCT operation.
To achieve the best performance, the best thing to do is .Distinct().ToList().

Related

Is better Linq or SQL query for complex calculations and aggregations?

We must create and show at runtime (asp.net mvc) some complex reports from Oracle tables data with millions of records. The reports data must be obtained from groupings and little complex calculations.
So is it better for performance and maintainability of code that do these groupings and calculations via sql query (pl/sql) or via linq?
Thanks for your kindle reply
So is it better for performance and maintainability of code that do
these groupings and calculations via sql query (pl/sql) or via linq?
It depends on what you mean by via linq. If you mean that you fetch the complete table to local memory and then use linq statements to extract the result that you want, then of course SQL statements are faster.
However, if you mean that you use Entity Framework, or something similar, then the answer is not a easy to give.
If you use Entity Framework (or some clone), your tables will be represented by IQueryable<...> instead of IEnumerable<...>. An IQueryable has an Expression and a Provider. The Expression represents the query that must be performed. The Provider knows which system must execute the query (usually a Database Management System) and how to communicate with this system. When the query must be executed, it is the task of the Provider to translate the Expression into the language that the system knows (usually something SQL-like) and to execute the SQL-query.
There are two kinds of IQueryable LINQ statements: those that return an IQueryable<...> of something, and those that return a TResult. The ones that return IQueryable only change the Expression. They are functions that use deferred execution.
Function that do not return an IQueryable, are ToList(), FirstOrDefault(), Any(), Max(), etc. Internally they will call functions that will GetEnumerator() (usually via a foreach), which orders the Provider to translate the Expression and execute the query.
Back to your question
So which one is more efficient, entity framework or SQL? Efficiency is not only the time to perform the queries, it is also the development/testing time, for the first version and for future changes in the software.
If you use an entity-framework (-clone), the SQL-queries created from the Expressions are pretty efficient, depending on the framework manufacturer. If you look at the code, then sometimes the SQL query is not the optimal one, although you'll have to be a pretty good SQL-programmer to improve most queries.
The big advantage above using Entity Framework and LINQ queries above SQL statements is that development times will be shorter. The syntax of the LINQ statements is checked at compile time, SQL statements at run-time. Development and test periods will be shorter.
It is easy to reuse LINQ statements, while SQL statements almost always have to be written especially for the query you want to execute. LINQ statements can be tested without a database on any sequence of items that represent your tables.
My Advice
For most queries you won't notice any difference in execution time between the entity framework query or the SQL query.
If you expect complicated queries and future changes, I'd go for entity framework. With main argument the shorter development time, the better testing possibilities, and the better maintainability.
If you detect some queries where you notice that the execution time is too long, you can always decide to bypass entity framework by executing a SQL query instead of using LINQ.
If you've wrapped your DbContext in a proper repository, where you hide the use cases from their implementations, the users of your repository won't notice the difference.

How performance can change using greedy LINQ operators?

Is it performant wise to use greedy LINQ operators such as ToList,ToLookUp,Distinct etc?
What would be a best practice(s) for LINQ query execution?
You often use for your objects List<> or making all your objects lists to IEnumerable<>. I know the latest gives more flexibility.
When working with memory (LINQ to Objects) it's ok to always use deffered loading, cause you can access it whenever you need without fear that tha data changed, added or inserted as the reference will execute the query as soon as you need access. But this changes with database LINQ queries such LINQ to EF.
Would like a StackOverflow users opinion.
Thank you!
What would be a best practice(s) for LINQ query execution?
A List may be accessed by index, a Lookup may be accessed by Key. These types are obviously serializable across a WCF boundary. A deferred IEnumerable doesn't do these things well.
For EF or LinqToSql, one must run their queries before the DataContext or whatever holds the SqlConnection gets disposed.
In my code, I use deferred IEnumerables only for method scoped variables when convenient. I use List for properties (sometimes the property constructs the List, but usually it's just backed by an instance) and method return types. Since I'm doing comparatively expensive things (like accessing the database or using WCF), the performance of eagerly executing in-memory Linq queries has never been an issue.
The final authority on any performance question is: how does it measure?

What are the best practices for determining when (or not) to pre-compile a Linq Query? [duplicate]

I have a table:
-- Tag
ID | Name
-----------
1 | c#
2 | linq
3 | entity-framework
I have a class that will have the following methods:
IEnumerable<Tag> GetAll();
IEnumerable<Tag> GetByName();
Should I use a compiled query in this case?
static readonly Func<Entities, IEnumerable<Tag>> AllTags =
CompiledQuery.Compile<Entities, IEnumerable<Tag>>
(
e => e.Tags
);
Then my GetByName method would be:
IEnumerable<Tag> GetByName(string name)
{
using (var db = new Entities())
{
return AllTags(db).Where(t => t.Name.Contains(name)).ToList();
}
}
Which generates a SELECT ID, Name FROM Tag and execute Where on the code. Or should I avoid CompiledQuery in this case?
Basically I want to know when I should use compiled queries. Also, on a website they are compiled only once for the entire application?
You should use a CompiledQuery when all of the following are true:
The query will be executed more than once, varying only by parameter values.
The query is complex enough that the cost of expression evaluation and view generation is "significant" (trial and error)
You are not using a LINQ feature like IEnumerable<T>.Contains() which won't work with CompiledQuery.
You have already simplified the query, which gives a bigger performance benefit, when possible.
You do not intend to further compose the query results (e.g., restrict or project), which has the effect of "decompiling" it.
CompiledQuery does its work the first time a query is executed. It gives no benefit for the first execution. Like any performance tuning, generally avoid it until you're sure you're fixing an actual performance hotspot.
2012 Update: EF 5 will do this automatically (see "Entity Framework 5: Controlling automatic query compilation") . So add "You're not using EF 5" to the above list.
Compiled queries save you time, which would be spent generating expression trees. If the query is used often and you'll save the compiled query, you should definitely use it. I had many cases when the query parsing took more time than the actual round trip to the database.
In your case, if you are sure that it would generate SELECT ID, Name FROM Tag without the WHERE case (which I doubt, as your AllQueries function should return IQueryable and the actual query should be made only after calling ToList) - you shouldn't use it.
As someone already mentioned, on bigger tables SELECT * FROM [someBigTable] would take very long and you'll spend even more time filtering that on the client side. So you should make sure that your filtering is made on the database side, no matter if you are using compiled queries or not.
compiled queries are more helpfull with linq queries with large expression trees say complex queries to gain performance over building expression tree again and again while reusing query. in your case i guess it will save a very little time.
Compiled queries are compiled when the application is compiled and every time you reuse a query often or it is complex you should definitely try compiled queries to make execution faster.
But I would not go for it on all queries as it is a little more code to write and for simple queries it might not be worthwhile.
But for maximum performance you should also evaluate Stored Procedures where you do all the processing on the database server, even if Linq tries to push as much of the work to the db as possible you will have situations where a stored procedure will be faster.
Compiled queries offer a performance improvement, but it's not huge. If you have complex queries, I'd rather go with a stored procedure or a view, if possible; letting the database do it's thing might be a better approach.

NHibernate Criteria query on in-memory collection of entities

I would like to apply a Criteria query to an in-memory collection
of entities, instead of on the database. Is this possible?
To have Criteria API work like LINQ? Or alternatively, convert
Criteria query to LINQ query.
Thanks!
I don't believe you can use Criteria to query against an in-memory collection and come to think about it it doesn't seem to make much sense. If I'm understanding everything correctly you've already queried against your database. I'd suggest to either tune your original query (whichever method you choose) to include all of your filters. Or you could use LINQ (as you suggested) to refine your results.
Also, what's your reasoning for wanting to query from memory?
It sounds like you're rolling your own caching mechanism. I would highly recommend checking out NHibernate's 2nd level cache. It handles many complex scenarios gracefully such as invalidating query results on updates to the underlying tables.
http://ayende.com/Blog/archive/2009/04/24/nhibernate-2nd-level-cache.aspx

Repository taking linq expression for filtering

I am considering refactoring a repository I have to improve its flexibility and reduce the method count.
Where there are the following methods:
Collection GetAllUsersByRole(Role role)
User GetUserByuserName(string userName)
...I would like to have a single method taking a Linq expression:
ICollection GetUsers(Expression e)
{
//retrieve user collection from store
//apply expression to collection and return
}
Is this a reasonable approach? Presumably I'd lose some efficiency because the full users collection would need to be retrieved and filtered every time, rather than retrieving a subset of users according to some hard-coded criteria?
Edit: NHibernate provides ORM in my implementation.
You really want to take an Expression as the argument to that method.
As far as performance, it really comes down to how far you want to go with it. The simplest method is bringing all the objects into memory and then filtering with the predicate expression.
On the other hand, you mention some sort of criteria. I have no idea what your back end data system is, but you can take these passed filters and transform them into your criteria. This is essentially what Linq to SQL and Linq to Entities does, but hopefully the range of possibilities you need to support is significantly smaller. If not, it might make sense to switch to one of the ORM tools if you want to take this approach.
This is not a very reasonable approach, cos typically will cost you a lot of performance issues. If you use data access technology that accepts LINQ queries than you just can use this query (expression) with it. This can be IQueryable for LINQ to SQL or ObjectQuery for EntityFramework. It also can be ICriteria (without linq support) for nHibernate. All modern ORM tools have its own expression API, so you just need to use it. If you have custom Data Access layer you will need to write your own API for creating criterias, for example Query Object.

Resources