I've seen a lot of people talking about IQueryable and I haven't quite picked up on what all the buzz is about. I always work with generic List's and find they are very rich in the way you can "query" them and work with them, even run LINQ queries against them.
I'm wondering if there is a good reason to start considering a different default collection in my projects.
The IQueryable interface allows you to define parts of a query against a remote LINQ provider (typically against a database, but doesn't have to be) in multiple steps, and with deferred execution.
E.g. your database layer could define some restriction (e.g. based on permissions, security - whatever) by adding a .Where(x => x.......) clause to your query. But this doesn't get executed just yet - e.g. you're not retrieving 150'000 rows that match that criteria.
Instead, you pass up the IQueryable interface to the next level, the business layer, where you might be adding additional requirements and where clauses to your query - again, nothing gets executed just yet, you're also not tossing out 80'000 of your 150'000 rows you retrieved - you're just defining additional query criteria.
And the UI layer might do the same thing, e.g. based on user input in a form or something.
The magic is that you're passing the IQueryable interface through all the layers, adding additional critieria to it - but it doesn't get executed / evaluated until you actually force it. This also means you're not needlessly selecting and retrieving tons of data which you end up discarding afterwards.
You can't really do that with a classic static list - you have to pick the data, possibly discarding a lot of it again later on in the process - you have a static list, after all.
IQueryable allows you to make queries using LINQ, just like the LINQ to Object queries, where the queries are actually "compiled" and run elsewhere.
The most common implementations work against databases. If you use List<T> and LINQ to Objects, you load the entire "table" of data into memory, then run your query against it.
By using IQueryable<T>, the LINQ provide can "translate" your LINQ statement into actual SQL code, and run it on the database. The results can be returned to you and enumerated.
This is much, much more efficient, especially if you're working in N-Tiered systems.
LINQ queries against IEnumerable<T> produce delegates (methods) which, when invoked, perform the described query.
LINQ queries against IQueryable<T> produce expression trees, a data structure which represents the code that produced the query. LINQ providers such as LINQ to SQL interpret these data structures, generating the same query on the target platform (T-SQL in this case).
For an example of how the compiler interprets the query syntax against IQueryable<T>, see my answer to this question:
Building Dynamic LINQ Queries based on Combobox Value
Related
We must create and show at runtime (asp.net mvc) some complex reports from Oracle tables data with millions of records. The reports data must be obtained from groupings and little complex calculations.
So is it better for performance and maintainability of code that do these groupings and calculations via sql query (pl/sql) or via linq?
Thanks for your kindle reply
So is it better for performance and maintainability of code that do
these groupings and calculations via sql query (pl/sql) or via linq?
It depends on what you mean by via linq. If you mean that you fetch the complete table to local memory and then use linq statements to extract the result that you want, then of course SQL statements are faster.
However, if you mean that you use Entity Framework, or something similar, then the answer is not a easy to give.
If you use Entity Framework (or some clone), your tables will be represented by IQueryable<...> instead of IEnumerable<...>. An IQueryable has an Expression and a Provider. The Expression represents the query that must be performed. The Provider knows which system must execute the query (usually a Database Management System) and how to communicate with this system. When the query must be executed, it is the task of the Provider to translate the Expression into the language that the system knows (usually something SQL-like) and to execute the SQL-query.
There are two kinds of IQueryable LINQ statements: those that return an IQueryable<...> of something, and those that return a TResult. The ones that return IQueryable only change the Expression. They are functions that use deferred execution.
Function that do not return an IQueryable, are ToList(), FirstOrDefault(), Any(), Max(), etc. Internally they will call functions that will GetEnumerator() (usually via a foreach), which orders the Provider to translate the Expression and execute the query.
Back to your question
So which one is more efficient, entity framework or SQL? Efficiency is not only the time to perform the queries, it is also the development/testing time, for the first version and for future changes in the software.
If you use an entity-framework (-clone), the SQL-queries created from the Expressions are pretty efficient, depending on the framework manufacturer. If you look at the code, then sometimes the SQL query is not the optimal one, although you'll have to be a pretty good SQL-programmer to improve most queries.
The big advantage above using Entity Framework and LINQ queries above SQL statements is that development times will be shorter. The syntax of the LINQ statements is checked at compile time, SQL statements at run-time. Development and test periods will be shorter.
It is easy to reuse LINQ statements, while SQL statements almost always have to be written especially for the query you want to execute. LINQ statements can be tested without a database on any sequence of items that represent your tables.
My Advice
For most queries you won't notice any difference in execution time between the entity framework query or the SQL query.
If you expect complicated queries and future changes, I'd go for entity framework. With main argument the shorter development time, the better testing possibilities, and the better maintainability.
If you detect some queries where you notice that the execution time is too long, you can always decide to bypass entity framework by executing a SQL query instead of using LINQ.
If you've wrapped your DbContext in a proper repository, where you hide the use cases from their implementations, the users of your repository won't notice the difference.
To minimize the data transferred from SQL server to app server (IIS), I wonder whether people do sorting, filtering, and paging using pure LINQ to Entites.
By "pure LINQ" I mean no hand-written SQL statements or views/stored procedures.
I found some articles helpful, for example:
Entity Framework: How to Increase Performance with Paging
Sorting, Filtering, and Paging with the Entity Framework in an ASP.NET MVC Application
But they didn't cover all features I need, so I'm writing my own helper classes for the following requirements:
multiple columns sorting
multiple columns filtering (support various joint operators such as AND/OR, and comparing operators such as equals, contains, larger than, smaller than..., etc.
paging
As mentioned, I'd like them to be done with "pure" LINQ to Entities and executed at database side. Is that a good idea or do you suggest using stored procedures?
Much appreciated if any suggestion or code sample.
You can use Entity Framework without storage procedures and, with a little attention, you can be compatible with most important databases.
For multiple column sorting you can use
OrderBy(x => x.a).ThenBy(x => x.b)...
For joins you can navigate the tree model (in query, otherwise you activate a lazy load).
Paging are
Skip/Take
You can use boolean expression for test multiple condition in the same Where statement
context.Persons.Where(p => p.Name == "me" && p.Age == 15)
and also add different where statement.
tbl.Where(...).Where(...)
The 2 condition in the query are inserted with AND
I do not use stored procedure because of compatibility with different DBMSs but several people does.
Is it performant wise to use greedy LINQ operators such as ToList,ToLookUp,Distinct etc?
What would be a best practice(s) for LINQ query execution?
You often use for your objects List<> or making all your objects lists to IEnumerable<>. I know the latest gives more flexibility.
When working with memory (LINQ to Objects) it's ok to always use deffered loading, cause you can access it whenever you need without fear that tha data changed, added or inserted as the reference will execute the query as soon as you need access. But this changes with database LINQ queries such LINQ to EF.
Would like a StackOverflow users opinion.
Thank you!
What would be a best practice(s) for LINQ query execution?
A List may be accessed by index, a Lookup may be accessed by Key. These types are obviously serializable across a WCF boundary. A deferred IEnumerable doesn't do these things well.
For EF or LinqToSql, one must run their queries before the DataContext or whatever holds the SqlConnection gets disposed.
In my code, I use deferred IEnumerables only for method scoped variables when convenient. I use List for properties (sometimes the property constructs the List, but usually it's just backed by an instance) and method return types. Since I'm doing comparatively expensive things (like accessing the database or using WCF), the performance of eagerly executing in-memory Linq queries has never been an issue.
The final authority on any performance question is: how does it measure?
I'm writing a (yet another) generic repository of entities, which is not necessarily backed by a relational database. I'd like one of the IEnumerable<T> Load<T>(...) methods to take as argument a generic Expression<Func<T, bool>> predicate that specifies user-defined criteria for the entities to retrieve. Please note that I don't want to expose a full IQueryable<T> to the user as I want to limit the exposure of the underlying storage to my users.
In those cases in which the repository is backed by NHibernate (3.1, by the way), simple predicates - such as x => x.Name="Mike" - can be "pushed down" to the relational database by LINQ for NHibernate (using the Where(Expression<Func<T, bool>>) method), with obvious performance gains when the underlying set of entities is large and the predicate only selects one entity. Nice.
However, my users do not necessarily know that the repository is backed by a relational database, so the predicates can be at times so complex (e.g. x => MyFunction(x.Name) == 0) that LINQ to NHibernate fails to generate HQL for them. In these cases I'd like to detect LINQ's failure to generate HQL and transparently "failover" to loading all entities and explicitly applying the predicate to each.
The problem is that I cannot find a way to reliably detect that LINQ to NHibernate fails to translate the predicate expression. Executing the query straight away throws a System.NotSupportedException which could be caused by anything, even by an underlying ConnectionProvider.
I entertained for a moment the possibility of breaking up the query execution in two - first translate, then execute - and then catch the System.NotSupportedException during translation. To this end, I tried the solution proposed in Does anyone know how to translate LINQ Expression to NHibernate HQL statement? in order to translate the query before execution, and I have to say that I got it to work, but it uses Reflection to access undocumented, non-public methods of internal NHibernate objects, and thus it smells like an unsupported hack.
Is there a more reliable and "official" way to either detect that LINQ to NHibernate fails to translate an expression, or to translate the expression without executing the query?
I don't think there is, but since NHibernate is an open source project you can easily do it like this:
Grab NHibernate's source code.
Replace all NotSupportedException in the LINQ provider with a more specific exception (which would inherit NotSupportedException to avoid unnecessary breaking)
Compile your modified NHibernate and use it in your code.
Handle the new exception wherever/however you need it in your code.
Submit your modifications as a patch to the NHibernate JIRA (don't forget the tests, otherwise it probably won't be considered)
Profit!
I am considering refactoring a repository I have to improve its flexibility and reduce the method count.
Where there are the following methods:
Collection GetAllUsersByRole(Role role)
User GetUserByuserName(string userName)
...I would like to have a single method taking a Linq expression:
ICollection GetUsers(Expression e)
{
//retrieve user collection from store
//apply expression to collection and return
}
Is this a reasonable approach? Presumably I'd lose some efficiency because the full users collection would need to be retrieved and filtered every time, rather than retrieving a subset of users according to some hard-coded criteria?
Edit: NHibernate provides ORM in my implementation.
You really want to take an Expression as the argument to that method.
As far as performance, it really comes down to how far you want to go with it. The simplest method is bringing all the objects into memory and then filtering with the predicate expression.
On the other hand, you mention some sort of criteria. I have no idea what your back end data system is, but you can take these passed filters and transform them into your criteria. This is essentially what Linq to SQL and Linq to Entities does, but hopefully the range of possibilities you need to support is significantly smaller. If not, it might make sense to switch to one of the ORM tools if you want to take this approach.
This is not a very reasonable approach, cos typically will cost you a lot of performance issues. If you use data access technology that accepts LINQ queries than you just can use this query (expression) with it. This can be IQueryable for LINQ to SQL or ObjectQuery for EntityFramework. It also can be ICriteria (without linq support) for nHibernate. All modern ORM tools have its own expression API, so you just need to use it. If you have custom Data Access layer you will need to write your own API for creating criterias, for example Query Object.