LINQ to Entities -- OrderBy().ToList() vs. ToList().OrderBy() - performance

I'm looking for confirmation/clarification with these LINQ expressions:
var context = new SomeCustomDbContext()
// LINQ to Entities?
var items = context.CustomItems.OrderBy(i => i.Property).ToList();
// LINQ to Objects?
var items2 = context.CustomItems.ToList().OrderBy(i => i.Property);
Am I correct in thinking the first method is LINQ to Entities where EF builds a more specific SQL statement to pass on, putting the ordering effort on on the database?
Is the second method LINQ to Objects where LINQ drags the whole collection into memory (the ToList() enumeration?) before ordering thus leaving the burden on the server side (the web server in this case)?
If this is the case, I can quickly see situations where L2E would be advantageous (ex. filtering/trimming collections before pulling them into memory).
But are there any other details/trade-offs I should be aware of, or times when "method 2" might be advantageous over the first method?
UPDATE:
Let's say we are not using EntityFramework, this is still true so long as the underlying repository/data source implements IQueryable<T> right? And if it doesn't both these statements result in LINQ to Objects operations in memory?

Yes.
Yes.
Yes.
You are correct that calling ToList() forces linq-to-entities to evaluate and return the results as a list. As you suspect, this can have huge performance implications.
There are cases where linq-to-entities cannot figure out how to parse what looks like a perfectly simple query (like Where(x => SomeFunction(x))). In these cases you often have no choice but to call ToList() and operate on the collection in memory.
In response to your update:
ToList() always forces everything ahead of it to evaluate immediately, as opposed to deferred execution. Take this example:
someEnumerable.Take(10).ToList();
vs
someEnumerable.ToList().Take(10);
In the second example, any deferred work on someEnumerable must be executed before taking the first 10 elements. If someEnumerable is doing something labor intensive (like reading files from the disk using Directory.EnumerateFiles()), this could have very real performance implications.

Am I correct in thinking the first method is LINQ to Entities where EF builds a more specific SQL statement to pass on, putting the ordering effort on on the database?
Yes
Is the second method LINQ to Objects where LINQ drags the whole collection into memory ... before ordering thus leaving the burden on the server side ...?
Yes
But are there any other details/trade-offs I should be aware of, or times when "method 2" might be advantageous over the first method?
There will be many times where Method 1 is not possible - usually when you have a complex filter or sort order that can't be directly translated to SQL (or more appropriately where EF does not support a direct SQL translation). Also since you can't transmit lazy-loaded IQueryables over-the-wire, any time you have to serialize a result you're going to have to materialize it first with ToList() or something similar.

The other thing to be aware of is that IQueryable makes no guarantees on either (a) the semantic reasoning of the underlying provider, or (b) how much of the set of IQueryable methods are implemented by the provider.
For example: -
EF does not support Last().
Nor does it support time-part comparisons of DateTimes into valid T-SQL.
It doesn't support FirstOrDefault() in subqueries.
In such circumstances you need to bring data back to the client and then perform further evaluation client-side.
You also need to have an understanding of "how" it parses the LINQ pipeline to generate (in the case of EF) T-SQL. So you sometimes have to think carefully about how you construct your LINQ queries in order to generate effective T-SQL.
Having said all that, IQueryable<> is an extremely powerful tool in the .NET framework and well worth getting more familiar with.

Related

Is LINQ faster or just more convenient?

Which of theses scenarios would be faster?
Scenario 1:
foreach (var file in directory.GetFiles())
{
if (file.Extension.ToLower() != ".txt" &&
file.Extension.ToLower() != ".bin")
continue;
// Do something cool.
}
Scenario 2:
var files = from file in directory.GetFiles()
where file.Extension.ToLower() == ".txt" ||
file.Extension.ToLower() == ".bin"
select file;
foreach (var file in files)
{
// Do something cool.
}
I know that they are logically the same because of delayed execution, but which would be the faster? And why?
Faster isn't usually the issue per se, especially in a scenario like this where there is not going to be a meaningful performance difference (and in general, if the code is not a bottleneck it just doesn't matter). The issue is which is more readable and more clearly expresses the intent of the code.
I think the second block of code more clearly expresses the intent of the code. It reads as "query a collection of file names for some file names with a certain property" and then "for each of those file names with that property, do something." It declares what is happening, rather than how it is going to happen. Separating the what from the mechanism is what makes the second block of code clearer and where LINQ really shines. Use LINQ to declare the what, and let LINQ implement the mechanism instead of in the past where the what would be muddled with the mechanism.
Is LINQ faster or just more convenient?
So, to answer the question in your title, LINQ usually does not materially hinder performance but it makes code more clear by allowing the coder to declare what they want done instead of having to focus on how they want something done. At the end of the day, we don't care about the how, we care about the what.
I know that they are logically the same because of delayed execution, but which would be the faster?
Probably the imperative version because there is a tiny amount of overhead in using LINQ. But if you really must know which is faster be sure to use a profiler, and be sure to test on real-world data.
And why?
Because LINQ adds a little bit of overhead. But the trade off is significantly clearer and more maintainable code. That is a huge win compared to the usually irrelevant performance loss.
It would be faster to do a GetFiles("*.txt") and GetFile("*.bin") if the directory contains lots of files or is on a network drive.
Compared to that the extra overhead for LINQ is just noise.
Linq isn't faster and it's not really about convenience. Rather, Linq pulls the higher-order functions Fold, Map, and Filter into .NET (with different names). These functions are valuable because they allow us to DRY-up our code. Every time you set up an iteration with a secondary collection or result, you open yourself up to a bug. Linq allows you to focus on what happens inside the iteration and feel fairly confident that the iteration mechanics are bug-free.
This doesn't mean that Linq is strictly slower than manual iteration. As others have mentioned, you'll have to benchmark case-by-case.
I wrote an article on Code Project that benchmarked linq and Stored procedures as well as using compiled linq.
Please take a look.
http://www.codeproject.com/KB/cs/linqsql2.aspx
I understand you are looking at local file parsing, the article will give you some idea of what is involved and what linq is doing behind the scenes.

Tradeoffs using NHibernate 3.0 QueryOver or LINQ provider

I have not found a clear comparison of what is supported with the NHibernate 3.0 LINQ Provider compared to using the QueryOver syntax. From the surface, it seems like two large efforts into two very similar things.
What are the key trade offs to using each?
LINQ and QueryOver are completely different query methods, which are added to the ones that existed in NHibernate 2 (Criteria, HQL, SQL)
QueryOver is meant as a strongly-typed version of Criteria, and supports mostly the same constructs, which are NHibernate-specific.
LINQ is a "standard" query method, which means the client code can work on IQueryable without explicit references to NHibernate. It supports a different set of constructs; it would be hard to say if there are more or less than with QueryOver.
My suggestion is to learn all the supported query methods, as each use case is different and some work better with one, some work better with other.
I have used both NH-Linq-providers (the old NHContrib for Version 2.1, and also the new for NH3.0) and also used QueryOver. With all the experience made during development of quite complex data-driven applications, I would strongly suggest NOT to use the existing linq-provider with nHibernate if you plan to go behind just basic CRUD-operations!
The current implementation (linq) sometimes produces really unreadable and also unefficient SQL. Especially joining some tables quickly becomes a nightmare if you want to optimize database-performance.
Despite all these drawbacks, I did never encounter wrong queries.
So if you don't care about performance and are already familiar with LINQ, then go for NH-Linq. Otherwise QueryOver is your realiable and typesafe friend.
LINQ to NHibernate (as of version 3.0) does not support the .HasValue property on Nullable types. One must compare to null in queries.
I started to use NH-Linq, because i was already done with LinqToSql and Entity Framework. But, for more complex queries, i have always finished with QueryOver. Reasons:
It's happen that query with NH-Linq doesn't work as expected. I can't remember exactly, but it doesn't work correct with some complex queries. Seems that is too young. And as dlang stated in previous answer, it's produce unefficient SQL.
When you learn QueryOver, it's easy to call functions, do projections, subqueries, seems to me more easy then with NH-Linq.
Good thing for NH-Linq - it can be extended, like Fabio Maulo explained here. But, similar is quite possible with QueryOver, but not so fancy as with NH-Linq :)

What type of optimizations does LINQ perform at the compiler level?

Now that LINQ is such an integral part of .NET, are their optimizations at the compiler level that would use the optimal path to get results?
For example, imagine you had an array of integers and wanted to get the lowest value. You could do this without LINQ using a foreach, but it's certainly easier to use the Min function in LINQ. Once this goes to the compiler with LINQ would you have been better off to skip LINQ altogether or does it essentially just convert it to something similar to a foreach?
The C# compiler doesn't do much at all - it just calls the methods you tell it to, basically.
You could argue that removing unnecessary Select calls is an optimization:
from x in collection
where x.Condition
select x
is compiled to collection.Where(x => x.Condition) instead of collection.Where(x => x.Condition).Select(x => x) as the compiler recognises the identity transformation as being redundant. (A degenerate query of the form from x in collection select x is immune to this optimization, however, to allow LINQ providers to make sure that any query goes through at least one of their methods.)
The LINQ to Objects Min method just does a foreach, yes. Various LINQ to Objects methods do perform optimization. For example, Count() will check whether the data source implements ICollection or ICollection<T> and use the Count property if so. As madgnome points out in a comment, I wrote a bit more about this in a blog post a while ago.
Of course, other LINQ providers can perform their own optimizations.

drawbacks of linq

What are the drawbacks of linq in general.
Can be hard to understand when you first start out with it
Deferred execution can separate errors from their causes (in terms of time)
Out-of-process LINQ (e.g. LINQ to SQL) will always be a somewhat leaky abstraction - you need to know what works and what doesn't, essentially
I still love LINQ massively though :)
EDIT: Having written this short list, I remembered that I've got an answer to a very similar question...
The biggest pain with LINQ is that (with database backends) you can't use it over a repository interface without it being a leaky abstraction.
LINQ is fantastic within a layer (especially the DAL etc), but since different providers support different things, you can't rely on Expression<Func<...>> or IQueryable<T> features working the same for different implementations.
As examples, between LINQ-to-SQL and Entity Framework:
EF doesn't support Single()
EF will error if you Skip/Take/First without an explicit OrderBy
EF doesn't support UDFs
etc. The LINQ provider for ADO.NET Data Services supports different combinations. This makes mocking and other abstractions unsafe.
But: for in-memory (LINQ-to-Objects), or in a single layer/implementation... fantastic.
Some more thoughts here: Pragmatic LINQ.
Like any abstraction in programming, it is vulnerable to a misunderstanding: "If I just understand this abstraction, I don't need to understand what's happening under the covers."
The truth is, if you do understand what's happening under the covers, you'll get much better value out of the abstraction, because you'll understand where it ceases to be applicable, so you'll be able to apply it with greater confidence of success where it is appropriate.
This is true of all abstractions, and applies to Linq in bucketfuls. To understand Linq to Objects, the best thing to do is to learn how to write Select, Where, Aggregate, etc. in C# with yield return. And then figure out how yield return replaces a lot of hand-written code by writing it all with classes. Then you'll be able to use it with an appreciation of the effort it is saving you, and it will no longer seem like magic, so you'll understand the limitations.
Same for the variants of Linq where the predicates are captured as expressions and transported off to another environment to be executed. You have to understand how it works in order to safely use it.
So the number 1 drawback of Linq is: the simple examples look deceptively short and simple. The problem is, how did the author of the sample know what to write? Because they knew how to write it all out in long form, and they knew how pieces of Linq could be used as abreviations, and so they arrived at the nice short version.
As I say, not really specific to Linq, but highly relevant to it anyway.
Anonymous types. Proper ORM should always return objects of 'your' type (partial class, with possiblity of adding my methods, overriding etc.). There are doezne of tutorials and examples of different complex queries using linq but non of them care to explain the advantage of returning a 'bag of properties' (return new { .........} ). How am I supposed to work with anonymous type, wrap it in another class again?
Actually I can´t think of any drawbacks. It makes programming life a lot simpler because a lot of things can be written in a more compact but still better readable way.
But having said this, I must also agree with Jon that you should have some idea what you´re doing (but that holds for all technological advances).
the only drawback which it has is its performance see this article

How does Linq work (behind the scenes)?

I was thinking about making something like Linq for Lua, and I have a general idea how Linq works, but was wondering if there was a good article or if someone could explain how C# makes Linq possible
Note: I mean behind the scenes, like how it generates code bindings and all that, not end user syntax.
It's hard to answer the question because LINQ is so many different things. For instance, sticking to C#, the following things are involved:
Query expressions are "pre-processed" into "C# without query expressions" which is then compiled normally. The query expression part of the spec is really short - it's basically a mechanical translation which doesn't assume anything about the real meaning of the query, beyond "order by is translated into OrderBy/ThenBy/etc".
Delegates are used to represent arbitrary actions with a particular signature, as executable code.
Expression trees are used to represent the same thing, but as data (which can be examined and translated into a different form, e.g. SQL)
Lambda expressions are used to convert source code into either delegates or expression trees.
Extension methods are used by most LINQ providers to chain together static method calls. This allows a simple interface (e.g. IEnumerable<T>) to effectively gain a lot more power.
Anonymous types are used for projections - where you have some disparate collection of data, and you want bits of each of the aspects of that data, an anonymous type allows you to gather them together.
Implicitly typed local variables (var) are used primarily when working with anonymous types, to maintain a statically typed language where you may not be able to "speak" the name of the type explicitly.
Iterator blocks are usually used to implement in-process querying, e.g. for LINQ to Objects.
Type inference is used to make the whole thing a lot smoother - there are a lot of generic methods in LINQ, and without type inference it would be really painful.
Code generation is used to turn a model (e.g. DBML) into code
Partial types are used to provide extensibility to generated code
Attributes are used to provide metadata to LINQ providers
Obviously a lot of these aren't only used by LINQ, but different LINQ technologies will depend on them.
If you can give more indication of what aspects you're interested in, we may be able to provide more detail.
If you're interested in effectively implementing LINQ to Objects, you might be interested in a talk I gave at DDD in Reading a couple of weeks ago - basically implementing as much of LINQ to Objects as possible in an hour. We were far from complete by the end of it, but it should give a pretty good idea of the kind of thing you need to do (and buffering/streaming, iterator blocks, query expression translation etc). The videos aren't up yet (and I haven't put the code up for download yet) but if you're interested, drop me a mail at skeet#pobox.com and I'll let you know when they're up. (I'll probably blog about it too.)
Mono (partially?) implements LINQ, and is opensource. Maybe you could look into their implementation?
Read this article:
Learn how to create custom LINQ providers
Perhaps my LINQ for R6RS Scheme will provide some insights.
It is 100% semantically, and almost 100% syntactically the same as LINQ, with the noted exception of additional sort parameters using 'then' instead of ','.
Some rules/assumptions:
Only dealing with lists, no query providers.
Not lazy, but eager comprehension.
No static types, as Scheme does not use them.
My implementation depends on a few core procedures:
map - used for 'Select'
filter - used for 'Where'
flatten - used for 'SelectMany'
sort - a multi-key sorting procedure
groupby - for grouping constructs
The rest of the structure is all built up using a macro.
Bindings are stored in a list that is tagged with bound identifiers to ensure hygiene. The binding are extracted and rebound locally where ever an expression occurs.
I did track the progress on my blog, that may provide some insight to possible issues.
For design ideas, take a look at c omega, the research project that birthed Linq. Linq is a more pragmatic or watered down version of c omega, depending on your perspective.
Matt Warren's blog has all the answers (and a sample IQueryable provider implementation to give you a headstart):
http://blogs.msdn.com/mattwar/

Resources