LINQ Where(predicate) vs. FirstOrDefault(predicate) - linq

Are there any appreciable performance differences between:
something.Where(predicate).FirstOrDefault();
and
something.FirstOrDefault(predicate);
?
I tend to use both, but am wondering if there's a clear winner when it comes to performance.

It depends on whether this Where is against an IQueryable or IEnumerable.
In case of IQueryable the difference is based on implementation of the provider but it is more likely there will be no difference and would yield same query.
In case of IEnumerable it should be negligible.

Related

Are there any cases where LINQ's .Where() will be faster than O(N)?

Think the title describes my thoughts pretty well :)
I've seen a lot of people lately that swear to LINQ, and while I also believe it's awesome, I also think you shouldn't be confused about the fact that on most (all?) IEnumerable types, it's performance is not that great. Am I wrong in thinking this? Especially queries where you nest Where()'s on large datasets?
Sorry if this question is a bit vague, I just want to confirm my thoughts in that you should be "cautious" when using LINQ.
[EDIT] Just to clarify - I'm thinking in terms of Linq to Objects here :)
It depends on the provider. For Linq to Objects, it's going to be O(n), but for Linq to SQL or Entities it might ultimately use indices to beat that. For Objects, if you need the functionality of Where, you're probably going to need O(n) anyway. Linq will almost certainly have a bigger constant, largely due to the function calls.
It depends on how you are using it and to what you compare.
I've seen many implementations using foreaches which would have been much faster with linq. Eg. because they forget to break or because they return too many items. The trick is that the lambda expressions are executed when the item is actually used. When you have First at the end, it could end up it just one single call.
So when you chain Wheres, if an item does not pass the first condition, it will also not be tested for the second condition. it's similar to the && operator, which does not evaluate the condition on the right side if the first is not met.
You could say it's always O(N). But N is not the number of items in the source, but the minimal number of items required to create the target set. That's a pretty good optimization IMHO.
Here's a project that promises to introduce indexing for LINQ2Objects. This should deliver better asymptotic behavior: http://i4o.codeplex.com/

NHibernate Criteria Query vs. LINQ to NHibernate

I understand that there are queries you can't express in LINQ to NHibernate that you can using NHibernate Criteria. But, as far performance, is it safe to assume that using NHibernate Criteria is generally better than LINQ to NHibernate?
Even though NHibernate Criteria is better in terms of performance, NHibernate Linq provides compile time checking. I rarely have seen any project having issues using any of the 2 since the performance gains are miniscule.
In my case I usually use Linq except in cases where I cannot express them and then I have to use ICriteria. The benefit of compile time checking outweighs the minor performance gains.

LINQ to Objects Optimization Techniques?

What LINQ to Objects optimization techniques do you use or have you seen in the wild?
While waiting for "yield foreach" and other language/compiler optimizations to arrive in C# in 201x, I'm interesting in doing everything possible to make using LINQ everywhere less of a performance pain.
One pattern I've seen so far is creating custom IEnumerable implementations for specific combinators such that the enumerable is not being re-enumerated several times.
One that I've spotted a few times - don't use:
if (query.Count() > 0)
... use this instead:
if (query.Any())
That way it only needs to find the first match.
EDIT: You may also be interested in a blog post I recently wrote about optimisations which could be in LINQ to Objects but aren't (or weren't in .NET 3.5).
Additionally, if you're going to do a lot of x.Contains(y) operations and x is the result of an existing query (i.e. it's not already going to be some optimised collection), you should probably consider building a HashSet<T> from x to avoid a linear scan (performing the query to produce x's results) on each iteration.

Is LINQ to Everything a good abstraction?

There is a proliferation of new LINQ providers. It is really quite astonishing and an elegant combination of lambda expressions, anonymous types and generics with some syntax sugar on top to make it easy reading. Everything is LINQed now from SQL to web services like Amazon to streaming sensor data to parallel processing. It seems like someone is creating an IQueryable<T> for everything but these data sources can have radically different performance, latency, availability and reliability characteristics.
It gives me a little pause that LINQ makes those performance details transparent to the developer. Is LINQ a solid general purpose abstraction or a RAD tool or both?
To me, LINQ is just a way to make code more readable, and hence more maintainable. LINQ does nothing more than takes standard methods and integrates them into the language (hence the name - language integrated query).
It's nothing but a syntax element around normal interfaces and methods - there is no "magic" here, and LINQ-to-something really should (IMO) be treated as any other 3rd party API - you need to understand the cost/benefits of using it just like any other technology.
That being said, it's a very nice syntax helper - it does a lot for making code cleaner, simpler, and more maintainable, and I believe that's where it's true strengths lie.
I see this as similar to the model of multiple storage engines in an RDBMS accepting a common(-ish) language of SQL, in it's design ... but with the added benefit of integreation into the application language semantics. Of course it is good!
I have not used it that much, but it looks sensible and clear when performance and layers of abstraction are not in a position to have a negative impact on the development process (and trust that standards and models wont change wildly).
It is just an interface and implementation that may fit your needs, like all interfaces, abstractions, libraries and implementations, does it fit?... it is all the same answers.
I suppose - no.
LINQ is just a convenient syntax, but not a common RAD tool. In the big projects with complex logic I noticed that developers do more errors in LINQ that in the same instructions they could do if they write the same thing in .NET 2.0 manner. The code is produced faster, it is smaller, but it is harder to find bugs. Sometimes it is not obvious from the first look, at what point the queried collection turns from IQueryable into IEnumerable... I would say that LINQ requires more skilled and disciplined developers.
Also SQL-like syntax is OK for a functional programming but it is a sidestep from object oriented thinking. Sometimes when you see 2 very similar LINQ queries, they look like copy-paste code, but not always any refactoring is possible (or it is possible only by sacrificing some performance).
I heard that MS is not going to further develop LINQ to SQL, and will give more priority to Entities. Is the ADO.NET Team Abandoning LINQ to SQL? Isn't this fact a signal for us that LINQ is not a panacea for everybody ?
If you are thinking about to build a connector to "something", you can build it without LINQ and, if you like, provide LINQ as an additional optional wrapper around it, like LINQ to Entities. So your customers will decide, whether to use LINQ or not, depending on their needs, required performance etc.
p.s.
.NET 4.0 will come with dynamics, and I expect that everybody will also start to use them as LINQ... without taking into considerations that code simplicity, quality and performance may suffer.

Examples on when not to use LINQ

This is sort of a follow up on this question I read:
What is the biggest mistake people make when starting to use LINQ?
The top answer is "That it should be used for everything." That made me wonder what exactly that means.
What are some common examples where someone used LINQ when they should not have?
You shouldn't use LINQ when the alternative is simpler or significantly more efficient.
I would suggest avoiding LINQ anytime it made the code less obvious.
However, in general, I think LINQ makes things easier to follow, not more difficult, so I rarely avoid it.
It is possible for LINQ to be significantly slower than the alternatives, especially if you have a lot of intermediate Lists. However you are talking about some pretty large datasets, so large that I haven't encountered them.
However, one thing to keep in mind is that a well-written LINQ query can also be much faster than the alternative because of the way IEnumerable works.
Finally, using LINQ now will allow you to switch to Parallel LINQ when it is released with little or no changes.
It's still okay to use foreach. :)

Resources