LINQ - Is Where(Predicate).FirstOrDefault() the same as FirstOrDefault(Predicate) - linq

I have always written my LINQ queries with the predicate in the Where clause followed by the FirstOrDefault clause. I started seeing examples with the predicate in the FirstOrDefault clause.
Is one better than the other? Would the answer be different with EF (SQL)?
A. Using Where Clause
List<Product> products = GetProductList();
Product productWhere = products.Where(p => p.ProductID == 789).FirstOrDefault();
B. No Where Clause
List<Product> products = GetProductList();
Product productNoWhere = products.FirstOrDefault(p => p.ProductID == 789);
https://code.msdn.microsoft.com/LINQ-Element-Operators-0f3f12ce

Because method chains in Linq are lazily evaluated, there shouldn't be any material difference between the two. Where.FirstOrDefault will stop executing when it obtains a value, just as FirstOrDefault(Predicate) will.
To put it another way, FirstOrDefault (or any other Linq operator downstream, for that matter) accepts items one at a time from Where for evaluation, not the entire list at once (The result of a Linq operator that returns an IEnumerable is essentially a yield return under the hood).
See Also
Where.FirstOrDefault vs FirstOrDefault

Related

Why is Entity Framework's AsEnumerable() downloading all data from the server?

What is the explanation for EF downloading all result rows when AsEnumerable() is used?
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
What I would like it to do, is to download only enough to gather 100 rows that would satisfy the Id % 2 == 0 condition (most likely just around 200 rows).
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
I suppose that it does not work like that for a reason and I'd like to hear a good argument supporting that design decision.
NOTE: This is a completely contrived example and I know normally you should not use EF this way, but I found this in some existing code and was just surprised my assumptions turned out to be incorrect.
The short answer: The reason for the different behaviors is that, when you use IQueryable directly, a single SQL query can be formed for your entire LINQ query; but when you use IEnumerable, the entire table of data must be loaded.
The long answer: Consider the following code.
context.Logs.Where(x => x.Id % 2 == 0)
context.Logs is of type IQueryable<Log>. IQueryable<Log>.Where is taking an Expression<Func<Log, bool>> as the predicate. The Expression represents an abstract syntax tree; that is, it's more than just code you can run. Think of it as being represented in memory, at runtime, like this:
Lambda (=>)
Parameters
Variable: x
Body
Equals (==)
Modulo (%)
PropertyAccess (.)
Variable: x
Property: Id
Constant: 2
Constant: 0
The LINQ-to-Entities engine can take context.Logs.Where(x => x.Id % 2 == 0) and mechanically convert it into a SQL query that looks something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0;
If you change your code to context.Logs.Where(x => x.Id % 2 == 0).Take(100), the SQL query becomes something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0
LIMIT 100;
This is entirely because the LINQ extension methods on IQueryable use Expression instead of just Func.
Now consider context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0). The IEnumerable<Log>.Where extension method is taking a Func<Log, bool> as a predicate. That is only runnable code. It cannot be analyzed to determine its structure; it cannot be used to form a SQL query.
Entity Framework and Linq use lazy loading. It means (among other things) that they will not run the query until they need to enumerate the results: for instance using ToList() or AsEnumerable(), or if the result is used as an enumerator (in a foreach for instance).
Instead, it builds a query using predicates, and returns IQueryable objects to further "pre-filter" the results before actually returning them. You can find more infos here for instance. Entity framework will actually build a SQL query depending on the predicates you have passed it.
In your example:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
From the Logs table in the context, it fetches all, returns a IEnumerable with the results, then filters the result, takes the first 100, then lists the results as a List.
On the other hand, just removing the AsEnumerable solves your problem:
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
Here it will build a query/filter on the result, then only once the ToList() is executed, query the database.
It also means that you can dynamically build a complex query without actually running it on the DB it until the end, for instance:
var logs = context.Logs.Where(a); // first filter
if (something) {
logs = logs.Where(b); // second filter
}
var results = logs.Take(100).ToList(); // only here is the query actually executed
Update
As mentionned in your comment, you seem to already know what I just wrote, and are just asking for a reason.
It's even simpler: since AsEnumerable casts the results to another type (a IQueryable<T> to IEnumerable<T> in this case), it has to convert all the results rows first, so it has to fetch the data first. It's basically a ToList in this case.
Clearly, you understand why it's better to avoid using AsEnumerable() the way you do in your question.
Also, some of the other answers have made it very clear why calling AsEnumerable() changes the way the query is performed and read. In short, it's because you are then invoking IEnumrable<T> extension methods rather than the IQueryable<T> extension methods, the latter allowing you to combine predicates before executing the query in the database.
However, I still feel that this doesn't answer your actual question, which is a legitimate question. You said (emphasis mine):
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
My question to you is: what made you conclude that this is true?
I would argue that, because you are using IEnumrable<T> instead of IQueryable<T>, it's true that the query being performed in the database will be a simple:
select * from logs
... without any predicates, unlike what would have happened if you had used IQueryable<T> to invoke Where and Take.
However, the AsEnumerable() method call does not fetch all the rows at that moment, as other answers have implied. In fact, this is the implementation of the AsEnumerable() call:
public static IEnumerable<TSource> AsEnumerable<TSource>(this IEnumerable<TSource> source)
{
return source;
}
There is no fetching going on there. In fact, even the calls to IEnumerable<T>.Where() and IEnumerable<T>.Take() don't actually start fetching any rows at that moment. They simply setup wrapping IEnumerables that will filter results as they are iterated on. The fetching and iterating of the results really only begins when ToList() is called.
So when you say:
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
... again, my question to you would be: doesn't it do that already?
If your table had 1,000,000 rows, I would still expect your code snippet to only fetch up to 100 rows that satisfy your Where condition, and then stop fetching rows.
To prove the point, try running the following little program:
static void Main(string[] args)
{
var list = PretendImAOneMillionRecordTable().Where(i => i < 500).Take(10).ToList();
}
private static IEnumerable<int> PretendImAOneMillionRecordTable()
{
for (int i = 0; i < 1000000; i++)
{
Console.WriteLine("fetching {0}", i);
yield return i;
}
}
... when I run it, I only get the following 10 lines of output:
fetching 0
fetching 1
fetching 2
fetching 3
fetching 4
fetching 5
fetching 6
fetching 7
fetching 8
fetching 9
It doesn't iterate through the whole set of 1,000,000 "rows" even though I am chaining Where() and Take() calls on IEnumerable<T>.
Now, you do have to keep in mind that, for your little EF code snippet, if you test it using a very small table, it may actually fetch all the rows at once, if all the rows fit within the value for SqlConnection.PacketSize. This is normal. Every time SqlDataReader.Read() is called, it never only fetches a single row at a time. To reduce the amount of network call roundtrips, it will always try to fetch a batch of rows at a time. I wonder if this is what you observed, and this mislead you into thinking that AsEnumerable() was causing all rows to be fetched from the table.
Even though you will find that your example doesn't perform nearly as bad as you thought, this would not be a reason not to use IQueryable. Using IQueryable to construct more complex database queries will almost always provide better performance, because you can then benefit from database indexes, etc to fetch results more efficiently.
AsEnumerable() eagerly loads the DbSet<T> Logs
You probably want something like
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable();
The idea here is that you're applying a predicate filter to the collection before actually loading it from the database.
An impressive subset of the world of LINQ is supported by EF. It will translate your beautiful LINQ queries into SQL expressions behind the scenes.
I have come across this before.
The context command is not executed until a linq function is called, because you have done
context.Logs.AsEnumerable()
it has assumed you have finished with the query and therefore compiled it and returns all rows.
If you changed this to:
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable()
It would compile a SQL statement that would get only the rows where the id is modular 2.
Similarly if you did
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
that would create a statement that would get the top 100...
I hope that helps.
LinQ to Entities has a store expression formed by all the Linq methods before It goes to an enumeration.
When you use AsEnumerable() and then Where() like this:
context.Logs.Where(...).AsEnumerable()
The Where() knows that the previous chain call has a store expression so he appends his predicate to It for lazy loading.
The overload of Where that is being called is different if you call this:
context.Logs.AsEnumerable().Where(...)
Here the Where() only knows that his previous method is an enumeration (it could be any kind of "enumerable" collection) and the only way that he can apply his condition is iterating over the collection with the IEnumerable implementation of the DbSet class, which must to retrieve the records from the database first.
I don't think you should ever use this:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
The correct way of doing things would be:
context.Logs.AsQueryable().Where(x => x.Id % 2 == 0).Take(100).ToList();
Answer with explanations here:
What's the difference(s) between .ToList(), .AsEnumerable(), AsQueryable()?
Why use AsQueryable() instead of List()?

Linq to NHibernate Expression Tree with nested relation condition

In a .Net project in which I'm using NHibernate, I have a piece of code that build a list of expression trees depending on the values set in the filter by the user in the UI.
The expression is build against a specific object of my domain model, let's say Customer.
When I wanna create a filter criteria for a property of Customes, everything's fine, like in the following example:
Expression<Func<Model.Customer, bool>> expr = c =>
c.Name == "My Company";
But now, I need to create an expression that let me filter che customer based on a condition involving a one to many relation... let's say Order. A customer can have many orders, so the relationship is one-to-many. I need to build an expression that I can apply to a Customer query, in order to exptract only the customers which have at least one order placed in 2010.I'd write something like this:
Expression<Func<Model.Cusotmer, bool>> expr = c =>
c.Orders.Where(o => o.year == 2010).Count() > 0;
Too bad this won't work. It seems that NHibernate is not able to parse this Expression.
Any idea on how to write an expression tree that implement that search criteria and is parsable by Linq 2 NHibernate?
because you use Count() > 0 you can use Any instead:
Expression<Func<Model.Cusotmer, bool>> expr = c =>
c.Orders.Any(o => o.year == 2010);

LINQ - Deferred Execution in Subqueries

My understanding is that the use of scalar or conversion functions causes immediate execution of a LINQ query. It is also my understanding that subqueries are executed upon demand of the outer query which would typically be once per element. For the following example would I be right in saying that the inner query is executed immediately? If so, as this would produce a scalar value how would this affect how the outer query operates?
IEnumerable<string> outerQuery = names.Where ( n => n.Length == names
.OrderBy(n2 => n2.Length).Select(n2 => n2.Length).First());
I would expect the above query to operate in a similar way as below, ie as if there wasn't a subquery.
int val = names.OrderBy(n2 => n2.Length).Select(n2 => n2.Length).First();
IEnumerable<string> outerQuery = names.Where ( n => n.Length == val );
This example was taken from Joseph and Ben Albahari's C# 4.0 in a Nutshell (Chp 8 P331/332) and my confusion stems from the accompanying diagram which appears to show that the subquery is being evaluated each time the outer query iterates through the elements of names.
Could someone clarify how LINQ works in this setup? Any help would be appreciated!
For the following example would I be right in saying that the inner query is executed immediately?
No, the inner query will be executed for each item in names when the outer query is enumerated. If you want it to be executed only once, use the second code sample.
EDIT: as LukeH pointed out, this is only true of Linq to Objects. Other Linq providers (Linq to SQL, Entity Framework...) might be able to optimize this automatically
What is names? If it's collection (and you use LINQ to Objects) then "subquery" will be executed for each outer query item. If it's actually query object then result depends on actual IQueryable.Provider. For example, for LINQ to SQL you will give SQL query with scalar subquery. And in the most cases subquery actually will be executed only once.

When to prefer joins expressed with SelectMany() over joins expressed with the join keyword in Linq

Linq allows to express inner joins by using the join keyword or by using
SelectMany() (i.e. a couple of from keywords) with a where keyword:
var personsToState = from person in persons
join state in statesOfUS
on person.State equals state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState)
{
System.Diagnostics.Debug.WriteLine(item);
}
// The same query can be expressed with the query operator SelectMany(), which is
// expressed as two from clauses and a single where clause connecting the sequences.
var personsToState2 = from person in persons
from state in statesOfUS
where person.State == state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState2)
{
System.Diagnostics.Debug.WriteLine(item);
}
My question: when is it purposeful to use the join-style and when to use the where-style,
has one style performance advantages over the other style?
For local queries Join is more efficient due to its keyed lookup as Athari mentioned, however for LINQ to SQL (L2S) you'll get more mileage out of SelectMany. In L2S a SelectMany ultimately uses some type of SQL join in the generated SQL depending on your query.
Take a look at questions 11 & 12 of the LINQ Quiz by Joseph/Ben Albahari, authors of C# 4.0 In a Nutshell. They show samples of different types of joins and they state:
With LINQ to SQL, SelectMany-based
joins are the most flexible, and can
perform both equi and non-equi joins.
Throw in DefaultIfEmpty, and you can
do left outer joins as well!
In addition, Matt Warren has a detailed blog post on this topic as it pertains to IQueryable / SQL here: LINQ: Building an IQueryable provider - Part VII.
Back to your question of which to use, you should use whichever query is more readable and allows you to easily express yourself and construct your end goal clearly. Performance shouldn't be an initial concern unless you are dealing with large collections and have profiled both approaches. In L2S you have to consider the flexibility SelectMany offers you depending on the way you need to pair up your data.
Join is more efficient, it uses Lookup class (a variation of Dictionary with multiple values for a single key) to find matching values.

How to merge a collection of collections in Linq

I would like to be able to fusion an IEnumerable<IEnumerable<T>> into IEnumerable<T> (i.e. merge all individual collections into one). The Union operators only applies to two collections. Any idea?
Try
var it = GetTheNestedCase();
return it.SelectMany(x => x);
SelectMany is a LINQ transformation which essentially says "For Each Item in a collection return the elements of a collection". It will turn one element into many (hence SelectMany). It's great for breaking down collections of collections into a flat list.
var lists = GetTheNestedCase();
return
from list in lists
from element in list
select element;
is another way of doing this using C# 3.0 query expression syntax.

Resources