LINQ - Deferred Execution in Subqueries - linq

My understanding is that the use of scalar or conversion functions causes immediate execution of a LINQ query. It is also my understanding that subqueries are executed upon demand of the outer query which would typically be once per element. For the following example would I be right in saying that the inner query is executed immediately? If so, as this would produce a scalar value how would this affect how the outer query operates?
IEnumerable<string> outerQuery = names.Where ( n => n.Length == names
.OrderBy(n2 => n2.Length).Select(n2 => n2.Length).First());
I would expect the above query to operate in a similar way as below, ie as if there wasn't a subquery.
int val = names.OrderBy(n2 => n2.Length).Select(n2 => n2.Length).First();
IEnumerable<string> outerQuery = names.Where ( n => n.Length == val );
This example was taken from Joseph and Ben Albahari's C# 4.0 in a Nutshell (Chp 8 P331/332) and my confusion stems from the accompanying diagram which appears to show that the subquery is being evaluated each time the outer query iterates through the elements of names.
Could someone clarify how LINQ works in this setup? Any help would be appreciated!

For the following example would I be right in saying that the inner query is executed immediately?
No, the inner query will be executed for each item in names when the outer query is enumerated. If you want it to be executed only once, use the second code sample.
EDIT: as LukeH pointed out, this is only true of Linq to Objects. Other Linq providers (Linq to SQL, Entity Framework...) might be able to optimize this automatically

What is names? If it's collection (and you use LINQ to Objects) then "subquery" will be executed for each outer query item. If it's actually query object then result depends on actual IQueryable.Provider. For example, for LINQ to SQL you will give SQL query with scalar subquery. And in the most cases subquery actually will be executed only once.

Related

Why is Entity Framework's AsEnumerable() downloading all data from the server?

What is the explanation for EF downloading all result rows when AsEnumerable() is used?
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
What I would like it to do, is to download only enough to gather 100 rows that would satisfy the Id % 2 == 0 condition (most likely just around 200 rows).
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
I suppose that it does not work like that for a reason and I'd like to hear a good argument supporting that design decision.
NOTE: This is a completely contrived example and I know normally you should not use EF this way, but I found this in some existing code and was just surprised my assumptions turned out to be incorrect.
The short answer: The reason for the different behaviors is that, when you use IQueryable directly, a single SQL query can be formed for your entire LINQ query; but when you use IEnumerable, the entire table of data must be loaded.
The long answer: Consider the following code.
context.Logs.Where(x => x.Id % 2 == 0)
context.Logs is of type IQueryable<Log>. IQueryable<Log>.Where is taking an Expression<Func<Log, bool>> as the predicate. The Expression represents an abstract syntax tree; that is, it's more than just code you can run. Think of it as being represented in memory, at runtime, like this:
Lambda (=>)
Parameters
Variable: x
Body
Equals (==)
Modulo (%)
PropertyAccess (.)
Variable: x
Property: Id
Constant: 2
Constant: 0
The LINQ-to-Entities engine can take context.Logs.Where(x => x.Id % 2 == 0) and mechanically convert it into a SQL query that looks something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0;
If you change your code to context.Logs.Where(x => x.Id % 2 == 0).Take(100), the SQL query becomes something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0
LIMIT 100;
This is entirely because the LINQ extension methods on IQueryable use Expression instead of just Func.
Now consider context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0). The IEnumerable<Log>.Where extension method is taking a Func<Log, bool> as a predicate. That is only runnable code. It cannot be analyzed to determine its structure; it cannot be used to form a SQL query.
Entity Framework and Linq use lazy loading. It means (among other things) that they will not run the query until they need to enumerate the results: for instance using ToList() or AsEnumerable(), or if the result is used as an enumerator (in a foreach for instance).
Instead, it builds a query using predicates, and returns IQueryable objects to further "pre-filter" the results before actually returning them. You can find more infos here for instance. Entity framework will actually build a SQL query depending on the predicates you have passed it.
In your example:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
From the Logs table in the context, it fetches all, returns a IEnumerable with the results, then filters the result, takes the first 100, then lists the results as a List.
On the other hand, just removing the AsEnumerable solves your problem:
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
Here it will build a query/filter on the result, then only once the ToList() is executed, query the database.
It also means that you can dynamically build a complex query without actually running it on the DB it until the end, for instance:
var logs = context.Logs.Where(a); // first filter
if (something) {
logs = logs.Where(b); // second filter
}
var results = logs.Take(100).ToList(); // only here is the query actually executed
Update
As mentionned in your comment, you seem to already know what I just wrote, and are just asking for a reason.
It's even simpler: since AsEnumerable casts the results to another type (a IQueryable<T> to IEnumerable<T> in this case), it has to convert all the results rows first, so it has to fetch the data first. It's basically a ToList in this case.
Clearly, you understand why it's better to avoid using AsEnumerable() the way you do in your question.
Also, some of the other answers have made it very clear why calling AsEnumerable() changes the way the query is performed and read. In short, it's because you are then invoking IEnumrable<T> extension methods rather than the IQueryable<T> extension methods, the latter allowing you to combine predicates before executing the query in the database.
However, I still feel that this doesn't answer your actual question, which is a legitimate question. You said (emphasis mine):
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
My question to you is: what made you conclude that this is true?
I would argue that, because you are using IEnumrable<T> instead of IQueryable<T>, it's true that the query being performed in the database will be a simple:
select * from logs
... without any predicates, unlike what would have happened if you had used IQueryable<T> to invoke Where and Take.
However, the AsEnumerable() method call does not fetch all the rows at that moment, as other answers have implied. In fact, this is the implementation of the AsEnumerable() call:
public static IEnumerable<TSource> AsEnumerable<TSource>(this IEnumerable<TSource> source)
{
return source;
}
There is no fetching going on there. In fact, even the calls to IEnumerable<T>.Where() and IEnumerable<T>.Take() don't actually start fetching any rows at that moment. They simply setup wrapping IEnumerables that will filter results as they are iterated on. The fetching and iterating of the results really only begins when ToList() is called.
So when you say:
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
... again, my question to you would be: doesn't it do that already?
If your table had 1,000,000 rows, I would still expect your code snippet to only fetch up to 100 rows that satisfy your Where condition, and then stop fetching rows.
To prove the point, try running the following little program:
static void Main(string[] args)
{
var list = PretendImAOneMillionRecordTable().Where(i => i < 500).Take(10).ToList();
}
private static IEnumerable<int> PretendImAOneMillionRecordTable()
{
for (int i = 0; i < 1000000; i++)
{
Console.WriteLine("fetching {0}", i);
yield return i;
}
}
... when I run it, I only get the following 10 lines of output:
fetching 0
fetching 1
fetching 2
fetching 3
fetching 4
fetching 5
fetching 6
fetching 7
fetching 8
fetching 9
It doesn't iterate through the whole set of 1,000,000 "rows" even though I am chaining Where() and Take() calls on IEnumerable<T>.
Now, you do have to keep in mind that, for your little EF code snippet, if you test it using a very small table, it may actually fetch all the rows at once, if all the rows fit within the value for SqlConnection.PacketSize. This is normal. Every time SqlDataReader.Read() is called, it never only fetches a single row at a time. To reduce the amount of network call roundtrips, it will always try to fetch a batch of rows at a time. I wonder if this is what you observed, and this mislead you into thinking that AsEnumerable() was causing all rows to be fetched from the table.
Even though you will find that your example doesn't perform nearly as bad as you thought, this would not be a reason not to use IQueryable. Using IQueryable to construct more complex database queries will almost always provide better performance, because you can then benefit from database indexes, etc to fetch results more efficiently.
AsEnumerable() eagerly loads the DbSet<T> Logs
You probably want something like
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable();
The idea here is that you're applying a predicate filter to the collection before actually loading it from the database.
An impressive subset of the world of LINQ is supported by EF. It will translate your beautiful LINQ queries into SQL expressions behind the scenes.
I have come across this before.
The context command is not executed until a linq function is called, because you have done
context.Logs.AsEnumerable()
it has assumed you have finished with the query and therefore compiled it and returns all rows.
If you changed this to:
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable()
It would compile a SQL statement that would get only the rows where the id is modular 2.
Similarly if you did
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
that would create a statement that would get the top 100...
I hope that helps.
LinQ to Entities has a store expression formed by all the Linq methods before It goes to an enumeration.
When you use AsEnumerable() and then Where() like this:
context.Logs.Where(...).AsEnumerable()
The Where() knows that the previous chain call has a store expression so he appends his predicate to It for lazy loading.
The overload of Where that is being called is different if you call this:
context.Logs.AsEnumerable().Where(...)
Here the Where() only knows that his previous method is an enumeration (it could be any kind of "enumerable" collection) and the only way that he can apply his condition is iterating over the collection with the IEnumerable implementation of the DbSet class, which must to retrieve the records from the database first.
I don't think you should ever use this:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
The correct way of doing things would be:
context.Logs.AsQueryable().Where(x => x.Id % 2 == 0).Take(100).ToList();
Answer with explanations here:
What's the difference(s) between .ToList(), .AsEnumerable(), AsQueryable()?
Why use AsQueryable() instead of List()?

How can I use Math.X functions with LINQ?

I have a simple table (SQL server and EF6) Myvalues, with columns Id & Value (double)
I'm trying to get the sum of the natural log of all values in this table. My LINQ statement is:
var sum = db.Myvalues.Select(x => Math.Log(x.Value)).Sum();
It compiles fine, but I'm getting a RTE:
LINQ to Entities does not recognize the method 'Double Log(Double)' method, and this method cannot be translated into a store expression.
What am I doing wrong/how can I fix this?
FWIW, I can execute the following SQL query directly against the database which gives me the correct answer:
select exp(sum(LogCol)) from
(select log(Myvalues.Value) as LogCol From Myvalues
) results
LINQ tries to translate Math.Log into a SQL command so it is executed against the DB.
This is not supported.
The first solution (for SQL Server) is to use one of the existing SqlFunctions. More specifically, SqlFunctions.Log.
The other solution is to retrieve all your items from your DB using .ToList(), and execute Math.Log with LINQ to Objects (not LINQ to Entities).
As EF cannot translate Math.Log() you could get your data in memory and execute the function form your client:
var sum = db.Myvalues.ToList().Select(x => Math.Log(x.Value)).Sum();

no supported translation to SQL Help Linq -> SQL

I have a query that has a where clause that checks if the data element is contained within a list.
This query executes fine:
results = awardedStats.Where(r => guidReq.Contains(r.RequirementId) || orgAcr.Contains(r.Division))
.Select(r => r);
however this does not:
results = awardedStats.Where(r => guidReq.Contains(r.RequirementId) || orgAcrId.Contains(r.guidDivisionId))
.Select(r => r);
r.division is a string and orgAcr is a List
r.guidDivisionId is a Guid and orgAcrId is a List
I know that each list get the correct values, I can check the list in the debugger, but if I run the first query, everything runs through fine, if I run the second query I get an error stating that the member "guidDivisionId" has no supported translation to SQL
If orgAcrId is a List<Guid> and r.guidDivisionId is a uniqueidentifier column in SQL Server, this should be fine. Are you sure the column name isn't r.DivisionId?
Get all data from sql and call AsEnumarable() method on it and then apply the where. That way comparison would be done in memory and it won't complain about sql translation.
Another thing is that when you use contains, it's converted to Sql IN clause. All elements in the list are included in the IN clause. If your list has more than 2100 elements, then you would get sql exception saying that sql cannot accept more than 2100 parameters. Doing this kind of comparison in memory is safer.

When to prefer joins expressed with SelectMany() over joins expressed with the join keyword in Linq

Linq allows to express inner joins by using the join keyword or by using
SelectMany() (i.e. a couple of from keywords) with a where keyword:
var personsToState = from person in persons
join state in statesOfUS
on person.State equals state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState)
{
System.Diagnostics.Debug.WriteLine(item);
}
// The same query can be expressed with the query operator SelectMany(), which is
// expressed as two from clauses and a single where clause connecting the sequences.
var personsToState2 = from person in persons
from state in statesOfUS
where person.State == state.USPS
select new { person, State = state.Name };
foreach (var item in personsToState2)
{
System.Diagnostics.Debug.WriteLine(item);
}
My question: when is it purposeful to use the join-style and when to use the where-style,
has one style performance advantages over the other style?
For local queries Join is more efficient due to its keyed lookup as Athari mentioned, however for LINQ to SQL (L2S) you'll get more mileage out of SelectMany. In L2S a SelectMany ultimately uses some type of SQL join in the generated SQL depending on your query.
Take a look at questions 11 & 12 of the LINQ Quiz by Joseph/Ben Albahari, authors of C# 4.0 In a Nutshell. They show samples of different types of joins and they state:
With LINQ to SQL, SelectMany-based
joins are the most flexible, and can
perform both equi and non-equi joins.
Throw in DefaultIfEmpty, and you can
do left outer joins as well!
In addition, Matt Warren has a detailed blog post on this topic as it pertains to IQueryable / SQL here: LINQ: Building an IQueryable provider - Part VII.
Back to your question of which to use, you should use whichever query is more readable and allows you to easily express yourself and construct your end goal clearly. Performance shouldn't be an initial concern unless you are dealing with large collections and have profiled both approaches. In L2S you have to consider the flexibility SelectMany offers you depending on the way you need to pair up your data.
Join is more efficient, it uses Lookup class (a variation of Dictionary with multiple values for a single key) to find matching values.

Linq to NHibernate generating 3,000+ SQL statements in one request!

I've been developing a webapp using Linq to NHibernate for the past few months, but haven't profiled the SQL it generates until now. Using NH Profiler, it now seems that the following chunk of code hits the DB more than 3,000 times when the Linq expression is executed.
var activeCaseList = from c in UserRepository.GetCasesByProjectManagerID(consultantId)
where c.CompletionDate == null
select new { c.PropertyID, c.Reference, c.Property.Address, DaysOld = DateTime.Now.Subtract(c.CreationDate).Days, JobValue = String.Format("£{0:0,0}", c.JobValue), c.CurrentStatus };
Where the Repository method looks like:
public IEnumerable<Case> GetCasesByProjectManagerID(int projectManagerId)
{
return from c in Session.Linq<Case>()
where c.ProjectManagerID == projectManagerId
select c;
}
It appears to run the initial Repository query first, then iterates through all of the results checking to see if the CompletionDate is null, but issuing a query to get c.Property.Address first.
So if the initial query returns 2,000 records, even if only five of them have no CompletionDate, it still fires off an SQL query to bring back the address details for the 2,000 records.
The way I had imagined this would work, is that it would evaluate all of the WHERE and SELECT clauses and simply amalgamate them, so the inital query would be like:
SELECT ... WHERE ProjectManager = #p1 AND CompleteDate IS NOT NULL
Which would yield 5 records, and then it could fire the further 5 queries to obtain the addresses. Am I expecting too much here, or am I simply doing something wrong?
Anthony
Change the declaration of GetCasesByProjectManagerID:
public IQueryable<Case> GetCasesByProjectManagerID(int projectManagerId)
You can't compose queries with IEnumerable<T> - they're just sequences. IQueryable<T> is specifically designed for composition like this.
Since I can't add a comment yet. Jon Skeet is right you'll want to use IQueryable, this is allows the Linq provider to Lazily construct the SQL. IEnumerable is the eager version.

Resources