Is it possible to use method in the group by method in Linq? - linq

I am trying to group by my custom method. For example, if the group id is something, then I want to return 1 or 0 from the method of GetClientGroup, then I want to group by the value. But I am getting error such as this.
Error
could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to either AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync(). See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.
await (from o in _cdsContext.Order
where o.ClienteleId == clienteleId && o.DeliveryDate >= new DateTime(2020, 06, 29).Date
&& o.DeliveryDate != null
group o by new
{
o.ClienteleId,
o.DeliveryDate,
ClientGroup= o.OrderTypeId == 22 ? 259 : GetClientGroup(clienteleId, (int)o.GroupId),
}
into g
select new { ClienteleId = g.Key.ClienteleId}).ToListAsync()

I think you get this error at run time, not at compile time. Am I right?
IEnumerable and IQueryable
You should be aware of the difference between IEnumerable<...> and IQueryable<...>.
Object that implement IEnumerable<...> or IQueryable<...> represents the potentional to give you an enumerable sequence. Once you've got the sequence, you can ask for the first element, and once you've got this, you can ask for the next element as long as there is an element.
This iterating over the elements is usually done using a foreach (var element in sequence) {...}. This translates into the following:
IEnumerable<MyType> sequence = ... // the potential to get iterator
IEnumerator<MyType> enumerator = sequence.GetEnumerator(); // get the iterator
while (enumerator.MoveNext()) // iterate
{ // as long as there are items
MyType item = enumerator.Current; // fetch the item
ProcessItem(item); // and process it.
}
The LINQ methods that don't return IEnumerable<...> or IQueryable<...>, like ToList, ToDictionary, Count, Any, FirstOrDefault, etc internally all use foreach or GetEnumerator
An object that implements IEnumerable<...> is meant to be processed by your local process. The object holds everything to be able to iterate, inclusive calls to local methods.
On the other hand, an object that implements IQueryable<...>, like your _cdsContext.Order is meant to be processed by another process, usually a database management system.
This object holds an Expression and a Provider. The Expression is a generic form of the data that you want to query. The Provider knows who has to execute the query, and what language is used (usually SQL)
Concatenating LINQ statements won't execute the query, they will only change the Expression. When (deep inside) GetEnumerator() is called, the Expression is sent to the Provider, who will translate it into SQL and execute the query at the DBMS. The fetched data is represented as an iterator to your process, who will repeatedly call MoveNext() and Current.
Back to your question
Your GroupBy contains a call to a local method. The GroupBy won't execute the query, it will only change the Expression. In the end you do a ToList. The Tolist will do a GetEnumerator(). The Expression is sent to the Provider who will try to translate it into SQL.
Alas, your provider doesn't know your local method GetClientGroup, and thus can't convert it into SQL. In fact, apart from all your local methods, there are also several LINQ methods that can't be translated into SQL. See Supported and Unsupported LINQ methods (LINQ to entities)
Your compiler doesn't know which methods the provider can translate, so the compiler won't complain. Only at run time, when you do a ToList, the problem is detected.
How to solve the problem
The problem is in parameter KeySelector of Queryable.GroupBy
Expression<Func<TSource,TKey>> keySelector
Alas you forgot to write what GetClientGroup does. It seems that it takes the ClienteleId and the GroupId of an Order, and returns an integer that is similar to a ClientGroup.
The most easy would be to replace the call to GetClientGroup with the code that is in that method. Don't call any other methods
DateTime deliveryLimitDate = new DateTime(2020, 06, 29).Date;
var result = dbContext.Orders
.Where (order => order.ClienteleId == clienteleId
&& order.DeliveryDate != null
&& order.DeliveryDate >= deliveryLimitDate)
.GroupBy(order => new // Parameter KeySelector
{
ClienteleId = order.ClienteleId,
DeliveryDate = order.DeliveryDate,
ClientGroup= order.OrderTypeId == 22 ? 259 :
// formula in GetClientGroup(...)
// for example
(int)order.GroupId << 16 + order.ClienteleId
// parameter ResultSelector
group => new { ClienteleId = group.Key.ClienteleId});
Instead of a separate Select, I used the GroupBy overload with a parameter ResultSelector. Your result is a sequence of objects with only one property ClienteleId. Consider to return only a sequence of ClienteleId:
// parameter ResultSelector
group => group.Key.ClienteleId});
Alas, since I don't know your GetClientGroup, I can't give you parameter KeySelector

Related

LINQ to Entities seemingly odd behavior

I have this Linq query that translates very oddly to SQL. I get the correct results but there must be a better way. So question 1 is:
Why is it that in SQL I get no group by, no count and all of the
columns are returned instead of just 2; and then the results in C# are correct? (I checked with profiler).
and question 2 is:
I would like to modify the query slightly so that I get also the
results where count is 0. At the moment I only get where counts > 0
because of the group by.
LINQ:
List<Tuple<string, int>> countPerType = db1.Audits
.OrderBy(p => p.CreatedBy)
.GroupBy(o => new { o.Type, o.CreatedBy })
.ToList()
.Select(g => new Tuple<string, int>(g.Select(f => f.CreatedBy + ',' + f.Type).FirstOrDefault(),
(int?)g.Count() ?? 0))
.ToList();
Note that if I remove the .ToList() in the middle, I get exception "only parameterless constructors and initializers are supported in linq to entities".
Thanks for your input
You run into several problems. I think the cause of this is that you aren't aware of the difference between queries that are AsEnumerable and queries that are AsQueryable.
AsEnumerable queries contain all information to enumerate over the elements in the query. The query will be executed by your process.
An AsQueryable query, contains a Expression and a Provider. The Provider knows who will execute the query, and how to communicate with this executer. Quite often the executer will be a database, but it can be other things, like internet queries, jswon files etc.
In your case the executer will be a database, the language will be SQL.
When the GetEnumerator() function of your IQueryable is called, the Provider is ordered to translate the Expression into the language that the executor knows. The translated query is sent to the executor and the returned data is put into an Enumerator (not IEnumerable!)
Of course SQL does not know what a System.Tuple is, nor does it know functions like String.operator+
Therefore your Provider can't translate your expression into SQL. That is the reason you have to do your first ToList()
You can't make queries as IQueryable with any of your own functions, and only a limited amount of .NET functions.
See this list of supported and unsupported Linq methods
It is not advise to use ToList() in this stadium of your query, because it enumerates all elements of your sequence, will in fact you only need an enumerator. It could be that during the rest of your query you'd only want a few elements. In that case it would be a waste to enumerate over all of them to create a list, and then to enumerate again to do the rest of your LINQ.
Instead of ToList() use Enumerable.AsEnumerable(). This will bring all data of the query to local memory and create an IEnumerable of it: the elements are not enumerated yet. This will allow you to call local functions with the rest of your query.
Another problem is that you transport way more data to local memory than you plan to use. One of the slower parts of database queries is the transport of data to your process. You should minimize the amount of data.
You took all Audits, and created groups of Audits that have the same values for (Type, CreatedBy). In other words: all Audits in the same group have the same values for (Type, CreatedBy). This value is also the Key of the group.
You don't want all Audits locally, you only want the Key of the group and the number of elements of this group (= the number of audits that have (Type, CreatedBy) equal to the key.
This is the only data you need to transport to local memory: Type, CreatedBy and the number of audits in the group:
var result = db1.Audits.GroupBy(o => new { o.Type, o.CreatedBy })
.Select(group => new
{
Type = group.Key.Type,
CreatedBy = group.Key.CreatedBy,
AuditCount = group.Count(),
})
.OrderBy(item => item.CreatedBy)
// the data that is left is the data you need locally
// bring to local memory:
.AsEnumerable()
// if you want you can put Type and CreatedBy into one string
.Select(item => new
{
AuditType = item.Type + item.CreatedBy,
AuditCount = item.AuditCount,
});
I chose not to put the result in a Tuple, because you would lose the help from the compiler if you mix up fields. But if you really want to suit yourself.

Why is Entity Framework's AsEnumerable() downloading all data from the server?

What is the explanation for EF downloading all result rows when AsEnumerable() is used?
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
What I would like it to do, is to download only enough to gather 100 rows that would satisfy the Id % 2 == 0 condition (most likely just around 200 rows).
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
I suppose that it does not work like that for a reason and I'd like to hear a good argument supporting that design decision.
NOTE: This is a completely contrived example and I know normally you should not use EF this way, but I found this in some existing code and was just surprised my assumptions turned out to be incorrect.
The short answer: The reason for the different behaviors is that, when you use IQueryable directly, a single SQL query can be formed for your entire LINQ query; but when you use IEnumerable, the entire table of data must be loaded.
The long answer: Consider the following code.
context.Logs.Where(x => x.Id % 2 == 0)
context.Logs is of type IQueryable<Log>. IQueryable<Log>.Where is taking an Expression<Func<Log, bool>> as the predicate. The Expression represents an abstract syntax tree; that is, it's more than just code you can run. Think of it as being represented in memory, at runtime, like this:
Lambda (=>)
Parameters
Variable: x
Body
Equals (==)
Modulo (%)
PropertyAccess (.)
Variable: x
Property: Id
Constant: 2
Constant: 0
The LINQ-to-Entities engine can take context.Logs.Where(x => x.Id % 2 == 0) and mechanically convert it into a SQL query that looks something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0;
If you change your code to context.Logs.Where(x => x.Id % 2 == 0).Take(100), the SQL query becomes something like this:
SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0
LIMIT 100;
This is entirely because the LINQ extension methods on IQueryable use Expression instead of just Func.
Now consider context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0). The IEnumerable<Log>.Where extension method is taking a Func<Log, bool> as a predicate. That is only runnable code. It cannot be analyzed to determine its structure; it cannot be used to form a SQL query.
Entity Framework and Linq use lazy loading. It means (among other things) that they will not run the query until they need to enumerate the results: for instance using ToList() or AsEnumerable(), or if the result is used as an enumerator (in a foreach for instance).
Instead, it builds a query using predicates, and returns IQueryable objects to further "pre-filter" the results before actually returning them. You can find more infos here for instance. Entity framework will actually build a SQL query depending on the predicates you have passed it.
In your example:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
From the Logs table in the context, it fetches all, returns a IEnumerable with the results, then filters the result, takes the first 100, then lists the results as a List.
On the other hand, just removing the AsEnumerable solves your problem:
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
Here it will build a query/filter on the result, then only once the ToList() is executed, query the database.
It also means that you can dynamically build a complex query without actually running it on the DB it until the end, for instance:
var logs = context.Logs.Where(a); // first filter
if (something) {
logs = logs.Where(b); // second filter
}
var results = logs.Take(100).ToList(); // only here is the query actually executed
Update
As mentionned in your comment, you seem to already know what I just wrote, and are just asking for a reason.
It's even simpler: since AsEnumerable casts the results to another type (a IQueryable<T> to IEnumerable<T> in this case), it has to convert all the results rows first, so it has to fetch the data first. It's basically a ToList in this case.
Clearly, you understand why it's better to avoid using AsEnumerable() the way you do in your question.
Also, some of the other answers have made it very clear why calling AsEnumerable() changes the way the query is performed and read. In short, it's because you are then invoking IEnumrable<T> extension methods rather than the IQueryable<T> extension methods, the latter allowing you to combine predicates before executing the query in the database.
However, I still feel that this doesn't answer your actual question, which is a legitimate question. You said (emphasis mine):
What I mean is that this code:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.
My question to you is: what made you conclude that this is true?
I would argue that, because you are using IEnumrable<T> instead of IQueryable<T>, it's true that the query being performed in the database will be a simple:
select * from logs
... without any predicates, unlike what would have happened if you had used IQueryable<T> to invoke Where and Take.
However, the AsEnumerable() method call does not fetch all the rows at that moment, as other answers have implied. In fact, this is the implementation of the AsEnumerable() call:
public static IEnumerable<TSource> AsEnumerable<TSource>(this IEnumerable<TSource> source)
{
return source;
}
There is no fetching going on there. In fact, even the calls to IEnumerable<T>.Where() and IEnumerable<T>.Take() don't actually start fetching any rows at that moment. They simply setup wrapping IEnumerables that will filter results as they are iterated on. The fetching and iterating of the results really only begins when ToList() is called.
So when you say:
Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?
... again, my question to you would be: doesn't it do that already?
If your table had 1,000,000 rows, I would still expect your code snippet to only fetch up to 100 rows that satisfy your Where condition, and then stop fetching rows.
To prove the point, try running the following little program:
static void Main(string[] args)
{
var list = PretendImAOneMillionRecordTable().Where(i => i < 500).Take(10).ToList();
}
private static IEnumerable<int> PretendImAOneMillionRecordTable()
{
for (int i = 0; i < 1000000; i++)
{
Console.WriteLine("fetching {0}", i);
yield return i;
}
}
... when I run it, I only get the following 10 lines of output:
fetching 0
fetching 1
fetching 2
fetching 3
fetching 4
fetching 5
fetching 6
fetching 7
fetching 8
fetching 9
It doesn't iterate through the whole set of 1,000,000 "rows" even though I am chaining Where() and Take() calls on IEnumerable<T>.
Now, you do have to keep in mind that, for your little EF code snippet, if you test it using a very small table, it may actually fetch all the rows at once, if all the rows fit within the value for SqlConnection.PacketSize. This is normal. Every time SqlDataReader.Read() is called, it never only fetches a single row at a time. To reduce the amount of network call roundtrips, it will always try to fetch a batch of rows at a time. I wonder if this is what you observed, and this mislead you into thinking that AsEnumerable() was causing all rows to be fetched from the table.
Even though you will find that your example doesn't perform nearly as bad as you thought, this would not be a reason not to use IQueryable. Using IQueryable to construct more complex database queries will almost always provide better performance, because you can then benefit from database indexes, etc to fetch results more efficiently.
AsEnumerable() eagerly loads the DbSet<T> Logs
You probably want something like
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable();
The idea here is that you're applying a predicate filter to the collection before actually loading it from the database.
An impressive subset of the world of LINQ is supported by EF. It will translate your beautiful LINQ queries into SQL expressions behind the scenes.
I have come across this before.
The context command is not executed until a linq function is called, because you have done
context.Logs.AsEnumerable()
it has assumed you have finished with the query and therefore compiled it and returns all rows.
If you changed this to:
context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable()
It would compile a SQL statement that would get only the rows where the id is modular 2.
Similarly if you did
context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();
that would create a statement that would get the top 100...
I hope that helps.
LinQ to Entities has a store expression formed by all the Linq methods before It goes to an enumeration.
When you use AsEnumerable() and then Where() like this:
context.Logs.Where(...).AsEnumerable()
The Where() knows that the previous chain call has a store expression so he appends his predicate to It for lazy loading.
The overload of Where that is being called is different if you call this:
context.Logs.AsEnumerable().Where(...)
Here the Where() only knows that his previous method is an enumeration (it could be any kind of "enumerable" collection) and the only way that he can apply his condition is iterating over the collection with the IEnumerable implementation of the DbSet class, which must to retrieve the records from the database first.
I don't think you should ever use this:
context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();
The correct way of doing things would be:
context.Logs.AsQueryable().Where(x => x.Id % 2 == 0).Take(100).ToList();
Answer with explanations here:
What's the difference(s) between .ToList(), .AsEnumerable(), AsQueryable()?
Why use AsQueryable() instead of List()?

Select one unique instance from LINQ query

I'm using LINQ to SQL to obtain data from a set of database tables. The database design is such that given a unique ID from one table (Table A) one and only one instance should be returned from an associated table (Table B).
Is there a more concise way to compose this query and ensure that only one item was returned without using the .Count() extension method like below:
var set = from itemFromA in this.dataContext.TableA
where itemFromA.ID == inputID
select itemFromA.ItemFromB;
if (set.Count() != 1)
{
// Exception!
}
// Have to get individual instance using FirstOrDefault or Take(1)
FirstOrDefault helps somewhat but I want to ensure that the returned set contains only one instance and not more.
It sounds like you want Single:
var set = from itemFromA in this.dataContext.TableA
where itemFromA.ID == inputID
select itemFromA.ItemFromB;
var onlyValue = set.Single();
Documentation states:
Returns the only element of a sequence, and throws an exception if there is not exactly one element in the sequence.
Of course that means you don't get to customize the message of the exception... if you need to do that, I'd use something like:
// Make sure that even if something is hideously wrong, we only transfer data
// for two elements...
var list = set.Take(2).ToList();
if (list.Count != 1)
{
// Throw an exception
}
var item = list[0];
The benefit of this over your current code is that it will avoid evaluating the query more than once.

NHibernate IQueryable doesn't seem to delay execution

I'm using NHibernate 3.2 and I have a repository method that looks like:
public IEnumerable<MyModel> GetActiveMyModel()
{
return from m in Session.Query<MyModel>()
where m.Active == true
select m;
}
Which works as expected. However, sometimes when I use this method I want to filter it further:
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.ID < 100
select new { m.Name };
Which produces the same SQL as the first one and the second filter and select must be done after the fact. I thought the whole point in LINQ is that it formed an expression tree that was unravelled when it's needed and therefore the correct SQL for the job could be created, saving my database requests.
If not, it means all of my repository methods have to return exactly what is needed and I can't make use of LINQ further down the chain without taking a penalty.
Have I got this wrong?
Updated
In response to the comment below: I omitted the line where I iterate over the results, which causes the initial SQL to be run (WHERE Active = 1) and the second filter (ID < 100) is obviously done in .NET.
Also, If I replace the second chunk of code with
var models = MyRepository.GetActiveMyModel();
var filtered = from m in models
where m.Items.Count > 0
select new { m.Name };
It generates the initial SQL to retrieve the active records and then runs a separate SQL statement for each record to find out how many Items it has, rather than writing something like I'd expect:
SELECT Name
FROM MyModel m
WHERE Active = 1
AND (SELECT COUNT(*) FROM Items WHERE MyModelID = m.ID) > 0
You are returning IEnumerable<MyModel> from the method, which will cause in-memory evaluation from that point on, even if the underlying sequence is IQueryable<MyModel>.
If you want to allow code after GetActiveMyModel to add to the SQL query, return IQueryable<MyModel> instead.
You're running IEnumerable's extension method "Where" instead of IQueryable's. It will still evaluate lazily and give the same output, however it evaluates the IQueryable on entry and you're filtering the collection in memory instead of against the database.
When you later add an extra condition on another table (the count), it has to lazily fetch each and every one of the Items collections from the database since it has already evaluated the IQueryable before it knew about the condition.
(Yes, I would also like to be the extensive extension methods on IEnumerable to instead be virtual members, but, alas, they're not)

Entity Framework LINQ Query using Custom C# Class Method - Once yes, once no - because executing on the client or in SQL?

I have two Entity Framework 4 Linq queries I wrote that make use of a custom class method, one works and one does not:
The custom method is:
public static DateTime GetLastReadToDate(string fbaUsername, Discussion discussion)
{
return (discussion.DiscussionUserReads.Where(dur => dur.User.aspnet_User.UserName == fbaUsername).FirstOrDefault() ?? new DiscussionUserRead { ReadToDate = DateTime.Now.AddYears(-99) }).ReadToDate;
}
The linq query that works calls a from after a from, the equivalent of SelectMany():
from g in oc.Users.Where(u => u.aspnet_User.UserName == fbaUsername).First().Groups
from d in g.Discussions
select new
{
UnReadPostCount = d.Posts.Where(p => p.CreatedDate > DiscussionRepository.GetLastReadToDate(fbaUsername, p.Discussion)).Count()
};
The query that does not work is more like a regular select:
from d in oc.Discussions
where d.Group.Name == "Student"
select new
{
UnReadPostCount = d.Posts.Where(p => p.CreatedDate > DiscussionRepository.GetLastReadToDate(fbaUsername, p.Discussion)).Count(),
};
The error I get is:
LINQ to Entities does not recognize the method 'System.DateTime GetLastReadToDate(System.String, Discussion)' method, and this method cannot be translated into a store expression.
My question is, why am I able to use my custom GetLastReadToDate() method in the first query and not the second? I suppose this has something to do with what gets executed on the db server and what gets executed on the client? These queries seem to use the GetLastReadToDate() method so similarly though, I'm wondering why would work for the first and not the second, and most importantly if there's a way to factor common query syntax like what's in the GetLastReadToDate() method into a separate location to be reused in several different other LINQ queries.
Please note all these queries are sharing the same object context.
I think your better of using a Model Defined Function here.
Define a scalar function in your database which returns a DateTime, pass through whatever you need, map it on your model, then use it in your LINQ query:
from g in oc.Users.Where(u => u.aspnet_User.UserName == fbaUsername).First().Groups
from d in g.Discussions
select new
{
UnReadPostCount = d.Posts.Where(p => p.CreatedDate > myFunkyModelFunction(fbaUsername, p.Discussion)).Count()
};
and most importantly if there's a way to factor common query syntax like what's in the GetLastReadToDate() method into a separate location to be reused in several different places LINQ queries.
A stored procedure would probably be one way to store that 'common query syntax"...EF, at least 4.0, works very nicely with SP's.

Resources