Optimizing orderby sum in LINQ and Entity Framework - linq

I wrote a LINQ query that performs an orderby on an Entity Framework Core (.NET Core 2.0.7) database context using the Sum extension method. It works fine on a small sample database, but when running against a larger database ~100,000 entries, it becomes significantly slower and uses more CPU. I have pasted the relevant code below. Is there a way to perform the Sum faster? (it's essentially a weighted average on an arbitrary number of tuples).
var iqClientIds = (from stat in context.Set<EFClientStatistics>()
join client in context.Clients
on stat.ClientId equals client.ClientId
group stat by stat.ClientId into s
orderby s.Sum(cs => (cs.Performance * cs.TimePlayed)) / s.Sum(cs => cs.TimePlayed) descending
select new
{
s.First().ClientId,
})
.Skip(start)
.Take(count);
Thanks!

EF Core 2 handles GroupJoin with translation to SQL and your query can be converted to use this:
var iqClientIds = (from client in context.Clients
join stat in context.Set<EFClientStatistics>() on client.ClientId equals stat.ClientId into sj
orderby sj.Sum(s => (s.Performance * s.TimePlayed)) / sj.Sum(s => s.TimePlayed) descending
select sj.First().ClientId
)
.Skip(start)
.Take(count);
NOTE: I simplified the result (select) to not create an anonymous object for a single value.

Related

Load only some elements of a nested collection efficiently with LINQ

I have the following LINQ query (using EF Core 6 and MS SQL Server):
var resultSet = dbContext.Systems
.Include(system => system.Project)
.Include(system => system.Template.Type)
.Select(system => new
{
System = system,
TemplateText = system.Template.TemplateTexts.FirstOrDefault(templateText => templateText.Language == locale.LanguageIdentifier),
TypeText = system.Template.Type.TypeTexts.FirstOrDefault(typeText => typeText.Language == locale.LanguageIdentifier)
})
.FirstOrDefault(x => x.System.Id == request.Id);
The requirement is to retrieve the system matching the requested ID and load its project, template and template's type info. The template has multiple TemplateTexts (one for each translated language) but I only want to load the one matching the requested locale, same deal with the TypeTexts elements of the template's type.
The LINQ query above does that in one query and it gets converted to the following SQL query (I edited the SELECT statements to use * instead of the long list of columns generated):
SELECT [t1].*, [t2].*, [t5].*
FROM (
SELECT TOP(1) [p].*, [t].*, [t0].*
FROM [ParkerSystems] AS [p]
LEFT JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
LEFT JOIN [Types] AS [t0] ON [t].[TypeId] = [t0].[Id]
LEFT JOIN [Projects] AS [p0] ON [p].[Project_ProjectId] = [p0].[ProjectId]
WHERE [p].[SystemId] = #__request_Id_1
) AS [t1]
LEFT JOIN (
SELECT [t3].*
FROM (
SELECT [t4].*, ROW_NUMBER() OVER(PARTITION BY [t4].[ReferenceId] ORDER BY [t4].[Id]) AS [row]
FROM [TemplateTexts] AS [t4]
WHERE [t4].[Language] = #__locale_LanguageIdentifier_0
) AS [t3]
WHERE [t3].[row] <= 1
) AS [t2] ON [t1].[Id] = [t2].[ReferenceId]
LEFT JOIN (
SELECT [t6].*
FROM (
SELECT [t7].*, ROW_NUMBER() OVER(PARTITION BY [t7].[ReferenceId] ORDER BY [t7].[Id]) AS [row]
FROM [TypeTexts] AS [t7]
WHERE [t7].[Language] = #__locale_LanguageIdentifier_0
) AS [t6]
WHERE [t6].[row] <= 1
) AS [t5] ON [t1].[Id0] = [t5].[ReferenceId]
which is not bad, it's not a super complicated query, but I feel like my requirement can be solved with a much simpler SQL query:
SELECT *
FROM [Systems] AS [p]
JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
JOIN [TemplateTexts] AS [tt] ON [p].[TemplateId] = [tt].[ReferenceId]
JOIN [Types] AS [ty] ON [t].[TypeId] = [ty].[Id]
JOIN [TemplateTexts] AS [tyt] ON [ty].[Id] = [tyt].[ReferenceId]
WHERE [p].[SystemId] = #systemId and tt.[Language] = 2 and tyt.[Language] = 2
My question is: is there a different/simpler LINQ expression (either in Method syntax or Query syntax) that produces the same result (get all info in one go) because ideally I'd like to not have to have an anonymous object where the filtered sub-collections are aggregated. For even more brownie points, it'd be great if the generated SQL would be simpler/closer to what I think would be a simple query.
Is there a different/simpler LINQ expression (...) that produces the same result
Yes (maybe) and no.
No, because you're querying dbContext.Systems, therefore EF will return all systems that match your filter, also when they don't have TemplateTexts etc. That's why it has to generate outer joins. EF is not aware of your apparent intention to skip systems without these nested data or of any guarantee that these systems don't occur in the database. (Which you seem to assume, seeing the second query).
That accounts for the left joins to subqueries.
These subqueries are generated because of FirstOrDefault. In SQL it always requires some sort of subquery to get "first" records of one-to-many relationships. This ROW_NUMBER() OVER construction is actually quite efficient. Your second query doesn't have any notion of "first" records. It'll probably return different data.
Yes (maybe) because you also Include data. I'm not sure why. Some people seem to think Include is necessary to make subsequent projections (.Select) work, but it isn't. If that's your reason to use Includes then you can remove them and thus remove the first couple of joins.
OTOH you also Include system.Project which is not in the projection, so you seem to have added the Includes deliberately. And in this case they have effect, because the entire entity system is in the projection, otherwise EF would ignore them.
If you need the Includes then again, EF has to generate outer joins for the reason mentioned above.
EF decides to handle the Includes and projections separately, while hand-crafted SQL, aided by prior knowledge of the data could do that more efficiently. There's no way to affect that behavior though.
This LINQ query is close to your SQL, but I'm afraid of correctness of the result:
var resultSet =
(from system in dbContext.Systems
from templateText in system.Template.TemplateTexts
where templateText.Language == locale.LanguageIdentifier
from typeText in system.Template.Type.TypeTexts
where typeText.Language == locale.LanguageIdentifier
select new
{
System = system,
TemplateText = templateText
TypeText = typeText
})
.FirstOrDefault(x => x.System.Id == request.Id);

Entity Framework Core + Count with Group By

I have a table which contains ~600k records and 33 columns. In my project I am using EF Core (2.0.1) to retrieve data from database. I am having issues with below code:
var theCounter = (from f in _context.tblData.Take(100000)
group f by f.TypeId into data
select new DataDto { ID = data.Key, Count = data.Count() }).ToList();
This code is a part of REST API and when I am testing it from SOAP UI, I am gettin timeout error. When I tested the code for
Take(1000)
There are around 300 unique TypeIds.
it works fine. Any ideas how I can make it work?
-- EDIT 1:
Here is what I see when debugging the code:
Microsoft.EntityFrameworkCore.Query:Warning: Query: '(from TblData <generated>_1 in DbSet<TblData> select [<generated>_1]).Take(__p_0)' uses a row limiting operation (Skip/Take) without OrderBy which may lead to unpredictable results.
Microsoft.EntityFrameworkCore.Query:Warning: Query: '(from TblData <generated>_1 in DbSet<TblData> select [<generated>_1]).Take(__p_0)' uses a row limiting operation (Skip/Take) without OrderBy which may lead to unpredictable results.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'GroupBy([f].TypeId, [f])' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'GroupBy([f].TypeId, [f])' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'Count()' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (131ms) [Parameters=[#__p_0='?'], CommandType='Text', CommandTimeout='30']
SELECT [t2].[Id], [t2].[at], [t2].[add], [t2].[AddDate], [t2].[aftc], [t2].[aftcd], [t2].[aid], [t2].[afl], [t2].[prdid], [t2].[cid], [t2].[TypeId], [t2].[env], [t2].[ext], [t2].[extddcode], [t2].[fn], [t2].[fn], [t2].[fic], [t2].[gid], [t2].[grp], [t2].[hnm], [t2].[IP], [t2].[icid], [t2].[ln], [t2].[lg], [t2].[pcid], [t2].[ret], [t2].[rts], [t2].[rnam], [t2].[sled], [t2].[seq], [t2].[sid], [t2].[styp]
FROM (
SELECT TOP(#__p_0) [t1].[Id], [t1].[at], [t1].[add], [t1].[AddDate], [t1].[aftc], [t1].[aftcd], [t1].[aid], [t1].[afl], [t1].[prdid], [t1].[cid], [t1].[TypeId], [t1].[env], [t1].[ext], [t1].[extddcode], [t1].[fn], [t1].[fn], [t1].[fic], [t1].[gid], [t1].[grp], [t1].[hnm], [t1].[IP], [t1].[icid], [t1].[ln], [t1].[lg], [t1].[pcid], [t1].[ret], [t1].[rts], [t1].[rnam], [t1].[sled], [t1].[seq], [t1].[sid], [t1].[styp]
FROM [TblData] AS [t1]
) AS [t2]
WHERE [t2].[TypeId] IS NOT NULL
ORDER BY [t2].[TypeId]
I think it is not translated properly. Any ideas why?
-- EDIT 2:
I have changed my queries to:
var query = _context.TblData
.Select(a => new {ID = a.Id, TypeId= a.TypeId})
.Distinct();
var q1 = query.GroupBy(p => p.TypeId)
.Select(g => new DataDto {TypeId= g.Key, Count = g.Count()});
return await q1.ToListAsync();
But it was translated to:
SELECT DISTINCT [a0].[Id], [a0].[TypeId] AS [TypeId]
FROM [tblData] AS [a0]
ORDER BY [a0].[TypeId]
When I checked directly in the database this query takes 14 seconds to execute. Any idea why it was not translated to something like:
SELECT DISTINCT [a0].[Id], COUNT([TypeId]) AS [TypeId]
FROM [tblData] AS [a0]
GROUP BY COUNT([a0].[Id])
ORDER BY [a0].[TypeId]
I had to upgrade EF Core version to 2.1 and LINQ is now translated properly into SQL.

EF - Linq Expression and using a List of Ints to get best performance

So I have a list(table) of about 100k items and I want to retrieve all values that match a given list.
I have something like this.
the Table Sections key is NOT a primary key, so I'm expecting each value in listOfKeys to return a few rows.
List<int> listOfKeys = new List<int>(){1,3,44};
var allSections = Sections.Where(s => listOfKeys.Contains(s.id));
I don't know if it makes a difference but generally listOfKeys will only have between 1 to 3 items.
I'm using the Entity Framework.
So my question is, is this the best / fastest way to include a list in a linq expression?
I'm assuming that it isn't better to use another .NETICollection data object. Should I be using a Union or something?
Thanks
Suppose the listOfKeys will contain only small about of items and it's local list (not from database), like <50, then it's OK. The query generated will be basically WHERE id in (...) or WHERE id = ... OR id = ... ... and that's OK for database engine to handle it.
A Join would probably be more efficient:
var allSections =
from s in Sections
join k in listOfKeys on s.id equals k
select s;
Or, if you prefer the extension method syntax:
var allSections = Sections.Join(listOfKeys, s => s.id, k => k, (s, k) => s);

(Fluent)NHibernate SELECT IN

Given the following tables:
tool
*toolid
*n other fields
process
*processid
*n other fields
toolprocess
*toolprocessid
*toolid
*processid
*n other fields
When trying to select all tools for a specific process I get up to a few thousand selects on toolprocess where my Linq looks like this:
from tool in tools
where toolprocesses.Any(t=>t.Tool.Id==tool.Id)
select tool
where toolprocesses contains the list of toolprocesses with the same processid
In SQL I would just write
SELECT * FROM TOOL WHERE toolid IN
(SELECT TOOLID FROM TOOLPROCESS WHERE processid = 'someid');
It takes almost no time and works as expected
How can I get NHibernate to create this query (or something similar)?
I don't know if you can do it in Query, but you can do it in QueryOver / Criteria.
In QueryOver it would look like:
var subQuery = QueryOver.Of<Toolprocess>()
.Where(x => x.Process.Id == id)
.Select(x => x.Tool.Id);
var result = session.QueryOver<Tool>()
.WithSubquery.WhereProperty(x => x.Id).In(subQuery)
.List();
http://www.philliphaydon.com/2010/09/28/queryover-with-nhibernate-3-lovin-it/
Alternatively, if you want to do Exists rather than In, I've blogged about it here:
http://www.philliphaydon.com/2011/01/19/revisiting-exists-in-nhibernate-3-0-and-queryover/
Try
from t in Session.Query<Tool>()
join tp in Session.Query<Toolprocess>() on t equals tp.Tool
where tp.Process.Id == 'someid'
select t;
I assume that you're using NH 3.X. This should be even faster than the Select...Where...In query.

Linq filter collection with EF

I'm trying to get Entity Framework to select an object and filter its collection at the same time. I have a JobSeries object which has a collection of jobs, what I need to do is select a jobseries by ID and filter all the jobs by SendDate but I can't believe how difficult this simple query is!
This is the basic query which works:
var q = from c in KnowledgeStoreEntities.JobSeries
.Include("Jobs.Company")
.Include("Jobs.Status")
.Include("Category")
.Include("Category1")
where c.Id == jobSeriesId
select c;
Any help would be appreciated, I've been trying to find something in google and what I want to do is here:http://blogs.msdn.com/bethmassi/archive/2009/07/16/filtering-entity-framework-collections-in-master-detail-forms.aspx
It's in VB.NET though and I couldn't convert it to C#.
EDIT: I've tried this now and it doesn't work!:
var q = from c in KnowledgeStoreEntities.JobSeries
.Include("Jobs")
.Include("Jobs.Company")
.Include("Jobs.Status")
.Include("Category")
.Include("Category1")
where (c.Id == jobSeriesId & c.Jobs.Any(J => J.ArtworkId == "13"))
select c;
Thanks
Dan
Include can introduce performance problems. Lazy loading is guaranteed to introduce performance problems. Projection is cheap and easy:
var q = from c in KnowledgeStoreEntities.JobSeries
where c.Id == jobSeriesId
select new
{
SeriesName = c.Name,
Jobs = from j in c.Jobs
where j.SendDate == sendDate
select new
{
Name = j.Name
}
CategoryName = c.Category.Name
};
Obviously, I'm guessing at the names. But note:
Filtering works.
SQL is much simpler.
No untyped strings anywhere.
You always get the data you need, without having to specify it in two places (Include and elsewhere).
No bandwith penalties for retrieving columns you don't need.
Free performance boost in EF 4.
The key is to think in LINQ, rather than in SQL or in materializing entire entities for no good reason as you would with older ORMs.
I've long given up on .Include() and implemented Lazy loading for Entity Framework

Resources