Optimizing LINQ Any() call in Entity Framework - linq

After profiling my Entity Framework 4.0 based database layer I have found the major performance sinner to be a simple LINQ Any() I use to check if an entity is already existing in the database. The Any() check performs orders of magnitude slower than saving the entity.
There are relatively few rows in the database and the columns being checked are indexed.
I use the following LINQ to check for the existence of a setting group:
from sg in context.SettingGroups
where sg.Group.Equals(settingGroup) && sg.Category.Equals(settingCategory)
select sg).Any()
This generates the following SQL (additionally my SQL profiler claims the query is executed twice):
exec sp_executesql N'SELECT
CASE WHEN ( EXISTS (SELECT
1 AS [C1]
FROM [dbo].[SettingGroups] AS [Extent1]
WHERE ([Extent1].[Group] = #p__linq__0) AND ([Extent1].[Category] = #p__linq__1)
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT
1 AS [C1]
FROM [dbo].[SettingGroups] AS [Extent2]
WHERE ([Extent2].[Group] = #p__linq__0) AND ([Extent2].[Category] = #p__linq__1)
)) THEN cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]',N'#p__linq__0 nvarchar(4000),#p__linq__1 nvarchar(4000)',#p__linq__0=N'Cleanup',#p__linq__1=N'Mediator'
Right now I can only think of creating stored procedures to solve this problem, but I would of course prefer to keep the code in LINQ.
Is there a way to make such an "Exist" check run faster with EF?
I should probably mention that I also use self-tracking-entities in an n-tier architecture. In some scenarios the ChangeTracker state for some entities is set to "Added" even though they already exist in the database. This is why I use a check to change the ChangeTracker state accordingly if updating the database caused an insert failure exception.

Try adding index to the database table "SettingGroups", by Group & Category.
BTW, does this produce similar sql?
var ok = context.SettingGroups.Any(sg => sg.Group==settingGroup && sg.Category==settingCategory);

The problem is Entity Framework (at least EF4) is generating stupid SQL. The following code seems to generate decent SQL with minimal pain.
public static class LinqExt
{
public static bool BetterAny<T>( this IQueryable<T> queryable, Expression<Func<T, bool>> predicate)
{
return queryable.Where(predicate).Select(x => (int?)1).FirstOrDefault().HasValue;
}
public static bool BetterAny<T>( this IQueryable<T> queryable)
{
return queryable.Select(x => (int?)1).FirstOrDefault().HasValue;
}
}
Then you can do:
(from sg in context.SettingGroups
where sg.Group.Equals(settingGroup) && sg.Category.Equals(settingCategory)
select sg).BetterAny()
or even:
context.SettingGroups.BetterAny(sg => sg.Group.Equals(settingGroup) && sg.Category.Equals(settingCategory));

I know it sounds a miserable solution, but what happens if you use Count instead of Any?

Have you profiled the time to execute the generated select statement against the time to execute the select what you would expect/like to be produced ? It is possible that it is not as bad as it looks.
The section
SELECT
1 AS [C1]
FROM [dbo].[SettingGroups] AS [Extent1]
WHERE ([Extent1].[Group] = #p__linq__0) AND ([Extent1].[Category] = #p__linq__1)
is probably close to what you would expect to be produced. It is quite possible that the query optimiser will realise the second query is the same as the first and hence it may add very little time to the overall query.

Related

Load only some elements of a nested collection efficiently with LINQ

I have the following LINQ query (using EF Core 6 and MS SQL Server):
var resultSet = dbContext.Systems
.Include(system => system.Project)
.Include(system => system.Template.Type)
.Select(system => new
{
System = system,
TemplateText = system.Template.TemplateTexts.FirstOrDefault(templateText => templateText.Language == locale.LanguageIdentifier),
TypeText = system.Template.Type.TypeTexts.FirstOrDefault(typeText => typeText.Language == locale.LanguageIdentifier)
})
.FirstOrDefault(x => x.System.Id == request.Id);
The requirement is to retrieve the system matching the requested ID and load its project, template and template's type info. The template has multiple TemplateTexts (one for each translated language) but I only want to load the one matching the requested locale, same deal with the TypeTexts elements of the template's type.
The LINQ query above does that in one query and it gets converted to the following SQL query (I edited the SELECT statements to use * instead of the long list of columns generated):
SELECT [t1].*, [t2].*, [t5].*
FROM (
SELECT TOP(1) [p].*, [t].*, [t0].*
FROM [ParkerSystems] AS [p]
LEFT JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
LEFT JOIN [Types] AS [t0] ON [t].[TypeId] = [t0].[Id]
LEFT JOIN [Projects] AS [p0] ON [p].[Project_ProjectId] = [p0].[ProjectId]
WHERE [p].[SystemId] = #__request_Id_1
) AS [t1]
LEFT JOIN (
SELECT [t3].*
FROM (
SELECT [t4].*, ROW_NUMBER() OVER(PARTITION BY [t4].[ReferenceId] ORDER BY [t4].[Id]) AS [row]
FROM [TemplateTexts] AS [t4]
WHERE [t4].[Language] = #__locale_LanguageIdentifier_0
) AS [t3]
WHERE [t3].[row] <= 1
) AS [t2] ON [t1].[Id] = [t2].[ReferenceId]
LEFT JOIN (
SELECT [t6].*
FROM (
SELECT [t7].*, ROW_NUMBER() OVER(PARTITION BY [t7].[ReferenceId] ORDER BY [t7].[Id]) AS [row]
FROM [TypeTexts] AS [t7]
WHERE [t7].[Language] = #__locale_LanguageIdentifier_0
) AS [t6]
WHERE [t6].[row] <= 1
) AS [t5] ON [t1].[Id0] = [t5].[ReferenceId]
which is not bad, it's not a super complicated query, but I feel like my requirement can be solved with a much simpler SQL query:
SELECT *
FROM [Systems] AS [p]
JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
JOIN [TemplateTexts] AS [tt] ON [p].[TemplateId] = [tt].[ReferenceId]
JOIN [Types] AS [ty] ON [t].[TypeId] = [ty].[Id]
JOIN [TemplateTexts] AS [tyt] ON [ty].[Id] = [tyt].[ReferenceId]
WHERE [p].[SystemId] = #systemId and tt.[Language] = 2 and tyt.[Language] = 2
My question is: is there a different/simpler LINQ expression (either in Method syntax or Query syntax) that produces the same result (get all info in one go) because ideally I'd like to not have to have an anonymous object where the filtered sub-collections are aggregated. For even more brownie points, it'd be great if the generated SQL would be simpler/closer to what I think would be a simple query.
Is there a different/simpler LINQ expression (...) that produces the same result
Yes (maybe) and no.
No, because you're querying dbContext.Systems, therefore EF will return all systems that match your filter, also when they don't have TemplateTexts etc. That's why it has to generate outer joins. EF is not aware of your apparent intention to skip systems without these nested data or of any guarantee that these systems don't occur in the database. (Which you seem to assume, seeing the second query).
That accounts for the left joins to subqueries.
These subqueries are generated because of FirstOrDefault. In SQL it always requires some sort of subquery to get "first" records of one-to-many relationships. This ROW_NUMBER() OVER construction is actually quite efficient. Your second query doesn't have any notion of "first" records. It'll probably return different data.
Yes (maybe) because you also Include data. I'm not sure why. Some people seem to think Include is necessary to make subsequent projections (.Select) work, but it isn't. If that's your reason to use Includes then you can remove them and thus remove the first couple of joins.
OTOH you also Include system.Project which is not in the projection, so you seem to have added the Includes deliberately. And in this case they have effect, because the entire entity system is in the projection, otherwise EF would ignore them.
If you need the Includes then again, EF has to generate outer joins for the reason mentioned above.
EF decides to handle the Includes and projections separately, while hand-crafted SQL, aided by prior knowledge of the data could do that more efficiently. There's no way to affect that behavior though.
This LINQ query is close to your SQL, but I'm afraid of correctness of the result:
var resultSet =
(from system in dbContext.Systems
from templateText in system.Template.TemplateTexts
where templateText.Language == locale.LanguageIdentifier
from typeText in system.Template.Type.TypeTexts
where typeText.Language == locale.LanguageIdentifier
select new
{
System = system,
TemplateText = templateText
TypeText = typeText
})
.FirstOrDefault(x => x.System.Id == request.Id);

Entity Framework Core + Count with Group By

I have a table which contains ~600k records and 33 columns. In my project I am using EF Core (2.0.1) to retrieve data from database. I am having issues with below code:
var theCounter = (from f in _context.tblData.Take(100000)
group f by f.TypeId into data
select new DataDto { ID = data.Key, Count = data.Count() }).ToList();
This code is a part of REST API and when I am testing it from SOAP UI, I am gettin timeout error. When I tested the code for
Take(1000)
There are around 300 unique TypeIds.
it works fine. Any ideas how I can make it work?
-- EDIT 1:
Here is what I see when debugging the code:
Microsoft.EntityFrameworkCore.Query:Warning: Query: '(from TblData <generated>_1 in DbSet<TblData> select [<generated>_1]).Take(__p_0)' uses a row limiting operation (Skip/Take) without OrderBy which may lead to unpredictable results.
Microsoft.EntityFrameworkCore.Query:Warning: Query: '(from TblData <generated>_1 in DbSet<TblData> select [<generated>_1]).Take(__p_0)' uses a row limiting operation (Skip/Take) without OrderBy which may lead to unpredictable results.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'GroupBy([f].TypeId, [f])' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'GroupBy([f].TypeId, [f])' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'Count()' could not be translated and will be evaluated locally.
Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (131ms) [Parameters=[#__p_0='?'], CommandType='Text', CommandTimeout='30']
SELECT [t2].[Id], [t2].[at], [t2].[add], [t2].[AddDate], [t2].[aftc], [t2].[aftcd], [t2].[aid], [t2].[afl], [t2].[prdid], [t2].[cid], [t2].[TypeId], [t2].[env], [t2].[ext], [t2].[extddcode], [t2].[fn], [t2].[fn], [t2].[fic], [t2].[gid], [t2].[grp], [t2].[hnm], [t2].[IP], [t2].[icid], [t2].[ln], [t2].[lg], [t2].[pcid], [t2].[ret], [t2].[rts], [t2].[rnam], [t2].[sled], [t2].[seq], [t2].[sid], [t2].[styp]
FROM (
SELECT TOP(#__p_0) [t1].[Id], [t1].[at], [t1].[add], [t1].[AddDate], [t1].[aftc], [t1].[aftcd], [t1].[aid], [t1].[afl], [t1].[prdid], [t1].[cid], [t1].[TypeId], [t1].[env], [t1].[ext], [t1].[extddcode], [t1].[fn], [t1].[fn], [t1].[fic], [t1].[gid], [t1].[grp], [t1].[hnm], [t1].[IP], [t1].[icid], [t1].[ln], [t1].[lg], [t1].[pcid], [t1].[ret], [t1].[rts], [t1].[rnam], [t1].[sled], [t1].[seq], [t1].[sid], [t1].[styp]
FROM [TblData] AS [t1]
) AS [t2]
WHERE [t2].[TypeId] IS NOT NULL
ORDER BY [t2].[TypeId]
I think it is not translated properly. Any ideas why?
-- EDIT 2:
I have changed my queries to:
var query = _context.TblData
.Select(a => new {ID = a.Id, TypeId= a.TypeId})
.Distinct();
var q1 = query.GroupBy(p => p.TypeId)
.Select(g => new DataDto {TypeId= g.Key, Count = g.Count()});
return await q1.ToListAsync();
But it was translated to:
SELECT DISTINCT [a0].[Id], [a0].[TypeId] AS [TypeId]
FROM [tblData] AS [a0]
ORDER BY [a0].[TypeId]
When I checked directly in the database this query takes 14 seconds to execute. Any idea why it was not translated to something like:
SELECT DISTINCT [a0].[Id], COUNT([TypeId]) AS [TypeId]
FROM [tblData] AS [a0]
GROUP BY COUNT([a0].[Id])
ORDER BY [a0].[TypeId]
I had to upgrade EF Core version to 2.1 and LINQ is now translated properly into SQL.

.NET LINQ Expression<Func<T, bool>> Performance Issues

I have two functions as follow -
public IQueryable<RequestSummaryDTO> GetProgramOfficerUSA(Guid officerId)
{
List<string> officerCountries = UnityProvider.Instance.Get<IProgramOfficerService>().GetPOCountries(officerId).Select(c => c.CNTR_ID.ToUpper()).ToList();
Expression<Func<TaskRequest, bool>> countriesFilter = (a) => officerCountries.Contains(a.tblTaskDetail.FirstOrDefault().tblOrganization.ORG_Country.ToUpper());
Expression<Func<TaskRequest, bool>> USAAndNotDelegated = LinqUtils.And(this.USAFilter(), this.NotDelegatedFilter(officerId));
Expression<Func<TaskRequest, bool>> countriesOrOwned = LinqUtils.Or(countriesFilter, this.OwnedFilter(officerId));
Expression<Func<TaskRequest, bool>> filter = LinqUtils.And(USAAndNotDelegated, countriesOrOwned);
return this.Get(filter, TaskRequestState.USA);
}
private IQueryable<RequestSummaryDTO> Get(Expression<Func<TaskRequest, bool>> additionalFilter, TaskRequestState? TaskRequestState = null)
{
var Tasks = this.TaskRequestRepository.List(additionalFilter).Where(x => x.tblTaskDetail.FirstOrDefault().PD_TaskRequestID == x.Id &&
(x.tblRequestDetail.AD_Status == (int)RequestStatus.Paid || x.tblRequestDetail.AD_Status == (int)RequestStatus.NotConfirmed));
if (TaskRequestState == TaskRequestState.USA)
{
Tasks = Tasks.Where(w => (w.tblTaskDetail.FirstOrDefault().PD_PStatus == null || w.tblTaskDetail.FirstOrDefault().PD_PStatus == 125));
}
else
{
Tasks = Tasks.Where(w => w.State == null);
}
return Tasks.ToList().Select(TaskSummaryFactory.CreateDto).AsQueryable();
}
I used Expression> and IQueryable in the LINQ. It is supposed to use LINQ To SQL instead of LINQ To Objects. The performance should be good.
But I do not see that. I believe the LINQ pulls every table's data into memory to process using LINQ To Objects. I do not see a join sql query sent to SQL Server tracking by SQL profile but a bunch of individual table selection statement.
It takes over 10 minutes to get something return from
return Tasks.ToList().Select(TaskSummaryFactory.CreateDto).AsQueryable();
I know the problem is in
Expression<Func<TaskRequest, bool>> countriesFilter = (a) => officerCountries.Contains(a.tblTaskDetail.FirstOrDefault().tblOrganization.ORG_Country.ToUpper());
tblTaskDetail is a big table. If I switch to a smaller one, the performance is noticeably improved.
Anyone can help find out why is wrong there.
Thanks,
Update 1 -
The statement from Entity Framework logged in SQL Profile are all like this -
exec sp_executesql N'SELECT [Extent1].[ORG_ID] AS [ORG_ID], [Extent1].[ORG_CreatedBy] AS [ORG_CreatedBy], [Extent1].[ORG_CreatedOn] AS [ORG_CreatedOn] FROM [dbo].[tblOrganization] AS [Extent1] WHERE [Extent1].[ORG_ID] = #EntityKeyValue1',N'#EntityKeyValue1 uniqueidentifier',#EntityKeyValue1='E8C3F120-AA40-445E-A8A0-2937F330D347'
They are all just having individual table select statement, not joined sql statement.
Update 2 -
I was wrong in Update 1. I missed the join SQL statement. The problem is that the generated SQL is too poor. There are 6 nested select statements, 11 LEFT OUTER JOINs, and 10 OUTER APPLYs. The query is too long and can not post here. Executing the generated SQL takes 9 minutes.
I have upgraded the EF STE 5 to EF6. Now the performance is better. The page loading time is cut down to 1.5 minutes from 12 minutes.

converting sql to linq woes

At my job our main application was written long ago before n-tier was really a thing, ergo - it has tons and tons of business logic begin handled in stored procs and such.
So we have finally decided to bite the bullet and make it not suck so bad. I have been tasked with converting a 900+ line sql script to a .NET exe, which I am doing in C#/Linq. Problem is...for the last 5-6 years at another job, I had been doing Linq exclusively, so my SQL has gotten somewhat rusty, and some of thing I am converting I have never tried to do before in Linq, so I'm hitting some roadblocks.
Anyway, enough whining.
I'm having trouble with the following sql statement, I think due to the fact that he is joining on a temp table and a derived table. Here's the SQL:
insert into #processedBatchesPurgeList
select d.pricebatchdetailid
from pricebatchheader h (nolock)
join pricebatchstatus pbs (nolock) on h.pricebatchstatusid = pbs.pricebatchstatusid
join pricebatchdetail d (nolock) on h.pricebatchheaderid = d.pricebatchheaderid
join
( -- Grab most recent REG.
select
item_key
,store_no
,pricebatchdetailid = max(pricebatchdetailid)
from pricebatchdetail _pbd (nolock)
join pricechgtype pct (nolock) on _pbd.pricechgtypeid = pct.pricechgtypeid
where
lower(rtrim(ltrim(pct.pricechgtypedesc))) = 'reg'
and expired = 0
group by item_key, store_no
) dreg
on d.item_key = dreg.item_key
and d.store_no = dreg.store_no
where
d.pricebatchdetailid < dreg.pricebatchdetailid -- Make sure PBD is not most recent REG.
and h.processeddate < #processedBatchesPurgeDateLimit
and lower(rtrim(ltrim(pbs.pricebatchstatusdesc))) = 'processed' -- Pushed/processed batches only.
So that's raising an overall question first: how to handle temp tables in Linq? This script uses about 10 of them. I currently have them as List. The problem is, if I try to .Join() on one in a query, I get the "Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator." error.
I was able to get the join to the derived table to work using 2 queries, just so a single one wouldn't get nightmarishly long:
var dreg = (from _pbd in db.PriceBatchDetails.Where(pbd => pbd.Expired == false && pbd.PriceChgType.PriceChgTypeDesc.ToLower().Trim() == "reg")
group _pbd by new { _pbd.Item_Key, _pbd.Store_No } into _pbds
select new
{
Item_Key = _pbds.Key.Item_Key,
Store_No = _pbds.Key.Store_No,
PriceBatchDetailID = _pbds.Max(pbdet => pbdet.PriceBatchDetailID)
});
var query = (from h in db.PriceBatchHeaders.Where(pbh => pbh.ProcessedDate < processedBatchesPurgeDateLimit)
join pbs in db.PriceBatchStatus on h.PriceBatchStatusID equals pbs.PriceBatchStatusID
join d in db.PriceBatchDetails on h.PriceBatchHeaderID equals d.PriceBatchHeaderID
join dr in dreg on new { d.Item_Key, d.Store_No } equals new { dr.Item_Key, dr.Store_No }
where d.PriceBatchDetailID < dr.PriceBatchDetailID
&& pbs.PriceBatchStatusDesc.ToLower().Trim() == "processed"
select d.PriceBatchDetailID);
So that query gives the expected results, which I am holding in a List, but then I need to join the results of that query to another one selected from the database, which is leading me back to the aforementioned "Local sequence cannot be used..." error.
That query is this:
insert into #pbhArchiveFullListSaved
select h.pricebatchheaderid
from pricebatchheader h (nolock)
join pricebatchdetail d (nolock)
on h.pricebatchheaderid = d.pricebatchheaderid
join #processedBatchesPurgeList dlist
on d.pricebatchdetailid = dlist.pricebatchdetailid -- PBH list is restricted to PBD purge list rows that have PBH references.
group by h.pricebatchheaderid
The join there on #processedBatchesPurgeList is the problem I am running into.
So uh...help? I have never written SQL like this, and certainly never tried to convert it to Linq.
As pointed out by the comments above, this is no longer being rewritten as Linq.
Was hoping to get a performance improvement along with achieving better SOX compliance, which was the whole reason for the rewrite in the first place.
I'm happy with just satisfying the SOX compliance issues.
Thanks, everyone.

LINQ to SQL Queries odd Materialization

I ran across an interesting Linq to SQL, uh, feature, the other day. Perhaps someone can give me a logical explanation for the reasoning behind the results. Take the code below as my example which utilizes the AdventureWorks database setup in a Linq to SQL DataContext. This is a clip from my unit test. The resulting customer returned from a call to both CustomerQuery_Test_01() and CustomerQuery_Test_02() is the same. However, the query executed on the SQLServer are different is a major way. The method CustomerQuery_Test_01 us causing the entire Customer table to be materialized, which the call to CustomerQuery_Test_02 is only causing the single customer to be materialized. The resulting SQL Queries are at the bottom of this post. Anyone have a good reason for this? To me, it was highly non-intuitive.
protected virtual Customer GetByPrimaryKey(Func<Customer, bool> keySelection)
{
AdventureWorksDataContext context = new AdventureWorksDataContext();
return (from r in context.Customers select r).SingleOrDefault(keySelection);
}
[TestMethod]
public void CustomerQuery_Test_01()
{
Customer customer = GetByPrimaryKey(c => c.CustomerID == 2);
}
[TestMethod]
public void CustomerQuery_Test_02()
{
AdventureWorksDataContext context = new AdventureWorksDataContext();
Customer customer = (from r in context.Customers select r).SingleOrDefault(c => c.CustomerID == 2);
}
Query for CustomerQuery_Test_01 (notice the lack of a where clause)
SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson], [t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash], [t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]
Query for CustomerQuery_Test_02 (notice the where clause)
SELECT [t0].[CustomerID], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[CompanyName], [t0].[SalesPerson], [t0].[EmailAddress], [t0].[Phone], [t0].[PasswordHash], [t0].[PasswordSalt], [t0].[rowguid], [t0].[ModifiedDate]
FROM [SalesLT].[Customer] AS [t0]
WHERE [t0].[CustomerID] = #p0
Func<Customer, bool> keySelection
That's not an Expression<Func<Customer, bool>>... the compiler resolved Enumerable.Single instead of Queryable.Single

Resources