Performance issue due to include statement in Entity Framework - performance

I have a performance issue with the API. It is due to retrieving data from multiple tables like below
Example:
Users.Include(x => x.UsersAdditionInfo)
.Include(x => x.UserRoles)
.Include(x => x.Location)
Note: each of these tables contains nearly (1,50,000) records except location table.
I have used joins instead of .Include then also facing the same performance issues.
Example:
from ub in users
join ua in UserAdditionalInfo on ub.Id equals ua.UserId
join ur in UserRoles on ub.Id equals ur.UserId
join urs in userRoles on ur.RoleId equals urs.Id
join l in Location on ub.LocationId equals l.Id
into leftLocation
from location in leftLocation.DefaultIfEmpty()
Kindly suggest for better alternative ways to query in multiple tables

If your EDMX is probably configured, you don't have to join yourself.
Can't you just do
Users.Select(x=> new {
UserRoles = x.UserRoles,
UserAdditionInfo = x.UserAdditionInfo,
Location = x.Location })
and so on?
(I've fudged your schema a bit but hopefully you get my point)
Also, if this is common, you could also always create a view in SQL Server and add it in EDMX

Related

CriteriaAPI Query with Join by a string value

I have currently a query with a bunch of filters, which makes sense to use the Criteria API, unfortunately I have this query that uses a Join which uses a string value instead of a relationship. This is an example of the query:
SELECT ua.id,
COALESCE(uf.status, f.status) AS status,
r.name,
ua.companyname,
ua.firstname,
ua.lastname,
ua.usergroup,
ua.email,
ua.country,
ua.continent
FROM useraccount ua
JOIN userrole ur on ua.id = ur.userid
JOIN role r on ur.roleid = r.id and r.eventgroupid = 1
JOIN feature f on f.name = 'Locked'
LEFT JOIN userfeature uf on uf.featureid = f.id AND uf.userid = ua.id;
As you can see the problem of the query is that I want to use COALESCE operation to get a UserFeature status if present, if not use the default status from the Feature table.
The feature table is just a simple one with id, name and the status, it is only related to UserFeature and UserFeature at the same time is related to the UserAccount.
As you might guess the CriteriaAPi doesn't allows a Join<> by a regular string value. I have tried to get my mind around to get how can I change the select statement to be more aligned with what CriteriaAPI offers, but I haven't found anything on this.
I'm using PostgreSQL and Hibernate 5.4.32 (by using the spring starter jpa)

Load only some elements of a nested collection efficiently with LINQ

I have the following LINQ query (using EF Core 6 and MS SQL Server):
var resultSet = dbContext.Systems
.Include(system => system.Project)
.Include(system => system.Template.Type)
.Select(system => new
{
System = system,
TemplateText = system.Template.TemplateTexts.FirstOrDefault(templateText => templateText.Language == locale.LanguageIdentifier),
TypeText = system.Template.Type.TypeTexts.FirstOrDefault(typeText => typeText.Language == locale.LanguageIdentifier)
})
.FirstOrDefault(x => x.System.Id == request.Id);
The requirement is to retrieve the system matching the requested ID and load its project, template and template's type info. The template has multiple TemplateTexts (one for each translated language) but I only want to load the one matching the requested locale, same deal with the TypeTexts elements of the template's type.
The LINQ query above does that in one query and it gets converted to the following SQL query (I edited the SELECT statements to use * instead of the long list of columns generated):
SELECT [t1].*, [t2].*, [t5].*
FROM (
SELECT TOP(1) [p].*, [t].*, [t0].*
FROM [ParkerSystems] AS [p]
LEFT JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
LEFT JOIN [Types] AS [t0] ON [t].[TypeId] = [t0].[Id]
LEFT JOIN [Projects] AS [p0] ON [p].[Project_ProjectId] = [p0].[ProjectId]
WHERE [p].[SystemId] = #__request_Id_1
) AS [t1]
LEFT JOIN (
SELECT [t3].*
FROM (
SELECT [t4].*, ROW_NUMBER() OVER(PARTITION BY [t4].[ReferenceId] ORDER BY [t4].[Id]) AS [row]
FROM [TemplateTexts] AS [t4]
WHERE [t4].[Language] = #__locale_LanguageIdentifier_0
) AS [t3]
WHERE [t3].[row] <= 1
) AS [t2] ON [t1].[Id] = [t2].[ReferenceId]
LEFT JOIN (
SELECT [t6].*
FROM (
SELECT [t7].*, ROW_NUMBER() OVER(PARTITION BY [t7].[ReferenceId] ORDER BY [t7].[Id]) AS [row]
FROM [TypeTexts] AS [t7]
WHERE [t7].[Language] = #__locale_LanguageIdentifier_0
) AS [t6]
WHERE [t6].[row] <= 1
) AS [t5] ON [t1].[Id0] = [t5].[ReferenceId]
which is not bad, it's not a super complicated query, but I feel like my requirement can be solved with a much simpler SQL query:
SELECT *
FROM [Systems] AS [p]
JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
JOIN [TemplateTexts] AS [tt] ON [p].[TemplateId] = [tt].[ReferenceId]
JOIN [Types] AS [ty] ON [t].[TypeId] = [ty].[Id]
JOIN [TemplateTexts] AS [tyt] ON [ty].[Id] = [tyt].[ReferenceId]
WHERE [p].[SystemId] = #systemId and tt.[Language] = 2 and tyt.[Language] = 2
My question is: is there a different/simpler LINQ expression (either in Method syntax or Query syntax) that produces the same result (get all info in one go) because ideally I'd like to not have to have an anonymous object where the filtered sub-collections are aggregated. For even more brownie points, it'd be great if the generated SQL would be simpler/closer to what I think would be a simple query.
Is there a different/simpler LINQ expression (...) that produces the same result
Yes (maybe) and no.
No, because you're querying dbContext.Systems, therefore EF will return all systems that match your filter, also when they don't have TemplateTexts etc. That's why it has to generate outer joins. EF is not aware of your apparent intention to skip systems without these nested data or of any guarantee that these systems don't occur in the database. (Which you seem to assume, seeing the second query).
That accounts for the left joins to subqueries.
These subqueries are generated because of FirstOrDefault. In SQL it always requires some sort of subquery to get "first" records of one-to-many relationships. This ROW_NUMBER() OVER construction is actually quite efficient. Your second query doesn't have any notion of "first" records. It'll probably return different data.
Yes (maybe) because you also Include data. I'm not sure why. Some people seem to think Include is necessary to make subsequent projections (.Select) work, but it isn't. If that's your reason to use Includes then you can remove them and thus remove the first couple of joins.
OTOH you also Include system.Project which is not in the projection, so you seem to have added the Includes deliberately. And in this case they have effect, because the entire entity system is in the projection, otherwise EF would ignore them.
If you need the Includes then again, EF has to generate outer joins for the reason mentioned above.
EF decides to handle the Includes and projections separately, while hand-crafted SQL, aided by prior knowledge of the data could do that more efficiently. There's no way to affect that behavior though.
This LINQ query is close to your SQL, but I'm afraid of correctness of the result:
var resultSet =
(from system in dbContext.Systems
from templateText in system.Template.TemplateTexts
where templateText.Language == locale.LanguageIdentifier
from typeText in system.Template.Type.TypeTexts
where typeText.Language == locale.LanguageIdentifier
select new
{
System = system,
TemplateText = templateText
TypeText = typeText
})
.FirstOrDefault(x => x.System.Id == request.Id);

converting sql to linq woes

At my job our main application was written long ago before n-tier was really a thing, ergo - it has tons and tons of business logic begin handled in stored procs and such.
So we have finally decided to bite the bullet and make it not suck so bad. I have been tasked with converting a 900+ line sql script to a .NET exe, which I am doing in C#/Linq. Problem is...for the last 5-6 years at another job, I had been doing Linq exclusively, so my SQL has gotten somewhat rusty, and some of thing I am converting I have never tried to do before in Linq, so I'm hitting some roadblocks.
Anyway, enough whining.
I'm having trouble with the following sql statement, I think due to the fact that he is joining on a temp table and a derived table. Here's the SQL:
insert into #processedBatchesPurgeList
select d.pricebatchdetailid
from pricebatchheader h (nolock)
join pricebatchstatus pbs (nolock) on h.pricebatchstatusid = pbs.pricebatchstatusid
join pricebatchdetail d (nolock) on h.pricebatchheaderid = d.pricebatchheaderid
join
( -- Grab most recent REG.
select
item_key
,store_no
,pricebatchdetailid = max(pricebatchdetailid)
from pricebatchdetail _pbd (nolock)
join pricechgtype pct (nolock) on _pbd.pricechgtypeid = pct.pricechgtypeid
where
lower(rtrim(ltrim(pct.pricechgtypedesc))) = 'reg'
and expired = 0
group by item_key, store_no
) dreg
on d.item_key = dreg.item_key
and d.store_no = dreg.store_no
where
d.pricebatchdetailid < dreg.pricebatchdetailid -- Make sure PBD is not most recent REG.
and h.processeddate < #processedBatchesPurgeDateLimit
and lower(rtrim(ltrim(pbs.pricebatchstatusdesc))) = 'processed' -- Pushed/processed batches only.
So that's raising an overall question first: how to handle temp tables in Linq? This script uses about 10 of them. I currently have them as List. The problem is, if I try to .Join() on one in a query, I get the "Local sequence cannot be used in LINQ to SQL implementations of query operators except the Contains operator." error.
I was able to get the join to the derived table to work using 2 queries, just so a single one wouldn't get nightmarishly long:
var dreg = (from _pbd in db.PriceBatchDetails.Where(pbd => pbd.Expired == false && pbd.PriceChgType.PriceChgTypeDesc.ToLower().Trim() == "reg")
group _pbd by new { _pbd.Item_Key, _pbd.Store_No } into _pbds
select new
{
Item_Key = _pbds.Key.Item_Key,
Store_No = _pbds.Key.Store_No,
PriceBatchDetailID = _pbds.Max(pbdet => pbdet.PriceBatchDetailID)
});
var query = (from h in db.PriceBatchHeaders.Where(pbh => pbh.ProcessedDate < processedBatchesPurgeDateLimit)
join pbs in db.PriceBatchStatus on h.PriceBatchStatusID equals pbs.PriceBatchStatusID
join d in db.PriceBatchDetails on h.PriceBatchHeaderID equals d.PriceBatchHeaderID
join dr in dreg on new { d.Item_Key, d.Store_No } equals new { dr.Item_Key, dr.Store_No }
where d.PriceBatchDetailID < dr.PriceBatchDetailID
&& pbs.PriceBatchStatusDesc.ToLower().Trim() == "processed"
select d.PriceBatchDetailID);
So that query gives the expected results, which I am holding in a List, but then I need to join the results of that query to another one selected from the database, which is leading me back to the aforementioned "Local sequence cannot be used..." error.
That query is this:
insert into #pbhArchiveFullListSaved
select h.pricebatchheaderid
from pricebatchheader h (nolock)
join pricebatchdetail d (nolock)
on h.pricebatchheaderid = d.pricebatchheaderid
join #processedBatchesPurgeList dlist
on d.pricebatchdetailid = dlist.pricebatchdetailid -- PBH list is restricted to PBD purge list rows that have PBH references.
group by h.pricebatchheaderid
The join there on #processedBatchesPurgeList is the problem I am running into.
So uh...help? I have never written SQL like this, and certainly never tried to convert it to Linq.
As pointed out by the comments above, this is no longer being rewritten as Linq.
Was hoping to get a performance improvement along with achieving better SOX compliance, which was the whole reason for the rewrite in the first place.
I'm happy with just satisfying the SOX compliance issues.
Thanks, everyone.

ef and linq extension method

I have this sql that i want to have written in linq extension method returning an entity from my edm:
SELECT p.[Id],p.[Firstname],p.[Lastname],prt.[AddressId],prt.[Street],prt.[City]
FROM [Person] p
CROSS APPLY (
SELECT TOP(1) pa.[AddressId],a.[ValidFrom],a.[Street],a.[City]
FROM [Person_Addresses] pa
LEFT OUTER JOIN [Addresses] AS a
ON a.[Id] = pa.[AddressId]
WHERE p.[Id] = pa.[PersonId]
ORDER BY a.[ValidFrom] DESC ) prt
Also could this be re-written in linq extension method using 3 joins?
Assuming you have set the Person_Addresses table up as a pure relation table (i.e., with no data besides the foreign keys) this should do the trick:
var persons = model.People
.Select(p => new { p = p, a = p.Addresses.OrderByDescending(a=>a.ValidFrom).First() })
.Select(p => new { p.p.Id, p.p.Firstname, p.p.LastName, AddressId = p.a.Id, p.a.Street, p.a.City });
The first Select() orders the addresses and picks the latest one, and the second one returns an anonymous type with the properties specified in your query.
If you have more data in your relation table you're gonna have to use joins but this way you're free from them. In my opinion, this is more easy to read.
NOTE: You might get an exception if any entry in Persons have no addresses connected to them, although I haven't tried it out.

How to query (LINQ) multiple table association link?

I have tables association such as (CaseClient is a bridge table):
Cases has many CaseClients
Client has many CaseClients
ClientType has many CaseClient
The easiest way just use the view in database but I heard that with linq you can join this somehow? Or should I just created view in the database and linq query agains that view?
I am appreciated your comment
I think you want to use the Join method, from your bridging table and resolving each of your relationships. E.g.
// Where CaseId and TypeId are your members of CaseClient
var x = caseClients.Join( cases, cc => cc.CaseId, c => c.Id)
.Join( types, cc => cc.TypeId, t => t.Id)
.Select();
Above code untested (so far) and from memory. You may need to put a Select between the two joins.
Heres an adaptation of what I did for a very similar situation. Only the names have been changed to protect the innocent.
IEnumerable<Case> getCaseByClient(int client_id)
{
var ret = from c in Cases
join cc in CasesClients
on c.Id equals cc.ClientId
join cl in Clients
on cc.ClientId equals client_id
select c;
return ret;
}
of course this assumes your client_id field is an int, but thats easy enough to modify.

Resources