optimize linq query with multiple include statements - performance

The linq query takes around 20 seconds for executing on some of the data . When converted the linq to sql there are 3 nested joins that might be taking more time for execution . Can we optimize the below query .
var query = (from s in this.Items
where demoIds.Contains(s.Id)
select)
.Include("demo1")
.Include("demo2")
.Include("demo3")
.Include("demo4");
return query;
The expectation is to execute the query in 3-4 seconds which is now taking around 20 secs for 100 demoIds .

As far as your code is concerned it looks like it's the best way to get what you want (Assuming Includeing "demo3" twice is a typo for this example.).
However, the database you use will have a way to optimize your queries or rather the underlying data structure. Use whatever tool your database provider has to get an execution plan of the query and see where it spends so much time. You might be missing an index or two.

I advice lazy loading or join query.
Probably SQL output is this query;
(SELECT .. FROM table1 WHERE ID in (...)) AS T1
(INNER, FULL) JOIN (SELECT .. FROM table2) AS T2 ON T1.PK = T2.FOREIGNKEY
(INNER, FULL) JOIN (SELECT .. FROM table3) AS T3 ON T1.PK = T3.FOREIGNKEY
(INNER, FULL) JOIN (SELECT .. FROM table4) AS T4 ON T1.PK = T4.FOREIGNKEY
But if you can use lazy loading, no need use the Include() func. And lazy loading will solve your problem.
Other else, you can write with join query,
var query = from i in this.Items.Where(w=>demoIds.Contains(w.Id))
join d1 in demo1 on i.Id equals d1.FK
join d2 in demo2 on i.Id equals d2.FK
join d3 in demo3 on i.Id equals d3.FK
select new ... { };
This two solutions solve your all problems.
If it continues your problem, I strongly recommended store procedure.

I've got a similar issue with a query that had 15+ "Include" statements and generated a 2M+ rows result in 7 minutes.
The solution that worked for me was:
Disabled lazy loading
Disabled auto detect changes
Split the big query in small chunks
A sample can be found below:
public IQueryable<CustomObject> PerformQuery(int id)
{
ctx.Configuration.LazyLoadingEnabled = false;
ctx.Configuration.AutoDetectChangesEnabled = false;
IQueryable<CustomObject> customObjectQueryable = ctx.CustomObjects.Where(x => x.Id == id);
var selectQuery = customObjectQueryable.Select(x => x.YourObject)
.Include(c => c.YourFirstCollection)
.Include(c => c.YourFirstCollection.OtherCollection)
.Include(c => c.YourSecondCollection);
var otherObjects = customObjectQueryable.SelectMany(x => x.OtherObjects);
selectQuery.FirstOrDefault();
otherObjects.ToList();
return customObjectQueryable;
}
IQueryable is needed in order to do all the filtering at the server side. IEnumerable would perform the filtering in memory and this is a very time consuming process. Entity Framework will fix up any associations in memory.

Related

How to improve LINQ to EF performance

I have two classes: Property and PropertyValue. A property has several values where each value is a new revision.
When retrieving a set of properties I want to include the latest revision of the value for each property.
in T-SQL this can very efficiently be done like this:
SELECT
p.Id,
pv1.StringValue,
pv1.Revision
FROM dbo.PropertyValues pv1
LEFT JOIN dbo.PropertyValues pv2 ON pv1.Property_Id = pv2.Property_Id AND pv1.Revision < pv2.Revision
JOIN dbo.Properties p ON p.Id = pv1.Property_Id
WHERE pv2.Id IS NULL
ORDER BY p.Id
The "magic" in this query is to join on the lesser than condition and look for rows without a result forced by the LEFT JOIN.
How can I accomplish something similar using LINQ to EF?
The best thing I could come up with was:
from pv in context.PropertyValues
group pv by pv.Property into g
select g.OrderByDescending(p => p.Revision).FirstOrDefault()
It does produce the correct result but is about 10 times slower than the other.
Maybe this can help. Where db is the database context:
(
from pv1 in db.PropertyValues
from pv2 in db.PropertyValues.Where(a=>a.Property_Id==pv1.Property_Id && pv1.Revision<pv2.Revision).DefaultIfEmpty()
join p in db.Properties
on pv1.Property_Id equals p.Id
where pv2.Id==null
orderby p.Id
select new
{
p.Id,
pv1.StringValue,
pv1.Revision
}
);
Next to optimizing a query in Linq To Entities, you also have to be aware of the work it takes for the Entity Framework to translate your query to SQL and then map the results back to your objects.
Comparing a Linq To Entities query directly to a SQL query will always result in lower performance because the Entity Framework does a lot more work for you.
So it's also important to look at optimizing the steps the Entity Framework takes.
Things that could help:
Precompile your query
Pre-generate views
Decide for yourself when to open the database connection
Disable tracking (if appropriate)
Here you can find some documentation with performance strategies.
if you want to use multiple conditions (less than expression) in join you can do this like
from pv1 in db.PropertyValues
join pv2 in db.PropertyValues on new{pv1.Property_ID, Condition = pv1.Revision < pv2.Revision} equals new {pv2.Property_ID , Condition = true} into temp
from t in temp.DefaultIfEmpty()
join p in db.Properties
on pv1.Property_Id equals p.Id
where t.Id==null
orderby p.Id
select new
{
p.Id,
pv1.StringValue,
pv1.Revision
}

Writing a LINQ query to traverse many tables

I wanted to write a LINQ query based on the SQL below.
Basically this strategy seems really confusing - why start from MerchantGroupMerchant and do 2 'from' statements?
Problem: Is there a simpler way to write this LINQ query?
var listOfCampaignsMerchantIsInvolvedIn =
(from merchantgroupactivity in uow.MerchantGroupActivities
from merchantgroupmerchant in uow.MerchantGroupMerchants
where merchantgroupmerchant.MerchantU.Id == merchantUIDGuid
select new
{
merchantgroupactivity.ActivityU.CampaignU.Id
}).Distinct();
Here is the table structure:
and the SQL:
SELECT DISTINCT Campaign.ID
FROM Campaign
INNER JOIN Activity
ON ( Campaign.CampaignUID = Activity.CampaignUID )
INNER JOIN MerchantGroupActivity
ON ( Activity.ActivityUID = MerchantGroupActivity.ActivityUID )
INNER JOIN MerchantGroup
ON ( MerchantGroup.MerchantGroupUID = MerchantGroupActivity.MerchantGroupUID )
INNER JOIN MerchantGroupMerchant
ON ( MerchantGroupMerchant.MerchantGroupUID = MerchantGroup.MerchantGroupUID )
INNER JOIN Merchant
ON ( Merchant.MerchantUID = MerchantGroupMerchant.MerchantUID )
WHERE Merchant.ID = 'M1'
No, not really, even if you use views to partially or completely reduce query size your execution plan will still look the same in the end (and execute just as fast/slow). If you have to traverse 5 joins then you have to traverse 5 joins, the only cure is "shorting" the model by introducing links between say merchant and activity or merchant and campaign. You can accomplish this by either introducing the M2M table between them (at the cost of manual maintenance), but I would not recommend it unless retrieval is really an issue. If this query is too slow you should check for existence of indexes on all join FK fields.

Linq to entities Left Join

I want to achieve the following in Linq to Entities:
Get all Enquires that have no Application or the Application has a status != 4 (Completed)
select e.*
from Enquiry enq
left outer join Application app
on enq.enquiryid = app.enquiryid
where app.Status <> 4 or app.enquiryid is null
Has anyone done this before without using DefaultIfEmpty(), which is not supported by Linq to Entities?
I'm trying to add a filter to an IQueryable query like this:
IQueryable<Enquiry> query = Context.EnquirySet;
query = (from e in query
where e.Applications.DefaultIfEmpty()
.Where(app=>app.Status != 4).Count() >= 1
select e);
Thanks
Mark
In EF 4.0+, LEFT JOIN syntax is a little different and presents a crazy quirk:
var query = from c1 in db.Category
join c2 in db.Category on c1.CategoryID equals c2.ParentCategoryID
into ChildCategory
from cc in ChildCategory.DefaultIfEmpty()
select new CategoryObject
{
CategoryID = c1.CategoryID,
ChildName = cc.CategoryName
}
If you capture the execution of this query in SQL Server Profiler, you will see that it does indeed perform a LEFT OUTER JOIN. HOWEVER, if you have multiple LEFT JOIN ("Group Join") clauses in your Linq-to-Entity query, I have found that the self-join clause MAY actually execute as in INNER JOIN - EVEN IF THE ABOVE SYNTAX IS USED!
The resolution to that? As crazy and, according to MS, wrong as it sounds, I resolved this by changing the order of the join clauses. If the self-referencing LEFT JOIN clause was the 1st Linq Group Join, SQL Profiler reported an INNER JOIN. If the self-referencing LEFT JOIN clause was the LAST Linq Group Join, SQL Profiler reported an LEFT JOIN.
Do this:
IQueryable<Enquiry> query = Context.EnquirySet;
query = (from e in query
where (!e.Applications.Any())
|| e.Applications.Any(app => app.Status != 4)
select e);
I don't find LINQ's handling of the problem of what would be an "outer join" in SQL "goofy" at all. The key to understanding it is to think in terms of an object graph with nullable properties rather than a tabular result set.
Any() maps to EXISTS in SQL, so it's far more efficient than Count() in some cases.
Thanks guys for your help. I went for this option in the end but your solutions have helped broaden my knowledge.
IQueryable<Enquiry> query = Context.EnquirySet;
query = query.Except(from e in query
from a in e.Applications
where a.Status == 4
select e);
Because of Linq's goofy (read non-standard) way of handling outers, you have to use DefaultIfEmpty().
What you'll do is run your Linq-To-Entities query into two IEnumerables, then LEFT Join them using DefaultIfEmpty(). It may look something like:
IQueryable enq = Enquiry.Select();
IQueryable app = Application.Select();
var x = from e in enq
join a in app on e.enquiryid equals a.enquiryid
into ae
where e.Status != 4
from appEnq in ae.DefaultIfEmpty()
select e.*;
Just because you can't do it with Linq-To-Entities doesn't mean you can't do it with raw Linq.
(Note: before anyone downvotes me ... yes, I know there are more elegant ways to do this. I'm just trying to make it understandable. It's the concept that's important, right?)
Another thing to consider, if you directly reference any properties in your where clause from a left-joined group (using the into syntax) without checking for null, Entity Framework will still convert your LEFT JOIN into an INNER JOIN.
To avoid this, filter on the "from x in leftJoinedExtent" part of your query like so:
var y = from parent in thing
join child in subthing on parent.ID equals child.ParentID into childTemp
from childLJ in childTemp.Where(c => c.Visible == true).DefaultIfEmpty()
where parent.ID == 123
select new {
ParentID = parent.ID,
ChildID = childLJ.ID
};
ChildID in the anonymous type will be a nullable type and the query this generates will be a LEFT JOIN.

What is the difference between using Join in Linq and "Olde Style" pre ANSI join syntax?

This question follows on from a question I asked yesterday about why using the join query on my Entities produced horrendously complicated SQL. It seemed that performing a query like this:
var query = from ev in genesisContext.Events
join pe in genesisContext.People_Event_Link
on ev equals pe.Event
where pe.P_ID == key
select ev;
Produced the horrible SQL that took 18 seconds to run on the database, whereas joining the entities through a where clause (sort of like pre-ANSI SQL syntax) took less than a second to run and produced the same result
var query = from pe in genesisContext.People_Event_Link
from ev in genesisContext.Events
where pe.P_ID == key && pe.Event == ev
select ev;
I've googled all over but still don't understand why the second is produces different SQL to the first. Can someone please explain the difference to me? When should I use the join keyword
This is the SQL that was produced when I used Join in my query and took 18 seconds to run:
SELECT
1 AS [C1],
[Extent1].[E_ID] AS [E_ID],
[Extent1].[E_START_DATE] AS [E_START_DATE],
[Extent1].[E_END_DATE] AS [E_END_DATE],
[Extent1].[E_COMMENTS] AS [E_COMMENTS],
[Extent1].[E_DATE_ADDED] AS [E_DATE_ADDED],
[Extent1].[E_RECORDED_BY] AS [E_RECORDED_BY],
[Extent1].[E_DATE_UPDATED] AS [E_DATE_UPDATED],
[Extent1].[E_UPDATED_BY] AS [E_UPDATED_BY],
[Extent1].[ET_ID] AS [ET_ID],
[Extent1].[L_ID] AS [L_ID]
FROM [dbo].[Events] AS [Extent1]
INNER JOIN [dbo].[People_Event_Link] AS [Extent2] ON EXISTS (SELECT
1 AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT
[Extent3].[E_ID] AS [E_ID]
FROM [dbo].[Events] AS [Extent3]
WHERE [Extent2].[E_ID] = [Extent3].[E_ID] ) AS [Project1] ON 1 = 1
LEFT OUTER JOIN (SELECT
[Extent4].[E_ID] AS [E_ID]
FROM [dbo].[Events] AS [Extent4]
WHERE [Extent2].[E_ID] = [Extent4].[E_ID] ) AS [Project2] ON 1 = 1
WHERE ([Extent1].[E_ID] = [Project1].[E_ID]) OR (([Extent1].[E_ID] IS NULL) AND ([Project2].[E_ID] IS NULL))
)
WHERE [Extent2].[P_ID] = 291
This is the SQL that was produce using the ANSI Style syntax (and is fairly close to what I would write if I were writing the SQL myself):
SELECT * FROM Events AS E INNER JOIN People_Event_Link AS PE ON E.E_ID=PE.E_ID INNER JOIN PEOPLE AS P ON P.P_ID=PE.P_ID
WHERE P.P_ID = 291
Neither of the above queries are entirely "correct." In EF, it is generally correct to use the relationship properties in lieu of either of the above. For example, if you had a Person object with a one to many relationship to PhoneNumbers in a property called Person.PhoneNumbers, you could write:
var q = from p in Context.Person
from pn in p.PhoneNumbers
select pn;
The EF will build the join for you.
In terms of the question above, the reason the generated SQL is different is because the expression trees are different, even though they produce equivalent results. Expression trees are mapped to SQL, and you of course know that you can write different SQL which produces the same results but with different performance. The mapping is designed to produce decent SQL when you write a farily "conventional" EF query.
But the mapping is not so smart as to take a very unconventional query and optimize it. In your first query, you state that the objects must be equivalent. In the second, you state that the ID property must be equivalent. My sample query above says "just get the details for this one record." The EF is designed to work with the way I show, principally, but also handles scalar equivalence well.

Optimizing a LINQ to SQL query

I have a query that looks like this:
public IList<Post> FetchLatestOrders(int pageIndex, int recordCount)
{
DatabaseDataContext db = new DatabaseDataContext();
return (from o in db.Orders
orderby o.CreatedDate descending
select o)
.Skip(pageIndex * recordCount)
.Take(recordCount)
.ToList();
}
I need to print the information of the order and the user who created it:
foreach (var o in FetchLatestOrders(0, 10))
{
Console.WriteLine("{0} {1}", o.Code, o.Customer.Name);
}
This produces a SQL query to bring the orders and one query for each order to bring the customer. Is it possible to optimize the query so that it brings the orders and it's customer in one SQL query?
Thanks
UDPATE: By suggestion of sirrocco I changed the query like this and it works. Only one select query is generated:
public IList<Post> FetchLatestOrders(int pageIndex, int recordCount)
{
var options = new DataLoadOptions();
options.LoadWith<Post>(o => o.Customer);
using (var db = new DatabaseDataContext())
{
db.LoadOptions = options;
return (from o in db.Orders
orderby o.CreatedDate descending
select o)
.Skip(pageIndex * recordCount)
.Take(recordCount)
.ToList();
}
}
Thanks sirrocco.
Something else you can do is EagerLoading. In Linq2SQL you can use LoadOptions : More on LoadOptions
One VERY weird thing about L2S is that you can set LoadOptions only before the first query is sent to the Database.
you might want to look into using compiled queries
have a look at http://www.3devs.com/?p=3
Given a LINQ statement like:
context.Cars
.OrderBy(x => x.Id)
.Skip(50000)
.Take(1000)
.ToList();
This roughly gets translated into:
select * from [Cars] order by [Cars].[Id] asc offset 50000 rows fetch next 1000 rows
Because offset and fetch are extensions of order by, they are not executed until after the select-portion runs (google). This means an expensive select with lots of join-statements are executed on the whole dataset ([Cars]) prior to getting the fetched-results.
Optimize the statement
All that is needed is taking the OrderBy, Skip, and Take statements and putting them into a Where-clause:
context.Cars
.Where(x => context.Cars.OrderBy(y => y.Id).Select(y => y.Id).Skip(50000).Take(1000).Contains(x.Id))
.ToList();
This roughly gets translated into:
exec sp_executesql N'
select * from [Cars]
where exists
(select 1 from
(select [Cars].[Id] from [Cars] order by [Cars].[Id] asc offset #p__linq__0 rows fetch next #p__linq__1 rows only
) as [Limit1]
where [Limit1].[Id] = [Cars].[Id]
)
order by [Cars].[Id] asc',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=50000,#p__linq__1=1000
So now, the outer select-statement only executes on the filtered dataset based on the where exists-clause!
Again, your mileage may vary on how much query time is saved by making the change. General rule of thumb is the more complex your select-statement and the deeper into the dataset you want to go, the more this optimization will help.

Resources