Optimizing a LINQ to SQL query - linq

I have a query that looks like this:
public IList<Post> FetchLatestOrders(int pageIndex, int recordCount)
{
DatabaseDataContext db = new DatabaseDataContext();
return (from o in db.Orders
orderby o.CreatedDate descending
select o)
.Skip(pageIndex * recordCount)
.Take(recordCount)
.ToList();
}
I need to print the information of the order and the user who created it:
foreach (var o in FetchLatestOrders(0, 10))
{
Console.WriteLine("{0} {1}", o.Code, o.Customer.Name);
}
This produces a SQL query to bring the orders and one query for each order to bring the customer. Is it possible to optimize the query so that it brings the orders and it's customer in one SQL query?
Thanks
UDPATE: By suggestion of sirrocco I changed the query like this and it works. Only one select query is generated:
public IList<Post> FetchLatestOrders(int pageIndex, int recordCount)
{
var options = new DataLoadOptions();
options.LoadWith<Post>(o => o.Customer);
using (var db = new DatabaseDataContext())
{
db.LoadOptions = options;
return (from o in db.Orders
orderby o.CreatedDate descending
select o)
.Skip(pageIndex * recordCount)
.Take(recordCount)
.ToList();
}
}
Thanks sirrocco.

Something else you can do is EagerLoading. In Linq2SQL you can use LoadOptions : More on LoadOptions
One VERY weird thing about L2S is that you can set LoadOptions only before the first query is sent to the Database.

you might want to look into using compiled queries
have a look at http://www.3devs.com/?p=3

Given a LINQ statement like:
context.Cars
.OrderBy(x => x.Id)
.Skip(50000)
.Take(1000)
.ToList();
This roughly gets translated into:
select * from [Cars] order by [Cars].[Id] asc offset 50000 rows fetch next 1000 rows
Because offset and fetch are extensions of order by, they are not executed until after the select-portion runs (google). This means an expensive select with lots of join-statements are executed on the whole dataset ([Cars]) prior to getting the fetched-results.
Optimize the statement
All that is needed is taking the OrderBy, Skip, and Take statements and putting them into a Where-clause:
context.Cars
.Where(x => context.Cars.OrderBy(y => y.Id).Select(y => y.Id).Skip(50000).Take(1000).Contains(x.Id))
.ToList();
This roughly gets translated into:
exec sp_executesql N'
select * from [Cars]
where exists
(select 1 from
(select [Cars].[Id] from [Cars] order by [Cars].[Id] asc offset #p__linq__0 rows fetch next #p__linq__1 rows only
) as [Limit1]
where [Limit1].[Id] = [Cars].[Id]
)
order by [Cars].[Id] asc',N'#p__linq__0 int,#p__linq__1 int',#p__linq__0=50000,#p__linq__1=1000
So now, the outer select-statement only executes on the filtered dataset based on the where exists-clause!
Again, your mileage may vary on how much query time is saved by making the change. General rule of thumb is the more complex your select-statement and the deeper into the dataset you want to go, the more this optimization will help.

Related

optimize linq query with multiple include statements

The linq query takes around 20 seconds for executing on some of the data . When converted the linq to sql there are 3 nested joins that might be taking more time for execution . Can we optimize the below query .
var query = (from s in this.Items
where demoIds.Contains(s.Id)
select)
.Include("demo1")
.Include("demo2")
.Include("demo3")
.Include("demo4");
return query;
The expectation is to execute the query in 3-4 seconds which is now taking around 20 secs for 100 demoIds .
As far as your code is concerned it looks like it's the best way to get what you want (Assuming Includeing "demo3" twice is a typo for this example.).
However, the database you use will have a way to optimize your queries or rather the underlying data structure. Use whatever tool your database provider has to get an execution plan of the query and see where it spends so much time. You might be missing an index or two.
I advice lazy loading or join query.
Probably SQL output is this query;
(SELECT .. FROM table1 WHERE ID in (...)) AS T1
(INNER, FULL) JOIN (SELECT .. FROM table2) AS T2 ON T1.PK = T2.FOREIGNKEY
(INNER, FULL) JOIN (SELECT .. FROM table3) AS T3 ON T1.PK = T3.FOREIGNKEY
(INNER, FULL) JOIN (SELECT .. FROM table4) AS T4 ON T1.PK = T4.FOREIGNKEY
But if you can use lazy loading, no need use the Include() func. And lazy loading will solve your problem.
Other else, you can write with join query,
var query = from i in this.Items.Where(w=>demoIds.Contains(w.Id))
join d1 in demo1 on i.Id equals d1.FK
join d2 in demo2 on i.Id equals d2.FK
join d3 in demo3 on i.Id equals d3.FK
select new ... { };
This two solutions solve your all problems.
If it continues your problem, I strongly recommended store procedure.
I've got a similar issue with a query that had 15+ "Include" statements and generated a 2M+ rows result in 7 minutes.
The solution that worked for me was:
Disabled lazy loading
Disabled auto detect changes
Split the big query in small chunks
A sample can be found below:
public IQueryable<CustomObject> PerformQuery(int id)
{
ctx.Configuration.LazyLoadingEnabled = false;
ctx.Configuration.AutoDetectChangesEnabled = false;
IQueryable<CustomObject> customObjectQueryable = ctx.CustomObjects.Where(x => x.Id == id);
var selectQuery = customObjectQueryable.Select(x => x.YourObject)
.Include(c => c.YourFirstCollection)
.Include(c => c.YourFirstCollection.OtherCollection)
.Include(c => c.YourSecondCollection);
var otherObjects = customObjectQueryable.SelectMany(x => x.OtherObjects);
selectQuery.FirstOrDefault();
otherObjects.ToList();
return customObjectQueryable;
}
IQueryable is needed in order to do all the filtering at the server side. IEnumerable would perform the filtering in memory and this is a very time consuming process. Entity Framework will fix up any associations in memory.

How this linq execute?

Data = _db.ALLOCATION_D.OrderBy(a => a.ALLO_ID)
.Skip(10)
.Take(10)
.ToList();
Let say I have 100000 rows in ALLOCATION_D table. I want to select first 10 row. Now I want to know how the above statement executes. I don't know but I think it executes in the following way...
first it select the 100000 rows
then ordered by ALLO_ID
then Skip 10
finally select the 10 rows.
Is it right? I want to know more details.
This Linq produce a SQL query via Entity Framework. Then it depends on your DBMS, but for SQL Server 2008, here is the query produces:
SELECT TOP (10) [Extent1].[ALLO_ID] AS [ALLO_ID],
FROM (
SELECT [Extent1].[ALLO_ID] AS [ALLO_ID]
, row_number() OVER (ORDER BY [Extent1].[ALLO_ID] ASC) AS [row_number]
FROM [dbo].[ALLOCATION_D] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 10
ORDER BY [Extent1].[ALLO_ID] ASC
You can run this in your C# for retrieve the query:
var linqQuery = _db.ALLOCATION_D
.OrderBy(a => a.ALLO_ID)
.Skip(10)
.Take(10);
var sqlQuery = ((System.Data.Objects.ObjectQuery)linqQuery).ToTraceString();
Data = linqQuery.ToList();
Second option with Linq To SQL
var linqQuery = _db.ALLOCATION_D
.OrderBy(a => a.ALLO_ID)
.Skip(10)
.Take(10);
var sqlQuery = _db.GetCommand(linqQuery).CommandText;
Data = linqQuery.ToList();
References:
How do I view the SQL generated by the entity framework?
How to: Display Generated SQL
How to view LINQ Generated SQL statements?
Your statement reads as follows:
Select all rows (overwritten by skip/take)
Order by Allo_ID
Order by Allo_ID again
Skip first 10 rows
Take next 10 rows
If you want it to select the first ten rows, you simply do this:
Data = _db.ALLOCATION_D // You don't need to order twice
.OrderBy(a => a.ALLO_ID)
.Take(10)
.ToList()
Up to the ToList call, the calls only generates expressions. That means that the OrderBy, Skip and Take calls are bundled up as an expression that is then sent to the entity framework to be executed in the database.
Entity framework will make an SQL query from that expression, which returns the ten rows from the table, which the ToList methods reads and places in a List<T> where T is the type of the items in the ALLOCATION_D collection.

Linq query, return distinct on single field & returning subset of data

I have a Linq query that returns three data elements.
var billingDateResults = from s in Subscriptions
.Where(s => (s.ProductCode.Contains("myCode")))
select { g.ID, BillingDate =s.BILL_THRU, s.ProductCode};
I would like to do distinct type of query on this to limit the results to one record per ID.
var billingDateResults = from s in Subscriptions
.Where(s => (s.ProductCode.Contains("myCode")))
group s by s.ID into g
select g.FirstOrDefault();
This works but now returns all of the fields in the records and I would like to minimize the amount of data by limiting the results to only the 3 fields in the first example.
What is a good way to do this?
TIA
Group by those three fields then.
var billingDateResults =
from s in Subscriptions
where s.ProductCode.Contains("myCode")
group new
{
g.ID,
BillingDate = s.BILL_THRU,
s.ProductCode
} by s.ID into g
select g.First(); // FirstOrDefault is not necessary, the groups will be non-empty

How to improve LINQ to EF performance

I have two classes: Property and PropertyValue. A property has several values where each value is a new revision.
When retrieving a set of properties I want to include the latest revision of the value for each property.
in T-SQL this can very efficiently be done like this:
SELECT
p.Id,
pv1.StringValue,
pv1.Revision
FROM dbo.PropertyValues pv1
LEFT JOIN dbo.PropertyValues pv2 ON pv1.Property_Id = pv2.Property_Id AND pv1.Revision < pv2.Revision
JOIN dbo.Properties p ON p.Id = pv1.Property_Id
WHERE pv2.Id IS NULL
ORDER BY p.Id
The "magic" in this query is to join on the lesser than condition and look for rows without a result forced by the LEFT JOIN.
How can I accomplish something similar using LINQ to EF?
The best thing I could come up with was:
from pv in context.PropertyValues
group pv by pv.Property into g
select g.OrderByDescending(p => p.Revision).FirstOrDefault()
It does produce the correct result but is about 10 times slower than the other.
Maybe this can help. Where db is the database context:
(
from pv1 in db.PropertyValues
from pv2 in db.PropertyValues.Where(a=>a.Property_Id==pv1.Property_Id && pv1.Revision<pv2.Revision).DefaultIfEmpty()
join p in db.Properties
on pv1.Property_Id equals p.Id
where pv2.Id==null
orderby p.Id
select new
{
p.Id,
pv1.StringValue,
pv1.Revision
}
);
Next to optimizing a query in Linq To Entities, you also have to be aware of the work it takes for the Entity Framework to translate your query to SQL and then map the results back to your objects.
Comparing a Linq To Entities query directly to a SQL query will always result in lower performance because the Entity Framework does a lot more work for you.
So it's also important to look at optimizing the steps the Entity Framework takes.
Things that could help:
Precompile your query
Pre-generate views
Decide for yourself when to open the database connection
Disable tracking (if appropriate)
Here you can find some documentation with performance strategies.
if you want to use multiple conditions (less than expression) in join you can do this like
from pv1 in db.PropertyValues
join pv2 in db.PropertyValues on new{pv1.Property_ID, Condition = pv1.Revision < pv2.Revision} equals new {pv2.Property_ID , Condition = true} into temp
from t in temp.DefaultIfEmpty()
join p in db.Properties
on pv1.Property_Id equals p.Id
where t.Id==null
orderby p.Id
select new
{
p.Id,
pv1.StringValue,
pv1.Revision
}

Linq: Orderby when including multiple tables

Currently learning Linq to Entity. I been successful, but came stumped with the orderby clause and its use with multiple tables.
var query = from k in contxt.pages.Include("keywords")
where k.ID == vals.pageId select k;
My understanding with the code above is it creates an inner join where ID is equal to pageId.
So what I am having a difficult time visualizing is how I would perform an orderby on both tables?
I would like to sort on both tables.
I have tried:
var query = from k in contxt.pages.Include("keywords") where k.ID == vals.pageId orderby k.keywords.**?** select k;
The question mark is not supposed to be there. I am showing that the column that I would like to sort by isn't there. Trying this k.Kegwords. doesn't show the column.
I would write a SQL query as follows:
string query = "SELECT pages.page, pages.title, pages.descp, keywords.keyword
FROM pages INNER JOIN keywords ON pages.ID = keywords.pageID
ORDER BY keywords.sort, pages.page";
pages and keywords have a 1 to many relationship, which FK keywords.
Thank you,
deDogs
Here you go.
var result = (from x in pages
join y in keywords on x.ID equals y.pageID
orderby y.sort, x.page
select new
{
x.Page,
x.title,
x.descp,
y.keyword
});

Resources