why does these queries generate different sql? - linq

I have a problem with a linq query in Entity framework. I am querying on a field on a navigation property. The problem is that the generated sql is less than optimal. The example below is simplified, actually I am trying to pass an expression tree, and that is why the second query with the let binding is not a sufficient solution even though the produced sql is what I want.
So to summarize I have two questions:
Why is the generated sql different? And is there any way to produce a sql query which will not create a join for every criteria with expression trees?
Update: I realize that I had Include("Securities") on the first query and not the second when I first posted the question, but it does not change the way the criterias are applied, only which columns are selected.
var qry = db.Positions
.Where(criteria)
.ToList();
var qry1 = (from p in db.Positions
where p.Security.Country == "NO" || p.Security.Country == "US" || p.Security.Country == "GB"
select p).ToList();
var qry2 = (from p in db.Positions
let s = p.Security
where s.Country == "NO" || s.Country == "US" || s.Country == "GB"
select p).ToList();
--qry1
SELECT
[Extent1].* --All columns from tblPositions
FROM [dbo].[tblPositions] AS [Extent1]
LEFT OUTER JOIN [dbo].[tblSecurities] AS [Extent2] ON ([Extent2].[SecurityType] IN (1,2..)) AND ([Extent1].[Security] = [Extent2].[SecuritySeq])
LEFT OUTER JOIN [dbo].[tblSecurities] AS [Extent3] ON ([Extent3].[SecurityType] IN (1,2..)) AND ([Extent1].[Security] = [Extent3].[SecuritySeq])
LEFT OUTER JOIN [dbo].[tblSecurities] AS [Extent4] ON ([Extent4].[SecurityType] IN (1,2..)) AND ([Extent1].[Security] = [Extent4].[SecuritySeq])
LEFT OUTER JOIN [dbo].[tblSecurities] AS [Extent5] ON ([Extent5].[SecurityType] IN (1,2..)) AND ([Extent1].[Security] = [Extent5].[SecuritySeq])
WHERE [Extent2].[Country] = 'NO' OR [Extent3].[Country] = 'US' OR [Extent4].[Country] = 'GB'
--qry2
SELECT
[Extent1].*
FROM [dbo].[tblPositions] AS [Extent1]
LEFT OUTER JOIN [dbo].[tblSecurities] AS [Extent2] ON ([Extent2].[SecurityType] IN (1,2..)) AND ([Extent1].[Security] = [Extent2].[SecuritySeq])
WHERE [Extent2].[Country] IN ('NO','US','GB')

In your first query
var qry1 = (from p in db.Positions.Include("Security")
where p.Security.Country == "NO"
|| p.Security.Country == "US"
|| p.Security.Country == "GB"
select p).ToList();
It seems you assume that Entity Framework has magical powers and can deduce that each of the or statements in the generated expression tree are on the same object, and that it can combine them into a contains. It is, by far, not that smart.
Additionally, the rows returned by both queries are not the same. The second one would also need an include, to include the rows from security (if needed). Otherwise you could remove them from the first query (include only means return rows, it has nothing to do with the ability of the where clause to filter rows).
Or to make it more object oriented and readable.
var allowedCountries = new List<string>() { "NO, "US", "GB" };
var qry1 = (from p in db.Positions
// I'm not sure if this is exactly correct
where p.Security.Country in allowedCountries
select p).ToList();
or Lambda (I'm much more familiar with)
var qry1 = db.Positions
.Where(p => allowedCountries.Contains(p.Security.Country))
.ToList();

Related

SQL to LINQ with JOIN and SubQuery

I have a query that I' struggling to convert to LINQ. I just can't get my head around the required nesting. Here's the query in SQL (just freehand typed):
SELECT V.* FROM V
INNER JOIN VE ON V.ID = VE.V_ID
WHERE VE.USER_ID != #USER_ID
AND V.MAX > (SELECT COUNT(ID) FROM VE
WHERE VE.V_ID = V.ID AND VE.STATUS = 'SELECTED')
The Closest I've come to is this:
var query = from vac in _database.Vacancies
join e in _database.VacancyEngagements
on vac.Id equals e.VacancyId into va
from v in va.DefaultIfEmpty()
where vac.MaxRecruiters > (from ve in _database.VacancyEngagements
where ve.VacancyId == v.Id && ve.Status == Enums.VacanyEngagementStatus.ENGAGED
select ve).Count()
...which correctly resolves the subquery from my SQL statement. But I want to further restrict the returned V rows to only those where the current user does not have a related VE row.
I've realised that the SQL in the question was misleading and whilst it led to technically correct answers, they weren't what I was after. That's my fault for not reviewing the SQL properly so I apologise to #Andy B and #Ivan Stoev for the misleading post. Here's the LINQ that solved the problem for me. As stated in the post I needed to show vacancy rows where no linked vacancyEngagement rows existed. The ! operator provides ability to specify this with a subquery.
var query = from vac in _database.Vacancies
where !_database.VacancyEngagements.Any(ve => (ve.VacancyId == vac.Id && ve.UserId == user.Id))
&& vac.MaxRecruiters > (from ve in _database.VacancyEngagements
where ve.VacancyId == vac.Id && ve.Status == Enums.VacanyEngagementStatus.ENGAGED
select ve).Count()
This should work:
var filterOutUser = <userId you want to filter out>;
var query = from vac in _database.Vacancies
join e in _database.VacancyEngagements
on vac.Id equals e.VacancyId
where (e.UserId != filterOutUser) && vac.MaxRecruiters > (from ve in _database.VacancyEngagements
where ve.VacancyId == vac.Id && ve.Status == Enums.VacanyEngagementStatus.ENGAGED
select ve).Count()
select vac;
I removed the join to VacancyEngagements but if you need columns from that table you can add it back in.

Linq left outer join with multiple condition

I am new to Linq. I am trying to query some data in MS SQL.
Here is my statement:
select * from booking
left outer join carpark
on booking.bookingId = carpark.bookingId
where userID = 5 and status = 'CL'
When I run this in MS SQL, I get the expected result. How can I do this in Linq?
Thank you for your help.
you need this:
var query = (from t1 in tb1
join t2 in tb2 on t1.pKey = t2.tb1pKey into JoinedList
from t2 in JoinedList.DefaultIfEmpty()
where t1.userID == 5 && t1.status == "CL"
select new
{
t1,
t2
})
.ToList();
You can try to do left join this way :
from t1 in tb1
from t2 in tb2.Where(o => o.tb1pKey == t1.pKey).DefaultIfEmpty()
where tb1.userId == 5 && tb1.status == "CL"
select t1;
Usually when people say they want a "left outer join," that's just because they've already converted what they really want into SQL in their head. Usually what they really want is all of the items from table A, and the ability to get the related items from table B if there are any.
Assuming you have your navigation properties set up correctly, this could be as easy as:
var tb1sWithTb2s = context.tb1
.Include(t => t.tb2s) // Include all the tb2 items for each of these.
.Where(t => t.userID == 5 and t.status = "CL");

LINQ - count from select with join with no group by

Linq is brand new to me so I apologize if this is really stupid.
I am trying to get the count from a multi-table join with where clause, without group by. I've seen examples of group by and will resort to that if need be, but I am wondering if there is a way to avoid it. Is sql my query would look something like this;
SELECT Count(*)
FROM plans p
JOIN organizations o
ON p.org_id = o.org_id
AND o.deleted IS NULL
JOIN orgdata od
ON od.org_id = o.org_id
AND od.active = 1
JOIN orgsys os
ON os.sys_id = od.sys_id
AND os.deleted IS NULL
WHERE p.deleted IS NULL
AND os.name NOT IN ( 'xxxx', 'yyyy', 'zzzz' )
What's the best way to get this?
All you need is to call Count(). You're only counting the number of results. So something like:
var names = new[] { "xxxx", "yyyy", "zzzz" };
var query = from plan in db.Plans
where plan.Deleted == null
join organization in db.Organizations
on plan.OrganizationId equals organization.OrganizationId
where organization.Deleted == null
join orgData in db.OrganizationData
on organization.OrganizationId equals orgData.OrganizationId
where orgData.Active == 1
join os on db.OrganizationSystems
on orgData.SystemId equals os.SystemId
where os.Deleted == null &&
!names.Contains(os.Name)
select 1; // It doesn't matter what you select here
var count = query.Count();

How to write this LINQ Query in a better way

I have one Linq Query. When I run the query, Only for 10 records its taking 13 seconds to extract the data to the model. I need to know the query which I wrote is good for performance or not. Please guide me what i am doing wrong.
Code
var stocktakelist = (from a in Db.Stocktakes
select new ExportStock
{
Id = a.Id,
ItemNo = a.ItemNo,
AdminId = (from admin in Db.AdminAccounts where admin.Id == a.Id select admin.Name).FirstOrDefault(),
CreatedOn = a.CreatedOn,
Status = (from items in Db.Items where items.ItemNo == a.ItemNo select items.ItemStatu.Description).FirstOrDefault(),
Title = (from tit in Db.BibContents where tit.BibId == (from bibs in Db.Items where bibs.ItemNo == a.ItemNo select bibs.BibId).FirstOrDefault() && tit.TagNo == "245" && tit.Sfld == "a" select tit.Value).FirstOrDefault() // This line of Query only makes the performance Issue
}
).ToList();
Thanks
The reason this is so slow is because it is running the 3 inner LINQ statements for every item in the outer LINQ statement.
Using LINQ joins will run only 4 queries and then link them together, which is faster.
To find out how to join, there are plenty of resources on the Internet depending on the type of LINQ you are using.
If you're retrieving this data from a SQL server, perhaps consider doing this intensive work in SQL - this is what SQL was designed for and it's much quicker than .NET. EDIT: As highlighted below, the work is done in SQL if using LINQ to SQL/Entities and using the correct join syntax.
I was trying to create the corresponding query with some joins for practice.
I cannot test it and i'm not 100% sure that this query will you get the result
you are hoping for but maybe at least it will give you a hint on how to write
joins with linq.
from a in Db.Stocktakes
join admin in Db.AdminAccounts
on a.Id equals admin.Id
into adminJoinData
from adminJoinRecord in adminJoinData.DefaultIfEmpty( )
join items in Db.Items
on a.ItemNo equals items.ItemNo
into itemsJoinData
from itemsJoinRecord in itemsJoinData.DefaultIfEmpty( )
join title in Db.BibContents
(
from subQuery in Db.BibContents
where subQuery.TagNo == "245"
where subQuery.Sfld == "a"
select subquery
)
on title.BibId equals itemsJoinRecord.BidId
into titleJoinData
from titleJoinRecord in titleJoinData.DefaultIfEmpty( )
select new ExportStock( )
{
Id = a.Id,
ItemNo = a.ItemNo,
AdminId = adminJoinRecord.Name,
CreatedOn = a.CreatedOn,
Status = itemsJoinRecord.ImemStatu.Description,
Title = titleJoinRecord.Value
}
As others have said, you should use Left Outer Joins in your LINQ just as you would if writing it in SQL.
Your query above will end up looking roughly like this once converted (this is untested, but gives the basic idea):
var a = from a in Db.Stocktakes
join admin in Db.AdminAccounts on admin.Id equals a.Id into tmpAdmin
from ad in tmpAdmin.DefaultIfEmpty()
join item in Db.Items on item.ItemNo equals a.ItemNo into tmpItem
from it in tmpItem.DefaultIfEmpty()
join title in Db.BibContents on bib.BibId equals items.BibId into tmpTitle
from ti in tmpTitle.DefaultIfEmpty()
where ti.TagNo == "245"
&& ti.Sfld == "a"
select new ExportStock
{
Id = a.Id,
ItemNo = a.ItemNo,
AdminId = ad == null ? default(int?) : ad.Id,
CreatedOn = a.CreatedOn,
Status = it == null ? default(string) : it.ItemStatus.Description,
Title = ti == null ? default(string) : ti.Value
};
Using lambda expressions your query will look like this:
Db.Stocktakes
.Join(Db.AdminAccounts, a => a.Id, b => b.Id, (a,b) => new { a, AdminId = b.Name })
.Join(Db.Items, a => a.ItemNo, b => b.ItemNo, (a,b) => new { a, Status = b.ItemStatus.Description, BidId = b.BibId })
.Join(Db.BibContents, a => a.BibId, b => b.BibId, (a,b) => new { a, Value = b.Value, TagNo = b.TagNo, Sfld = b.Sfld })
.Where(a => a.TagNo == "245" && a.Sfld == "a")
.Select(a =>
new ExportStock { Id = a.Id,
ItemNo = a.ItemNo,
AdminId = a.AdminId,
CreatedOn = a.CreatedOn,
Status = a.Status,
Title = a.Value
}
).ToList();

Adding multiple Contains IQueryables to a base IQueryable changes every previous IQueryable

I am spending a good amount of time trying to figure out why Linq2SQL is changing my SQL query. It is rather difficult to explain, and I can't find any reason why this is happening. The low down is it appears that adding multiple contains around an IQueryable seems to overwrite each previous IQueryable Expression. Let me try and explain:
Say you have a Linq2SQL query that provides you the basic framework for query. (something that is the base of all your queries)
I am dynamically adding in parts of the where query (shown as "partQuery" in the examples below). The Expression generated from the where query is correct, and when I add it to the finalQuery -- it still is correct. The problem comes when I add another partQuery to the final query, it seems to overwrite the first query in the final query, but adds a second query into it. (or as shown below, when adding a third query, overwrites the first 2 queries)
Here is some source example:
foreach (var partQuery in whereStatements)
{
finalQuery = finalQuery.Where(
dataEvent => partQuery.Contains(dataEvent.DataEventID)
);
}
The partQuery is of of type IQueryable
finalQuery is the query that will eventually be executed at the SQL server
// the list of the wheres that are sent
var whereStatements = new List<IQueryable<long>>();
var query1 = DataEvent.GetQueryBase(context);
query1 = query1.Where(
dataEvent =>
dataEvent.DataEventKeyID == (short)DataEventTypesEnum.TotalDollarAmount && dataEvent.ValueDouble < -50);
whereStatements.Add(query1.Select(x => x.DataEventID));
var query2 = DataEvent.GetQueryBase(context);
query2 = query2.Where(
dataEvent =>
dataEvent.DataEventKeyID == (short)DataEventTypesEnum.ObjectNumber && dataEvent.ValueDouble == 6);
whereStatements.Add(query2.Select(x => x.DataEventID));
The first where Query (query1) has an expression that turns out like this:
{SELECT [t0].[DataEventID]
FROM [dbo].[DataEvents] AS [t0]
INNER JOIN [dbo].[DataEventAttributes] AS [t1] ON [t0].[DataEventID] = [t1].[DataEventID]
WHERE ([t1].[DataEventKeyID] = #p0) AND ([t1].[ValueDouble] < #p1)
}
Notice that the where line has a ValueDouble "<" #p1 -- less then
and then when added into the final query, it looks like this:
{SELECT [t0].[DataEventID], [t0].[DataOwnerID], [t0].[DataTimeStamp]
FROM [dbo].[DataEvents] AS [t0]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[DataEvents] AS [t1]
INNER JOIN [dbo].[DataEventAttributes] AS [t2] ON [t1].[DataEventID] = [t2].[DataEventID]
WHERE ([t1].[DataEventID] = [t0].[DataEventID]) AND ([t2].[DataEventKeyID] = #p0) AND ([t2].[ValueDouble] < #p1)
)) AND ([t0].[DataOwnerID] = #p2)
}
At this point, the query is still correct. Notice how ValueDouble still has a "<" sign. The problem occurs when I add 2 or more to the query. Here is the expression of the second query in this example:
{SELECT [t0].[DataEventID]
FROM [dbo].[DataEvents] AS [t0]
INNER JOIN [dbo].[DataEventAttributes] AS [t1] ON [t0].[DataEventID] = [t1].[DataEventID]
WHERE ([t1].[DataEventKeyID] = #p0) AND ([t1].[ValueDouble] = #p1)
}
and when added to the final query.. you will notice that the first query is no longer correct.... (And more to come after)
{SELECT [t0].[DataEventID], [t0].[DataOwnerID], [t0].[DataTimeStamp]
FROM [dbo].[DataEvents] AS [t0]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[DataEvents] AS [t1]
INNER JOIN [dbo].[DataEventAttributes] AS [t2] ON [t1].[DataEventID] = [t2].[DataEventID]
WHERE ([t1].[DataEventID] = [t0].[DataEventID]) AND ([t2].[DataEventKeyID] = #p0) AND ([t2].[ValueDouble] = #p1)
)) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[DataEvents] AS [t3]
INNER JOIN [dbo].[DataEventAttributes] AS [t4] ON [t3].[DataEventID] = [t4].[DataEventID]
WHERE ([t3].[DataEventID] = [t0].[DataEventID]) AND ([t4].[DataEventKeyID] = #p2) AND ([t4].[ValueDouble] = #p3)
)) AND ([t0].[DataOwnerID] = #p4)
}
And a bonus to it... after looking at this via the SQL profiler, it appears that it totally dropped the first query, and the two Exists clauses in the final SQl are actually the same query (query2). None of the parameters are actually passed to the SQl server for the first query.
So, in my research of this, it appears that its adding queries to the SQl, but its replacing any existing where exists clause with the last one that was added. To double confirm this.. the exact same code as above, but I added a third query.... and look how it changed.
var query3 = DataEvent.GetQueryBase(context);
query3 = query3.Where(
dataEvent =>
dataEvent.DataEventKeyID != (short)DataEventTypesEnum.Quantity && dataEvent.ValueDouble != 5);
whereStatements.Add(query3.Select(x => x.DataEventID));
I threw in some "!=" to the last part of the query
{SELECT [t0].[DataEventID], [t0].[DataOwnerID], [t0].[DataTimeStamp]
FROM [dbo].[DataEvents] AS [t0]
WHERE (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[DataEvents] AS [t1]
INNER JOIN [dbo].[DataEventAttributes] AS [t2] ON [t1].[DataEventID] = [t2].[DataEventID]
WHERE ([t1].[DataEventID] = [t0].[DataEventID]) AND ([t2].[DataEventKeyID] <> #p0) AND ([t2].[ValueDouble] <> #p1)
)) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[DataEvents] AS [t3]
INNER JOIN [dbo].[DataEventAttributes] AS [t4] ON [t3].[DataEventID] = [t4].[DataEventID]
WHERE ([t3].[DataEventID] = [t0].[DataEventID]) AND ([t4].[DataEventKeyID] <> #p2) AND ([t4].[ValueDouble] <> #p3)
)) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[DataEvents] AS [t5]
INNER JOIN [dbo].[DataEventAttributes] AS [t6] ON [t5].[DataEventID] = [t6].[DataEventID]
WHERE ([t5].[DataEventID] = [t0].[DataEventID]) AND ([t6].[DataEventKeyID] <> #p4) AND ([t6].[ValueDouble] <> #p5)
)) AND ([t0].[DataOwnerID] = #p6)
}
Notice how all three internal queries are now all "<>" where the query above is not that.
Am I totally off my rocker here? Am I missing something so simple that when you tell me I am going to want to pull my fingernails off? I am actually hoping you tell me that, instead of telling me that it looks like a bug in the MS framework (well, we know that happens sometimes).
Any help is greatly appreciated. Maybe I should be sending dynamic query parts to the base query a different way. I am open to ideas.
Without having fully evaluated your examples, the one thing that stands out is this:
foreach (var partQuery in whereStatements)
{
finalQuery = finalQuery.Where(
dataEvent => partQuery.Contains(dataEvent.DataEventID)
);
}
Because of the way this loop is structured, every expression generated in each iteration will eventually use the final value of partQuery--the value that is present when the loop terminates. You probably want this, instead:
foreach (var partQuery in whereStatements)
{
var part = partQuery;
finalQuery = finalQuery.Where(
dataEvent => part.Contains(dataEvent.DataEventID)
);
}
Now, part is the captured variable, and is unique per iteration, and therefore unique per expression. This odd-at-first behavior is by design: see a related question.
Edited: It looks like this is exactly what's causing your problem; the subqueries in the final query are all of the form x <> y, which is the form of the last query added to your whereStatements collection.

Resources