Why does LINQ-to-EF render certain "Contains" as two separate JOINs? [duplicate] - linq

I know that there are some questions about this already, most relate to either old issues which were resolved or multiple tables. This question is not covered in any of the other 'left outer join' issues I saw, I get an INNER JOIN and LEFT OUTER JOIN to the same table at the same query.
The table outline:
Users: id (PK)
Name (VARCHAR)
ProfileImageUri (VARCHAR)
Locations: id (PK)
LocationBPNTips: id (PK)
TipText (VARCHAR)
CreatedAt (Datetime)
UserId (int) (FK to User.id, navigation property is called User)
LocationId (int) (FK to Location.id)
(there is more, but it is not relevant :) )
In my scenario, I am performing a query to a referenced table via projection and I get an extra left outer join, this is what I run (the commented parts are irrelevant to the problem, commented out for cleaner SQL, EF does the sorting right (even better than I imagined :) ) ):
LocationBPNTips
.Where(t => t.LocationId == 33)
//.OrderByDescending(t => intList.Contains(t.UserId))
//.ThenByDescending(t => t.CreatedAt)
.Select(tip => new LocationTipOutput
{
CreatedAt = tip.CreatedAt,
Text = tip.TipText,
LocationId = tip.LocationId,
OwnerName = tip.User.Name,
OwnerPhoto = tip.User.ProfileImageUri
}).ToList();
And this is is the generated SQL
SELECT
[Extent1].[LocationId] AS [LocationId],
[Extent1].[CreatedAt] AS [CreatedAt],
[Extent1].[TipText] AS [TipText],
[Extent2].[Name] AS [Name],
[Extent3].[ProfileImageUri] AS [ProfileImageUri]
FROM [dbo].[LocationBPNTips] AS [Extent1]
INNER JOIN [dbo].[Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[Id]
LEFT OUTER JOIN [dbo].[Users] AS [Extent3] ON [Extent1].[UserId] = [Extent3].[Id]
WHERE 33 = [Extent1].[LocationId]
As you can see, the LEFT OUTER JOIN is done on the same table of the INNER JOIN
I think the optimal code will be (note, I renamed Extent3 to Extent2 manually, and added the comment. this was not generated by EF!!) - with my current data, this runs about 22% faster (with the sorting, this % should be higher without the sort) as no need for an extra join..
SELECT
[Extent1].[LocationId] AS [LocationId],
[Extent1].[CreatedAt] AS [CreatedAt],
[Extent1].[TipText] AS [TipText],
[Extent2].[Name] AS [Name],
[Extent2].[ProfileImageUri] AS [ProfileImageUri]
FROM [dbo].[LocationBPNTips] AS [Extent1]
INNER JOIN [dbo].[Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[Id]
--LEFT OUTER JOIN [dbo].[Users] AS [Extent3] ON [Extent1].[UserId] = [Extent3].[Id]
WHERE 33 = [Extent1].[LocationId]
The different queries I have tried (the projection is into an anonymous type in these):
LocationBPNTips
.Where(t => t.LocationId == 33)
//.OrderByDescending(t => intList.Contains(t.UserId))
//.ThenByDescending(t => t.CreatedAt)
.Select(tip => new
{
CreatedAt = tip.CreatedAt,
Text = tip.TipText,
LocationId = tip.LocationId,
OwnerName = tip.User,
OwnerPhoto = tip.User
}).ToList()
SQL output was messed up, it selected the entire user table twice in the same format as above, inner then left outer. I think that I can see in theory why this happens for this case, because I asked for the data twice - although it could have been done in memory and not by the SQL with an extra join - but in my case I did not ask for data twice, I asked for different columns only once. I did this test to see if the double join is consistent.
I also tried running:
LocationBPNTips
.Where(t => t.LocationId == 33)
.Select(tip => new
{
CreatedAt = tip.CreatedAt,
Text = tip.TipText,
LocationId = tip.LocationId,
OwnerName = tip.User.Name
}).ToList()
This one returned clean, single inner join as expected, but it is not what I am trying to do
So the question is: Is this a bug? am I using the EF incorrectly?

I've seen similar problem already. We can call it bug or feature. Simply EF query generation is far from ideal. ADO.NET team fixes some problems with every major release. I don't have June CTP 2011 currently installed to verify if it also happens in the first preview of the next version.
Edit:
According to this answer similar issue was fixed in June CTP 2011.

Related

How to simplify this postgres query

How can I simplify this query?
What I am trying to do is derive the column S9_Unlock via a subquery in which I only look for user_ids which are returned from the main query but this looks very awkward to me, especially as this query here is just an excerpt. In reality I am doing multiple of these subqueries to derive different columns...
SELECT userid, CAST(to_char(S9_unlock,'YYYY/MM/DD') AS timestamp) AS "S9_Unlock"
FROM (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id) AS master
LEFT JOIN (
SELECT MIN(s3.unl) AS "S9_Unlock", s3.user_id
FROM (
SELECT user_id, challenge_codes.created AS unl,
MAX /* Check if license contains Suite9 */
(CASE WHEN substring(bundle_article_code,1,6) = 'BuSu90' THEN 1 ELSE 0 END) AS "S9_Unlock"
FROM licensing_db.serial_numbers
LEFT JOIN licensing_db.licenses ON licenses.id = serial_numbers.license_id
LEFT JOIN user_db.users ON users.id = licenses.user_id
LEFT JOIN licensing_db.challenge_codes ON challenge_codes.serial_number_id = serial_numbers.id
WHERE user_id IN (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON inv.id = ca.invoice_id
LEFT JOIN shop_db.cart_items AS ci ON ca.id = ci.cart_id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
GROUP BY user_id
)
GROUP BY user_id, challenge_codes.created) AS s3
)
WHERE "S9_Unlock" = 1
AND s3.unl IS NOT NULL
GROUP BY s3.user_id) AS "S9_Unlock" ON "S9_Unlock".user_id = master.userid
In your query you have two sub-queries that are identical; this screams for a CTE.
In the sub-query on licensing issues you can filter out the valid licenses after the GROUP BY clause using a HAVING clause. Make that a WITH QUERY too and you end up with the rather more readable:
WITH inv AS (
SELECT ca.user_id AS userid
FROM shop_db.invoices AS inv
LEFT JOIN shop_db.carts AS ca ON ca.invoice_id = inv.id
LEFT JOIN shop_db.cart_items AS ci ON ci.cart_id = ca.id
WHERE (inv.created BETWEEN '2014-11-13' AND '2014-11-14' OR inv.created BETWEEN '2013-11-14' AND '2013-11-15')
AND inv.status <> 'do_not_book'
AND inv.id IS NOT NULL
), s3 AS (
SELECT u.user_id, min(cc.created) AS first_unlocked, bundle_article_code
FROM licensing_db.serial_numbers AS sn
LEFT JOIN licensing_db.licenses AS lic ON lic.id = sn.license_id
LEFT JOIN user_db.users AS u ON u.id = lic.user_id
LEFT JOIN licensing_db.challenge_codes AS cc ON cc.serial_number_id = sn.id
WHERE u.user_id IN (SELECT userid FROM inv)
GROUP BY u.user_id, bundle_article_code
HAVING bundle_article_code LIKE 'BuSu90%'
AND first_unlocked IS NOT NULL
)
SELECT userid, date_trunc('day', first_unlocked) AS "S9_Unlock"
FROM inv
LEFT JOIN s3 ON s3.user_id = inv.userid;
So the main query is now reduced to 3 lines and both the WITH-QUERY's perform a logically self-contained query of the database. The other sub-queries you refer to can similarly become a WITH-QUERY and then you assemble them in the main query. Remember that you can refer to earlier named queries in the list of with-queries, as is shown above with inv being referred to by s3. While such CTE's are syntactically not providing new functionality (except for the RECURSIVE variant), they do make complex queries much more readable and therefore easier to maintain.
Another approach would be to factor out logical sub-components (such as the inv sub-query) and make a VIEW out of those. Then you can simply reference the view in the main query. Making the whole thing a view is probably also a good idea if you want to make the query more flexible. What if you want to query for Suite9.1 ('BuSu91%') on 27 March 2014? Taken those literals out and then using them as WHERE clauses in a view makes your query more versatile; this can be either with sub-queries or with the complete CTE.
(Please check if the semantics are still right in the s3 with-query because without your table structures and sample data I ccannot test my code above.)
Instead of solving your problem as one big monolithic relational sql query, I would seriously consider going the "procedural" way, by using the built-in "plpgsql" language of postgresql. This could bring a lot of clarity in your application.

Linq left outer group by, then left outer the group

I've this query that i'm trying to put as linq:
select *
from stuff
inner join stuffowner so on so.stuffID = stuff.stuffID
left outer join (select min(loanId) as loanId, stuffownerId from loan
where userid = 1 and status <> 2 group by stuffownerId) t on t.stuffownerid = so.stuffownerid
left outer join loan on t.LoanId = loan.LoanId
when this is done, I would like to do a linq Group by to have Stuff has key, then stuffowners + Loan as value.
I can't seem to get to a nice query without sub query (hence the double left outer).
So basically what my query does, is for each stuff I've in my database, bring the owners, and then i want to bring the first loan a user has made on that stuff.
I've tried various linq:
from stuff in Stuffs
join so in StuffOwners on stuff.StuffId equals so.StuffId
join tLoan in Loans on so.StuffOwnerId equals tLoan.StuffOwnerId into tmpJoin
from tTmpJoin in tmpJoin.DefaultIfEmpty()
group tTmpJoin by new {stuff} into grouped
select new {grouped, fluk = (int?)grouped.Max(w=> w.Status )}
This is not good because if I don't get stuff owner and on top of that it seems to generate a lot of queries (LinqPad)
from stuff in Stuffs
join so in StuffOwners on stuff.StuffId equals so.StuffId
join tmpLoan in
(from tLoan in Loans group tLoan by tLoan.StuffOwnerId into g
select new {StuffOwnerId = g.Key, loanid = (from t2 in g select t2.LoanId).Max()})
on so.StuffOwnerId equals tmpLoan.StuffOwnerId
into tmptmp from tMaxLoan in tmptmp.DefaultIfEmpty()
select new {stuff, so, tmptmp}
Seems to generate a lot of subqueries as well.
I've tried the let keyworkd with:
from tstuffOwner in StuffOwners
let tloan = Loans.Where(p2 => tstuffOwner.StuffOwnerId == p2.StuffOwnerId).FirstOrDefault()
select new { qsdq = tstuffOwner, qsdsq= (int?) tloan.Status, kwk= (int?) tloan.UserId, kiwk= tloan.ReturnDate }
but the more info i get from tLoan, the longer the query gets with more subqueries
What would be the best way to achieve this?
Thanks

Oracle SQL developer how to display NULL value of the foreign key

I am going to try to use left outer join between Ticket and Membership.
However, it does not display the foreign key of NULL values on Ticket. Could you give me some answer for this what's wrong with this query?
Thanks.
FROM Ticket t, Production pro, Performance per, Price, Price_level, Booking, Customer, Customer_Concession ccons, Membership, Member_concession mcons
WHERE t.performanceid = per.performanceid AND
t.PRODUCTIONID = Price.PRODUCTIONID AND
t.levelId = Price.levelId AND
Price.PRODUCTIONID = pro.PRODUCTIONID AND
Price.levelId = Price_level.levelId AND
Booking.bookingId (+) = t.bookingId AND
Customer.customerId = Booking.customerId AND
ccons.cConcessionId (+) = Customer.cConcessionId AND
Membership.membershipId (+) = t.membershipId AND
Membership.mConcessionId = mcons.mConcessionId
ORDER BY t.ticketId
One potential problem you have is these two conditions:
Booking.bookingId (+) = t.bookingId AND
Customer.customerId = Booking.customerId AND
Since you're doing an outer join to Booking, its columns will appear as NULL when no match is found; but then your doing a normal join to Customer, so those rows will be eliminated since NULL cannot be equal to anything. You may want to change the second line to an outer join as well.
But, I don't know if that's your primary problem, since I don't actually understand exactly what you're asking. What do you mean by "NULL value of the foreign key"? You haven't specified what your foreign keys are.
To expand on Dave's observation, and to give you an example of SQL92 syntax, please please please learn it and get away from Oracle's own outer join syntax.
FROM
TICKET t
JOIN Performance per
ON per.performance_id = t.performance_id
JOIN Production pro
ON pro.produciton_id = t.production_id
JOIN PRICE pr
ON pr.production_id = pro.production_id
AND pr.levelId = t.level_id
JOIN price_level pl
ON pl.levelid = pr.levelid
LEFT OUTER JOIN booking b
on b.booking_id = t.booking_id
LEFT OUTER JOIN customer cus
on cus.customer_id = b.customer_id
LEFT OUTER JOIN customer_concession cons
ON cons.concession_id = cus.concession_id
LEFT OUTER JOIN memebership m
ON M.membership_id = t.membership_id
LEFT OUTER JOIN membership_concession mc
ON mc.mConcession_id = m.mConcession_id
Order by t.ticketid

Linq-to-entities - Include() method not loading

If I use a join, the Include() method is no longer working, eg:
from e in dc.Entities.Include("Properties")
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e
e.Properties is not loaded
Without the join, the Include() works
Lee
UPDATE: Actually I recently added another Tip that covers this, and provides an alternate probably better solution. The idea is to delay the use of Include() until the end of the query, see this for more information: Tip 22 - How to make include really include
There is known limitation in the Entity Framework when using Include().
Certain operations are just not supported with Include.
Looks like you may have run into one on those limitations, to work around this you should try something like this:
var results =
from e in dc.Entities //Notice no include
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select new {Entity = e, Properties = e.Properties};
This will bring back the Properties, and if the relationship between entity and Properties is a one to many (but not a many to many) you will find that each resulting anonymous type has the same values in:
anonType.Entity.Properties
anonType.Properties
This is a side-effect of a feature in the Entity Framework called relationship fixup.
See this Tip 1 in my EF Tips series for more information.
Try this:
var query = (ObjectQuery<Entities>)(from e in dc.Entities
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e)
return query.Include("Properties")
So what is the name of the navigation property on "Entity" which relates to "Item.Member" (i.e., is the other end of the navigation). You should be using this instead of the join. For example, if "entity" add a property called Member with the cardinality of 1 and Member had a property called Items with a cardinality of many, you could do this:
from e in dc.Entities.Include("Properties")
where e.Member.Items.Any(i => i.Collection.ID == collectionID)
select e
I'm guessing at the properties of your model here, but this should give you the general idea. In most cases, using join in LINQ to Entities is wrong, because it suggests that either your navigational properties are not set up correctly, or you are not using them.
So, I realise I am late to the party here, however I thought I'd add my findings. This should really be a comment on Alex James's post, but as I don't have the reputation it'll have to go here.
So my answer is: it doesn't seem to work at all as you would intend. Alex James gives two interesting solutions, however if you try them and check the SQL, it's horrible.
The example I was working on is:
var theRelease = from release in context.Releases
where release.Name == "Hello World"
select release;
var allProductionVersions = from prodVer in context.ProductionVersions
where prodVer.Status == 1
select prodVer;
var combined = (from release in theRelease
join p in allProductionVersions on release.Id equals p.ReleaseID
select release).Include(release => release.ProductionVersions);
var allProductionsForChosenRelease = combined.ToList();
This follows the simpler of the two examples. Without the include it produces the perfectly respectable sql:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
But with, OMG:
SELECT
[Project1].[Id1] AS [Id],
[Project1].[Id] AS [Id1],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[Id2] AS [Id2],
[Project1].[Status] AS [Status],
[Project1].[ReleaseID] AS [ReleaseID]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent3].[Id] AS [Id2],
[Extent3].[Status] AS [Status],
[Extent3].[ReleaseID] AS [ReleaseID],
CASE WHEN ([Extent3].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [Extent3] ON [Extent1].[Id] = [Extent3].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
) AS [Project1]
ORDER BY [Project1].[Id1] ASC, [Project1].[Id] ASC, [Project1].[C1] ASC
Total garbage. The key point to note here is the fact that it returns the outer joined version of the table which has not been limited by status=1.
This results in the WRONG data being returned:
Id Id1 Name C1 Id2 Status ReleaseID
2 1 Hello World 1 1 2 1
2 1 Hello World 1 2 1 1
Note that the status of 2 is being returned there, despite our restriction. It simply does not work.
If I have gone wrong somewhere, I would be delighted to find out, as this is making a mockery of Linq. I love the idea, but the execution doesn't seem to be usable at the moment.
Out of curiosity, I tried the LinqToSQL dbml rather than the LinqToEntities edmx that produced the mess above:
SELECT [t0].[Id], [t0].[Name], [t2].[Id] AS [Id2], [t2].[Status], [t2].[ReleaseID], (
SELECT COUNT(*)
FROM [dbo].[ProductionVersions] AS [t3]
WHERE [t3].[ReleaseID] = [t0].[Id]
) AS [value]
FROM [dbo].[Releases] AS [t0]
INNER JOIN [dbo].[ProductionVersions] AS [t1] ON [t0].[Id] = [t1].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [t2] ON [t2].[ReleaseID] = [t0].[Id]
WHERE ([t0].[Name] = #p0) AND ([t1].[Status] = #p1)
ORDER BY [t0].[Id], [t1].[Id], [t2].[Id]
Slightly more compact - weird count clause, but overall same total FAIL.
Has anybody actually ever used this stuff in a real business application? I'm really starting to wonder...
Please tell me I've missed something obvious, as I really want to like Linq!
Try the more verbose way to do more or less the same thing obtain the same results, but with more datacalls:
var mydata = from e in dc.Entities
join i in dc.Items
on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e;
foreach (Entity ent in mydata) {
if(!ent.Properties.IsLoaded) { ent.Properties.Load(); }
}
Do you still get the same (unexpected) result?
EDIT: Changed the first sentence, as it was incorrect. Thanks for the pointer comment!

Linq To Entity Framework selecting whole tables

I have the following Linq statement:
(from order in Orders.AsEnumerable()
join component in Components.AsEnumerable()
on order.ORDER_ID equals component.ORDER_ID
join detail in Detailss.AsEnumerable()
on component.RESULT_ID equals detail.RESULT_ID
where orderRestrict.ORDER_MNEMONIC == "MyOrderText"
select new
{
Mnemonic = detail.TEST_MNEMONIC,
OrderID = component.ORDER_ID,
SeqNumber = component.SEQ_NUM
}).ToList()
I expect this to put out the following query:
select *
from Orders ord (NoLock)
join Component comp (NoLock)
on ord .ORDER_ID = comp.ORDER_ID
join Details detail (NoLock)
on comp.RESULT_TEST_NUM = detail .RESULT_TEST_NUM
where res.ORDER_MNEMONIC = 'MyOrderText'
but instead I get 3 seperate queries that select all rows from the tables. I am guessing that Linq is then filtering the values because I do get the correct values in the end.
The problem is that it takes WAY WAY too long because it is pulling down all the rows from all three tables.
Any ideas how I can fix that?
Remove the .AsEnumerable()s from the query as these are preventing the entire query being evaluated on the server.

Resources