LINQ - Subquery or LEFT OUTER JOIN? - linq

I am trying to optimize my LINQ query performance, and I've noticed a lot of LEFT OUTER JOINs being generated. I know that in SQL, there are some cases in which a single row subquery works better than the equivalent LEFT OUTER JOIN.
For example:
Query 1:
select f.FacilityName, p.Id PatientId, u.DOB, (select u2.NameComputed from adm.Staffs s inner join dbo.AspNetUsers u2 on s.UserId = u2.Id where s.Id = p.StaffId) AssignedTo
from ptn.Patients p
inner join dbo.AspNetUsers u on p.UserId = u.Id
inner join hco.Hcos f on p.HcoId = f.Id
where p.IsRemoved = 0
Query 2:
select f.FacilityName, p.Id PatientId, u.DOB, u2.NameComputed
from ptn.Patients p
inner join dbo.AspNetUsers u on p.UserId = u.Id
inner join hco.Hcos f on p.HcoId = f.Id
left outer join adm.Staffs s on p.StaffId = s.Id
left outer join dbo.AspNetUsers u2 on s.UserId = u2.Id
where p.IsRemoved = 0
Query 1 takes less than a second. Query 2 takes about 45 seconds. If I wanted to make sure LINQ is structured in such a way as to take advantage of this in some cases, how would I go about doing that? That is, is there a way to write a LINQ statement that produces a subquery instead of a LEFT OUTER JOIN?

Related

What is the purpose of (+) operator in a where clause, other than outer joins, in Oracle SQL?

I have some very old Oracle SQL code I need to review, as per below and am trying to understand what the (+) operator is doing in the where clause after the first use of it
select *
from table_a a,
table b b
where
a.id = b.id (+)
and b.seq_nb (+) = 1
and b.type_cd (+) = 'DOLLR'
I thought (+) was a outer join equivalent, so
from table_a a,
table b b
where
a.id = b.id (+)
would be the same as
from table a a left outer join table b b on a.id=b.id
so how can you have outer joins to hard coded variables as below?
b.seq_nb (+) = 1
and b.type_cd (+) = 'DOLLR'
Any help would be greatly appreciated, thank you!
It's the same as:
select *
from table_a a
left outer join table_b b
on a.id = b.id
and b.type_cd = 'DOLLR'
and b.seq_nb = 1
Sometimes also referred to as a "filtered outer join".
It is equivalent to an outer join with a derived table:
select *
from table_a a
left outer join (
select *
from table_b
where b.type_cd = 'DOLLR'
and b.seq_nb = 1
) b on a.id = b.id

Hive - how to reuse a sub-query in hive with optimal performance

What is the best way to structure/write a query in Hive when I have a complex sub-query that is repeated multiple times throughout the select statement?
I originally created a temporary table for the sub-query which was refreshed before each run. Then I began to use a CTE as part of the original query (discarding the temp table) for readability and noticed degraded performance. This made me curious about which implementation methods are best with respect to performance when needing to reuse sub-queries.
The data I am working with contains upwards of 10 million records. Below is an example of the query I wrote that made use of a CTE.
with temp as (
select
a.id,
x.type,
y.response
from sandbox.tbl_form a
left outer join sandbox.tbl_formStatus b
on a.id = b.id
left outer join sandbox.tbl_formResponse y
on b.id = y.id
left outer join sandbox.tbl_formType x
on y.id = x.typeId
where b.status = 'Completed'
)
select
a.id,
q.response as user,
r.response as system,
s.response as agent,
t.response as owner
from sandbox.tbl_form a
left outer join (
select * from temp x
where x.type= 'User'
) q
on a.id = q.id
left outer join (
select * from temp x
where x.type= 'System'
) r
on a.id = r.id
left outer join (
select * from temp x
where x.type= 'Agent'
) s
on a.id = s.id
left outer join (
select * from temp x
where x.type= 'Owner'
) t
on a.id = t.id;
There are issues in your query.
1) In the CTE you have three left joins without ON clause. This may cause serious performance problems because joins without ON clause are CROSS JOINS.
2) BTW where b.status = 'Completed' clause converts LEFT join with table b to the inner join though still without ON clause it multiplicates all records from a by all records from b with a where.
3) Most probably you do not need CTE at all. Just join correctly with ON clause and use case when type='User' then response end + aggregate using min() or max() by id:
select a.id
max(case when x.type='User' then y.response end) as user,
max(case when x.type='System' then y.response end) as system,
...
from sandbox.tbl_form a
left outer join sandbox.tbl_formStatus b
on a.id = b.id
left outer join sandbox.tbl_formResponse y
on b.id = y.id
left outer join sandbox.tbl_formType x
on y.id = x.typeId
where b.status = 'Completed' --if you want LEFT JOIN add --or b.status is null
group by a.id

Doing a Left outer join in a linq query

I have the query below that works in sqllite with a left outer join
select * from customer cu
inner join contract cnt on cu.customerId = cnt.customerid
inner join address addy on cu.addressid = addy.addressId
inner join csrAssoc cassc on cu.customerid = cassc.customerId
left outer join CustomerServiceRepresentative csrr on cassc.csrid = csrr.customerservicerepresentativeId
inner join customerServiceManager csmm on cassc.csmid = csmm.customerservicemanagerId
where cu.customernumber = '22222234'
I want to be able to apply a left outer join on this line in the linq query below
join csrr in objCsrCustServRep.AsEnumerable() on cassc.CsrId equals
csrr.CustomerServiceRepresentativeId
VisitRepData = (from cu in objCustomer.AsEnumerable()
join cnt in objContract.AsEnumerable() on cu.customerId equals cnt.customerId
join addy in objAddress.AsEnumerable() on cu.addressId equals addy.addressId
join cassc in objCsrAssoc.AsEnumerable() on cu.customerId equals cassc.CustomerId
join csrr in objCsrCustServRep.AsEnumerable() on cassc.CsrId equals
csrr.CustomerServiceRepresentativeId
join csmm in objCustServMan on cassc.CsmId.ToString() equals csmm.customerServiceManagerId
where cu.CustomerNumber == (customernbr)
How can I do a left outer join in a linq query?
Here is my comment after adjusting and running the code. The other section is also added. All am getting is object is not set to an instance of an object.
var VisitRepData = from cu in objCustomer.AsEnumerable()
join cnt in objContract.AsEnumerable() on cu.customerId equals cnt.customerId
join addy in objAddress.AsEnumerable() on cu.addressId equals addy.addressId
join cassc in objCsrAssoc.AsEnumerable() on cu.customerId equals cassc.CustomerId
join csrr in objCsrCustServRep.AsEnumerable() on cassc.CsrId equals
csrr.CustomerServiceRepresentativeId into temp
from tempItem in temp.DefaultIfEmpty()
join csmm in objCustServMan on cassc.CsmId.ToString() equals csmm.customerServiceManagerId
where cu.CustomerNumber == (customernbr)
select new
{
cu.customerId,
cu.CustomerNumber,
cu.customerName,
cu.dateActive,
cnt.contractExpirationDate,
addy.street,
addy.street2,
addy.city,
addy.state,
addy.zipcode,
cu.EMail,
cu.phoneNo,
cu.faxNumber,
csmm.customerServiceManagerName,
tempItem.CustomerServiceRepresentativeName,
};
foreach (var item in VisitRepData)
{
var one = item.customerId;
var two = item.CustomerNumber;
}
Per documentation:
A left outer join is a join in which each element of the first collection is returned, regardless of whether it has any correlated elements in the second collection. You can use LINQ to perform a left outer join by calling the DefaultIfEmpty method on the results of a group join
(emphasis mine)
Based on MSDN link above, and if I understood your requirements correctly, query should look like this:
VisitRepData = from cu in objCustomer.AsEnumerable()
join cnt in objContract.AsEnumerable() on cu.customerId equals cnt.customerId
join addy in objAddress.AsEnumerable() on cu.addressId equals addy.addressId
join cassc in objCsrAssoc.AsEnumerable() on cu.customerId equals cassc.CustomerId
join csrr in objCsrCustServRep.AsEnumerable() on cassc.CsrId equals
csrr.CustomerServiceRepresentativeId into temp
from tempItem in temp.DefaultIfEmpty()
join csmm in objCustServMan on cassc.CsmId.ToString() equals csmm.customerServiceManagerId
where cu.CustomerNumber == (customernbr)
Specifically, the left outer join is performed with this code:
join csrr in objCsrCustServRep.AsEnumerable()
on cassc.CsrId equals csrr.CustomerServiceRepresentativeId
into temp
from tempItem in temp.DefaultIfEmpty()

LEFT OUTER JOIN WHEN IT HAS MULTIPLE TABLE SELECT QUERY

Currently I have joined two tables using inner join , like following
SELECT A.*,B.*
FROM A,B
WHERE A.COLUMN_A = B.COLUMN_B
now I want to join Left outer join to above results , lets say I want to join Table C
So I did like following
SELECT A.*,B.*
FROM A,B
LEFT OUTER JOIN C ON B.COLUMN_X = C.COLUMN_X
WHERE A.COLUMN_A = B.COLUMN_B
this is executing without errors in SQL navigator, But in this result I cannot see any output.
anything wrong in this query , please advise
Change it to have proper join syntax like
SELECT A.*,B.*
FROM A
INNER JOIN B ON A.COLUMN_A = B.COLUMN_B
LEFT OUTER JOIN C ON B.COLUMN_X = C.COLUMN_X;
Better change all to outer join
SELECT A.*,B.*
FROM A
LEFT JOIN B ON A.COLUMN_A = B.COLUMN_B
LEFT OUTER JOIN C ON B.COLUMN_X = C.COLUMN_X;
Use this
SELECT A.*,B.*,C.*
FROM A
INNER JOIN B
ON A.COLUMN_A = B.COLUMN_B
LEFT OUTER JOIN C
ON B.COLUMN_X = C.COLUMN_X
If you absolutely have to use legacy syntax, then use this. But I won't recommend it.
SELECT A.*,B.*,C.*
FROM A,B,C
where A.COLUMN_A = B.COLUMN_B
AND
B.COLUMN_X = C.COLUMN_X (+)

How to write multiple outer left join in linq?

I am trying to outer join multiple tables. Here I have pasted my code what I am trying to do.
from o in entities.O
join p in entities.P on o.PID equals p.ID
join dC in entities.C on o.DCID equals dC.ID
join hC in entities.C on o.HCID equals hC.ID
join e in entities.E on pat.AN equals e.UID
join oT in entities.OT on o.OT equals oT.ID
join plt in entities.PLT on p.LT equals plt.ID
join puO in entities.PUO on o.PUTId equals puO.ID into pots
from x in pots.DefaultIfEmpty()
join op in entities.OP on o.OPID equals op.ID
join prt in entities.PatientRelationshipTypes on pat.PRTId equals prt.ID
join secPRT in entities.PRT on pat.SRTId equals secPRT.ID
join patDiagCode in entities.PDiagnosisCodes on pat.ID equals patDiagCode.PID
//from y in pdc.DefaultIfEmpty()
join diagCode in entities.DCodes on patDiagCode.CodeID equals diagCode.ID into dc
from z in dc.DefaultIfEmpty()
where o.ID == orderId
//from x in pots.DefaultIfEmpty()
//from y in pdc.DefaultIfEmpty()
...
...
How do I write correct syntax for multiple outer join in Linq?
Thanks in advance

Resources