Using LAG to Find Previous Value in Oracle - oracle

I'm trying to use the LAG function in Oracle to find the previous registration value for donors.
To see my data, I started with this query to find all registrations for the particular donor:
select registration_id, registration_date from registration r where r.person_id=52503290 order by r.registration_date desc;
Then I used the LAG function to return the previous value along with the most recent value:
select registration_id as reg_id, registration_date as reg_date,
lag(registration_date,1) over (order by registration_date) as prev_reg_date
from registration
where person_id=52503290
order by registration_date desc;
And the results are as expected:
So I thought I should be good to place the LAG function within the main query to get the previous value but for some reason, the previous value returns NULL or no value at all.
SELECT
P.Person_Id AS Person_ID,
R.Registration_Date AS Drive_Date,
LAG(R.Registration_Date,1) OVER (ORDER BY R.REGISTRATION_DATE) AS Previous_Drive_Date,
P.Abo AS Blood_Type,
DT.Description AS Donation_Type
FROM
Person P
JOIN Registration R ON P.Person_Id = R.Person_Id AND P.First_Name <> 'Pooled' AND P.First_Name <> 'IMPORT'
LEFT OUTER JOIN Drives DR ON R.Drive_Id = DR.Drive_Id AND DR.Group_Id <> 24999
LEFT OUTER JOIN Branches B ON R.Branch_Id = B.Branch_Id
LEFT OUTER JOIN Donor_Group DG on DR.Group_Id = DG.Group_Id
LEFT OUTER JOIN Donation_Type DT ON R.Donation_Type_Id = DT.DONATION_TYPE_ID
WHERE
TRUNC(R.Registration_Date) = TRUNC(SYSDATE)-1
AND R.Person_Id=52503290
ORDER BY
R.Registration_Date DESC;
Here is the result set:
Any suggestions on what I am missing here? Or why this query isn't returning the values expected?
Based on #Alex Poole's suggestions, I changed the query to look like:
SELECT * FROM (
SELECT
P.Person_Id AS Person_ID,
R.Registration_Date AS Drive_Date,
LAG(R.Registration_Date,1) OVER (partition by p.person_id ORDER BY r.registration_date) AS Previous_Drive_Date,
P.Abo AS Blood_Type,
DT.Description AS Donation_Type
FROM
Person P
JOIN Registration R ON P.Person_Id = R.Person_Id AND P.First_Name <> 'Pooled' AND P.First_Name <> 'IMPORT'
LEFT OUTER JOIN Drives DR ON R.Drive_Id = DR.Drive_Id AND DR.Group_Id <> 24999
LEFT OUTER JOIN Branches B ON R.Branch_Id = B.Branch_Id
LEFT OUTER JOIN Donor_Group DG on DR.Group_Id = DG.Group_Id
LEFT OUTER JOIN Donation_Type DT ON R.Donation_Type_Id = DT.DONATION_TYPE_ID
--WHERE R.Person_Id=52503290
)
WHERE TRUNC(Drive_Date) = TRUNC(SYSDATE)-1
ORDER BY Drive_Date DESC;
It takes about 85 seconds to pull back the first 30 rows:
My original query (before the LAG function was added) took about 2 seconds to pull back approximately 2100 records. But it was nothing but a SELECT with a couple of JOINS and one item in the WHERE clause.
Looking at the record counts, Person has almost 5.5 million records and Registration has 9.1 million records.

The lag is only applied within the rows that match the where clause filter, so you would only see the previous value if that was also yesterday.
You can apply the lag in a subquery, and then filter in an outer query:
SELECT * FROM (
SELECT
P.Person_Id AS Person_ID,
R.Registration_Date AS Drive_Date,
LAG(R.Registration_Date,1) OVER (ORDER BY R.REGISTRATION_DATE) AS Previous_Drive_Date,
P.Abo AS Blood_Type,
DT.Description AS Donation_Type
FROM
Person P
JOIN Registration R ON P.Person_Id = R.Person_Id AND P.First_Name <> 'Pooled' AND P.First_Name <> 'IMPORT'
LEFT OUTER JOIN Drives DR ON R.Drive_Id = DR.Drive_Id AND DR.Group_Id <> 24999
LEFT OUTER JOIN Branches B ON R.Branch_Id = B.Branch_Id
LEFT OUTER JOIN Donor_Group DG on DR.Group_Id = DG.Group_Id
LEFT OUTER JOIN Donation_Type DT ON R.Donation_Type_Id = DT.DONATION_TYPE_ID
WHERE R.Person_Id=52503290
)
WHERE TRUNC(Drive_Date) = TRUNC(SYSDATE)-1
ORDER BY Drive_Date DESC;

Related

Hive - how to reuse a sub-query in hive with optimal performance

What is the best way to structure/write a query in Hive when I have a complex sub-query that is repeated multiple times throughout the select statement?
I originally created a temporary table for the sub-query which was refreshed before each run. Then I began to use a CTE as part of the original query (discarding the temp table) for readability and noticed degraded performance. This made me curious about which implementation methods are best with respect to performance when needing to reuse sub-queries.
The data I am working with contains upwards of 10 million records. Below is an example of the query I wrote that made use of a CTE.
with temp as (
select
a.id,
x.type,
y.response
from sandbox.tbl_form a
left outer join sandbox.tbl_formStatus b
on a.id = b.id
left outer join sandbox.tbl_formResponse y
on b.id = y.id
left outer join sandbox.tbl_formType x
on y.id = x.typeId
where b.status = 'Completed'
)
select
a.id,
q.response as user,
r.response as system,
s.response as agent,
t.response as owner
from sandbox.tbl_form a
left outer join (
select * from temp x
where x.type= 'User'
) q
on a.id = q.id
left outer join (
select * from temp x
where x.type= 'System'
) r
on a.id = r.id
left outer join (
select * from temp x
where x.type= 'Agent'
) s
on a.id = s.id
left outer join (
select * from temp x
where x.type= 'Owner'
) t
on a.id = t.id;
There are issues in your query.
1) In the CTE you have three left joins without ON clause. This may cause serious performance problems because joins without ON clause are CROSS JOINS.
2) BTW where b.status = 'Completed' clause converts LEFT join with table b to the inner join though still without ON clause it multiplicates all records from a by all records from b with a where.
3) Most probably you do not need CTE at all. Just join correctly with ON clause and use case when type='User' then response end + aggregate using min() or max() by id:
select a.id
max(case when x.type='User' then y.response end) as user,
max(case when x.type='System' then y.response end) as system,
...
from sandbox.tbl_form a
left outer join sandbox.tbl_formStatus b
on a.id = b.id
left outer join sandbox.tbl_formResponse y
on b.id = y.id
left outer join sandbox.tbl_formType x
on y.id = x.typeId
where b.status = 'Completed' --if you want LEFT JOIN add --or b.status is null
group by a.id

How can we use partition in Oracle query

I have query in which i used partition of avoid the duplicate value for particular column , but still it is giving duplicate row below i am mention my query in which i used partition
SELECT iol.M_product_id as faultyProduct , iol.SERIALNO,iol.M_product_id as newproduct, ma.Description,
mp.M_Product_category_id ,mi.issotrx, co.C_BPartner_ID,
ROW_NUMBER() OVER(PARTITION BY ma.Description ORDER BY iol.M_product_id DESC) rn
FROM M_inoutline iol
inner join M_inout mi ON (iol.m_inout_id = mi.m_inout_id)
inner join C_Order co ON (co.c_order_id = mi.c_order_id )
inner Join M_AttributeSetInstance ma ON (ma.m_attributesetinstance_id =iol.m_attributesetinstance_id)
inner join M_Product mp ON (mp.m_product_id = iol.m_product_id)
where mp.m_product_category_id= 1000447 AND mi.issotrx = 'Y';
Please help me out
For me it looks you want to do:
select * from (/*YOUR QUERY*/) where rn = 1;

How to left join with conditions in Toad Data Point Query Builder?

I'm trying to build a query in Toad Data Point. I have a subquery that has a row number to identify the records I'm interested in. This subquery needs to be left joined onto the main table only when the row number is 1. Here's the query I'm trying to visualize:
SELECT distinct E.EMPLID, E.ACAD_CAREER
FROM PS_STDNT_ENRL E
LEFT JOIN (
SELECT ACAD_CAREER, ROW_NUMBER() OVER (PARTITION BY ACAD_CAREER ORDER BY EFFDT DESC) as RN
FROM PS_ACAD_CAR_TBL
) T on T.ACAD_CAREER = E.ACAD_CAREER and RN = 1
When I try to replicate this, the row number condition is placed in the global WHERE clause. This is not the intended functionality because it removes any records that don't have a match in the subquery effectively making it an inner join.
Here is the query it's generating:
SELECT DISTINCT E.EMPLID, E.ACAD_CAREER, T.RN
FROM SYSADM.PS_STDNT_ENRL E
LEFT OUTER JOIN
(SELECT PS_ACAD_CAR_TBL.ACAD_CAREER,
ROW_NUMBER ()
OVER (PARTITION BY ACAD_CAREER ORDER BY EFFDT DESC)
AS RN
FROM SYSADM.PS_ACAD_CAR_TBL PS_ACAD_CAR_TBL) T
ON (E.ACAD_CAREER = T.ACAD_CAREER)
WHERE (T.RN = 1)
Is there a way to get the query builder to place that row number condition on the left join instead of the global WHERE clause?
I found a way to get this to work.
Add a calculated field to the main table with a value of 1.
Join the row number to this new calculated field.
Now the query has the filter in the join condition instead of the WHERE clause so that it joins as intended. Here is the query it made:
SELECT DISTINCT E.EMPLID, E.ACAD_CAREER, T.RN
FROM SYSADM.PS_STDNT_ENRL E
LEFT OUTER JOIN
(SELECT PS_ACAD_CAR_TBL.ACAD_CAREER,
ROW_NUMBER ()
OVER (PARTITION BY ACAD_CAREER ORDER BY EFFDT DESC)
AS RN
FROM SYSADM.PS_ACAD_CAR_TBL PS_ACAD_CAR_TBL) T
ON (E.ACAD_CAREER = T.ACAD_CAREER) AND (1 = T.RN)

Case based Joins in oracle query

I have below query which returns all the data from inner query including one more field i.e. ctry from tbl_account table.I have used
LEFT JOIN so that even though a_row_id not matched with row_id from tbl_account so that i would still get all the matched data from inner query.
select b.* ,sac.ctry as country from
(SELECT DISTINCT rec.* ,asset.a_row_id
FROM tbl_record rec LEFT JOIN tbl_asset asset
ON asset.from_sn = rec.to_sn
AND asset.from_name = rec.to_productname
AND asset.from_rel = rec.to_rel
LEFT JOIN tbl_country mas
ON rec.loc = mas.a_loc
WHERE rec.cust_id = 2456 ) b
LEFT JOIN tbl_account sac on b.a_row_id = sac.row_id ;
But,now i need to implement case based join in the above query i.e. when a_row_id is not null then inner join with one extra condition will be used and when a_row_id is null left join will be used.
I have tried using CASE statement as below but it is not working and cost of the query is very high as well and I suppose it is because of CASE statement.The
data in all the tables are in millions.
select b.* ,sac.ctry as country from
(SELECT DISTINCT rec.* ,asset.a_row_id
FROM tbl_record rec LEFT JOIN tbl_asset asset
ON asset.from_sn = rec.to_sn
AND asset.from_name = rec.to_productname
AND asset.from_rel = rec.to_rel
LEFT JOIN tbl_country mas
ON rec.loc = mas.a_loc
WHERE rec.cust_id = 2456 ) b
INNER JOIN tbl_account sac
ON CASE
WHEN b.a_row_id IS NOT NULL AND b.a_row_id = sac.row_id and b.from_cn = SAC.to_cn THEN 1
END = 1
LEFT JOIN tbl_account sac
ON CASE
WHEN b.a_row_id IS NULL AND b.a_row_id = sac.row_id THEN 1
END = 1 ;
Is there any other way that i can implement case based joins condition in above oracle query and at the same time cost of query would be less.Is it possible to use decode in this case ? Any help on this would be greatly appreciated.
This syntax is probably what you are looking for:
select b.*, sac.ctry as country
from b
left join tbl_account sac
on b.a_row_id = sac.row_id or (b.a_row_id is null and sac.row_id is null)
where b.from_cn = sac.to_cn or b.a_row_id is null
SQLFiddle

Linq to entities query adding inner join instead of left join

I'd like to know why INNER JOINs are generated instead of LEFT and why the whole view is selected before join instead of just adding LEFT JOIN view.
I'm trying to post a table of information which is spread out over several tables. Basically I want to search by the date and return all the information for events happening today, yesterday, this month - whatever the user selects. The query is quite long. I added DefaultIfEmpty to all the tables except the main one in an attempt to get LEFT JOINs but it just made a mess.
using (TransitEntities t = new TransitEntities())
{
var charters = from c in t.tblCharters
join v in t.tblChartVehicles.DefaultIfEmpty()
on c.Veh
equals v.ChartVehID
join n in t.tblNACharters.DefaultIfEmpty()
on c.Dpt.Substring(c.Dpt.Length - 1)
equals SqlFunctions.StringConvert((double)n.NAID)
join r in t.tblChartReqs.DefaultIfEmpty()
on c.ChartReqID
equals r.ChartReqID
join f in t.tblCharterCustomers.DefaultIfEmpty()
on c.Dpt
equals (f.DptID == "NONAFF" ? SqlFunctions.StringConvert((double)f.CustID) : f.DptID)
join d in t.tblChartReqDocs.DefaultIfEmpty()
on c.Attach
equals SqlFunctions.StringConvert((double)d.DocID)
join s in t.tblChartSupAttaches.DefaultIfEmpty()
on c.SupAttach
equals SqlFunctions.StringConvert((double)s.DocID)
join p in (from e in t.v_EmpData select new {e.UIN, e.First, e.Last}).DefaultIfEmpty()
on c.TakenUIN
equals p.UIN
where c.BeginTime > EntityFunctions.AddYears(DateTime.Now,-1)
select new
{
ChartID = c.ChartID,
Status = c.Status,
...
Website = r.Website,
};
//select today's events
gvCharters.DataSource = charters.Where(row => (row.BeginTime.Value >= midnight && row.BeginTime.Value < midnight1));
This results in very convoluted SQL:
SELECT
[Extent1].[ChartID] AS [ChartID],
[Extent1].[Status] AS [Status],
...
[Join5].[Website] AS [Website],
FROM [dbo].[tblCharters] AS [Extent1]
INNER JOIN (SELECT [Extent2].[ChartVehID] AS [ChartVehID], [Extent2].[Descr] AS [Descr]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN [dbo].[tblChartVehicles] AS [Extent2] ON 1 = 1 ) AS [Join1] ON ([Extent1].[Veh] = [Join1].[ChartVehID]) OR (([Extent1].[Veh] IS NULL) AND ([Join1].[ChartVehID] IS NULL))
INNER JOIN (SELECT [Extent3].[NAID] AS [NAID], [Extent3].[Descr] AS [Descr]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]
LEFT OUTER JOIN [dbo].[tblNACharter] AS [Extent3] ON 1 = 1 ) AS [Join3] ON ((SUBSTRING([Extent1].[Dpt], ((LEN([Extent1].[Dpt])) - 1) + 1, (LEN([Extent1].[Dpt])) - ((LEN([Extent1].[Dpt])) - 1))) = (STR( CAST( [Join3].[NAID] AS float)))) OR ((SUBSTRING([Extent1].[Dpt], ((LEN([Extent1].[Dpt])) - 1) + 1, (LEN([Extent1].[Dpt])) - ((LEN([Extent1].[Dpt])) - 1)) IS NULL) AND (STR( CAST( [Join3].[NAID] AS float)) IS NULL))
INNER JOIN (SELECT [Extent4].[ChartReqID] AS [ChartReqID], [Extent4].[Event] AS [Event], [Extent4].[ContactName] AS [ContactName], [Extent4].[ContactPhone] AS [ContactPhone], [Extent4].[Website] AS [Website]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]
LEFT OUTER JOIN [dbo].[tblChartReq] AS [Extent4] ON 1 = 1 ) AS [Join5] ON ([Extent1].[ChartReqID] = [Join5].[ChartReqID]) OR (([Extent1].[ChartReqID] IS NULL) AND ([Join5].[ChartReqID] IS NULL))
INNER JOIN (SELECT [Extent5].[CustID] AS [CustID], [Extent5].[Dpt] AS [Dpt], [Extent5].[DptID] AS [DptID]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]
LEFT OUTER JOIN [dbo].[tblCharterCustomers] AS [Extent5] ON 1 = 1 ) AS [Join7] ON ([Extent1].[Dpt] = (CASE WHEN (N'NONAFF' = [Join7].[DptID]) THEN STR( CAST( [Join7].[CustID] AS float)) ELSE [Join7].[DptID] END)) OR (([Extent1].[Dpt] IS NULL) AND (CASE WHEN (N'NONAFF' = [Join7].[DptID]) THEN STR( CAST( [Join7].[CustID] AS float)) ELSE [Join7].[DptID] END IS NULL))
INNER JOIN (SELECT [Extent6].[DocID] AS [DocID], [Extent6].[FileName] AS [FileName]
FROM ( SELECT 1 AS X ) AS [SingleRowTable5]
LEFT OUTER JOIN [dbo].[tblChartReqDocs] AS [Extent6] ON 1 = 1 ) AS [Join9] ON ([Extent1].[Attach] = (STR( CAST( [Join9].[DocID] AS float)))) OR (([Extent1].[Attach] IS NULL) AND (STR( CAST( [Join9].[DocID] AS float)) IS NULL))
INNER JOIN (SELECT [Extent7].[DocID] AS [DocID], [Extent7].[FileName] AS [FileName]
FROM ( SELECT 1 AS X ) AS [SingleRowTable6]
LEFT OUTER JOIN [dbo].[tblChartSupAttach] AS [Extent7] ON 1 = 1 ) AS [Join11] ON ([Extent1].[SupAttach] = (STR( CAST( [Join11].[DocID] AS float)))) OR (([Extent1].[SupAttach] IS NULL) AND (STR( CAST( [Join11].[DocID] AS float)) IS NULL))
INNER JOIN (SELECT [Extent8].[First] AS [First], [Extent8].[Last] AS [Last], [Extent8].[UIN] AS [UIN]
FROM ( SELECT 1 AS X ) AS [SingleRowTable7]
LEFT OUTER JOIN (SELECT
[v_EmpData].[First] AS [First],
[v_EmpData].[Last] AS [Last],
[v_EmpData].[Legal] AS [Legal],
[v_EmpData].[Name] AS [Name],
[v_EmpData].[Email] AS [Email],
[v_EmpData].[UIN] AS [UIN],
[v_EmpData].[UserNM] AS [UserNM],
[v_EmpData].[Worker] AS [Worker],
[v_EmpData].[SUPERVISORNUM] AS [SUPERVISORNUM],
[v_EmpData].[Supervisor] AS [Supervisor],
[v_EmpData].[EmpArea] AS [EmpArea],
[v_EmpData].[Title] AS [Title],
[v_EmpData].[FullName] AS [FullName],
[v_EmpData].[HireDate] AS [HireDate],
[v_EmpData].[WORKERTYPENM] AS [WORKERTYPENM],
[v_EmpData].[Birth] AS [Birth],
[v_EmpData].[HOMESTREET] AS [HOMESTREET],
[v_EmpData].[HOMECITY] AS [HOMECITY],
[v_EmpData].[HOMEZIP] AS [HOMEZIP],
[v_EmpData].[HOMESTATE] AS [HOMESTATE],
[v_EmpData].[PicID] AS [PicID],
[v_EmpData].[WorkPhone] AS [WorkPhone],
[v_EmpData].[HomePhone] AS [HomePhone],
[v_EmpData].[WorkCellPhone] AS [WorkCellPhone]
FROM [dbo].[v_EmpData] AS [v_EmpData]) AS [Extent8] ON 1 = 1 ) AS [Join13] ON ([Extent1].[TakenUIN] = [Join13].[UIN]) OR (([Extent1].[TakenUIN] IS NULL) AND ([Join13].[UIN] IS NULL))
WHERE ([Extent1].[BeginTime] > (DATEADD (year, -1, SysDateTime())))
AND ('C' <> [Extent1].[Status])
AND ([Extent1].[BeginTime] >= '11/28/2012 12:00:00 AM')
AND ([Extent1].[BeginTime] < '11/29/2012 12:00:00 AM')
This is what my original SQL query looked like and what I was hoping it would be closer to:
SELECT
ChartID,
c.Status,
...
r.Website As Website,
FROM tblChartersNew c
LEFT JOIN (SELECT [Dpt],[DptID] FROM [DRVRDiscipline].[dbo].[tblCharterCustomers] Where Valid=1 and DptID <> 'NONAFF' UNION SELECT Dpt, CONVERT(nvarchar,CustID) AS DptID FROM [DRVRDiscipline].[dbo].[tblCharterCustomers] Where Valid=1 and DptID = 'NONAFF') f
ON RTRIM(c.Dpt) = f.DptID LEFT JOIN [tskronos].WfcSuite.dbo.VP_ALLPERSONV42 p ON p.PersonNUM = c.TakenUIN
LEFT JOIN tblChartVehicles v ON v.ChartVehID = c.Veh
LEFT JOIN tblNACharter n ON CAST(n.NAID AS varchar) = RIGHT(c.Dpt, LEN(c.Dpt)-1)
LEFT JOIN tblChartReq r
ON r.ChartReqID = c.ChartReqID
WHERE CONVERT(datetime,CONVERT(char(10),c.BeginTime,101)) = (SELECT TOP 1 CONVERT(datetime,CONVERT(char(10),BeginTime,101)) from tblChartersNew WHERE CONVERT(datetime,CONVERT(char(10),BeginTime,101)) >= CONVERT(datetime,CONVERT(char(10),GETDATE(),101)) ORDER BY BeginTime)
AND NOT c.ChartReqID IS NULL
ORDER BY BeginTime, ISNULL(f.Dpt,c.Dpt)
I also add a Select New on the view to avoid selecting all of the columns when I only need three but it didn't seem to make a difference. Instead of adding LEFT JOIN v_EmpData it adds LEFT OUTER JOIN and then selects all of the columns in the view. It seems to be ignoring the Select New.
I'd really like to transition to using Linq to Entities for the majority of my queries because intellisense makes it so much easier to make sure it's right and to have variations of queries without having to have separate functions for each but maybe I need to stick with plain old SQL. I know just enough to make a big mess. Any suggestions?
For complex queries like what you need.
I would suggest looking into FunctionImport.
MSDN Function Import
This would save you the headache of creating a LINQ that would be 1:1 to your expected generated SQL.

Resources