How to consolidate overlap date in single - oracle

How would I get one row out of two overlaping dates rows from the same table for same id. I have more then 50000 records.
I have following sample data.
ID start_dt end_dt division
1212 04/01/2006 03/01/2007 second
1212 05/01/2009 01/01/2010 second
1212 04/01/2006 03/01/2008 second --- This should be selected as longest timeframe
1212 09/03/2007 03/01/2008 third
1213 05/03/2005 04/11/2009 second
1214 07/03/2007 03/01/2008 third
And the data I should get as following.
ID start_dt end_dt division
1212 04/01/2006 03/01/2008 second
1212 05/01/2009 01/01/2010 second
1213 05/03/2005 04/11/2009 second
1214 07/03/2007 03/01/2008 third
Thank you.
Ramu

Now that I understand your issue, just subtract the 2 dates to determine the time frame difference:
SELECT S.Id, S.Start_dt, S.End_dt, S.Division
FROM Sample S
JOIN (
SELECT S.Id, Max(S.end_dt-S.start_dt) as timeframe
FROM Sample S
GROUP BY S.Id ) S2 ON S.Id = S2.Id
AND S.end_dt-S.start_dt = s2.timeframe
Here is the Fiddle.
Good luck.

Related

Get closest date with id and value Oracle

I ran into a problem and maybe there are experienced guys here to help me figure it out:
I have a table with rows:
ID
VALUE
DATE
2827
0
20.07.2022 10:40:01
490
27432
20.07.2022 10:40:01
565
189
20.07.2022 9:51:03
200
1
20.07.2022 9:50:01
731
0.91
20.07.2022 9:43:21
161
13004
19.07.2022 16:11:01
This table has a million records, there are about 1000 ID instances, only the date of the value change and, therefore, the value itself changes in them.
When the value of the ID changes is added to this table:
ID | Tme the value was changed (DATE) | VALUE
My task is to get the all id's values closest to the input date.
I mean: if I input date "20.07.2022 10:00:00"
I want to get each ID (1-1000) with rows "value, date" with last date before "20.07.2022 10:00:00":
ID
VALUE
DATE
2827
0
20.07.2022 9:59:11
490
27432
20.07.2022 9:40:01
565
189
20.07.2022 9:51:03
200
1
20.07.2022 9:50:01
731
0.91
20.07.2022 8:43:21
161
13004
19.07.2022 16:11:01
What query will be the most optimal and correct in this case?
If you want the data for each ID with the latest change up to, but not after, your input date then you can just filter on that date, and use aggregate functions to get the most recent data in that filtered range:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= <your input date>
group by id
With your previous sample data, using midnight this morning as the input date would give:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= timestamp '2022-07-28 00:00:00'
group by id
order by id
ID
CHANGE_TIME
VALUE
1
2022-07-24 10:00:00
900
2
2022-07-22 21:51:00
422
3
2022-07-24 13:01:00
1
4
2022-07-24 10:48:00
67
and using midday today woudl give:
select id,
max(change_time) as change_time,
max(value) keep (dense_rank last order by change_time) as value
from your_table
where change_time <= timestamp '2022-07-28 12:00:00'
group by id
order by id
ID
CHANGE_TIME
VALUE
1
2022-07-24 10:00:00
900
2
2022-07-22 21:51:00
422
3
2022-07-28 11:59:00
12
4
2022-07-28 11:45:00
63
5
2022-07-28 10:20:00
55
db<>fiddle with some other input dates to show the result set changing.

How to find changed values in column

I have a table which looks similar to:
AcctNbr
AcctTypCD
ContractDate
Emp
WrkStLct
WrkStRgn
10001
12M
11-01-2021
John Smith
Downtown
D
10002
BCK
11-02-2021
Jane Smith
Uptown
U
10003
HPLS
11-03-2021
Bob Jones
Midtown
M
10005
VPLS
11-04-2021
Chris Ice
Downtown
D
10006
CLBV
11-12-2021
Smith John
Uptown
U
10007
TI80
11-13-2021
Joann Penn
Midtown
M
10008
M360
10-04-2021
Jim Blue
Downtown
D
My initial query is:
Select acctnbr, accttypcd, contractdate, emp, wrkstlct, wrkstrgn
from tableA
where accttypcd in ('HPLS', 'VPLS')
and contractdate between trunc(sysdate,'mm') and sysdate
order by wrkstrgn, wrkstlct, emp, contractdate;
End users are requesting now a report which pulls back any time AcctTypCD changes from any value (a list of up to 80+ different values) to either 'HPLS' or 'VPLS' and the emp who made the change, what would be the best way to accomplish this?
I apologize in advance for any initial mistakes in this question or if this is a duplicate, first time asking a question.
Its hard to determine fully what you want to achieve given that every row has a different AcctNbr, but your description says "If acctnbr 10007 changes from ...". I will assume that a change means a new row in the table. On that assumption, you could do something like
Select acctnbr, accttypcd, contractdate, emp, wrkstlct, wrkstrgn,
lag(accttypcd) over ( partition by acctnbr order by contractdate)
from tableA
where accttypcd in ('HPLS', 'VPLS')
and contractdate between trunc(sysdate,'mm') and sysdate
order by wrkstrgn, wrkstlct, emp, contractdate;
where the LAG function will show you the previous value of 'accttypcd' where the definition of "previous" is segmented by acctnbr (the 'partition by' part) and sequenced by contractdata (the 'order by' part).
Analytic SQL (like lag, lead etc) is a big topic, so you can get a full tutorial here
https://www.youtube.com/watch?v=0cjxYMxa1e4&list=PLJMaoEWvHwFIUwMrF4HLnRksF0H8DHGtt

How to flatten the queried data

I am currently using the below query to pull the data which is being represented in 4 rows for the same sample record and would like to have it flattened into 1 ow per sample. Attaching the query results for information any help is much appreciated.
select s.name as CRF, a.name as Aliquot_Name, a.aliquot_type, au.u_step_yield as Step_Yield, au.u_step_concentration as Step_Concentration, au. u_pooled_plasma_volume as Pooled_Plasma_volume
from aliquot a
join aliquot_user au on a.aliquot_id = au.aliquot_id
join sample s on s.sample_id = a.SAMPLE_ID
where a.aliquot_type in ('DNA Extracted', 'Library', 'Target Enrichment', 'DNA Plasma')
order by s.name desc, a.aliquot_type, a.name, au.u_step_yield, au.u_step_concentration, au.u_pooled_plasma_volume;
CRF ALIQUOT_NAME ALIQUOT_TYPE STEP_YIELD STEP_CONCENTRATION POOLED_PLASMA_VOLUME
CRF007650 PE-0046758 DNA Plasma 10
CRF007650 LCNL-47275 Library 2,178 36
CRF007650 HCNLS-47467 Target Enrichment 105 2
CRF007649 1146667362 DNA Extracted 451 6
CRF007649 PE-0046774 DNA Plasma 10
CRF007649 LCNL-47291 Library 3,543 59
CRF007649 HCNLS-47483 Target Enrichment 132 2
CRF007648 1146668498 DNA Extracted 166 2
CRF007648 PE-0046755 DNA Plasma 9
CRF007648 LCNL-47272 Library 3,881 65
CRF007648 HCNLS-47463 Target Enrichment 381 6
CRF007647 1146635220 DNA Extracted 29 0
CRF007647 PE-0046764 DNA Plasma 8
CRF007647 LCNL-47281 Library 1,274 21
CRF007647 HCNLS-47473 Target Enrichment 57 1
CRF007646 1146736347 DNA Extracted 67 1
I think you have to more specific.
There's no tables' info. which is pk and which is not.
Only I can say now is that you have to join same table if you want to flatten rows.
If you want to get answer with query, you have to write your tables' info and others can help people answer your question.
As far as I understand your data you have 4 entrys in your table a with different a.aliquot_type ('DNA Extracted', 'Library', 'Target Enrichment', 'DNA Plasma'). And you want to give 4 columns with the corresponding Aliquot_Name (one for 'DNA Extracted' etc.).
You could use 4 columns with a subselect where you read the corresponding data from aliquot and therefore you have to quit the join
a.aliquot_id = au.aliquot_id
For example:
select s.name as CRF, (select a.aliquot_type from aliquot where a.aliquot_type = 'DNA Extracted' and ....) col1, (select a.aliquot_type from aliquot where a.aliquot_type = 'Library' and ....) col2, ...

PIG Script How to

I am trying clean up this employee volunteer data. There is no way to track if employee already is registered volunteer so he can sign up as new volunteer and will get a new VOLUNTEER_ID. I have a data feeding into where i can tie each VOLUNTEER_ID to its EMP_ID. The volunteer data needs to be cleaned up so we can figure out how the employee moved from a volunteer_level to another and when.
The business logic is that, when there is a overlaping dates, we give the highest level to the employee for the timeframe of between start_date and end_date.
I posted a Input sample of data and what the output should be.
Is it possible to do this a PIG script ? Can someone please help me
INPUT:
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 200 1 A 5/1/2006
10001 100 1 A 1/1/2008
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
10001 300 2 A 12/2/2008
OUTPUT NEEDED:( VOLUNTEER_ID is not needed in output but adding below to show which ID was selected for output and which did not)
EMP_ID VOLUNTEER_ID V_LEVEL STATUS START_DATE END_DATE
10001 100 1 A 1/1/2006 12/31/2007
10001 300 3 P 3/1/2008 3/1/2008
10001 300 3 A 3/2/2008 12/1/2008
10001 1001 2 A 5/1/2008 6/30/2008
10001 1001 3 A 7/1/2008
It seems like you want the row in your data with the earliest start date for each V_LEVEL, STATUS, EMP_ID, and VOLUNTEER_ID
First we add a unix time column and then find the min for that column (this is in the latest version of pig so you may need to update your version).
data_with_unix = foreach data generate EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS, START_DATE, END_DATE, ToUnixTime((datetime)START_DATE) as unix_time;
grp = group data_with_unix by (EMP_ID, VOLUNTEER_ID, V_LEVEL, STATUS);
max_date = foreach grp generate group, MIN(data_with_unix.unix_time);
Then join the start and end date back into your dataset since there it doesn't look like there is currently a way to convert unix time back to date.

Transpose without PIVOT in ORACLE

currently I am using pl/sql Developer(Oracle). I am told to convert a Row wise arranged data into columns but without the use of PIVOT. Since the Table I am working on dynamically changes, I am not able to use DECODE too.
POLICY SEQS INVDATE SUM(AMT)
-------- ------ ----------- ----------
policA 123 01-JAN-10 40
policA 123 01-FEB-10 50
policA 123 01-MAR-10 60
policA 456 01-JAN-10 360
policA 456 01-FEB-10 450
policA 456 01-MAR-10 540
policA 789 01-FEB-10 1000
polcA 789 01-MAR-10 1000
I have to re-arrange the dates and the sum of amounts column wise. So that the Single Policy and Single SEQS will have the dates and its amount column wise in a line.
"POLICY","SEQS","INST1","INST1SUM","INST2","INST2SUM","INST3","INST3SUM"
"policA","123","01-JAN-10","40","01-FEB-10","50","01-MAR-10","60"
"policA","456","01-JAN-10","360","01-FEB-10","450","01-MAR-10","540"
"policA","789","01-FEB-10","1000","01-MAR-10","1000"
Some Policy might not be starting from Jan, so the INST1 must be from feb, INST2 must be Mar and INST3 and corresponding INSTSUM must be NULL.
Is there any way that this can be done using CROSS JOINS or using xml function?
Can I use xmlagg with alternative data (INST and SUM)?
I have done some research and am not able to solve this out. Can you please help me with this?

Resources