I have data with overlapping data ranges. Example below
Customer_ID
FAC_NUM
Start_Date
End_Date
New_Monies
12345
ABC1234
26/NOV/2014
26/MAY/2015
100000
12345
ABC1234
12/DEC/2014
12/JUN/2015
200000
12345
ABC1234
15/JUN/2015
15/DEC/2015
500000
12345
ABC1234
20/DEC/2015
20/JUN/2016
600000
I want to convert this table into data with non overlapping ranges such that for each overlapping period, the New_Monies column is summed together and shown as a new row. For the example above, I want the output to be as follows
Customer_ID
FAC_NUM
Start_Date
End_Date
New_Monies
12345
ABC1234
26/NOV/2014
11/DEC/2014
100000
12345
ABC1234
12/DEC/2014
26/MAY/2015
300000
12345
ABC1234
27/MAY/2015
12/JUN/2015
200000
12345
ABC1234
15/JUN/2015
15/DEC/2015
500000
12345
ABC1234
20/DEC/2015
20/JUN/2016
600000
Row 2 above being the overlapping period of 12 Dec 2014 to 26 May 2015 showing the total New_Monies as 300000 (100000+200000)
What would be the best way to do this in Oracle?
Thanks in advance for your support.
Regards,
Ani
with
prep (customer_id, fac_num, dt, amount) as (
select t.customer_id, t.fac_num,
case h.col when 's' then t.start_date else t.end_date + 1 end as dt,
case h.col when 's' then t.new_monies else - t.new_monies end as amount
from sample_data t
cross join
(select 's' as col from dual union all select 'e' from dual) h
)
, cumul_sums (customer_id, fac_num, dt, amount) as (
select distinct
customer_id, fac_num, dt,
sum(amount) over (partition by customer_id, fac_num order by dt)
from prep
)
, with_intervals (customer_id, fac_num, start_date, end_date, amount) as (
select customer_id, fac_num, dt,
lead(dt) over (partition by customer_id, fac_num order by dt) - 1,
amount
from cumul_sums
)
select customer_id, fac_num, start_date, end_date, amount
from with_intervals
where end_date is not null
order by customer_id, fac_num, start_date
;
The prep subquery unpivots the inputs, while at the same time changing the "end date" to the "start date" of the following interval and assigning a positive amount to the "start date" and the negative of the same amount to the following "start date". cumul_sums computes the cumulative sums; note that if two or more intervals begin on the same date (so the same date from prep appears multiple times for a customer and fac_num), the analytic sum will include the amounts from ALL the rows up to that date - the default windowing clause is range between...... After the cumulative sums are computed, this subquery also de-duplicates the output rows (to handle precisely that complication, of multiple intervals starting on the same date). with_intervals recovers the "start date" - "end date" intervals, and the final step simply removes the last interval ("to infinity") which would have an "amount" of zero.
EDIT This solution answers the OP's original question. After posting the solution, the OP changed the question. The solution can be changed easily to address the new formulation. I'm not going to chase shadows though; the solution will remain as is.
Here is an way to do this.
with all_data
as (select Customer_ID,FAC_NUM,start_date as dt,new_monies as calc_monies
from t
union all
select Customer_ID,FAC_NUM,end_date as dt,new_monies*-1 as calc_monies
from t
)
select x.customer_id
,x.fac_num
,x.start_date
,case when row_number() over(order by end_date desc)=1 then
x.end_date + 1
else x.end_date
end as new_end_date
from (
select t.customer_id
,t.fac_num
,t.dt as start_date
,lead(dt) over(order by dt)-1 as end_date
,sum(calc_monies) over(order by dt) as new_monies
from all_data t
)x
where end_date is not null
order by 3
db fiddle link
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=856c9ac0954e45429994f4ac45699e6f
+-------------+---------+------------+--------------+------------+
| CUSTOMER_ID | FAC_NUM | START_DATE | NEW_END_DATE | NEW_MONIES |
+-------------+---------+------------+--------------+------------+
| 12345 | ABC1234 | 26-NOV-14 | 11-DEC-14 | 100000 |
| 12345 | ABC1234 | 12-DEC-14 | 25-MAY-15 | 300000 |
| 12345 | ABC1234 | 26-MAY-15 | 12-JUN-15 | 200000 |
+-------------+---------+------------+--------------+------------+
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a list of activities that is currently ordered by user, date and time of activity, and ID. I want to generate numbers for each group set by those same fields. Using the following code, I achieve considerable accuracy. However, there's a problem when the same ID is repeated at a later time and I need the row number count to restart instead of continuing from the previous iteration.
Here's my code:
ROW_NUMBER() OVER (PARTITION BY USER_ID, foc_id ORDER BY USER_ID, to_char(activity_date, 'MM/DD/YYYY HH24:MI:SS'), foc_id) seq_nbr
In the image below, we see that FOC_ID "A240" had activity around 2:20PM. Then FOC_ID "B410" had activity around 3:19PM, lastly the user returned to "A240" for additional activity around 3:20. Because there was activity between the first and second sequence of events of "A240," I need the row number (seq_nbr) to restart instead of continuing from the previous activity.
You can use MATCH_RECOGNIZE:
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id, mno ORDER BY activity_date ) AS seq_num
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY user_id
ORDER BY activity_date
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN ( same_foc_id* last_row )
DEFINE
same_foc_id AS FIRST( foc_id ) = NEXT( foc_id )
)
or, multiple ROW_NUMBERs:
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id, foc_id, grp ORDER BY activity_date ) AS seq_num
FROM (
SELECT user_id,
activity_date,
foc_id,
ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date )
- ROW_NUMBER() OVER ( PARTITION BY user_id, foc_id ORDER BY activity_date ) AS grp
FROM table_name
)
ORDER BY user_id, activity_date
Which, for the sample data:
CREATE TABLE table_name ( user_id, activity_date, foc_id ) AS
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:20:34' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '14:21:23' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:39' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:44' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:19:58' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:20:11' HOUR TO SECOND, 'B410' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:22:16' HOUR TO SECOND, 'A240' FROM DUAL UNION ALL
SELECT 'UVAC3', DATE '2020-11-04' + INTERVAL '15:22:33' HOUR TO SECOND, 'A240' FROM DUAL;
Both output:
USER_ID | ACTIVITY_DATE | FOC_ID | SEQ_NUM
:------ | :------------------ | :----- | ------:
UVAC3 | 2020-11-04 14:20:34 | A240 | 1
UVAC3 | 2020-11-04 14:21:23 | A240 | 2
UVAC3 | 2020-11-04 14:21:23 | A240 | 3
UVAC3 | 2020-11-04 14:21:23 | A240 | 4
UVAC3 | 2020-11-04 15:19:39 | B410 | 1
UVAC3 | 2020-11-04 15:19:44 | B410 | 2
UVAC3 | 2020-11-04 15:19:58 | B410 | 3
UVAC3 | 2020-11-04 15:20:11 | B410 | 4
UVAC3 | 2020-11-04 15:22:16 | A240 | 1
UVAC3 | 2020-11-04 15:22:33 | A240 | 2
db<>fiddle here
Oracle - Say you have a table that has a unique key on name, ssn and effective date. The effective date makes it unique. What is the best way to update a current indicator to show inactive for the rows with dates less than the max effective date? I can't really wrap my head around it since there are multiple rows with the same name and ssn combinations. I haven't been able to find this scenario on here for Oracle and I'm having developer's block. Thanks.
"All name/ssn having a max effective date earlier than this time yesterday:"
SELECT name, ssn
FROM t
GROUP BY name, ssn
HAVING MAX(eff_date) < SYSDATE - 1
Oracle supports multi column in, so
UPDATE t
SET current_indicator = 'inactive'
WHERE (name,ssn,eff_date) IN (
SELECT name, ssn, max(eff_date)
FROM t
GROUP BY name, ssn
HAVING MAX(eff_date) < SYSDATE - 1
)
Use a MERGE statement using an analytic function to identify the rows to update and then merge on the ROWID pseudo-column so that Oracle can efficiently identify the rows to update (without having to perform an expensive self-join by comparing the values):
MERGE INTO table_name dst
USING (
SELECT rid,
max_eff_date
FROM (
SELECT ROWID AS rid,
effective_date,
status,
MAX( effective_date ) OVER ( PARTITION BY name, ssn ) AS max_eff_date
FROM table_name
)
WHERE ( effective_date < max_eff_date AND status <> 'inactive' )
OR ( effective_date = max_eff_date AND status <> 'active' )
) src
ON ( dst.ROWID = src.rid )
WHEN MATCHED THEN
UPDATE
SET status = CASE
WHEN src.max_eff_date = dst.effective_date
THEN 'active'
ELSE 'inactive'
END;
So, for some sample data:
CREATE TABLE table_name ( name, ssn, effective_date, status ) AS
SELECT 'aaa', 1, DATE '2020-01-01', 'inactive' FROM DUAL UNION ALL
SELECT 'aaa', 1, DATE '2020-01-02', 'inactive' FROM DUAL UNION ALL
SELECT 'aaa', 1, DATE '2020-01-03', 'inactive' FROM DUAL UNION ALL
SELECT 'bbb', 2, DATE '2020-01-01', 'active' FROM DUAL UNION ALL
SELECT 'bbb', 2, DATE '2020-01-02', 'inactive' FROM DUAL UNION ALL
SELECT 'bbb', 3, DATE '2020-01-01', 'inactive' FROM DUAL UNION ALL
SELECT 'bbb', 3, DATE '2020-01-03', 'active' FROM DUAL;
The query only updates the 3 rows that need changing and:
SELECT *
FROM table_name;
Outputs:
NAME | SSN | EFFECTIVE_DATE | STATUS
:--- | --: | :------------- | :-------
aaa | 1 | 01-JAN-20 | inactive
aaa | 1 | 02-JAN-20 | inactive
aaa | 1 | 03-JAN-20 | active
bbb | 2 | 01-JAN-20 | inactive
bbb | 2 | 02-JAN-20 | active
bbb | 3 | 01-JAN-20 | inactive
bbb | 3 | 03-JAN-20 | active
db<>fiddle here
I have 2 rows with 2 periods of time that intersect. For example:
---------------------------------------------
| START_DATE | END_DATE |
---------------------------------------------
| 01/01/2018 08:00:00 | 01/01/2018 09:30:00 |
| 01/01/2018 08:30:00 | 01/01/2018 10:00:00 |
---------------------------------------------
There are 30 minutes where both periods intersect. I want to avoid it. I would like to join both rows in one single column, taking the starting date as the older and the ending date as the newer:
---------------------------------------------
| START_DATE | END_DATE |
---------------------------------------------
| 01/01/2018 08:00:00 | 01/01/2018 10:00:00 |
---------------------------------------------
Have you any idea how can I get the solution I want with a SQL sentence?
For two rows just use greatest() and least(). But the problem is when you have many rows which may overlap in different ways. You can:
add row numbers to each row,
assign groups for overlapping periods using recursive query,
group data using this value and find min and max dates in each group.
dbfiddle demo
with
r(rn, start_date, end_date) as (
select row_number() over(order by start_date), start_date, end_date from t ),
c(rn, start_date, end_date, grp) as (
select rn, start_date, end_date, 1 from r where rn = 1
union all
select r.rn,
case when r.start_date <= c.end_date and c.start_date <= r.end_date
then least(r.start_date, c.start_date) else r.start_date end,
case when r.start_date <= c.end_date and c.start_date <= r.end_date
then greatest(r.end_date, c.end_date) else r.end_date end,
case when r.start_date <= c.end_date and c.start_date <= r.end_date
then grp else grp + 1 end
from c join r on r.rn = c.rn + 1)
select min(start_date), max(end_date) from c group by grp
If all you have is a set of date ranges, with no other correlating or constraining criteria, and you want to reduce that to a set of non overlapping ranges, you can do that with a recursive query like this one:
with recur(start_date, end_date) as (
select * from yourdata yd
where not exists (select 1 from yourdata cyd
where yd.start_Date between cyd.start_date and cyd.end_date
and (yd.start_date <> cyd.start_date or yd.end_date <> cyd.end_date))
union all
select r.start_date
, yd.end_date
from recur r
join yourdata yd
on r.start_date < yd.start_date
and yd.start_date <= r.end_date
and r.end_date < yd.end_date
)
select start_date, max(end_date) end_Date from recur group by start_Date;
In this query the anchor (the part before the union all) select all records whose start date is not contained in any other range.
The recursive part (the part after the union all) then select ranges that extend the current range. In both halves the original start date is returned while in the recursive part the new extended end date is returned. This results in a set of over lapping ranges with a common start date.
Finally the output query returns the start date and max end date grouped by start date.
I'm using ORACLE Database,
How to get all column with GROUP by only 1 column (EMP_ID)?
Example I have table ESD_RESULTS
FIRST_NAME | LAST_NAME | EMP_ID | WRIST_STATUS | LFOOT_STATUS | DATE
Dodo | A | 0101 | Pass | Pass | 2016-01-18 10:00
Wedi | Wil | 0105 | Pass | Pass | 2016-01-18 10:05
Dodo | A | 0101 | Pass | Fail | 2016-01-18 10:11
What I want the data display is (Get the last data by date desc if EMP_ID same):
FIRST_NAME | LAST_NAME | EMP_ID | WRIST_STATUS | LFOOT_STATUS | DATE
Dodo | A | 0101 | Pass | Fail | 2016-01-18 10:11
Wedi | Wil | 0105 | Pass | Pass | 2016-01-18 10:05
I tried to use DISTINCT and GROUP by the data still show all.
One option is to use ROW_NUMBER() to identify the latest record for each employee:
SELECT t.FIRST_NAME,
t.LAST_NAME,
t.EMP_ID,
t.WRIST_STATUS,
t.LFOOT_STATUS,
t.DATE
FROM
(
SELECT FIRST_NAME, LAST_NAME, EMP_ID, WRIST_STATUS, LFOOT_STATUS, DATE,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY DATE DESC) rn
FROM ESD_RESULTS
) t
WHERE t.rn = 1
Since presumably the first name and the last name are determined by the emp_id (they don't change from one row to another), you might as well group by all three columns - resulting in less work. (On the other hand, it would make more sense to normalize your table design; one table shows the associated first name and last name for each emp_id, there is no need to repeat the first name and last name in "this" table, which you show in your post.)
Then: you can use the FIRST/LAST function, with keep (dense_rank ...), as demonstrated below, to eliminate the need for a subquery and an outer query. If there is the possibility of two rows having the exact same date and time for an emp_id, you may refine the query to accommodate "tie-breaks" of some kind. If there are no ties, then the query will work without modification.
DATE is a reserved word in Oracle, it shouldn't be used for table or column names. I changed it to DT.
with
test_data ( first_name, last_name, emp_id, wrist_status, lfoot_status, dt ) as (
select 'Dodo', 'A' , 0101, 'Pass', 'Pass', to_date('2016-01-18 10:00', 'yyyy-mm-dd hh24:mi') from dual union all
select 'Wedi', 'Wil', 0105, 'Pass', 'Pass', to_date('2016-01-18 10:05', 'yyyy-mm-dd hh24:mi') from dual union all
select 'Dodo', 'A' , 0101, 'Pass', 'Fail', to_date('2016-01-18 10:11', 'yyyy-mm-dd hh24:mi') from dual
)
-- end of test data (NOT part of the solution); SQL query begins BELOW THIS LINE
select first_name, last_name, emp_id,
min(wrist_status) keep (dense_rank last order by dt) as wrist_status,
min(lfoot_status) keep (dense_rank last order by dt) as lfoot_status,
max(dt) as dt
from test_data
group by first_name, last_name, emp_id
;
FIRST_NAME LAST_NAME EMP_ID WRIST_STATUS LFOOT_STATUS DT
---------- --------- ---------- ------------ ------------ ----------------
Dodo A 101 Pass Fail 2016-01-18 10:11
Wedi Wil 105 Pass Pass 2016-01-18 10:05
2 rows selected.
Imagine this scenario (YYYY/MM/DD):
Start date: 2015/01/01 End date: 2015/08/10
Start date: 2014/10/03 End date: 2015/07/06
Start date: 2015/09/30 End date: 2016/04/28
Using PL/SQL can I calculate the distinct days between these overlapping dates?
Edit: My table has 2 DATE columns, Start_Date and End_Date. The result I'm expecting is 515 days ((2015/08/10 - 2014/10/03) + (2016/04/28 -2015/09/30))
You can do also with pure SQL (no need for PL/SQL):
with
minmax as (select min(start_date) min_dt, max(end_date) max_dt from myTable ),
dates as (
SELECT min_dt + rownum-1 dt1
FROM minmax CONNECT BY ROWNUM <= (max_dt - min_dt +1)
)
select count(*) from dates
where exists(
select 1 from MyTable T2
where dates.dt1 between T2.start_date and T2.end_date )
NOTE: an idea, written from head, not tested. Adapt generated dates as needed, with start date and needed length.
Hope it helps.
EDIT: Using actual table dates
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE DATES ( start_date, end_date ) AS
SELECT DATE '2015-01-01', DATE '2015-08-10' FROM DUAL
UNION ALL SELECT DATE '2014-10-03', DATE '2015-07-06' FROM DUAL
UNION ALL SELECT DATE '2015-09-30', DATE '2016-04-28' FROM DUAL
Query 1:
SELECT COUNT( DISTINCT COLUMN_VALUE ) AS number_of_days
FROM DATES d,
TABLE(
CAST(
MULTISET(
SELECT d.START_DATE + LEVEL - 1
FROM DUAL
CONNECT BY d.START_DATE + LEVEL - 1 < d.END_DATE
)
AS SYS.ODCIDATELIST
)
)
ORDER BY 1
Results:
| NUMBER_OF_DAYS |
|----------------|
| 522 |
Query 2 - Check:
SELECT DATE '2015-08-10' - DATE '2014-10-03'
+ DATE '2016-04-28' - DATE '2015-09-30'
FROM DUAL
Results:
| DATE'2015-08-10'-DATE'2014-10-03'+DATE'2016-04-28'-DATE'2015-09-30' |
|---------------------------------------------------------------------|
| 522 |