find nearest row of different type in oracle - oracle

My table looks like
__ Key type timeStamp flag
1 ) 1 B 2015-06-28 22:19:26 Y
2 ) 1 B 2015-06-28 22:20:22 Y
3 ) 1 C 2015-06-28 22:22:06 N
4 ) 1 A 2015-06-28 22:25:11 N
5 ) 1 B 2015-06-28 22:29:44 Y
6 ) 1 A 2015-06-28 22:33:33 N
7 ) 1 B 2015-06-28 22:35:21 N
8 ) 1 B 2015-06-28 22:39:34 Y
9 ) 1 B 2015-06-28 22:43:53 N
10) 1 A 2015-06-28 22:45:53 N
I need to find out all the types of A whose flag='N' with respect to which there exist type B whose timestampOF(B)<timestampOF(A) and Flag(B)='Y' and key(A)=key(B).
note: If there exist two B previous than A than take the B with max timestamp.(ROW[8,9,10] 9 is taken instead of 8)
OUTPUT
__ Key type timeStamp flag
4 ) 1 A 2015-06-28 22:25:11 N
6 ) 1 A 2015-06-28 22:33:33 N
My approach
SELECT *
FROM tab TAB_OUT
WHERE TAB_OUT.TYPE='A'
AND TAB_OUT.FLAG='N'
AND EXISTS(
SELECT *
FROM tab TAB_IN
WHERE TAB_IN.KEY = TAB_OUT.KEY
AND TAB_IN.TYPE='B'
AND TAB_OUT.FLAG='Y'
AND TAB_IN.timestamp<TAB_OUT.timestamp
AND TAB_IN.timestamp = (SELECT MAX(timestamp) from
tab where timestamp< `TAB_OUT.timestamp`)
);
But in this i can not use TAB_OUT.timestamp in third level query. Is there any alternative solution to solve this problem.
In my query note: part is not satisfied as my query as it skips no. 9) and satisfy condition with no. 8)

A solution that only requires a single table scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( Key, type, timeStamp, flag ) AS
SELECT 1, 'B', CAST( TIMESTAMP '2015-06-28 22:19:26' AS DATE ), 'Y' FROM DUAL
UNION ALL SELECT 1, 'B', CAST( TIMESTAMP '2015-06-28 22:20:22' AS DATE ), 'Y' FROM DUAL
UNION ALL SELECT 1, 'C', CAST( TIMESTAMP '2015-06-28 22:22:06' AS DATE ), 'N' FROM DUAL
UNION ALL SELECT 1, 'A', CAST( TIMESTAMP '2015-06-28 22:25:11' AS DATE ), 'N' FROM DUAL
UNION ALL SELECT 1, 'B', CAST( TIMESTAMP '2015-06-28 22:29:44' AS DATE ), 'Y' FROM DUAL
UNION ALL SELECT 1, 'A', CAST( TIMESTAMP '2015-06-28 22:33:33' AS DATE ), 'N' FROM DUAL
UNION ALL SELECT 1, 'B', CAST( TIMESTAMP '2015-06-28 22:35:21' AS DATE ), 'N' FROM DUAL
UNION ALL SELECT 1, 'B', CAST( TIMESTAMP '2015-06-28 22:39:34' AS DATE ), 'Y' FROM DUAL
UNION ALL SELECT 1, 'B', CAST( TIMESTAMP '2015-06-28 22:43:53' AS DATE ), 'N' FROM DUAL
UNION ALL SELECT 1, 'A', CAST( TIMESTAMP '2015-06-28 22:45:53' AS DATE ), 'N' FROM DUAL
Query 1:
SELECT Key,
type,
timeStamp,
flag
FROM (
SELECT Key,
type,
timeStamp,
flag,
LAG( CASE WHEN type = 'B' THEN flag END ) IGNORE NULLS OVER ( PARTITION BY Key ORDER BY timeStamp ) AS prev_b_flag
FROM table_name t
WHERE type IN ( 'A', 'B' )
)
WHERE type = 'A'
AND flag = 'N'
AND prev_b_flag = 'Y'
Results:
| KEY | TYPE | TIMESTAMP | FLAG |
|-----|------|------------------------|------|
| 1 | A | June, 28 2015 22:25:11 | N |
| 1 | A | June, 28 2015 22:33:33 | N |

SELECT
*
FROM
tab A
WHERE
flag = 'N' AND type = 'A'
AND EXISTS (
SELECT
NULL
FROM
tab B
WHERE
type = 'B'
AND A.timestamp > timestamp AND A.Key = Key
GROUP BY
Key
HAVING
MAX(flag) KEEP (DENSE_RANK LAST ORDER BY timestamp) = 'Y'
);
There is no need to make correlated query to select flag from the the last record. Using aggregate KEEP clause is more efficient way. In this case it sort the groups by timestamp and keeps only the last value for the aggregation (last timestamp you wanted), so there comes only single record to the MAX function and we just take the FLAG value from it.
Here is simple example:
WITH sample (value1, value2) AS (
SELECT 1, 'Y' FROM DUAL UNION ALL
SELECT 2, 'X' FROM DUAL
)
SELECT
MIN(value2) KEEP (DENSE_RANK LAST ORDER BY value1) value2
FROM
sample
This returns value2 from the record with highest value1.

Related

ORACLE - How to use LAG to display strings from all previous rows into current row

I have data like below:
group
seq
activity
A
1
scan
A
2
visit
A
3
pay
B
1
drink
B
2
rest
I expect to have 1 new column "hist" like below:
group
seq
activity
hist
A
1
scan
NULL
A
2
visit
scan
A
3
pay
scan, visit
B
1
drink
NULL
B
2
rest
drink
I was trying to solve with LAG function, but LAG only returns one row from previous instead of multiple.
Truly appreciate any help!
Use a correlated sub-query:
SELECT t.*,
(SELECT LISTAGG(activity, ',') WITHIN GROUP (ORDER BY seq)
FROM table_name l
WHERE t."GROUP" = l."GROUP"
AND l.seq < t.seq
) AS hist
FROM table_name t
Or a hierarchical query:
SELECT t.*,
SUBSTR(SYS_CONNECT_BY_PATH(PRIOR activity, ','), 3) AS hist
FROM table_name t
START WITH seq = 1
CONNECT BY
PRIOR seq + 1 = seq
AND PRIOR "GROUP" = "GROUP"
Or a recursive sub-query factoring clause:
WITH rsqfc ("GROUP", seq, activity, hist) AS (
SELECT "GROUP", seq, activity, NULL
FROM table_name
WHERE seq = 1
UNION ALL
SELECT t."GROUP", t.seq, t.activity, r.hist || ',' || r.activity
FROM rsqfc r
INNER JOIN table_name t
ON (r."GROUP" = t."GROUP" AND r.seq + 1 = t.seq)
)
SEARCH DEPTH FIRST BY "GROUP" SET order_rn
SELECT "GROUP", seq, activity, SUBSTR(hist, 2) AS hist
FROM rsqfc
Which, for the sample data:
CREATE TABLE table_name ("GROUP", seq, activity) AS
SELECT 'A', 1, 'scan' FROM DUAL UNION ALL
SELECT 'A', 2, 'visit' FROM DUAL UNION ALL
SELECT 'A', 3, 'pay' FROM DUAL UNION ALL
SELECT 'B', 1, 'drink' FROM DUAL UNION ALL
SELECT 'B', 2, 'rest' FROM DUAL;
All output:
GROUP
SEQ
ACTIVITY
HIST
A
1
scan
null
A
2
visit
scan
A
3
pay
scan,visit
B
1
drink
null
B
2
rest
drink
db<>fiddle here
To aggregate strings in Oracle we use LISAGG function.
In general, you need a windowing_clause to specify a sliding window for analytic function to calculate running total.
But unfortunately LISTAGG doesn't support it.
To simulate this behaviour you may use model_clause of the select statement. Below is an example with explanation.
select
group_
, activity
, seq
, hist
from t
model
/*Where to restart calculation*/
partition by (group_)
/*Add consecutive numbers to reference "previous" row per group.
May use "seq" column if its values are consecutive*/
dimension by (
row_number() over(
partition by group_
order by seq asc
) as rn
)
measures (
/*Other columnns to return*/
activity
, cast(null as varchar2(1000)) as hist
, seq
)
rules update (
/*Apply this rule sequentially*/
hist[any] order by rn asc =
/*Previous concatenated result*/
hist[cv()-1]
/*Plus comma for the third row and tne next rows*/
|| presentv(activity[cv()-2], ',', '') /**/
/*lus previous row's value*/
|| activity[cv()-1]
)
GROUP_ | ACTIVITY | SEQ | HIST
:----- | :------- | --: | :---------
A | scan | 1 | null
A | visit | 2 | scan
A | pay | 3 | scan,visit
B | drink | 1 | null
B | rest | 2 | drink
db<>fiddle here
Few more variants (without subqueries):
SELECT--+ NO_XML_QUERY_REWRITE
t.*,
regexp_substr(
listagg(activity, ',')
within group(order by SEQ)
over(partition by "GROUP")
,'^([^,]+,){'||(row_number()over(partition by "GROUP" order by seq)-1)||'}'
)
AS hist1
,xmlcast(
xmlquery(
'string-join($X/A/B[position()<$Y]/text(),",")'
passing
xmlelement("A", xmlagg(xmlelement("B", activity)) over(partition by "GROUP")) as x
,row_number()over(partition by "GROUP" order by seq) as y
returning content
)
as varchar2(1000)
) hist2
FROM table_name t;
DBFIddle: https://dbfiddle.uk/?rdbms=oracle_21&fiddle=9b477a2089d3beac62579d2b7103377a
Full test case with output:
with table_name ("GROUP", seq, activity) AS (
SELECT 'A', 1, 'scan' FROM DUAL UNION ALL
SELECT 'A', 2, 'visit' FROM DUAL UNION ALL
SELECT 'A', 3, 'pay' FROM DUAL UNION ALL
SELECT 'B', 1, 'drink' FROM DUAL UNION ALL
SELECT 'B', 2, 'rest' FROM DUAL
)
SELECT--+ NO_XML_QUERY_REWRITE
t.*,
regexp_substr(
listagg(activity, ',')
within group(order by SEQ)
over(partition by "GROUP")
,'^([^,]+,){'||(row_number()over(partition by "GROUP" order by seq)-1)||'}'
)
AS hist1
,xmlcast(
xmlquery(
'string-join($X/A/B[position()<$Y]/text(),",")'
passing
xmlelement("A", xmlagg(xmlelement("B", activity)) over(partition by "GROUP")) as x
,row_number()over(partition by "GROUP" order by seq) as y
returning content
)
as varchar2(1000)
) hist2
FROM table_name t;
GROUP SEQ ACTIV HIST1 HIST2
------ ---------- ----- ------------------------------ ------------------------------
A 1 scan
A 2 visit scan, scan
A 3 pay scan,visit, scan,visit
B 1 drink
B 2 rest drink, drink

group orders based on crossing date ranges

I need to group order together with crossing their date ranges only
scenario A.
order 1, 1.3.2020-30.6.2020
order 2, 1.5.2020-31.8.2020
order 3, 31.7.2020-31.10.2020
order 4, 31.7.2020-31.12.2020
so the output should be
order 1, order 2
order 2, order 3, order 4
order1,3,4 are not grouped because their ranges don't cross at all
scenario B.
same as above plus another order
order 5, 1.1.2020-31.12.2020
so output will be
order 1, order 2, order 5
order 2, order 3, order 4, order 5
I tried Self Join to check which start date falls in that range.
so in the range of order 1 falls only the start date of order 2 -> we have one group
then in the range of order 2 fall both start dates of order 3 and 4 -> we have second group
but then for order 3 falls start date of order 4 and opposite -> that will give another 2 groups but they are invalid because order 2 is crossing their date ranges as well and shoul be included as well and becuase there will be 3 douplicates we should display it just once as in the desired output but this approach will fail.
Thanks
the result of MATCH_RECOGNIZE solution is incorrent because order 5 should be in both groups
I use some analitycal functions to solve this:
-- create table
Create table cross_dates (order_id number, start_date date , end_date date);
-- insert dates
insert into cross_dates values( 1, to_date('01.03.2020', 'dd.mm.yyyy'), to_date('30.06.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 2, to_date('01.05.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 3, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 4, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.10.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 5, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'));
-- SQL
select 'Order '|| min_order_id ||': ', listagg( order_id, ',') within group (order by order_id) list
from (
select distinct min_order_id, order_id from (
with dates (cur_date, end_date, order_id, start_date) as (
select start_date, end_date, order_id, start_date
from cross_Dates
union all
select cur_date + 1, end_date, order_id,start_date
from dates
where cur_date < end_date )
select d.order_id,
min(d.order_id) over(partition by greatest(d.start_date, cd.start_date)) min_order_id
from dates d, cross_Dates cd
where d.cur_date between cd.start_date and cd.end_date ))
group by min_order_id
having count(*) > 1;
Result:
Order 1: 1,2,5
Order 2: 2,3,4,5
-- add new column and update old records
alter table cross_dates add (item varchar2(1));
update cross_dates set item = 'A';
-- insert new records B
insert into cross_dates values( 1, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '30.06.2020', 'dd.mm.yyyy'), 'B');
insert into cross_dates values( 1, to_date('01.07.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'), 'B');
My assumption:
A and B are separate orders, not going in same groups even when crossing
order 1 B - has two records as a continuations - in my understanding counts like one order : order 1 B 01.01.2020 - 21.12.2020
If my assumption are correct the SQL could look like this:
select distinct min_order_id, order_id, item from (
with dates (cur_date, end_date, order_id, start_date, item) as (
select start_date, end_date, order_id, start_date, item
from cross_Dates
union all
select cur_date + 1, end_date, order_id,start_date, item
from dates
where cur_date < end_date )
select d.order_id, d.item,
min(d.order_id) over(partition by greatest(d.start_date, cd.start_date),d.item) min_order_id
from dates d, cross_Dates cd
where d.cur_date between cd.start_date and cd.end_date and d.item = cd.item )
order by item, min_order_id;
Result:
MIN_ORDER_ID ORDER_ID I
1 1 A
1 2 A
1 5 A
2 2 A
2 3 A
2 4 A
2 5 A
5 5 A
1 1 B
If my assumption are not ok please provide me what result should look like i this case.
:)
You can use MATCH_RECOGNIZE to find groups where the next value's start date is before, or equal to, the end date of all the previous values in the group. Then you can aggregate and exclude groups that would be entirely contained in another group:
WITH groups ( id, ids, start_date, end_date ) AS (
SELECT id,
LISTAGG( grp_id, ',' ) WITHIN GROUP ( ORDER BY start_date ),
MIN( start_date ),
MIN( end_date )
FROM (
SELECT t.id,
x.id AS grp_id,
x.start_date,
x.end_date
FROM table_name t
INNER JOIN table_name x
ON (
x.start_date >= t.start_date
AND x.start_date <= t.end_date
)
)
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY start_date
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN ( FIRST_ROW GROUPED_ROWS* )
DEFINE GROUPED_ROWS AS (
GROUPED_ROWS.start_date <= MIN( end_date )
)
)
WHERE mno = 1
GROUP BY id
)
SELECT id,
ids
FROM groups g
WHERE NOT EXISTS (
SELECT 1
FROM groups x
WHERE g.ID <> x.ID
AND x.start_date <= g.start_date
AND g.end_date <= x.end_date
)
Which for the sample data:
CREATE TABLE table_name ( id, start_date, end_date ) AS
SELECT 'order 1', DATE '2020-03-01', DATE '2020-06-30' FROM DUAL UNION ALL
SELECT 'order 2', DATE '2020-05-01', DATE '2020-08-31' FROM DUAL UNION ALL
SELECT 'order 3', DATE '2020-07-31', DATE '2020-10-31' FROM DUAL UNION ALL
SELECT 'order 4', DATE '2020-07-31', DATE '2020-12-31' FROM DUAL;
Outputs:
ID | IDS
:------ | :----------------------
order 2 | order 2,order 3,order 4
order 1 | order 1,order 2
I you then:
INSERT INTO table_name ( id, start_date, end_date )
VALUES ( 'order 5', DATE '2020-01-01', DATE '2020-12-31' );
The output would be:
ID | IDS
:------ | :----------------------
order 2 | order 2,order 3,order 4
order 5 | order 5,order 1,order 2
db<>fiddle here

How to Choose a specific value from a table and to avoid duplicates?

I have two tables:
MainTable
id AccountNum status
1 11001 active
2 11002 active
3 11003 active
4 11004 active
AddTable
id date description
1 01.2020 ACCOUNT.SET
1 02.2020 ACCOUNT.CHANGE
1 03.2020 ACCOUNT.REMOVE
2 04.2020 ACCOUNT.SET
2 05.2020 ACCOUNT.CHANGE
3 08.2020 ACCOUNT.SET
4 05.2020 ACCOUNT.SET
4 09.2020 ACCOUNT.REMOVE
I need to get a such result:
EffectiveFrom is date when Account was set,
EffectiveTo is date when Account was removed
id AccountNum EffectiveFrom EffectiveTo
1 11001 01.2020 03.2020
2 11002 04.2020 null
3 11003 08.2020 null
4 11004 05.2020 09.2020
The problem is that after joining on AddTable I get the duplicates, but I need just one row on every Id and only dates where the description in ACCOUNT.SET,ACCOUNT.REMOVE.
Are you looking for left join?
select m.id as id,
m.AccountNum as AccountNum,
a.date as EffectiveFrom,
b.date as EffectiveTo
from MainTable m left join
AddTable a on (a.id = m.id and a.description = 'ACCOUNT.SET') left join
AddTable b on (b.id = m.id and b.description = 'ACCOUNT.REMOVE')
order by m.AccountNum
Use a PIVOT and a LEFT OUTER JOIN:
SELECT m.id,
a.EffectiveFrom,
a.EffectiveTo
FROM MainTable m
LEFT OUTER JOIN
(
SELECT *
FROM AddTable
PIVOT( MAX( dt ) FOR description IN (
'ACCOUNT.SET' AS EffectiveFrom,
'ACCOUNT.REMOVE' AS EffectiveTo
) )
) a
ON ( a.id = m.id )
ORDER BY m.id
So for your test data:
CREATE TABLE MainTable ( id, AccountNum, status ) AS
SELECT 1, 11001, 'active' FROM DUAL UNION ALL
SELECT 2, 11002, 'active' FROM DUAL UNION ALL
SELECT 3, 11003, 'active' FROM DUAL UNION ALL
SELECT 4, 11004, 'active' FROM DUAL;
CREATE TABLE AddTable ( id, dt, description ) AS
SELECT 1, DATE '2020-01-01', 'ACCOUNT.SET' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-02', 'ACCOUNT.CHANGE' FROM DUAL UNION ALL
SELECT 1, DATE '2020-01-03', 'ACCOUNT.REMOVE' FROM DUAL UNION ALL
SELECT 2, DATE '2020-01-04', 'ACCOUNT.SET' FROM DUAL UNION ALL
SELECT 2, DATE '2020-01-05', 'ACCOUNT.CHANGE' FROM DUAL UNION ALL
SELECT 3, DATE '2020-01-08', 'ACCOUNT.SET' FROM DUAL UNION ALL
SELECT 4, DATE '2020-01-05', 'ACCOUNT.SET' FROM DUAL UNION ALL
SELECT 4, DATE '2020-01-09', 'ACCOUNT.REMOVE' FROM DUAL;
This outputs:
ID | EFFECTIVEFROM | EFFECTIVETO
-: | :------------ | :----------
1 | 01-JAN-20 | 03-JAN-20
2 | 04-JAN-20 | null
3 | 08-JAN-20 | null
4 | 05-JAN-20 | 09-JAN-20
db<>fiddle here

calculate the average time difference between each stage

How to calculate the average time difference between each stage.
The challenge with the actual data set is not every id will go through all stages.. some will skip stages and the date is not continuous for all Id's like below.
id date status
1 1/1/18 requirement
1 1/8/18 analysis
1 ? design
1 1/30/18 closed
2 2/1/18 requirement
2 2/18/18 closed
3 1/2/18 requirement
3 1/29/18 analysis
3 ? accepted
3 2/5/18 closed
?--we have missing dates as well
Expected output
id date status time_spent
1 1/1/18 requirement 0
1 1/8/18 analysis 7
1 ? design
1 1/30/18 closed 22
2 2/1/18 requirement 0
2 2/18/18 closed 17
3 1/2/18 requirement 0
3 1/29/18 analysis 27
3 ? accepted
3 2/5/18 closed 24
status avg(timespent)
requirement 0
analysis 17
design
closed 21
You can use windowing functions LAG (or LEAD) to get the data of the previous (or next) status for each id. That will let you compute the time elapsed in each stage. Then, compute the average time elapsed for each stage.
Here is an example of how to do that:
with input_data (id, dte, status) as (
SELECT 1, TO_DATE('1/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/8/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 1, NULL, 'design' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/30/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/18/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/2/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/29/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 3, NULL, 'accepted' FROM DUAL UNION ALL
SELECT 3, TO_DATE('2/5/18','MM/DD/YY'), 'closed' FROM DUAL ),
----- Solution begins here
data_with_elapsed_days as (
SELECT id.*, dte-nvl(lag(dte ignore nulls) over ( partition by id order by dte ), dte) elapsed
from input_data id)
SELECT status, avg(elapsed)
FROM data_with_elapsed_days d
group by status
order by decode(status,'requirement',1,'analysis',2,'design',3,'accepted',4,'closed',5,99);
+-------------+-------------------------------------------+
| STATUS | AVG(ELAPSED) |
+-------------+-------------------------------------------+
| requirement | 0 |
| analysis | 17 |
| design | |
| accepted | |
| closed | 15.33333333333333333333333333333333333333 |
+-------------+-------------------------------------------+
As I said in my comment, that logic computes the elapsed days as the time to the given status from the prior status. Since, "requirement" has no prior status, this logic will always show zero days spent in requirements. It would probably be better to compute the time from the given status to the next status. For "closed", there would be no next status. You could just leave that blank or use SYSDATE as the data of the next status. Here is an example of that:
with input_data (id, dte, status) as (
SELECT 1, TO_DATE('1/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/8/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 1, NULL, 'design' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/30/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/18/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/2/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/29/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 3, NULL, 'accepted' FROM DUAL UNION ALL
SELECT 3, TO_DATE('2/5/18','MM/DD/YY'), 'closed' FROM DUAL ),
----- Solution begins here
data_with_elapsed_days as (
SELECT id.*, nvl(lead(dte ignore nulls) over ( partition by id order by dte ), trunc(sysdate))-dte elapsed
from input_data id)
SELECT status, avg(elapsed)
FROM data_with_elapsed_days d
group by status
order by decode(status,'requirement',1,'analysis',2,'design',3,'accepted',4,'closed',5,99);
+-------------+------------------------------------------+
| STATUS | AVG(ELAPSED) |
+-------------+------------------------------------------+
| requirement | 17 |
| analysis | 14.5 |
| design | |
| accepted | |
| closed | 361.666666666666666666666666666666666667 |
+-------------+------------------------------------------+
I agree with #MatthewMcPeak. Your requirements seem a bit odd: you spend zero days of requirement stage but spend an average of 21 days on closed? Fnord.
This solution treats the presented date as the start date of the stage and calculates the difference between it and the start_date of the next phase.
with cte as (
select status
, lead(dd ignore nulls) over (partition by id order by dd) - dd as dt_diff
from your_table)
select status, avg(dt_diff) as avg_ela
from cte
group by status
/
If you wish to include all stages for each d and estimate the time spent in each (using linear interpolation) then you can create a sub-query with all the statuses and use a PARTITION OUTER JOIN to join them and then use LAG and LEAD to find the date range the status is in and interpolate between:
Oracle Setup:
CREATE TABLE data ( d, dt, status ) AS
SELECT 1, TO_DATE( '1/1/18', 'MM/DD/YY' ), 'requirement' FROM DUAL UNION ALL
SELECT 1, TO_DATE( '1/8/18', 'MM/DD/YY' ), 'analysis' FROM DUAL UNION ALL
SELECT 1, NULL, 'design' FROM DUAL UNION ALL
SELECT 1, TO_DATE( '1/30/18', 'MM/DD/YY' ), 'closed' FROM DUAL UNION ALL
SELECT 2, TO_DATE( '2/1/18', 'MM/DD/YY' ), 'requirement' FROM DUAL UNION ALL
SELECT 2, TO_DATE( '2/18/18', 'MM/DD/YY' ), 'closed' FROM DUAL UNION ALL
SELECT 3, TO_DATE( '1/2/18', 'MM/DD/YY' ), 'requirement' FROM DUAL UNION ALL
SELECT 3, TO_DATE( '1/29/18', 'MM/DD/YY' ), 'analysis' FROM DUAL UNION ALL
SELECT 3, NULL, 'accepted' FROM DUAL UNION ALL
SELECT 3, TO_DATE( '2/5/18', 'MM/DD/YY' ), 'closed' FROM DUAL;
Query:
WITH statuses ( status, id ) AS (
SELECT 'requirement', 1 FROM DUAL UNION ALL
SELECT 'analysis', 2 FROM DUAL UNION ALL
SELECT 'design', 3 FROM DUAL UNION ALL
SELECT 'accepted', 4 FROM DUAL UNION ALL
SELECT 'closed', 5 FROM DUAL
),
ranges ( d, dt, status, id, recent_dt, recent_id, next_dt, next_id ) AS (
SELECT d.d,
d.dt,
s.status,
s.id,
NVL(
d.dt,
LAG( d.dt, 1 )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
),
NVL2(
d.dt,
s.id,
LAG( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1 )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
),
LEAD( d.dt, 1, d.dt )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id ),
LEAD( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1, s.id + 1 )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
FROM data d
PARTITION BY ( d )
RIGHT OUTER JOIN statuses s
ON ( d.status = s.status )
)
SELECT d,
dt,
status,
( next_dt - recent_dt ) / (next_id - recent_id ) AS estimated_duration
FROM ranges;
Output:
D | DT | STATUS | ESTIMATED_DURATION
-: | :-------- | :---------- | ---------------------------------------:
1 | 01-JAN-18 | requirement | 7
1 | 08-JAN-18 | analysis | 7.33333333333333333333333333333333333333
1 | null | design | 7.33333333333333333333333333333333333333
1 | null | accepted | 7.33333333333333333333333333333333333333
1 | 30-JAN-18 | closed | 0
2 | 01-FEB-18 | requirement | 4.25
2 | null | analysis | 4.25
2 | null | design | 4.25
2 | null | accepted | 4.25
2 | 18-FEB-18 | closed | 0
3 | 02-JAN-18 | requirement | 27
3 | 29-JAN-18 | analysis | 2.33333333333333333333333333333333333333
3 | null | design | 2.33333333333333333333333333333333333333
3 | null | accepted | 2.33333333333333333333333333333333333333
3 | 05-FEB-18 | closed | 0
Query 2:
Then of you can easily change that to take the average for each status:
WITH statuses ( status, id ) AS (
SELECT 'requirement', 1 FROM DUAL UNION ALL
SELECT 'analysis', 2 FROM DUAL UNION ALL
SELECT 'design', 3 FROM DUAL UNION ALL
SELECT 'accepted', 4 FROM DUAL UNION ALL
SELECT 'closed', 5 FROM DUAL
),
ranges ( d, dt, status, id, recent_dt, recent_id, next_dt, next_id ) AS (
SELECT d.d,
d.dt,
s.status,
s.id,
NVL(
d.dt,
LAG( d.dt, 1 )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
),
NVL2(
d.dt,
s.id,
LAG( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1 )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
),
LEAD( d.dt, 1, d.dt )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id ),
LEAD( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1, s.id + 1 )
IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
FROM data d
PARTITION BY ( d )
RIGHT OUTER JOIN statuses s
ON ( d.status = s.status )
)
SELECT status,
AVG( ( next_dt - recent_dt ) / (next_id - recent_id ) ) AS estimated_duration
FROM ranges
GROUP BY status, id
ORDER BY id;
Results:
STATUS | ESTIMATED_DURATION
:---------- | ---------------------------------------:
requirement | 12.75
analysis | 4.63888888888888888888888888888888888889
design | 4.63888888888888888888888888888888888889
accepted | 4.63888888888888888888888888888888888889
closed | 0
db<>fiddle here

Oracle 11g - How to calculate the value of a number in range minimum or max

i need a help to get solution to my problem, Please.
I have a table like this :
ID Number
|6 |20.90 |
|7 |45.00 |
|8 |52.00 |
|9 |68.00 |
|10 |120.00 |
|11 |220.00 |
|12 |250.00 |
The first range is 0 - 20.90.
When the value is in the half, the value id is for the max range.
When i got value 20.91, i want to get "ID = 6".
If the value is 31.00, i want to get "ID = 6"
If the value is
33.95, i want to get "ID = 7".
if the value is 44.99, i want to get ID = 7
How i can do it? Is there a function that will do what I need?
If you want the record with a number that is closest to your input, then you can use this:
select *
from (
select *
from mytable
order by abs(number - my_input_number), id
)
where rownum < 2
The inner query selects all records, but orders them by the distance they have from your input number. This distance can be calculated with number - my_input_number. But that could be negative, so we take the absolute value of that. This result is not output; it is just used to order by. So records with smaller distances will come first.
Now we need just the first of those records, and that is what the outer query does with the typical Oracle reserved word rownum: it represents a sequence number for every record of the final result set (1, 2, 3, ...). The where clause will effectively filter away all records we do not want to see, leaving only one (with smallest distance).
As mathguy suggested in comments, the order by now also has a second value to order by in case the input value is right at the mid point between the two closest records. In that case the record with the lowest id value will be chosen.
This is a good illustration of the power of analytic functions:
with mytable ( id, value ) as (
select 6, 20.90 from dual union all
select 7, 45.00 from dual union all
select 8, 52.00 from dual union all
select 9, 68.00 from dual union all
select 10, 120.00 from dual union all
select 11, 220.00 from dual union all
select 12, 250.00 from dual
),
inputs ( x ) as (
select 0.00 from dual union all
select 20.91 from dual union all
select 31.00 from dual union all
select 33.95 from dual union all
select 44.99 from dual union all
select 68.00 from dual union all
select 32.95 from dual union all
select 400.11 from dual
)
-- End of test data (not part of the solution). SQL query begins BELOW THIS LINE
select val as x, new_id as closest_id
from (
select id, val,
last_value(id ignore nulls) over (order by val desc) as new_id
from (
select id, (value + lead(value) over (order by value))/2 as val
from mytable
union all
select null, x
from inputs
)
)
where id is null
order by x -- if needed
;
Output:
X CLOSEST_ID
------ ----------
0 6
20.91 6
31 6
32.95 6
33.95 7
44.99 7
68 9
400.11 12

Resources