Create ("force") island in "gaps and island" problem - oracle

I have code that's partitioning my data for a gaps and island solution. The data itself is reporting on user activity, time spent working, and idle time based on logged timestamps and activities. My code is working great, but every once in a while I have a user_id that logs a string of activities for one application, goes idle, then returns to the same application to log additional activity. Based on my current code, it looks like the user spent nearly two hours in one application when in reality there was significant downtime in the middle. I want to "force" the creation of an island, restarting the partition if there is a lapse of greater than 30 minutes between activities.
ACTIVITY_DATE | USER_ID | APPL_ID | PR1 | PR2
---------------------------------------------------
11/20/2020 10:55 A 9340 1 1
11/20/2020 10:55 A 9340 2 2
11/20/2020 10:58 A 9340 3 3
11/20/2020 10:58 A 9340 4 4
11/20/2020 10:59 A 9340 5 5
11/20/2020 13:09 A 9340 6 6
11/20/2020 13:09 A 9340 7 7
11/20/2020 13:10 A 9340 8 8
11/20/2020 13:10 A 9340 9 9
11/20/2020 17:12 A 8354 10 1
11/20/2020 17:14 A 8354 11 2
11/20/2020 17:14 A 8354 12 3
The final result needs to restart the partition for column PR2 at the sixth row in this example because the gap between logged activities exceeds 30min for the same appl_id:
ACTIVITY_DATE | USER_ID | APPL_ID | PR1 | PR2
---------------------------------------------------
11/20/2020 10:55 A 9340 1 1
11/20/2020 10:55 A 9340 2 2
11/20/2020 10:58 A 9340 3 3
11/20/2020 10:58 A 9340 4 4
11/20/2020 10:59 A 9340 5 5
11/20/2020 13:09 A 9340 6 1
11/20/2020 13:09 A 9340 7 2
11/20/2020 13:10 A 9340 8 3
11/20/2020 13:10 A 9340 9 4
11/20/2020 17:12 A 8354 10 1
11/20/2020 17:14 A 8354 11 2
11/20/2020 17:14 A 8354 12 3
Here's my current code:
select activity_date, user_id, appl_id,
row_number() over(partition by user_id order by activity_date) rn1,
row_number() over(partition by user_id, appl_id order by activity_date) rn2
from
(select
activity_date, user_id, appl_id, count(*)
from mytable tt
where
user_id in ('A', 'B', 'C')
and activity_date >= trunc(sysdate - 4,'DD')
and activity_date <= trunc(sysdate - 3,'DD')
group by
activity_date, user_id, appl_id) tt

You can use MATCH_RECOGNIZE:
SELECT activity_date,
user_id,
appl_id,
pr1,
ROW_NUMBER() OVER ( PARTITION BY user_id, appl_id, mno ORDER BY pr1 )
AS pr2
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date) AS pr1
FROM table_name t
)
MATCH_RECOGNIZE(
PARTITION BY user_id, appl_id
ORDER BY pr1
MEASURES
MATCH_NUMBER() AS mno
ALL ROWS PER MATCH
PATTERN ( activities* last_activity )
DEFINE activities AS
NEXT(activity_date) <= LAST(activity_date) + INTERVAL '30' MINUTE
)
ORDER BY user_id, pr1;
Which, for the sample data:
CREATE TABLE table_name ( ACTIVITY_DATE, USER_ID, APPL_ID ) AS
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:59' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:12' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL;
Outputs:
ACTIVITY_DATE | USER_ID | APPL_ID | PR1 | PR2
:------------------ | :------ | ------: | --: | --:
2020-11-20 10:55:00 | A | 9340 | 1 | 1
2020-11-20 10:55:00 | A | 9340 | 2 | 2
2020-11-20 10:58:00 | A | 9340 | 3 | 3
2020-11-20 10:58:00 | A | 9340 | 4 | 4
2020-11-20 10:59:00 | A | 9340 | 5 | 5
2020-11-20 13:09:00 | A | 9340 | 6 | 1
2020-11-20 13:09:00 | A | 9340 | 7 | 2
2020-11-20 13:10:00 | A | 9340 | 8 | 3
2020-11-20 13:10:00 | A | 9340 | 9 | 4
2020-11-20 17:12:00 | A | 8354 | 10 | 1
2020-11-20 17:14:00 | A | 8354 | 11 | 2
2020-11-20 17:14:00 | A | 8354 | 12 | 3
db<>fiddle here

Related

Oracle sql set end date based on previous start date

I have one table where I need to add new column endDate for future implementation but since we have currently only start date for all records I need to set endDate which should be equal to start date from previous record that are connected by userId and if it is only one record for that user than end date will have some value in future.
For example:
Table structure:
ID | USER_ID | START_DATE | END_DATE
-------------------------------------
1 | 1 | 01.01.2015 |
2 | 1 | 01.01.2016 |
3 | 1 | 01.07.2018 |
4 | 1 | 01.08.2021 |
5 | 2 | 01.01.2015 |
6 | 3 | 01.01.2016 |
7 | 3 | 01.07.2018 |
8 | 4 | 01.08.2021 |
Expected result should be like this
ID | USER_ID | START_DATE | END_DATE
-------------------------------------
1 | 1 | 01.01.2015 | 01.01.2016
2 | 1 | 01.01.2016 | 01.07.2018
3 | 1 | 01.07.2018 | 01.08.2021
4 | 1 | 01.08.2021 | 01.01.2050
5 | 2 | 01.01.2015 | 01.01.2050
6 | 3 | 01.01.2016 | 01.07.2018
7 | 3 | 01.07.2018 | 01.01.2050
8 | 4 | 01.08.2021 | 01.01.2050
Can someone help me with how query in oracle databse should look to update it like this?
I've tried something with for loop but not sure how to continue from this step
DECLARE
CURSOR c_contract
IS
SELECT
USER_ID
FROM
CONTRACT
ORDER_BY START_DATE

BEGIN
FOR r_contract IN c_contract
LOOP

dbms_output.put_line( r_contract.USER_ID );
END LOOP;

END;
Use the LEAD analytic function with the default date as the third argument:
SELECT t.*,
LEAD( start_date, 1, DATE '2050-01-01') OVER (
PARTITION BY user_id
ORDER BY start_date
) AS end_date
FROM table_name t
Which, for the sample data:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';
CREATE TABLE table_name ( ID, USER_ID, START_DATE ) AS
SELECT 1, 1, DATE '2015-01-01' FROM DUAL UNION ALL
SELECT 2, 1, DATE '2016-01-01' FROM DUAL UNION ALL
SELECT 3, 1, DATE '2018-07-01' FROM DUAL UNION ALL
SELECT 4, 1, DATE '2021-08-01' FROM DUAL UNION ALL
SELECT 5, 2, DATE '2015-01-01' FROM DUAL UNION ALL
SELECT 6, 3, DATE '2016-01-01' FROM DUAL UNION ALL
SELECT 7, 3, DATE '2018-07-01' FROM DUAL UNION ALL
SELECT 8, 4, DATE '2021-08-01' FROM DUAL;
Outputs:
ID
USER_ID
START_DATE
END_DATE
1
1
2015-01-01 00:00:00
2016-01-01 00:00:00
2
1
2016-01-01 00:00:00
2018-07-01 00:00:00
3
1
2018-07-01 00:00:00
2021-08-01 00:00:00
4
1
2021-08-01 00:00:00
2050-01-01 00:00:00
5
2
2015-01-01 00:00:00
2050-01-01 00:00:00
6
3
2016-01-01 00:00:00
2018-07-01 00:00:00
7
3
2018-07-01 00:00:00
2050-01-01 00:00:00
8
4
2021-08-01 00:00:00
2050-01-01 00:00:00
If you want to add a column then:
ALTER TABLE table_name ADD (end_date DATE);
MERGE INTO table_name dst
USING (
SELECT ROWID AS rid,
LEAD( start_date, 1, DATE '2050-01-01') OVER (
PARTITION BY user_id
ORDER BY start_date
) AS end_date
FROM table_name
) src
ON (dst.ROWID = src.rid)
WHEN MATCHED THEN
UPDATE SET end_date = src.end_date;
fiddle

how to check length of "concatenate" string

Has anybody an idea how to write a query in Oracle to get length of characters:
| user | action |
-------------------------
| mary | aaa | # 3 characters from action
| mary | bbbbb | # 5 characters from action
| mary | c | # 1 character from action
| adam | xx | # 2 characters from action
| adam | yyyy | # 4 characters from action
| adam | zzzzzzz | # 7 characters from action
So in result should be sum of characters for each:
| mary | 9 |
| adam | 13 |
Thanks.
SUM + LENGTH along with GROUP BY user. Sample data in lines #1 - 14; query you need begins at line #15.
SQL> WITH
2 test (cuser, action)
3 AS
4 (SELECT 'mary', 'aaa' FROM DUAL
5 UNION ALL
6 SELECT 'mary', 'bbbbb' FROM DUAL
7 UNION ALL
8 SELECT 'mary', 'c' FROM DUAL
9 UNION ALL
10 SELECT 'adam', 'xx' FROM DUAL
11 UNION ALL
12 SELECT 'adam', 'yyyy' FROM DUAL
13 UNION ALL
14 SELECT 'adam', 'zzzzzzz' FROM DUAL)
15 SELECT cuser, SUM (LENGTH (action))
16 FROM test
17 GROUP BY cuser;
CUSE SUM(LENGTH(ACTION))
---- -------------------
mary 9
adam 13
SQL>
Use the LENGTH function in your aggregation:
SELECT "USER",
SUM( LENGTH( action ) ) AS total_length
FROM table_name
GROUP BY "USER"
Which, for your sample data:
CREATE TABLE table_name( "USER", action ) AS
SELECT 'mary', 'aaa' FROM DUAL UNION ALL
SELECT 'mary', 'bbbbb' FROM DUAL UNION ALL
SELECT 'mary', 'c' FROM DUAL UNION ALL
SELECT 'adam', 'xx' FROM DUAL UNION ALL
SELECT 'adam', 'yyyy' FROM DUAL UNION ALL
SELECT 'adam', 'zzzzzzz' FROM DUAL;
Outputs:
USER | TOTAL_LENGTH
:--- | -----------:
mary | 9
adam | 13
db<>fiddle here

Get the sales count within a month

I have monthly agents' sales data. I need to get the first 25 days sales count and last 5 days columns.
How to get in separately.
I have below table
Agent_ID Date Device
2343 1/1/2019 33330
3245 1/1/2019 43554
2343 5/1/2019 46665
3245 10/1/2019 78900
2343 15/1/2019 55678
2343 26/1/2019 45678
3245 28/1/2019 48900
2343 30/1/2019 56710
5645 12/1/2019 33067
5645 15/1/2019 44890
2121 31/1/2019 55810
I need to get this below output table
Agent_ID first_25days_sale_count Last_5days_sale_count
2343 3 2
3245 2 1
5645 2 0
2121 0 1
Some months have 28, 29 or 31 days so naively using "first 25 days" and "last 5 days" may lead to either double counting (i.e. days 24 and 25 when February has 28 days) or not counting some days (i.e. day 26 when the the month has 31 days). You should decide whether you want to count:
The first 25 days and then the remaining 3-6 days after that; or
The last 5 days and then the 23-26 days before that.
Whichever you chose, you can use conditional aggregation:
SELECT agent_id,
COUNT(
CASE
WHEN EXTRACT( DAY FROM "Date" ) <= 25
THEN 1
END
) AS first_25days_sale_count,
COUNT(
CASE
WHEN EXTRACT( DAY FROM "Date" ) > 25
THEN 1
END
) AS after_first_25days_sale_count,
COUNT(
CASE
WHEN "Date" < TRUNC( LAST_DAY( "Date" ) ) - INTERVAL '4' DAY
THEN 1
END
) AS not_last_5days_sale_count,
COUNT(
CASE
WHEN "Date" >= TRUNC( LAST_DAY( "Date" ) ) - INTERVAL '4' DAY
THEN 1
END
) AS last_5days_sale_count
FROM your_table
GROUP BY agent_id;
So, for your sample data:
CREATE TABLE your_table ( Agent_ID, "Date", Device ) AS
SELECT 2343, DATE '2019-01-01', 33330 FROM DUAL UNION ALL
SELECT 3245, DATE '2019-01-01', 43554 FROM DUAL UNION ALL
SELECT 2343, DATE '2019-01-05', 46665 FROM DUAL UNION ALL
SELECT 3245, DATE '2019-01-10', 78900 FROM DUAL UNION ALL
SELECT 2343, DATE '2019-01-15', 55678 FROM DUAL UNION ALL
SELECT 2343, DATE '2019-01-26', 45678 FROM DUAL UNION ALL
SELECT 3245, DATE '2019-01-29', 48900 FROM DUAL UNION ALL
SELECT 2343, DATE '2019-01-30', 56710 FROM DUAL UNION ALL
SELECT 5645, DATE '2019-01-12', 33067 FROM DUAL UNION ALL
SELECT 5645, DATE '2019-01-15', 44890 FROM DUAL UNION ALL
SELECT 2121, DATE '2019-01-31', 55810 FROM DUAL;
This outputs:
AGENT_ID | FIRST_25DAYS_SALE_COUNT | AFTER_FIRST_25DAYS_SALE_COUNT | NOT_LAST_5DAYS_SALE_COUNT | LAST_5DAYS_SALE_COUNT
-------: | ----------------------: | ----------------------------: | ------------------------: | --------------------:
3245 | 2 | 1 | 2 | 1
2121 | 0 | 1 | 0 | 1
5645 | 2 | 0 | 2 | 0
2343 | 3 | 2 | 4 | 1
db<>fiddle here

repeat rows in a given sequence format

I have a table with following data
Order_no | Part_No | R_from | R_to
1001 | 1010037-00L| 1 | 5
1001 | 1010025-00L| 6 | 12
I need to get the above data to a report in below manner.
R_NO | PART_NO
------------------
1 | 1010037-00L
2 | 1010037-00L
3 | 1010037-00L
4 | 1010037-00L
5 | 1010037-00L
6 | 1010025-00L
7 | 1010025-00L
8 | 1010025-00L
9 | 1010025-00L
10 | 1010025-00L
11 | 1010025-00L
12 | 1010025-00L
Something like:
WITH r_nos ( r_no ) AS (
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= ( SELECT MAX( R_to ) FROM your_table )
)
SELECT r_no,
part_no
FROM r_nos r
INNER JOIN
your_table y
ON ( r.r_no BETWEEN y.r_from AND y.r_to )
Here's an alternative which doesn't require a separate join. You should test both solutions to see which is more performant for your data etc, though.
WITH your_table AS (SELECT 1001 order_no, '1010037-00L' part_no, 1 r_from, 5 r_to FROM dual UNION ALL
SELECT 1001 order_no, '1010025-00L' part_no, 6 r_from, 12 r_to FROM dual)
SELECT r_from + LEVEL -1 r_no,
order_no,
part_no
FROM your_table
CONNECT BY PRIOR order_no = order_no
AND PRIOR part_no = part_no
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= r_to - r_from + 1
ORDER BY r_no;
R_NO ORDER_NO PART_NO
---------- ---------- -----------
1 1001 1010037-00L
2 1001 1010037-00L
3 1001 1010037-00L
4 1001 1010037-00L
5 1001 1010037-00L
6 1001 1010025-00L
7 1001 1010025-00L
8 1001 1010025-00L
9 1001 1010025-00L
10 1001 1010025-00L
11 1001 1010025-00L
12 1001 1010025-00L

Oracle Group by salary range(0-999,1000-1999,2000-2999,...) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
user_id | username | salary |
+---------+----------+------+
| 1 | John | 4000 |
| 2 | Paul | 0900 |
| 3 | Adam | 0589 |
| 4 | Ben | 2154 |
| 5 | Charles | 2489 |
| 6 | Dean | 2500 |
| 7 | Edward | 2900 |
| 8 | Fred | 2800 |
| 9 | George | 4100 |
| 10 | Hugo | 5200 |
I need output like this
range count
--------------------
0-999 2
1000-1999 0
2000-2999 5
3000-3999 0
4000-4999 2
5000-5999 1
Here is an attempt:
with w as
(
select 1000 * (level - 1) low, 1000 * level high from dual
connect by level <= 10
)
select w.low, w.high, sum(decode(t.user_id, null, 0, 1)) nb
from w, test_epn t
where w.low <= t.salary (+)
and w.high > t.salary (+)
group by w.low, w.high
order by w.low
;
This gives:
1 0 1000 2
2 1000 2000 0
3 2000 3000 5
4 3000 4000 0
5 4000 5000 2
6 5000 6000 1
7 6000 7000 0
8 7000 8000 0
9 8000 9000 0
10 9000 10000 0
SQL> col range format a30
SQL> with t as (
2 select 'John' name, 4000 sal from dual union all
3 select 'Paul' name, 900 from dual union all
4 select 'Adam' name, 589 from dual union all
5 select 'Ben' name, 2154 from dual union all
6 select 'Charles' name, 2489 from dual union all
7 select 'Dean' name, 2500 from dual union all
8 select 'Edward' name, 2900 from dual union all
9 select 'Fred' name, 2800 from dual union all
10 select 'George' name, 4100 from dual union all
11 select 'Hugo' name, 5200 from dual
12 )
13 select to_char(pvtid*1000)||'-'||to_char(pvtid*1000+999) range, count(t.sal)
14 from t
15 ,
16 (
17 select rownum-1 pvtid
18 from dual connect by level <= (select floor(max(sal)/1000) from t)+1
19 ) piv
20 where piv.pvtid = floor(t.sal(+)/1000)
21 group by piv.pvtid
22 order by 1
23 /
RANGE COUNT(T.SAL)
------------------------------ ------------
0-999 2
1000-1999 0
2000-2999 5
3000-3999 0
4000-4999 2
5000-5999 1
Oracle 11g R2 Schema Setup:
create table test_table as
select 1 user_id, 'John' username , 4000 salary from dual union all
select 2 , 'Paul' , 0900 from dual union all
select 3 , 'Adam' , 0589 from dual union all
select 4 , 'Ben' , 2154 from dual union all
select 5 , 'Charles' , 2489 from dual union all
select 6 , 'Dean' , 2500 from dual union all
select 7 , 'Edward' , 2900 from dual union all
select 8 , 'Fred' , 2800 from dual union all
select 9 , 'George' , 4100 from dual union all
select 10 , 'Hugo' , 5200 from dual
Query 1:
with range_tab(f,t) as (select (level - 1)*1000 , (level - 1)*1000 + 999
from dual
connect by (level - 1)*1000 <= (select max(salary) from test_table))
select f ||'-'|| t as range, count(user_id)
from test_table
right outer join range_tab on (salary between f and t)
group by f, t
order by 1
[Results][2]:
| RANGE | COUNT(USER_ID) |
|-----------|----------------|
| 0-999 | 2 |
| 1000-1999 | 0 |
| 2000-2999 | 5 |
| 3000-3999 | 0 |
| 4000-4999 | 2 |
| 5000-5999 | 1 |
In case of fixed interval you can also use Oracle WIDTH_BUCKET function.
select count(*),
(WIDTH_BUCKET(salary, 0, 10000,10)-1)*1000 ||'-'||to_char(WIDTH_BUCKET(salary, 0, 10000,10)*1000-1) as salary_range
from table1
group by WIDTH_BUCKET(salary, 0, 10000,10)
order by salary_range;
| COUNT(*) | SALARY_RANGE |
|----------|--------------|
| 2 | 0-999 |
| 5 | 2000-2999 |
| 2 | 4000-4999 |
| 1 | 5000-5999 |
Disadvantage is: It does not count empty buckets, but maybe this satisfy your needs anyway.

Resources