Determine start of data's most recent uninterrupted 'streak' by date - oracle

I have a dataset that looks something like:
asset_id,date_logged
1234,2018-02-01
1234,2018-02-02
1234,2018-02-03
1234,2018-02-04
1234,2018-02-05
1234,2018-02-06
1234,2018-02-07
1234,2018-02-08
1234,2018-02-09
1234,2018-02-10
9876,2018-02-01
9876,2018-02-02
9876,2018-02-03
9876,2018-02-07
9876,2018-02-08
9876,2018-02-09
9876,2018-02-10
For the purpose of this exercise, imagine today's date is 2018-02-10 (10 Feb 2018). For all the asset_ids in the table, I am trying to identify the start of the most recent unbroken streak for date_logged.
For asset_id = 1234, this would be 2018-02-01. The asset_id was logged all 10 days in an unbroken streak. For asset_id = 9876, this would be 2018-02-07. Because the asset_id was not logged on 2018-02-04, 2018-02-05, and 2018-02-06, the most recent unbroken streak starts on 2018-02-07.
So, my result set would hopefully look something like:
asset_id,Number_of_days_in_most_recent_logging_streak
1234,10
9876,4
Or, alternatively:
asset_id,Date_Begin_Most_Recent_Streak
1234,2018-02-01
9876,2018-02-07
I haven't been able to work out anything that gets me close -- my best effort so far is to get the number of days since the first log date and today, and the number of days the asset_id appears in the dataset, and compare these to identify situations where the streak is more recent than the first day they appear. For my real dataset this isn't particularly problematic, but it's an ugly solution and I would like to understand a better way of getting to the outcome.

Perhaps something like this. Break the query after each inline view in the WITH clause and SELECT * FROM the most recent inline view, to see what each step does.
with
inputs ( asset_id, date_logged ) as (
select 1234, to_date('2018-02-01', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-02', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-03', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-04', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-05', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-06', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-07', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-08', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-09', 'yyyy-mm-dd') from dual union all
select 1234, to_date('2018-02-10', 'yyyy-mm-dd') from dual union all
select 9876, to_date('2018-02-01', 'yyyy-mm-dd') from dual union all
select 9876, to_date('2018-02-02', 'yyyy-mm-dd') from dual union all
select 9876, to_date('2018-02-03', 'yyyy-mm-dd') from dual union all
select 9876, to_date('2018-02-07', 'yyyy-mm-dd') from dual union all
select 9876, to_date('2018-02-08', 'yyyy-mm-dd') from dual union all
select 9876, to_date('2018-02-09', 'yyyy-mm-dd') from dual union all
select 9876, to_date('2018-02-10', 'yyyy-mm-dd') from dual
),
prep ( asset_id, date_logged, grp ) as (
select asset_id, date_logged,
date_logged - row_number()
over (partition by asset_id order by date_logged)
from inputs
),
agg ( asset_id, date_logged, cnt ) as (
select asset_id, min(date_logged), count(*)
from prep
group by asset_id, grp
)
select asset_id, max(date_logged) as date_start_recent_streak,
max(cnt) keep (dense_rank last order by date_logged) as cnt
from agg
group by asset_id
order by asset_id -- If needed
;
ASSET_ID DATE_START_RECENT_STREAK CNT
---------- ------------------------ ----------
1234 2018-02-01 10
9876 2018-02-07 4

you can try this,
with test (asset_id, date_logged) as
(select 1234, date '2018-02-01' from dual union all
select 1234, date '2018-02-02' from dual union all
select 1234, date '2018-02-03' from dual union all
select 1234, date '2018-02-04' from dual union all
select 1234, date '2018-02-05' from dual union all
select 1234, date '2018-02-06' from dual union all
select 1234, date '2018-02-07' from dual union all
select 1234, date '2018-02-08' from dual union all
select 1234, date '2018-02-09' from dual union all
select 1234, date '2018-02-10' from dual union all
select 9876, date '2018-02-01' from dual union all
select 9876, date '2018-02-02' from dual union all
select 9876, date '2018-02-03' from dual union all
select 9876, date '2018-02-07' from dual union all
select 9876, date '2018-02-08' from dual union all
select 9876, date '2018-02-09' from dual union all
select 9876, date '2018-02-10' from dual union all
select 9876, date '2018-02-11' from dual union all
select 9876, date '2018-02-12' from dual
)
SELECT asset_id, MIN(date_logged), COUNT(1)
FROM (SELECT asset_id, date_logged,
MAX(date_logged) OVER (PARTITION BY asset_id)+1 max_date_logged_plus_one,
DENSE_RANK() OVER (PARTITION BY asset_id ORDER BY date_logged desc) rown
FROM test
ORDER BY asset_id, date_logged desc)
WHERE max_date_logged_plus_one - date_logged = rown
GROUP BY asset_id;
ASSET_ID MIN(DATE_LOGGED) COUNT(1)
---------- ---------------- ----------
1234 01-FEB-18 10
9876 07-FEB-18 6
if below data is commented, output is
select 9876, date '2018-02-10' from dual union all
ASSET_ID MIN(DATE_LOGGED) COUNT(1)
---------- ---------------- ----------
1234 01-FEB-18 10
9876 11-FEB-18 2

Would this make any sense?
SQL> with test (asset_id, date_logged) as
2 (select 1234, date '2018-02-01' from dual union all
3 select 1234, date '2018-02-02' from dual union all
4 select 1234, date '2018-02-03' from dual union all
5 select 1234, date '2018-02-04' from dual union all
6 select 1234, date '2018-02-05' from dual union all
7 select 1234, date '2018-02-06' from dual union all
8 select 1234, date '2018-02-07' from dual union all
9 select 1234, date '2018-02-08' from dual union all
10 select 1234, date '2018-02-09' from dual union all
11 select 1234, date '2018-02-10' from dual union all
12 select 9876, date '2018-02-01' from dual union all
13 select 9876, date '2018-02-02' from dual union all
14 select 9876, date '2018-02-03' from dual union all
15 select 9876, date '2018-02-07' from dual union all
16 select 9876, date '2018-02-08' from dual union all
17 select 9876, date '2018-02-09' from dual union all
18 select 9876, date '2018-02-10' from dual
19 ),
20 inter as
21 -- difference between DATE_LOGGED and its previous DATE_LOGGED
22 (select asset_id,
23 date_logged,
24 date_logged - lag(date_logged) over (partition by asset_id order by date_logged) diff
25 from test
26 )
27 select i.asset_id, min(i.date_logged) date_logged
28 from inter i
29 where nvl(i.diff, 1) = (select max(i1.diff) from inter i1
30 where i1.asset_id = i.asset_id
31 )
32 group by i.asset_id
33 order by i.asset_id;
ASSET_ID DATE_LOGGE
---------- ----------
1234 2018-02-01
9876 2018-02-07
SQL>

Related

how to get last businessday of last month in oralce

I have data like this my table
2020-01-01 H
2020-01-02 B
2020-01-03 B
2020-01-04 B
.
2020-01-29 B
2020-01-30 H
2020-01-31 H
2020-01-02 H
2020-02-02 H
2020-02-03 B
2020-02-04 B
2020-02-05 B
.
now my problem is in the current month i need to check third business day i.e in this case 2020-02-05 i need to get last business day of last month. i.e.2020-01-29
By adding 2 columns:
row_number() over(partition by trunc(date_value,'MM'), day_type order by date_value) as rn_month_asc,
row_number() over(partition by trunc(date_value,'MM'), day_type order by date_value desc) as rn_month_desc
in a month the 3rd business day will have rn_month_asc=3 and day_type ='B' and the latest business day will have rn_month_desc=1 and day_type ='B', and easy to query other situations if you need to.
in the current month I need to check third business day
From Oracle 12, you can use:
SELECT date_value
FROM table_name
WHERE TRUNC(SYSDATE, 'MM') <= date_value
AND date_value < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), 1)
AND day_type = 'B'
ORDER BY date_value ASC
OFFSET 2 ROWS
FETCH NEXT ROW ONLY;
Which, for the sample data:
CREATE TABLE table_name (date_value, day_type) AS
SELECT DATE '2020-01-01', 'H' FROM DUAL UNION ALL
SELECT DATE '2020-01-02', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-01-03', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-01-04', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-01-05', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-01-28', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-01-29', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-01-30', 'H' FROM DUAL UNION ALL
SELECT DATE '2020-01-31', 'H' FROM DUAL UNION ALL
SELECT DATE '2020-01-02', 'H' FROM DUAL UNION ALL
SELECT DATE '2020-02-02', 'H' FROM DUAL UNION ALL
SELECT DATE '2020-02-03', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-02-04', 'B' FROM DUAL UNION ALL
SELECT DATE '2020-02-05', 'B' FROM DUAL;
If the current month was 2020-01 then the output is:
DATE_VALUE
04-JAN-20
I need to get last business day of last month
SELECT date_value
FROM table_name
WHERE ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -1) <= date_value
AND date_value < TRUNC(SYSDATE, 'MM')
AND day_type = 'B'
ORDER BY date_value DESC
FETCH FIRST ROW ONLY;
If the current month is 2020-02 then the output is:
DATE_VALUE
29-JAN-20
fiddle

Duplicated rows numbering

I need to number the rows so that the row number with the same ID is the same. For example:
Oracle database. Any ideas?
Use the DENSE_RANK analytic function:
SELECT DENSE_RANK() OVER (ORDER BY id) AS row_number,
id
FROM your_table
Which, for the sample data:
CREATE TABLE your_table ( id ) AS
SELECT 86325 FROM DUAL UNION ALL
SELECT 86325 FROM DUAL UNION ALL
SELECT 86326 FROM DUAL UNION ALL
SELECT 86326 FROM DUAL UNION ALL
SELECT 86352 FROM DUAL UNION ALL
SELECT 86353 FROM DUAL UNION ALL
SELECT 86354 FROM DUAL UNION ALL
SELECT 86354 FROM DUAL;
Outputs:
ROW_NUMBER
ID
1
86325
1
86325
2
86326
2
86326
3
86352
4
86353
5
86354
5
86354
db<>fiddle here

Oracle: Is it possible to filter out duplicates only when they appear in succession?

Thank you in advance for your help.
I have a table that holds itinerary information for drivers. There will be times when the itinerary seems to have the same stop (but is several days apart). I'd like to be able to query the table and filter out any record where the address is the same AND the dates are consecutive.
Is this possible?
Thanks again,
josh
with tst as(
select timestamp '2020-08-01 00:00:00' dt, '123 street' loc from dual
union all
select timestamp '2020-08-01 00:00:00', '89 street' from dual
union all
select timestamp '2020-08-02 00:00:00', '456 airport' from dual
union all
select timestamp '2020-08-04 00:00:00', '456 airport' from dual
union all
select timestamp '2020-08-05 00:00:00', '67 street' from dual
union all
select timestamp '2020-08-06 00:00:00', '89 street' from dual
union all
select timestamp '2020-08-07 00:00:00', '123 street' from dual
)
select dt, loc
from (
select dt, loc, nvl(lag(loc) over(order by dt), 'FIRST_ROW') prev_loc
from tst
) where loc <> prev_loc;
fiddle
Another approach would be to use Tabibitosan method which assign consecutive rows a group number and then count number of rows per group.(found in asktom website).
with test_data as(
select date'2020-08-01' dt, '123 street' loc from dual
union all
select date '2020-08-01', '89 street' from dual
union all
select date '2020-08-02', '456 airport' from dual
union all
select date '2020-08-04', '456 airport' from dual
union all
select date '2020-08-05', '67 street' from dual
union all
select date '2020-08-06', '89 street' from dual
union all
select date '2020-08-07', '123 street' from dual
)
select max(dt),loc
from
(
select t.*
,row_number() over (order by dt) -
row_number() over (partition by loc order by dt) grp
from test_data t
)
group by grp,loc
having count(*) > 1;
Another approach using match_recognize available from 12c onwards.patter used {1,} says repeated one or more times
more to learn match_recognize here
with test_data as(
select date'2020-08-01' dt, '123 street' loc from dual
union all
select date '2020-08-01', '89 street' from dual
union all
select date '2020-08-02', '456 airport' from dual
union all
select date '2020-08-04', '456 airport' from dual
union all
select date '2020-08-05', '67 street' from dual
union all
select date '2020-08-06', '89 street' from dual
union all
select date '2020-08-07', '123 street' from dual
)
select *
from test_data
match_recognize (
order by dt
all rows per match
pattern (equal{1,})
define
equal as loc = prev(loc)
);
Playground: Dbfiddle

oracle group by date with specific time

some will say "another question from that guy" but here is my Problem. all this works as designed:
with tab1 as (
select to_timestamp( '04.02.15 14:25:21.503000000' ) as dt from dual union all
select to_timestamp( '04.02.15 14:25:25.154000000' ) as dt from dual union all
select to_timestamp( '09.02.15 22:20:36.861000000' ) as dt from dual union all
select to_timestamp( '09.02.15 22:20:36.883000000' ) as dt from dual union all
select to_timestamp( '10.02.15 04:19:13.839000000' ) as dt from dual union all
select to_timestamp( '10.02.15 04:13:18.142000000' ) as dt from dual union all
select to_timestamp( '10.02.15 12:43:18.171000000' ) as dt from dual union all
select to_timestamp( '11.02.15 04:30:53.654000000' ) as dt from dual union all
select to_timestamp( '11.02.15 22:00:38.951000000' ) as dt from dual union all
select to_timestamp( '11.02.15 22:00:42.014000000' ) as dt from dual union all
select to_timestamp( '16.02.15 08:50:43.967000000' ) as dt from dual union all
select to_timestamp( '16.02.15 16:35:41.387000000' ) as dt from dual union all
select to_timestamp( '16.02.15 16:35:42.835000000' ) as dt from dual union all
select to_timestamp( '17.02.15 04:21:08.542000000' ) as dt from dual union all
select to_timestamp( '17.02.15 04:21:08.912000000' ) as dt from dual union all
select to_timestamp( '17.02.15 04:06:09.818000000' ) as dt from dual union all
select to_timestamp( '17.02.15 04:40:39.411000000' ) as dt from dual union all
select to_timestamp( '18.02.15 04:41:08.218000000' ) as dt from dual union all
select to_timestamp( '18.02.15 03:20:40.609000000' ) as dt from dual union all
select to_timestamp( '18.02.15 01:20:40.712000000' ) as dt from dual union all
select to_timestamp( '20.02.15 06:55:42.185000000' ) as dt from dual union all
select to_timestamp( '20.02.15 12:55:42.364000000' ) as dt from dual union all
select to_timestamp( '20.02.15 12:55:42.518000000' ) as dt from dual union all
select to_timestamp( '20.02.15 12:55:43.874000000' ) as dt from dual union all
select to_timestamp( '20.02.15 14:16:05.080000000' ) as dt from dual union all
select to_timestamp( '20.02.15 18:14:17.630000000' ) as dt from dual union all
select to_timestamp( '22.02.15 21:25:40.683000000' ) as dt from dual union all
select to_timestamp( '22.02.15 21:25:42.046000000' ) as dt from dual union all
select to_timestamp( '23.02.15 12:43:27.246000000' ) as dt from dual
order by dt
),
tab2 as(
select trunc(dt) as leaddate, dt,
case
when dt between (to_timestamp(trunc(dt)) + interval '04:30' hour to minute) and (to_timestamp(trunc(dt)) + interval '28:29' hour to minute) then (dt)
else (dt) - interval '04:30' hour to minute
end as newBaseTime
from tab1
)
select trunc(newBaseTime),
sum(case when ( dt <= to_timestamp(trunc( trunc(dt)),'dd.MM.yy') + interval '17:30' hour to minute) then 1 else 0 end) as beforeTS,
sum(case when ( dt > to_timestamp(trunc( trunc(dt)),'dd.MM.yy') + interval '17:30' hour to minute) then 1 else 0 end) as afterTS
from tab2
group by trunc(newBaseTime)
order by trunc(newBaseTime)
the idea is to Group by days with a "new time base" and check if Dates are before or after a defined daytime. due to contracts days in our company lasts from 4.30a.m. this day to 4.30. next day. my solution above works (with little data), but i guess there is an easier way to get result. any idea?
Not sure exactly what you are trying to do, but you seem stuck on this question... perhaps this is the solution you are looking for?
select dt, trunc(dt - interval '270' minute) as leaddate from tab1
This will preserve the timestamp (showing perhaps "today's" date) but if the time is before 4:30 am, the leaddate will be "yesterday's" date.
If this is NOT what you were looking for, please try to clarify your question.

Group 'n' rows in to columns - oracle

I have a situation where I need to split 'n' rows in to column group. For example, Below is dataset
COMMENT_TEXT
T1
T2
T3
T4
T5
T6
Expected Output:
SUN MON TUE
T1 T2 T3
T4 T5 T6
My Query:
SELECT htbp1.comment_text
FROM hxc_time_building_blocks htbp,
hxc_time_building_blocks htbp1
WHERE htbp1.parent_building_block_id = htbp.time_building_block_id
AND htbp1.parent_building_block_ovn = htbp.parent_building_block_ovn
AND htbp.parent_building_block_id = 116166
AND htbp.parent_building_block_ovn = 1
ORDER BY htbp1.time_building_block_id
Is there any way I can do PIVOT with a 'n' rows and without aggregate function?
Edit: T1/T2/T3 as sample data sets but in real it can be any random free text or null.
SELECT * FROM (SELECT htbp1.comment_text, TO_CHAR (htbp.start_time, 'DY') par_time,
trunc((rownum-1) / 7) buck
FROM hxc_time_building_blocks htbp,
hxc_time_building_blocks htbp1,
hxc_timecard_summary hts
WHERE hts.RESOURCE_ID = :p_resource_id
AND TRUNC(hts.STOP_TIME) = TRUNC(:p_wkend_date)
AND htbp1.parent_building_block_id = htbp.time_building_block_id
AND htbp1.parent_building_block_ovn = htbp.parent_building_block_ovn
AND htbp.parent_building_block_id = hts.timecard_id
AND htbp.parent_building_block_ovn = hts.timecard_ovn
ORDER BY htbp1.time_building_block_id ) PIVOT( max(comment_text) FOR par_time
IN ('SUN' AS "SUN",
'MON' AS "MON",
'TUE' AS "TUE",
'WED' AS "WED",
'THU' AS "THU",
'FRI' AS "FRI",
'SAT' AS "SAT"));
When I added the another table 'hxc_timecard_summary' which is parent then data is going crazy, but if I use the hardcoded parameters like the one in the first then the rows are showing up fine.
PIVOT also uses an aggregate function but you don't need a GROUP BY:
with tab as (
select sysdate - 7 date_col, 'T1' comment_text from dual
union all select sysdate - 6, 'T2' from dual
union all select sysdate - 5, 'T3' from dual
union all select sysdate - 4, 'T4' from dual
union all select sysdate - 3, 'T5' from dual
union all select sysdate - 2, 'T6' from dual
union all select sysdate - 1, 'T7' from dual
)
select * from (select to_char(date_col, 'D') day_of_week, comment_text from tab)
PIVOT (max(comment_text) for day_of_week in (7 as sun, 1 as mon, 2 as tue));
Also, I suppose you need the second column with a date to form your new columns.
And you cannot use expressions for FOR clause - this should be column(s). For example, this won't work:
select * from tab
PIVOT (max(comment_text) for to_char(date_col, 'D') in (7 as sun, 1 as mon, 2 as tue));
because of to_char(date_col, 'D')
Try using pivot.
It allows rows to be mapped to columns.
Its from 11g onwards I believe.
with tab as (
select 'T1' comment_text from dual
union all select 'T2' from dual
union all select 'T3' from dual
union all select 'T4' from dual
union all select 'T5' from dual
union all select 'T6' from dual
)
select regexp_substr(txt, '[^,]+', 1, 1) sun,
regexp_substr(txt, '[^,]+', 1, 2) mon,
regexp_substr(txt, '[^,]+', 1, 3) tue
from (
select buck, wm_concat(comment_text) txt
from (
select comment_text, trunc((rownum-1) / 3) buck
from (select comment_text from tab order by comment_text)
)
group by buck
);
wm_concat(comment_text) (Oracle 10g) =
listagg(comment_text, ',') within group(order by comment_text) (Oracle 11g)
But these two functions are both aggregate
My third try, no aggregate functions at all (works fine in Oracle 10g)
with tab as (
select 'T1' comment_text from dual
union all select 'T2' from dual
union all select 'T3' from dual
union all select 'T4' from dual
union all select 'T5' from dual
union all select 'T6' from dual
)
select regexp_substr(txt, '[^(#####)]+', 1, 1) sun,
regexp_substr(txt, '[^(#####)]+', 1, 2) mon,
regexp_substr(txt, '[^(#####)]+', 1, 3) tue
from (
select sys_connect_by_path(comment_text, '#####') txt, parent_id
from (
select rownum id, comment_text, mod(rownum-1, 3) parent_id
from (select comment_text from tab order by comment_text)
)
start with parent_id = 0
connect by prior id = parent_id
) where parent_id = 2;

Resources