Calculating values in date ranges in Oracle (possibly with recursive CTE) - oracle

I have a problem which can be handled by a recursive CTE, but not within an acceptable period of time. Can anyone point me at ways to improve the performance and/or get the same result a different way?
Here's my scenario!
I have : A large table which contains in each row an id, a start date, an end date, and a ranking number. There are multiple rows for each id and the date ranges often overlap. Dates are from 2010 onward.
I want: A table which contains a row for each combination of id + date which falls inside any date range for that id from the previous table. Each row should have the lowest ranking number for that id and day.
Eg:
ID Rank Range
1 1 1/1/2010-1/4/2010
1 2 1/2/2010-1/5/2010
2 1 1/1/2010-1/2/2010
becomes
ID Rank Day
1 1 1/1/2010
1 1 1/2/2010
1 1 1/3/2010
1 1 1/4/2010
1 2 1/5/2010
2 1 1/1/2010
2 1 1/2/2010
I can do this with a recursive CTE, but the performance is terrible (20-25 minutes for a relatively small data set which produces a final table with 31 million rows):
with enc(PersonID, EncounterDate, EndDate, Type_Rank) as (
select PersonID, EncounterDate, EndDate, Type_Rank
from Big_Base_Table
union all
select PersonID, EncounterDate + 1, EndDate, Type_Rank
from enc
where EncounterDate + 1 <= EndDate
)
select PersonID, EncounterDate, min(Type_Rank) Type_Rank
from enc
group by PersonID, EncounterDate
;

You could extract all possible dates from the table once in a CTE, and then join that back to the table:
with all_dates (day) as (
select start_date + level - 1
from (
select min(start_date) as start_date, max(end_date) as end_date
from big_base_table
)
connect by level <= end_date - start_date + 1
)
select bbt.id, min(bbt.type_rank) as type_rank, to_char(ad.day, 'YYYY-MM-DD') as day
from all_dates ad
join big_base_table bbt
on bbt.start_date <= ad.day
and bbt.end_date >= ad.day
group by bbt.id, ad.day
order by bbt.id, ad.day;
ID TYPE_RANK DAY
---------- ---------- ----------
1 1 2010-01-01
1 1 2010-01-02
1 1 2010-01-03
1 1 2010-01-04
1 2 2010-01-05
2 1 2010-01-01
2 1 2010-01-02
7 rows selected.
The CTE gets all dates from the lowest for any ID, up to the highest for any ID. You could also use a static calendar table for that if you have one, to save hitting the table twice (and getting min/max at the same time is slow in some versions at least).
You could also write it the other way round, as:
...
from big_base_table bbt
join all_dates ad
on ad.day >= bbt.start_date
and ad.day <= bbt.end_date
...
but I think the optimisier will probably end up treating them the same, with a single full scan of your base table; worth checking the plan it actually comes up with for both though, and if one is more efficnet that the other.

Related

reset index every 5th row - Oracle SQL

How can I make index column start over after reaching 5th row? I can't do that with a window function as there are no groups, I just need an index with max number of 5 like this:
date
index
01.01.21
1
02.01.21
2
03.01.21
3
04.01.21
4
05.01.21
5
06.01.21
1
07.01.21
2
and so on.
Appreciate any ideas.
You can use below solution for that purpose.
First, rank (row_number analytic function)the rows in your table within inline view
Then, use again the row_number function with partition by clause to group the previously ranked rows by TRUNC((rnb - 1)/5)
SELECT t."DATE"
, row_number()over(PARTITION BY TRUNC((rnb - 1)/5) ORDER BY rnb) as "INDEX"
FROM (
select "DATE", row_number()OVER(ORDER BY "DATE") rnb
from Your_table
) t
ORDER BY 1
;
demo on db<>fiddle
Your comment about using analytic functions is wrong; you can use analytic functions even when there are no "groups" (or "partitions"). Here you do need an analytic function, to order the rows (even if you don't need to partition them).
Here is a very simple solution, using just row_number(). Note the with clause, which is not part of the solution; I included it just for testing. In your real-life case, remove the with clause, and use your actual table and column names. The use of mod(... , 5) is pretty much obvious; it looks a little odd (subtracting 1, taking the modulus, then adding 1) because in Oracle we seem to count from 1 in all cases, instead of the much more natural counting from 0 common in other languages (like C).
Note that both date and index are reserved keywords, which shouldn't be used as column names. I used one common way to address that - I added an underscore at the end.
alter session set nls_date_format = 'dd.mm.rr';
with
sample_inputs (date_) as (
select date '2021-01-01' from dual union all
select date '2021-01-02' from dual union all
select date '2021-01-03' from dual union all
select date '2021-01-04' from dual union all
select date '2021-01-05' from dual union all
select date '2021-01-06' from dual union all
select date '2021-01-07' from dual
)
select date_, 1 + mod(row_number() over (order by date_) - 1, 5) as index_
from sample_inputs
;
DATE_ INDEX_
-------- ----------
01.01.21 1
02.01.21 2
03.01.21 3
04.01.21 4
05.01.21 5
06.01.21 1
07.01.21 2
You can combine MOD() with ROW_NUMBER() to get the index you want. For example:
select date, 1 + mod(row_number() over(order by date) - 1, 5) as idx from t

Efficiently get array of all previous dates per id per date limited to past 6 months in BigQuery

I have a very big table 'DATES_EVENTS' (20 T) that looks like this:
ID DATE
1 '2022-04-01'
1 '2022-03-02'
1 '2022-03-01'
2 '2022-05-01'
3 '2021-12-01'
3 '2021-11-11'
3 '2020-11-11'
3 '2020-10-01'
I want per each row to get all past dates (per user) limited to up to 6 months.
My desired table:
ID DATE DATE_list
1 '2022-04-01' ['2022-04-01','2022-03-02','2022-03-01']
1 '2022-03-02' ['2022-03-02','2022-03-01']
1 '2022-03-01' ['2022-03-01']
2 '2022-05-01' ['2022-05-01']
3 '2021-12-01' ['2021-12-01','2021-11-11']
3 '2021-11-11' ['2021-11-11']
3 '2020-11-11' ['2020-11-11','2020-10-01']
3 '2020-10-01' ['2020-10-01']
I have a solution for all dates not limited:
SELECT
ID, DATE, ARRAY_AGG(DATE) OVER (PARTITION BY ID ORDER BY DATE) as DATE_list
FROM
DATES_EVENTS
But for a limited up to 6 months I don't have an efficient solution:
SELECT
distinct A.ID, A.DATE, ARRAY_AGG(B.DATE) OVER (PARTITION BY B.ID ORDER BY B.DATE) as DATE_list
FROM
DATES_EVENTS A
INNER JOIN
DATES_EVENTS B
ON
A.ID=B.ID
AND B.DATE BETWEEN DATE_SUB(A.DATE, INTERVAL 180 DAY) AND A.DATE
** ruffly a solution
Anyone know of a good and efficient way to do what I need?
Consider below approach
select id, date, array(
select day
from t.date_list day
where day <= date
order by day desc
) as date_list
from (
select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by extract(year from date) * 12 + extract(month from date)
range between 5 preceding and current row
)
) t
if applied to sample data in your question - output is
In case if (as I noticed in your question) 180 days is appropriate substitution for 6 months for you - you can use below simpler version
select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by unix_date(date)
range between current row and 179 following
)

How to group by month including all months?

I group my table by months
SELECT TO_CHAR (created, 'YYYY-MM') AS operation, COUNT (id)
FROM user_info
WHERE created IS NOT NULL
GROUP BY ROLLUP (TO_CHAR (created, 'YYYY-MM'))
2015-04 1
2015-06 10
2015-08 22
2015-09 8
2015-10 13
2015-12 5
2016-01 25
2016-02 37
2016-03 24
2016-04 1
2016-05 1
2016-06 2
2016-08 2
2016-09 7
2016-10 103
2016-11 5
2016-12 2
2017-04 14
2017-05 2
284
But the records don't cover all the months.
I would like the output to include all the months, with the missing ones displayed in the output with a default value:
2017-01 ...
2017-02 ...
2017-03 ZERO
2017-04 ZERO
2017-05 ...
Oracle has a good array of date manipulation functions. The two pertinent ones for this problem are
MONTHS_BETWEEN() which calculates the number of months between two dates
ADD_MONTHS() which increments a date by the given number of months
We can combine these functions to generate a table of all the months spanned by your table's records. Then we use an outer join to conditionally join records from USER_INFO to that calendar. When no records match count(id) will be zero.
with cte as (
select max(trunc(created, 'MM')) as max_dt
, min(trunc(created, 'MM')) as min_dt
from user_info
)
, cal as (
select add_months(min_dt, (level-1)) as mth
from cte
connect by level <= months_between(max_dt, min_dt) + 1
)
select to_char(cal.mth, 'YYYY-MM') as operation
, count(id)
from cal
left outer join user_info
on trunc(user_info.created, 'mm') = cal.mth
group by rollup (cal.mth)
order by 1
/

oracle sql, counting working hours

I have been trying to find something related but couldn't.
I have an issue that i need to produce an availability percentage of something. I have a table that includes events that are happening, which i managed to count them by the day they are happening, but i am finding issues to count the total number of working hours in a quarter or a year.
when each day of the week has a different weight.
Basically my question is: can i do it without making a table with all dates in that month/year?
An example of the data:
ID DATE duration Environment
1 23/10/15 25 a
2 15/01/15 50 b
3 01/01/15 43 c
8 05/06/14 7 b
It can work for me by a calculated field or just a general query to get the information.
sorry I don't really understand the question but if you want to generate dates using connect by level is an easy way to do it (you could also use the model clause or recursive with) I did it here for just 10 days but you get the idea. I put in your dates as t1 and generated a list of dates (t) and then did a left outer join to put them together.
WITH t AS
(SELECT to_date('01-01-2015', 'mm-dd-yyyy') + level - 1 AS dt,
NULL AS duration
FROM dual
CONNECT BY level < = 10
),
t1 AS
(
SELECT to_date('10/01/15', 'dd-mm-yy') as dt, 50 as duration FROM dual
UNION ALL
SELECT to_date('01/01/15', 'dd-mm-yy'), 43 FROM dual
UNION ALL
SELECT to_date('06/01/15', 'dd-mm-yy'), 43 FROM dual
)
SELECT t.dt,
NVL(NVL(t1.duration, t.duration),0) duration
FROM t,
t1
WHERE t.dt = t1.dt(+)
ORDER BY dt
results
DT Duration
01-JAN-15 43
02-JAN-15 0
03-JAN-15 0
04-JAN-15 0
05-JAN-15 0
06-JAN-15 43
07-JAN-15 0
08-JAN-15 0
09-JAN-15 0
10-JAN-15 50
This was my intention, and the full answer below.
WITH t AS
(SELECT to_date('01-01-2015', 'mm-dd-yyyy') + level - 1 AS dt
FROM dual
CONNECT BY level < =365
),
t1 as
(
SELECT dt,
CASE
WHEN (to_char(TO_DATE( t.dt,'YYYY-MM-DD HH24:MI:SS'),'DY') in ('MON', 'TUE', 'WED', 'THU', 'FRI'))
THEN 14*60
WHEN (to_char(TO_DATE( t.dt,'YYYY-MM-DD HH24:MI:SS'),'DY') in ('SAT'))
THEN 8*60
WHEN (to_char(TO_DATE( t.dt,'YYYY-MM-DD HH24:MI:SS'),'DY') in ('SUN'))
THEN 10*60
ELSE 0 END duration ,
to_char(t.dt,'Q') as quarter
FROM t
)
select to_char(t1.dt,'yyyy'), to_char(t1.dt,'Q'),sum(t1.duration)
from t1
group by
to_char(t1.dt,'yyyy'), to_char(t1.dt,'Q');

Finding a count of rows in an arbitrary date range using Oracle

The question I need to answer is this "What is the maximum number of page requests we have ever received in a 60 minute period?"
I have a table that looks similar to this:
date_page_requested date;
page varchar(80);
I'm looking for the MAX count of rows in any 60 minute timeslice.
I thought analytic functions might get me there but so far I'm drawing a blank.
I would love a pointer in the right direction.
You have some options in the answer that will work, here is one that uses Oracle's "Windowing Functions with Logical Offset" feature instead of joins or correlated subqueries.
First the test table:
Wrote file afiedt.buf
1 create table t pctfree 0 nologging as
2 select date '2011-09-15' + level / (24 * 4) as date_page_requested
3 from dual
4* connect by level <= (24 * 4)
SQL> /
Table created.
SQL> insert into t values (to_date('2011-09-15 11:11:11', 'YYYY-MM-DD HH24:Mi:SS'));
1 row created.
SQL> commit;
Commit complete.
T now contains a row every quarter hour for a day with one additional row at 11:11:11 AM. The query preceeds in three steps. Step 1 is to, for every row, get the number of rows that come within the next hour after the time of the row:
1 with x as (select date_page_requested
2 , count(*) over (order by date_page_requested
3 range between current row
4 and interval '1' hour following) as hour_count
5 from t)
Then assign the ordering by hour_count:
6 , y as (select date_page_requested
7 , hour_count
8 , row_number() over (order by hour_count desc, date_page_requested asc) as rn
9 from x)
And finally select the earliest row that has the greatest number of following rows.
10 select to_char(date_page_requested, 'YYYY-MM-DD HH24:Mi:SS')
11 , hour_count
12 from y
13* where rn = 1
If multiple 60 minute windows tie in hour count, the above will only give you the first window.
This should give you what you need, the first row returned should have
the hour with the highest number of pages.
select number_of_pages
,hour_requested
from (select to_char(date_page_requested,'dd/mm/yyyy hh') hour_requested
,count(*) number_of_pages
from pages
group by to_char(date_page_requested,'dd/mm/yyyy hh')) p
order by number_of_pages
How about something like this?
SELECT TOP 1
ranges.date_start,
COUNT(data.page) AS Tally
FROM (SELECT DISTINCT
date_page_requested AS date_start,
DATEADD(HOUR,1,date_page_requested) AS date_end
FROM #Table) ranges
JOIN #Table data
ON data.date_page_requested >= ranges.date_start
AND data.date_page_requested < ranges.date_end
GROUP BY ranges.date_start
ORDER BY Tally DESC
For PostgreSQL, I'd first probably write something like this for a "window" aligned on the minute. You don't need OLAP windowing functions for this.
select w.ts,
date_trunc('minute', w.ts) as hour_start,
date_trunc('minute', w.ts) + interval '1' hour as hour_end,
(select count(*)
from weblog
where ts between date_trunc('minute', w.ts) and
(date_trunc('minute', w.ts) + interval '1' hour) ) as num_pages
from weblog w
group by ts, hour_start, hour_end
order by num_pages desc
Oracle also has a trunc() function, but I'm not sure of the format. I'll either look it up in a minute, or leave to see a friend's burlesque show.
WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
#Table
)
SELECT
a.StartDate AS StartDate,
MAX(b.RowNo) - a.RowNo + 1 AS Tally
FROM
ranges a
JOIN
ranges b
ON a.StartDate <= b.StartDate
AND b.StartDate < a.EndDate
GROUP BY a.StartDate
, a.RowNo
ORDER BY Tally DESC
or:
WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
#Table
)
SELECT
a.StartDate AS StartDate,
( SELECT MIN(b.RowNo) - a.RowNo
FROM ranges b
WHERE b.StartDate > a.EndDate
) AS Tally
FROM
ranges a
ORDER BY Tally DESC

Resources