Finding a count of rows in an arbitrary date range using Oracle - oracle

The question I need to answer is this "What is the maximum number of page requests we have ever received in a 60 minute period?"
I have a table that looks similar to this:
date_page_requested date;
page varchar(80);
I'm looking for the MAX count of rows in any 60 minute timeslice.
I thought analytic functions might get me there but so far I'm drawing a blank.
I would love a pointer in the right direction.

You have some options in the answer that will work, here is one that uses Oracle's "Windowing Functions with Logical Offset" feature instead of joins or correlated subqueries.
First the test table:
Wrote file afiedt.buf
1 create table t pctfree 0 nologging as
2 select date '2011-09-15' + level / (24 * 4) as date_page_requested
3 from dual
4* connect by level <= (24 * 4)
SQL> /
Table created.
SQL> insert into t values (to_date('2011-09-15 11:11:11', 'YYYY-MM-DD HH24:Mi:SS'));
1 row created.
SQL> commit;
Commit complete.
T now contains a row every quarter hour for a day with one additional row at 11:11:11 AM. The query preceeds in three steps. Step 1 is to, for every row, get the number of rows that come within the next hour after the time of the row:
1 with x as (select date_page_requested
2 , count(*) over (order by date_page_requested
3 range between current row
4 and interval '1' hour following) as hour_count
5 from t)
Then assign the ordering by hour_count:
6 , y as (select date_page_requested
7 , hour_count
8 , row_number() over (order by hour_count desc, date_page_requested asc) as rn
9 from x)
And finally select the earliest row that has the greatest number of following rows.
10 select to_char(date_page_requested, 'YYYY-MM-DD HH24:Mi:SS')
11 , hour_count
12 from y
13* where rn = 1
If multiple 60 minute windows tie in hour count, the above will only give you the first window.

This should give you what you need, the first row returned should have
the hour with the highest number of pages.
select number_of_pages
,hour_requested
from (select to_char(date_page_requested,'dd/mm/yyyy hh') hour_requested
,count(*) number_of_pages
from pages
group by to_char(date_page_requested,'dd/mm/yyyy hh')) p
order by number_of_pages

How about something like this?
SELECT TOP 1
ranges.date_start,
COUNT(data.page) AS Tally
FROM (SELECT DISTINCT
date_page_requested AS date_start,
DATEADD(HOUR,1,date_page_requested) AS date_end
FROM #Table) ranges
JOIN #Table data
ON data.date_page_requested >= ranges.date_start
AND data.date_page_requested < ranges.date_end
GROUP BY ranges.date_start
ORDER BY Tally DESC

For PostgreSQL, I'd first probably write something like this for a "window" aligned on the minute. You don't need OLAP windowing functions for this.
select w.ts,
date_trunc('minute', w.ts) as hour_start,
date_trunc('minute', w.ts) + interval '1' hour as hour_end,
(select count(*)
from weblog
where ts between date_trunc('minute', w.ts) and
(date_trunc('minute', w.ts) + interval '1' hour) ) as num_pages
from weblog w
group by ts, hour_start, hour_end
order by num_pages desc
Oracle also has a trunc() function, but I'm not sure of the format. I'll either look it up in a minute, or leave to see a friend's burlesque show.

WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
#Table
)
SELECT
a.StartDate AS StartDate,
MAX(b.RowNo) - a.RowNo + 1 AS Tally
FROM
ranges a
JOIN
ranges b
ON a.StartDate <= b.StartDate
AND b.StartDate < a.EndDate
GROUP BY a.StartDate
, a.RowNo
ORDER BY Tally DESC
or:
WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
#Table
)
SELECT
a.StartDate AS StartDate,
( SELECT MIN(b.RowNo) - a.RowNo
FROM ranges b
WHERE b.StartDate > a.EndDate
) AS Tally
FROM
ranges a
ORDER BY Tally DESC

Related

Stop condition for recursive CTE on Oracle (ORA-32044)

I have the following recursive CTE which splits each element coming from base per month:
with
base (id, start_date, end_date) as (
select 1, date '2022-01-15', date '2022-03-15' from dual
union
select 2, date '2022-09-15', date '2022-12-31' from dual
union
select 3, date '2023-09-15', date '2023-09-25' from dual
),
split (id, start_date, end_date) as (
select base.id, base.start_date, least(last_day(base.start_date), base.end_date) from base
union all
select base.id, split.end_date + 1, least(last_day(split.end_date + 1), base.end_date) from base join split on base.id = split.id and split.end_date < base.end_date
)
select * from split order by id, start_date, end_date;
It works on Oracle and gives the following result:
id
start_date
end_date
1
2022-01-15
2022-01-31
1
2022-02-01
2022-02-28
1
2022-03-01
2022-03-15
2
2022-09-15
2022-09-30
2
2022-10-01
2022-10-31
2
2022-11-01
2022-11-30
2
2022-12-01
2022-12-31
3
2023-09-15
2023-09-25
The two following stop conditions work correctly:
... from base join split on base.id = split.id and split.end_date < base.end_date
... from base, split where base.id = split.id and split.end_date < base.end_date
The following one fails with the message ORA-32044: cycle detected while executing recursive WITH query:
... from base join split on base.id = split.id where split.end_date < base.end_date
I fail to understand how the last one is different from the two others.
It looks like a bug as all your queries should result in identical explain plans.
However, you can rewrite the recursive sub-query without the join (and using a SEARCH clause so you may not have to re-order the query later):
WITH split (id, start_date, month_end, end_date) AS (
SELECT id,
start_date,
LEAST(
ADD_MONTHS(TRUNC(start_date, 'MM'), 1) - INTERVAL '1' SECOND,
end_date
),
end_date
FROM base
UNION ALL
SELECT id,
month_end + INTERVAL '1' SECOND,
LEAST(
ADD_MONTHS(month_end, 1),
end_date
),
end_date
FROM split
WHERE month_end < end_date
) SEARCH DEPTH FIRST BY id, start_date SET order_id
SELECT id,
start_date,
month_end AS end_date
FROM split;
Note: if you want to just use values at midnight rather than the entire month then use INTERVAL '1' DAY rather than 1 second.
Which, for the sample data:
CREATE TABLE base (id, start_date, end_date) as
select 1, date '2022-01-15', date '2022-04-15' from dual union all
select 2, date '2022-09-15', date '2022-12-31' from dual union all
select 3, date '2023-09-15', date '2023-09-25' from dual;
Outputs:
ID
START_DATE
END_DATE
1
2022-01-15T00:00:00Z
2022-01-31T23:59:59Z
1
2022-02-01T00:00:00Z
2022-02-28T23:59:59Z
1
2022-03-01T00:00:00Z
2022-03-31T23:59:59Z
1
2022-04-01T00:00:00Z
2022-04-15T00:00:00Z
2
2022-09-15T00:00:00Z
2022-09-30T23:59:59Z
2
2022-10-01T00:00:00Z
2022-10-31T23:59:59Z
2
2022-11-01T00:00:00Z
2022-11-30T23:59:59Z
2
2022-12-01T00:00:00Z
2022-12-31T00:00:00Z
3
2023-09-15T00:00:00Z
2023-09-25T00:00:00Z
fiddle
It's because WHERE and ON conditions are not evaluated at the same level:
when the condition is in the ON clause it's limiting the rows concerned by the JOIN, where it's in the WHERE it's filtering the results after the JOIN has been applied, and since a recursive CTE see all rows selected up to now...

Efficiently get array of all previous dates per id per date limited to past 6 months in BigQuery

I have a very big table 'DATES_EVENTS' (20 T) that looks like this:
ID DATE
1 '2022-04-01'
1 '2022-03-02'
1 '2022-03-01'
2 '2022-05-01'
3 '2021-12-01'
3 '2021-11-11'
3 '2020-11-11'
3 '2020-10-01'
I want per each row to get all past dates (per user) limited to up to 6 months.
My desired table:
ID DATE DATE_list
1 '2022-04-01' ['2022-04-01','2022-03-02','2022-03-01']
1 '2022-03-02' ['2022-03-02','2022-03-01']
1 '2022-03-01' ['2022-03-01']
2 '2022-05-01' ['2022-05-01']
3 '2021-12-01' ['2021-12-01','2021-11-11']
3 '2021-11-11' ['2021-11-11']
3 '2020-11-11' ['2020-11-11','2020-10-01']
3 '2020-10-01' ['2020-10-01']
I have a solution for all dates not limited:
SELECT
ID, DATE, ARRAY_AGG(DATE) OVER (PARTITION BY ID ORDER BY DATE) as DATE_list
FROM
DATES_EVENTS
But for a limited up to 6 months I don't have an efficient solution:
SELECT
distinct A.ID, A.DATE, ARRAY_AGG(B.DATE) OVER (PARTITION BY B.ID ORDER BY B.DATE) as DATE_list
FROM
DATES_EVENTS A
INNER JOIN
DATES_EVENTS B
ON
A.ID=B.ID
AND B.DATE BETWEEN DATE_SUB(A.DATE, INTERVAL 180 DAY) AND A.DATE
** ruffly a solution
Anyone know of a good and efficient way to do what I need?
Consider below approach
select id, date, array(
select day
from t.date_list day
where day <= date
order by day desc
) as date_list
from (
select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by extract(year from date) * 12 + extract(month from date)
range between 5 preceding and current row
)
) t
if applied to sample data in your question - output is
In case if (as I noticed in your question) 180 days is appropriate substitution for 6 months for you - you can use below simpler version
select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by unix_date(date)
range between current row and 179 following
)

Max number of counts in a tparticular hour

I have a table called Orders, i want to get maximum number of orders for each day with respect to hours with following query
SELECT
trunc(created,'HH') as dated,
count(*) as Counts
FROM
orders
WHERE
created > trunc(SYSDATE -2)
group by trunc(created,'HH') ORDER BY counts DESC
this gets the result of all hours, I want only max hour of a day e.g.
Image
This result looks good but now i want only rows with max number of count for a day
e.g.
for 12/23/2019 max number of counts is 90 for "12/23/2019 4:00:00 PM",
for 12/22/2019 max number of counts is 25 for "12/22/2019 3:00:00 PM"
required dataset
1 12/23/2019 4:00:00 PM 90
2 12/24/2019 12:00:00 PM 76
3 12/22/2019 1:00:00 PM 25
This could be the solution and in my opinion is the most trivial.
Use the WITH clause to make a sub query then search for the greatest value in the data set on a specific date.
WITH ORD AS (
SELECT
trunc(created,'HH') as dated,
count(*) as Counts
FROM
orders
WHERE
created > trunc(SYSDATE-2)
group by trunc(created,'HH')
)
SELECT *
FROM ORD ord
WHERE NOT EXISTS (
SELECT 'X'
FROM ORD ord1
WHERE trunc(ord1.dated) = trunc(ord.dated) AND ord1.Counts > ord.Counts
)
Use ROW_NUMBER analytic function over your original query and filter the rows with number 1.
You need to partition on the day, i.e. TRUNC(dated) to get the correct result
with ord1 as (
SELECT
trunc(created,'HH') as dated,
count(*) as Counts
FROM
orders
WHERE
created > trunc(SYSDATE -2)
group by trunc(created,'HH')
),
ord2 as (
select dated, Counts,
row_number() over (partition by trunc(dated) order by Counts desc) as rn
from ord1)
select dated, Counts
from ord2
where rn = 1
The advantage of using the ROW_NUMBER is that it correct handels ties, i.e. cases where there are more hour in a day with the same maximal count. The query shows only one record and you can controll with the order by e.g. to show the first / last hour.
You can use the analytical function ROW_NUMBER as following to get the desired result:
SELECT DATED, COUNTS
FROM (
SELECT
TRUNC(CREATED, 'HH') AS DATED,
COUNT(*) AS COUNTS,
ROW_NUMBER() OVER(
PARTITION BY TRUNC(CREATED)
ORDER BY COUNT(*) DESC NULLS LAST
) AS RN
FROM ORDERS
WHERE CREATED > TRUNC(SYSDATE - 2)
GROUP BY TRUNC(CREATED, 'HH'), TRUNC(CREATED)
)
WHERE RN = 1
Cheers!!

How to convert this code from oracle to redshift?

I am trying to implement the same in redshift and i am finding it little difficult to do that. Since redshift is in top of postgresql engine, if any one can do it in postgresql it would be really helpfull. Basically the code gets the count for previous two month at column level. If there is no count for exact previous month then it gives 0.
This is my code:
with abc(dateval,cnt) as(
select 201908, 100 from dual union
select 201907, 200 from dual union
select 201906, 300 from dual union
select 201904, 600 from dual)
select dateval, cnt,
last_value(cnt) over (order by dateval
range between interval '1' month preceding
and interval '1' month preceding ) m1,
last_value(cnt) over (order by dateval
range between interval '2' month preceding
and interval '2' month preceding ) m2
from (select to_date(dateval, 'yyyymm') dateval, cnt from abc)
I get error in over by clause. I tried to give cast('1 month' as interval) but still its failing. Can someone please help me with this windows function.
expected output:
Regards
This is how I would do it. In Redshift there's no easy way to generate sequences, do I select row_number() from an arbitrary table to create a sequence:
with abc(dateval,cnt) as(
select 201908, 100 union
select 201907, 200 union
select 201906, 300 union
select 201904, 600),
cal(date) as (
select
add_months(
'20190101'::date,
row_number() over () - 1
) as date
from <an arbitrary table to generate a sequence of rows> limit 10
),
with_lag as (
select
dateval,
cnt,
lag(cnt, 1) over (order by date) as m1,
lag(cnt, 2) over (order by date) as m2
from abc right join cal on to_date(dateval, 'YYYYMM') = date
)
select * from with_lag
where dateval is not null
order by dateval

Query to find row till which sum less than an amount

I have an account where interest is debited corresponding to each account as below
amount Date
2 01-01-2012
5 02-01-2012
2 05-01-2012
1 07-01-2012
If the total credit in the account is 8. Ineed a query to find till what dates interest the credit amount can adjust.
Here the query should give output as 02-01-2012(2+5 < 8). I know this can be handled through cursor. But is there any method to write this as a single query in ORACLE.
SELECT pdate
FROM (
SELECT t.*,
LAG(date) OVER (ORDER BY date) AS pdate
8 - SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS diff
FROM mytable t
ORDER BY
date
)
WHERE diff < 0
AND rownum = 1
Not knowing the structure of your table, here's a guess:
SELECT date from your_table
GROUP BY AMOUNT
HAVING SUM(AMOUNT) < 8
Note: this is LESS THAN 8. Change the conditional as appropriate.
Doesn't do the (2+5)<8 thing yet:
select max(cum_sum), max(date)
from (
select date,
sum(amount) over (order by date) cum_sum
) where cum_sum < 8

Resources