How to group by month including all months? - oracle

I group my table by months
SELECT TO_CHAR (created, 'YYYY-MM') AS operation, COUNT (id)
FROM user_info
WHERE created IS NOT NULL
GROUP BY ROLLUP (TO_CHAR (created, 'YYYY-MM'))
2015-04 1
2015-06 10
2015-08 22
2015-09 8
2015-10 13
2015-12 5
2016-01 25
2016-02 37
2016-03 24
2016-04 1
2016-05 1
2016-06 2
2016-08 2
2016-09 7
2016-10 103
2016-11 5
2016-12 2
2017-04 14
2017-05 2
284
But the records don't cover all the months.
I would like the output to include all the months, with the missing ones displayed in the output with a default value:
2017-01 ...
2017-02 ...
2017-03 ZERO
2017-04 ZERO
2017-05 ...

Oracle has a good array of date manipulation functions. The two pertinent ones for this problem are
MONTHS_BETWEEN() which calculates the number of months between two dates
ADD_MONTHS() which increments a date by the given number of months
We can combine these functions to generate a table of all the months spanned by your table's records. Then we use an outer join to conditionally join records from USER_INFO to that calendar. When no records match count(id) will be zero.
with cte as (
select max(trunc(created, 'MM')) as max_dt
, min(trunc(created, 'MM')) as min_dt
from user_info
)
, cal as (
select add_months(min_dt, (level-1)) as mth
from cte
connect by level <= months_between(max_dt, min_dt) + 1
)
select to_char(cal.mth, 'YYYY-MM') as operation
, count(id)
from cal
left outer join user_info
on trunc(user_info.created, 'mm') = cal.mth
group by rollup (cal.mth)
order by 1
/

Related

Efficiently get array of all previous dates per id per date limited to past 6 months in BigQuery

I have a very big table 'DATES_EVENTS' (20 T) that looks like this:
ID DATE
1 '2022-04-01'
1 '2022-03-02'
1 '2022-03-01'
2 '2022-05-01'
3 '2021-12-01'
3 '2021-11-11'
3 '2020-11-11'
3 '2020-10-01'
I want per each row to get all past dates (per user) limited to up to 6 months.
My desired table:
ID DATE DATE_list
1 '2022-04-01' ['2022-04-01','2022-03-02','2022-03-01']
1 '2022-03-02' ['2022-03-02','2022-03-01']
1 '2022-03-01' ['2022-03-01']
2 '2022-05-01' ['2022-05-01']
3 '2021-12-01' ['2021-12-01','2021-11-11']
3 '2021-11-11' ['2021-11-11']
3 '2020-11-11' ['2020-11-11','2020-10-01']
3 '2020-10-01' ['2020-10-01']
I have a solution for all dates not limited:
SELECT
ID, DATE, ARRAY_AGG(DATE) OVER (PARTITION BY ID ORDER BY DATE) as DATE_list
FROM
DATES_EVENTS
But for a limited up to 6 months I don't have an efficient solution:
SELECT
distinct A.ID, A.DATE, ARRAY_AGG(B.DATE) OVER (PARTITION BY B.ID ORDER BY B.DATE) as DATE_list
FROM
DATES_EVENTS A
INNER JOIN
DATES_EVENTS B
ON
A.ID=B.ID
AND B.DATE BETWEEN DATE_SUB(A.DATE, INTERVAL 180 DAY) AND A.DATE
** ruffly a solution
Anyone know of a good and efficient way to do what I need?
Consider below approach
select id, date, array(
select day
from t.date_list day
where day <= date
order by day desc
) as date_list
from (
select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by extract(year from date) * 12 + extract(month from date)
range between 5 preceding and current row
)
) t
if applied to sample data in your question - output is
In case if (as I noticed in your question) 180 days is appropriate substitution for 6 months for you - you can use below simpler version
select *, array_agg(date) over win as date_list
from dates_events
window win as (
partition by id
order by unix_date(date)
range between current row and 179 following
)

Calculating values in date ranges in Oracle (possibly with recursive CTE)

I have a problem which can be handled by a recursive CTE, but not within an acceptable period of time. Can anyone point me at ways to improve the performance and/or get the same result a different way?
Here's my scenario!
I have : A large table which contains in each row an id, a start date, an end date, and a ranking number. There are multiple rows for each id and the date ranges often overlap. Dates are from 2010 onward.
I want: A table which contains a row for each combination of id + date which falls inside any date range for that id from the previous table. Each row should have the lowest ranking number for that id and day.
Eg:
ID Rank Range
1 1 1/1/2010-1/4/2010
1 2 1/2/2010-1/5/2010
2 1 1/1/2010-1/2/2010
becomes
ID Rank Day
1 1 1/1/2010
1 1 1/2/2010
1 1 1/3/2010
1 1 1/4/2010
1 2 1/5/2010
2 1 1/1/2010
2 1 1/2/2010
I can do this with a recursive CTE, but the performance is terrible (20-25 minutes for a relatively small data set which produces a final table with 31 million rows):
with enc(PersonID, EncounterDate, EndDate, Type_Rank) as (
select PersonID, EncounterDate, EndDate, Type_Rank
from Big_Base_Table
union all
select PersonID, EncounterDate + 1, EndDate, Type_Rank
from enc
where EncounterDate + 1 <= EndDate
)
select PersonID, EncounterDate, min(Type_Rank) Type_Rank
from enc
group by PersonID, EncounterDate
;
You could extract all possible dates from the table once in a CTE, and then join that back to the table:
with all_dates (day) as (
select start_date + level - 1
from (
select min(start_date) as start_date, max(end_date) as end_date
from big_base_table
)
connect by level <= end_date - start_date + 1
)
select bbt.id, min(bbt.type_rank) as type_rank, to_char(ad.day, 'YYYY-MM-DD') as day
from all_dates ad
join big_base_table bbt
on bbt.start_date <= ad.day
and bbt.end_date >= ad.day
group by bbt.id, ad.day
order by bbt.id, ad.day;
ID TYPE_RANK DAY
---------- ---------- ----------
1 1 2010-01-01
1 1 2010-01-02
1 1 2010-01-03
1 1 2010-01-04
1 2 2010-01-05
2 1 2010-01-01
2 1 2010-01-02
7 rows selected.
The CTE gets all dates from the lowest for any ID, up to the highest for any ID. You could also use a static calendar table for that if you have one, to save hitting the table twice (and getting min/max at the same time is slow in some versions at least).
You could also write it the other way round, as:
...
from big_base_table bbt
join all_dates ad
on ad.day >= bbt.start_date
and ad.day <= bbt.end_date
...
but I think the optimisier will probably end up treating them the same, with a single full scan of your base table; worth checking the plan it actually comes up with for both though, and if one is more efficnet that the other.

oracle sql, counting working hours

I have been trying to find something related but couldn't.
I have an issue that i need to produce an availability percentage of something. I have a table that includes events that are happening, which i managed to count them by the day they are happening, but i am finding issues to count the total number of working hours in a quarter or a year.
when each day of the week has a different weight.
Basically my question is: can i do it without making a table with all dates in that month/year?
An example of the data:
ID DATE duration Environment
1 23/10/15 25 a
2 15/01/15 50 b
3 01/01/15 43 c
8 05/06/14 7 b
It can work for me by a calculated field or just a general query to get the information.
sorry I don't really understand the question but if you want to generate dates using connect by level is an easy way to do it (you could also use the model clause or recursive with) I did it here for just 10 days but you get the idea. I put in your dates as t1 and generated a list of dates (t) and then did a left outer join to put them together.
WITH t AS
(SELECT to_date('01-01-2015', 'mm-dd-yyyy') + level - 1 AS dt,
NULL AS duration
FROM dual
CONNECT BY level < = 10
),
t1 AS
(
SELECT to_date('10/01/15', 'dd-mm-yy') as dt, 50 as duration FROM dual
UNION ALL
SELECT to_date('01/01/15', 'dd-mm-yy'), 43 FROM dual
UNION ALL
SELECT to_date('06/01/15', 'dd-mm-yy'), 43 FROM dual
)
SELECT t.dt,
NVL(NVL(t1.duration, t.duration),0) duration
FROM t,
t1
WHERE t.dt = t1.dt(+)
ORDER BY dt
results
DT Duration
01-JAN-15 43
02-JAN-15 0
03-JAN-15 0
04-JAN-15 0
05-JAN-15 0
06-JAN-15 43
07-JAN-15 0
08-JAN-15 0
09-JAN-15 0
10-JAN-15 50
This was my intention, and the full answer below.
WITH t AS
(SELECT to_date('01-01-2015', 'mm-dd-yyyy') + level - 1 AS dt
FROM dual
CONNECT BY level < =365
),
t1 as
(
SELECT dt,
CASE
WHEN (to_char(TO_DATE( t.dt,'YYYY-MM-DD HH24:MI:SS'),'DY') in ('MON', 'TUE', 'WED', 'THU', 'FRI'))
THEN 14*60
WHEN (to_char(TO_DATE( t.dt,'YYYY-MM-DD HH24:MI:SS'),'DY') in ('SAT'))
THEN 8*60
WHEN (to_char(TO_DATE( t.dt,'YYYY-MM-DD HH24:MI:SS'),'DY') in ('SUN'))
THEN 10*60
ELSE 0 END duration ,
to_char(t.dt,'Q') as quarter
FROM t
)
select to_char(t1.dt,'yyyy'), to_char(t1.dt,'Q'),sum(t1.duration)
from t1
group by
to_char(t1.dt,'yyyy'), to_char(t1.dt,'Q');

How to group by category, year so as to include zero sums for each year per category?

My imaginary results would look like:
Category | Year | sum |
--------- ------ --------
A 2008 200
A 2009 0
B 2008 100
B 2009 5
... ... ...
i.e. the sum of the transactions per year and per category.
There are cases where a category does not have any transaction for one year. in those cases the 2nd line of the results will not appear. How do I have to re-write the above query in order to include 2008, 2009 for every category?
select category, to_char(trans_date, 'YYYY') year, sum(trans_value)
from transaction
group by category, to_char(trans_date, 'YYYY')
order by 1, 2;
With a partitioned outer join, you don't need a categories table.
I used the same transactions table as "dcp" used:
SQL> create table transactions
2 ( category varchar(1)
3 , trans_date date
4 , trans_value number(25,8)
5 );
Table created.
SQL> insert into transactions values ('A',to_date('2008-01-01','yyyy-mm-dd'),100.0);
1 row created.
SQL> insert into transactions values ('A',to_date('2008-02-01','yyyy-mm-dd'),100.0);
1 row created.
SQL> insert into transactions values ('B',to_date('2008-01-01','yyyy-mm-dd'),50.0);
1 row created.
SQL> insert into transactions values ('B',to_date('2008-02-01','yyyy-mm-dd'),50.0);
1 row created.
SQL> insert into transactions values ('B',to_date('2009-08-01','yyyy-mm-dd'),5.0);
1 row created.
For the partitioned outer join you only need a set of years to partition outer join against. In the query below I used 2 years (2008 and 2009), but you can easily adjust that set.
SQL> with the_years as
2 ( select 2007 + level year
3 , trunc(to_date(2007 + level,'yyyy'),'yy') start_of_year
4 , trunc(to_date(2007 + level + 1,'yyyy'),'yy') - interval '1' second end_of_year
5 from dual
6 connect by level <= 2
7 )
8 select t.category "Category"
9 , y.year "Year"
10 , nvl(sum(t.trans_value),0) "sum"
11 from the_years y
12 left outer join transactions t
13 partition by (t.category)
14 on (t.trans_date between y.start_of_year and y.end_of_year)
15 group by t.category
16 , y.year
17 order by t.category
18 , y.year
19 /
Category Year sum
-------- ---------- ----------
A 2008 200
A 2009 0
B 2008 100
B 2009 5
4 rows selected.
Also note that I used start_of_year and end_of_year, so if you want to filter on trans_date and you have an index on that column, it could be used. Another option is to simply use trunc(t.trans_date) = y.year as on-condition.
Hope this helps.
Regards,
Rob.
You ideally need a table of categories and a table of years:
select c.category, y.year, nvl(sum(t.trans_value),0)
from categories c
cross join years y
left outer join transaction t
on to_char(t.trans_date, 'YYYY') = y.year
and t.category = c.category
group by c.category, y.year
order by 1, 2;
Hopefully you do have a table of categories, but you may well not have a table of years, in which case you can "fake" one like this:
with years as
( select 2007+rownum year
from dual
connect by rownum < 10) -- returns 2008, 2009, ..., 2017
select c.category, y.year, nvl(sum(t.trans_value),0)
from categories c
cross join years y
left outer join transaction t
on to_char(t.trans_date, 'YYYY') = y.year
and t.category = c.category
group by c.category, y.year
order by 1, 2;
Here's a complete, working example:
CREATE TABLE transactions (CATEGORY VARCHAR(1), trans_date DATE, trans_value NUMBER(25,8));
CREATE TABLE YEAR (YEAR NUMBER(4));
CREATE TABLE categories (CATEGORY VARCHAR(1));
INSERT INTO categories VALUES ('A');
INSERT INTO categories VALUES ('B');
INSERT INTO transactions VALUES ('A',to_date('2008-01-01','YYYY-MM-DD'),100.0);
INSERT INTO transactions VALUES ('A',to_date('2008-02-01','YYYY-MM-DD'),100.0);
INSERT INTO transactions VALUES ('B',to_date('2008-01-01','YYYY-MM-DD'),50.0);
INSERT INTO transactions VALUES ('B',to_date('2008-02-01','YYYY-MM-DD'),50.0);
INSERT INTO transactions VALUES ('B',to_date('2009-08-01','YYYY-MM-DD'),5.0);
INSERT INTO YEAR VALUES (2008);
INSERT INTO YEAR VALUES (2009);
SELECT b.category
, b.year
, SUM(nvl(a.trans_value,0))
FROM (SELECT to_char(a.trans_date,'YYYY') YEAR
, CATEGORY
, SUM(NVL(trans_value,0)) trans_value
FROM transactions a
GROUP BY to_char(a.trans_date,'YYYY')
, a.category ) a
, (SELECT
DISTINCT a.category
, b.year
FROM categories a
, YEAR b ) b
WHERE b.year = to_char(a.year(+))
AND b.category = a.category(+)
GROUP BY
b.category
, b.year
ORDER BY 1
,2;
Output:
CATEGORY YEAR SUM(NVL(A.TRANS_VALUE,0))
1 A 2008 200
2 A 2009 0
3 B 2008 100
4 B 2009 5

Finding a count of rows in an arbitrary date range using Oracle

The question I need to answer is this "What is the maximum number of page requests we have ever received in a 60 minute period?"
I have a table that looks similar to this:
date_page_requested date;
page varchar(80);
I'm looking for the MAX count of rows in any 60 minute timeslice.
I thought analytic functions might get me there but so far I'm drawing a blank.
I would love a pointer in the right direction.
You have some options in the answer that will work, here is one that uses Oracle's "Windowing Functions with Logical Offset" feature instead of joins or correlated subqueries.
First the test table:
Wrote file afiedt.buf
1 create table t pctfree 0 nologging as
2 select date '2011-09-15' + level / (24 * 4) as date_page_requested
3 from dual
4* connect by level <= (24 * 4)
SQL> /
Table created.
SQL> insert into t values (to_date('2011-09-15 11:11:11', 'YYYY-MM-DD HH24:Mi:SS'));
1 row created.
SQL> commit;
Commit complete.
T now contains a row every quarter hour for a day with one additional row at 11:11:11 AM. The query preceeds in three steps. Step 1 is to, for every row, get the number of rows that come within the next hour after the time of the row:
1 with x as (select date_page_requested
2 , count(*) over (order by date_page_requested
3 range between current row
4 and interval '1' hour following) as hour_count
5 from t)
Then assign the ordering by hour_count:
6 , y as (select date_page_requested
7 , hour_count
8 , row_number() over (order by hour_count desc, date_page_requested asc) as rn
9 from x)
And finally select the earliest row that has the greatest number of following rows.
10 select to_char(date_page_requested, 'YYYY-MM-DD HH24:Mi:SS')
11 , hour_count
12 from y
13* where rn = 1
If multiple 60 minute windows tie in hour count, the above will only give you the first window.
This should give you what you need, the first row returned should have
the hour with the highest number of pages.
select number_of_pages
,hour_requested
from (select to_char(date_page_requested,'dd/mm/yyyy hh') hour_requested
,count(*) number_of_pages
from pages
group by to_char(date_page_requested,'dd/mm/yyyy hh')) p
order by number_of_pages
How about something like this?
SELECT TOP 1
ranges.date_start,
COUNT(data.page) AS Tally
FROM (SELECT DISTINCT
date_page_requested AS date_start,
DATEADD(HOUR,1,date_page_requested) AS date_end
FROM #Table) ranges
JOIN #Table data
ON data.date_page_requested >= ranges.date_start
AND data.date_page_requested < ranges.date_end
GROUP BY ranges.date_start
ORDER BY Tally DESC
For PostgreSQL, I'd first probably write something like this for a "window" aligned on the minute. You don't need OLAP windowing functions for this.
select w.ts,
date_trunc('minute', w.ts) as hour_start,
date_trunc('minute', w.ts) + interval '1' hour as hour_end,
(select count(*)
from weblog
where ts between date_trunc('minute', w.ts) and
(date_trunc('minute', w.ts) + interval '1' hour) ) as num_pages
from weblog w
group by ts, hour_start, hour_end
order by num_pages desc
Oracle also has a trunc() function, but I'm not sure of the format. I'll either look it up in a minute, or leave to see a friend's burlesque show.
WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
#Table
)
SELECT
a.StartDate AS StartDate,
MAX(b.RowNo) - a.RowNo + 1 AS Tally
FROM
ranges a
JOIN
ranges b
ON a.StartDate <= b.StartDate
AND b.StartDate < a.EndDate
GROUP BY a.StartDate
, a.RowNo
ORDER BY Tally DESC
or:
WITH ranges AS
( SELECT
date_page_requested AS StartDate,
date_page_requested + (1/24) AS EndDate,
ROWNUMBER() OVER(ORDER BY date_page_requested) AS RowNo
FROM
#Table
)
SELECT
a.StartDate AS StartDate,
( SELECT MIN(b.RowNo) - a.RowNo
FROM ranges b
WHERE b.StartDate > a.EndDate
) AS Tally
FROM
ranges a
ORDER BY Tally DESC

Resources