Postgres timeline simulator - performance

I want to order search results by (age group, rank), and have age groups of 1 day, 1 week, 1 month, 6 months etc. I know I can get the "days old" with
SELECT NOW()::DATE - created_at::DATE FROM blah
and am thinking to do a CASE statement based on that, but am I barking up the right tree performance wise? Is there a nicer way?

You can also create separate table with intervals definition and labels. However this comes at cost of extra join to get the data.
create table distance (
d_start int,
d_end int,
d_description varchar
);
insert into distance values
(1,7,'1 week'),
(8,30,'1 month'),
(31,180,'6 months'),
(181,365,'1 year'),
(366,999999,'more than one year')
;
with
sample_data as (
select *
from generate_series('2013-01-01'::date,'2014-01-01'::date,'1 day') created_at
)
select
created_at,
d_description
from
sample_data sd
join distance d on ((current_date-created_at::date) between d.d_start and d.d_end)
;

Using this function to update an INT column stored on the table for performance reasons,and running an occasional update task. What's nice that way is that it's only necessary to run it against a small subset of the data once per hour (anything <~ 1 week old), and every 24 hours can just run it against anything > 1 week old (perhaps even a weekly task for even older stuff.)
CREATE OR REPLACE FUNCTION age_group(_date timestamp) RETURNS int AS
$$
DECLARE
days_old int;
age_group int;
BEGIN
days_old := current_date - _date::DATE;
age_group := CASE
WHEN days_old < 2 THEN 0
WHEN days_old < 8 THEN 1
WHEN days_old < 30 THEN 2
WHEN days_old < 90 THEN 3
ELSE 4
END;
RETURN age_group;
END;
$$
LANGUAGE plpgsql;

Related

Oracle: Update values in table with aggregated values from same table

I am looking for a possibly better approach to this.
I have created a temp table in Oracle 11.2 that I'm using to pre calculate values that I will need in other selects instead of always generating them again with each select.
create global temporary table temp_foo (
DT timestamp(6), --only the date part will be used in this example but for later things I will need the time
Something varchar2(100),
Customer varchar2(100),
MinDate timestamp(6),
MaxDate timestamp(6),
Filecount int,
Errorcount int,
AvgFilecount int,
constraint PK_foo primary key (DT, Customer)
) on commit preserve rows;
I then first insert some fixed values for everything except AvgFilecount. AvgFilecount should contain the average for the Filecount for the 3 previous records (going by the date in DT). It doesn’t matter that the result will be converted to an int, I don’t need the decimal places
DT | Customer | Filecount | AvgFilecount
2019-04-30 | x | 10 | avg(2+3+9)
2019-04-29 | x | 2 | based on values before this
2019-04-28 | x | 3 | based on values before this
2019-04-27 | x | 9 | based on values before this
I thought about using a normal UPDATE statement as this should be faster than looping through the values. I should mention that there are no gaps in the DT field but obviously there is a first one where I won‘t find any previous records. If I would loop through, I could easily calculate AvgFilecount with (the record before previous record/2 + previous record)/3 which I cannot with UPDATE as I cannot guarantee the order of how they are executed. So I‘m fine with just taking the last 3 records (going by DT) and calcuting it from there.
What I thought would be an easy update is giving me headaches. I‘m mostly doing SQL Server where I would just join the 3 other records but it seems is a bit different in Oracle. I have found https://stackoverflow.com/a/2446834/4040068 and wanted to use the second approach in the answer.
update
(select curr.DT, curr.temp_foo, curr.Filecount, curr.AvgFilecount as OLD, (coalesce(Minus1.Filecount, 0) + coalesce(Minus2.Filecount, 0) + coalesce(Minus3.Filecount, 0)) / 3 as NEW
from temp_foo curr
left join temp_foo Minus1 ON Minus1.Customer = curr.Customer and trunc(Minus1.DT) = trunc(curr.DT-1)
left join temp_foo Minus2 ON Minus2.Customer = curr.Customer and trunc(Minus2.DT) = trunc(curr.DT-2)
left join temp_foo Minus3 ON Minus3.Customer = curr.Customer and trunc(Minus3.DT) = curr.DT-3
order by 1, 2
)
set OLD = NEW;
Which gives me an
ORA-01779: cannot modify a column which maps to a non key-preserved
table
01779. 00000 - "cannot modify a column which maps to a non key-preserved table"
*Cause: An attempt was made to insert or update columns of a join view which
map to a non-key-preserved table.
*Action: Modify the underlying base tables directly.
I thought this should work as both join conditions are in the primary key and thus unique. I am currently implementing the first approach in the above mentioned answer but it is getting quite big and it feels like there should be a better solution to this.
Other things I thought about trying:
using a nested subselect (nested because Oracle doesn’t know top(n) and I need to sort the subselect) to select the previous 3 records ordered by DT and then he outer select with rownum <=3 and then I could just use AVG(). However, I was told subselect can be quite slow and joins are better in Oracle performance wise. Dunno if that is really the case, haven‘t done any testing
Edit: My insert right now looks like this. I am already aggregating the Filecount for a day as there can be multiple records per DT per Customer per Something.
insert into temp_foo (DT, Something, Customer, Filecount)
select dates.DT, tbl1.Something, tbl1.Customer, coalesce(sum(tbl3.Filecount),0)
from table(Function_Returning_Daterange(NULL, NULL)) dates
cross join
(SELECT Something,
Code,
Value
FROM Table2 tbl2
WHERE (Something = 'Value')) tbl1
left outer join Table3 tbl3
on tbl3.Customer = tbl1.Customer
and trunc(tbl3.MinDate) = trunc(dates.DT)
group by dates.DT, tbl1.Something, tbl1.Customer;
You could use an analytic average with a window clause:
select dt, customer, filecount,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
and then do the update part with a merge statement:
merge into tmp_foo t
using (
select dt, customer,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
) s
on (s.dt = t.dt and s.customer = t.customer)
when matched then update set t.avgfilecount = s.avgfilecount;
4 rows merged.
select dt, customer, filecount, avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
You haven't shown your original insert statement; it might be possible to add the analytic calculation to that, and avoid the separate update step.
Also, if you want the first two date values to be calculated as if the 'missing' extra days before them had zero counts, you could use sum and division instead of avg:
select dt, customer, filecount,
sum(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding)/3 as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 4
2019-04-28 x 3 3
2019-04-27 x 9
It depends what you expect those last calculated values to be.

Referancing value from select column in where clause : Oracle

My tables are as below
MS_ISM_ISSUE
ISSUE_ID ISSUE_DUE_DATE ISSUE_SOURCE_TYPE
I1 25-11-2018 1
I2 25-12-2018 1
I3 27-03-2019 2
MS_ISM_SOURCE_SETUP
SOURCE_ID MODULE_NAME
1 IT-Compliance
2 Risk Assessment
I have written following query.
with rs as
(select
count(ISSUE_ID) as ISSUE_COUNT, src.MODULE_NAME,
case
when ISSUE_DUE_DATE<sysdate then 'Overdue'
when ISSUE_DUE_DATE between sysdate and sysdate + 90 then 'Within 3 months'
when ISSUE_DUE_DATE>sysdate+90 then 'Beyond 90 days'
end as date_range
from MS_ISM_ISSUE issue, MS_ISM_SOURCE_SETUP src
where issue.Issue_source_type = src.source_id
group by src.MODULE_NAME, case
when ISSUE_DUE_DATE<sysdate then 'Overdue'
when ISSUE_DUE_DATE between sysdate and sysdate + 90 then 'Within 3 months'
when ISSUE_DUE_DATE>sysdate+90 then 'Beyond 90 days'
end)
select ISSUE_COUNT,MODULE_NAME, DATE_RANGE,
(select count(ISSUE_COUNT) from rs where rs.MODULE_NAME=MODULE_NAME) as total from rs;
The output of the code is as below.
ISSUE_COUNT MODULE_NAME DATE_RANGE Total
1 IT-Compliance Overdue 3
1 IT-Compliance Within 3 months 3
1 Risk Assessment Beyond 90 days 3
The result is correct till 3rd column. In 4th column what I want is, total of Issue count for given module name. Hence in above case Total column will have value as 2 for first and second row (since there are 2 Issues for IT-Compliance) and value 1 for the third row (since one issue is present for Risk Assessment).
Essentially, I want to achieve is to replace current row's MODULE_NAME in last where clause. How do I achieve this using query?
OK, this condition
where rs.MODULE_NAME=MODULE_NAME
is essentially the same as if you wrote
where MODULE_NAME = MODULE_NAME
which is simply always true (if there are no nulls in module_name).
Try using different table alias for inner query and outer query, e.g.
select count(ISSUE_COUNT) from rs rs2 where rs2.MODULE_NAME=rs.MODULE_NAME
You can also try to use analytic function here, something like
select ISSUE_COUNT,
MODULE_NAME,
DATE_RANGE,
COUNT(ISSUE_COUNT) OVER (PARTITION BY RS.MODULE_NAME) AS TOTAL
from rs
instead of your subquery

Showing Different aggregate for different products

I want a code which is used to generate aggregate product by product. The product aggregate can be any like from Year to Date(YTD), Months to Date(MTD) and Quarter to Date(QTD). The user will pass the parameter on that basis the code should decide what kind of output the user wants.
If the Year is passing in the parameter than the code should generate the aggregate from the starting of the year to the sysdate.
If the Quarter No is passing in the parameter than the code should generate the aggregate from the starting of the quarter to the sysdate.
If the Month is passing in the parameter than the code should generate the aggregate from the starting of the month to the sysdate.
It means that on the basis of the parameter it should be able to decide which kind of user want from those 3. My input data is like this-
Product Table
Product_ID Product_name Price
1 Mobile 200
2 T.V. 400
3 Mixer 300
and
Sales Table-
Product_ID Sales_Date Quantity
1 01-01-2015 30
2 03-01-2015 40
3 06-02-2015 10
1 22-03-2015 30
2 09-04-2015 10
3 21-05-2015 40
1 04-06-2015 40
2 29-07-2015 30
1 31-08-2015 30
3 14-09-2015 30
And my ouput column contains 3 columns that are- Product_id, Product_Name and Total. The column Total_Amount(quantity*price) have to calculate sale on the basis of input given by user and is be something like this-
For example ,
If pro_test is the procedure then
call pro_test('YTD') -- Should Return the ProductWise YTD,
call pro_test('QTD') -- Should Return the ProductWise QTD and so on..
You are looking for a WHERE clause :-) List your conditions with OR and you are done.
select
p.product_id,
p.product_name,
coalesce(sum(s.quantity * p.price), 0) as total
from product p
left join sales s on s.product_id = p.product_id
where
(:aggregate = 'YTD' and to_char(s.sales_date, 'yyyy') = to_char(sysdate, 'yyyy'))
or
(:aggregate = 'MTD' and to_char(s.sales_date, 'yyyymm') = to_char(sysdate, 'yyyymm'))
or
(:aggregate = 'QTD' and to_char(s.sales_date, 'yyyyq') = to_char(sysdate, 'yyyyq'))
group by p.product_id, p.product_name;
EDIT: Here is how the corresponding PL/SQL function would look like:
create or replace function matches_date_aggregate(in_sales_date date, in_aggregate char)
return integer as
begin
if (in_aggregate = 'YTD' and to_char(in_sales_date, 'yyyy') = to_char(sysdate, 'yyyy'))
or (in_aggregate = 'MTD' and to_char(in_sales_date, 'yyyymm') = to_char(sysdate, 'yyyymm'))
or (in_aggregate = 'QTD' and to_char(in_sales_date, 'yyyyq') = to_char(sysdate, 'yyyyq')) then
return 1;
else
return 0;
end if;
end matches_date_aggregate;
Your query's WHERE clause would become:
where matches_date_aggregate(s.sales_date, :aggregate) = 1
The function cannot return BOOLEAN unfortunately, for even though Oracle's PL/SQL knows the BOOLEAN data type, Oracle SQL doesn't.

Sum of INTERVAL DAY in Oracle [duplicate]

I am trying to sum INTERVAL. E.g.
SELECT SUM(TIMESTAMP1 - TIMESTAMP2) FROM DUAL
Is it possible to write a query that would work both on Oracle and SQL Server? If so, how?
Edit: changed DATE to INTERVAL
I'm afraid you're going to be out of luck with a solution which works in both Oracle and MSSQL. Date arithmetic is something which is very different on the various flavours of DBMS.
Anyway, in Oracle we can use dates in straightforward arithmetic. And we have a function NUMTODSINTERVAL which turns a number into a DAY TO SECOND INTERVAL. So let's put them together.
Simple test data, two rows with pairs of dates rough twelve hours apart:
SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss'
2 /
Session altered.
SQL> select * from t42
2 /
D1 D2
-------------------- --------------------
27-jul-2010 12:10:26 27-jul-2010 00:00:00
28-jul-2010 12:10:39 28-jul-2010 00:00:00
SQL>
Simple SQL query to find the sum of elapsed time:
SQL> select numtodsinterval(sum(d1-d2), 'DAY')
2 from t42
3 /
NUMTODSINTERVAL(SUM(D1-D2),'DAY')
-----------------------------------------------------
+000000001 00:21:04.999999999
SQL>
Just over a day, which is what we would expect.
"Edit: changed DATE to INTERVAL"
Working with TIMESTAMP columns is a little more labourious, but we can still work the same trick.
In the following sample. T42T is the same as T42 only the columns have TIMESTAMP rather than DATE for their datatype. The query extracts the various components of the DS INTERVAL and converts them into seconds, which are then summed and converted back into an INTERVAL:
SQL> select numtodsinterval(
2 sum(
3 extract (day from (t1-t2)) * 86400
4 + extract (hour from (t1-t2)) * 3600
5 + extract (minute from (t1-t2)) * 600
6 + extract (second from (t1-t2))
7 ), 'SECOND')
8 from t42t
9 /
NUMTODSINTERVAL(SUM(EXTRACT(DAYFROM(T1-T2))*86400+EXTRACT(HOURFROM(T1-T2))*
---------------------------------------------------------------------------
+000000001 03:21:05.000000000
SQL>
At least this result is in round seconds!
Ok, after a bit of hell, with the help of the stackoverflowers' answers I've found the solution that fits my needs.
SELECT
SUM(CAST((DATE1 + 0) - (DATE2 + 0) AS FLOAT) AS SUM_TURNAROUND
FROM MY_BEAUTIFUL_TABLE
GROUP BY YOUR_CHOSEN_COLUMN
This returns a float (which is totally fine for me) that represents days both on Oracle ant SQL Server.
The reason I added zero to both DATEs is because in my case date columns on Oracle DB are of TIMESTAMP type and on SQL Server are of DATETIME type (which is obviously weird). So adding zero to TIMESTAMP on Oracle works just like casting to date and it does not have any effect on SQL Server DATETIME type.
Thank you guys! You were really helpful.
You can't sum two datetimes. It wouldn't make sense - i.e. what does 15:00:00 plus 23:59:00 equal? Some time the next day? etc
But you can add a time increment by using a function like Dateadd() in SQL Server.
In SQL Server as long as your individual timespans are all less than 24 hours you can do something like
WITH TIMES AS
(
SELECT CAST('01:01:00' AS DATETIME) AS TimeSpan
UNION ALL
SELECT '00:02:00'
UNION ALL
SELECT '23:02:00'
UNION ALL
SELECT '17:02:00'
--UNION ALL SELECT '24:02:00' /*This line would fail!*/
),
SummedTimes As
(
SELECT cast(SUM(CAST(TimeSpan AS FLOAT)) as datetime) AS [Summed] FROM TIMES
)
SELECT
FLOOR(CAST(Summed AS FLOAT)) AS D,
DATEPART(HOUR,[Summed]) AS H,
DATEPART(MINUTE,[Summed]) AS M,
DATEPART(SECOND,[Summed]) AS S
FROM SummedTimes
Gives
D H M S
----------- ----------- ----------- -----------
1 17 7 0
If you wanted to handle timespans greater than 24 hours I think you'd need to look at CLR integration and the TimeSpan structure. Definitely not portable!
Edit: SQL Server 2008 has a DateTimeOffset datatype that might help but that doesn't allow either SUMming or being cast to float
I also do not think this is possible. Go with custom solutions that calculates the date value according to your preferences.
You can also use this:
select
EXTRACT (DAY FROM call_end_Date - call_start_Date)*86400 +
EXTRACT (HOUR FROM call_end_Date - call_start_Date)*3600 +
EXTRACT (MINUTE FROM call_end_Date - call_start_Date)*60 +
extract (second FROM call_end_Date - call_start_Date) as interval
from table;
You Can write you own aggregate function :-). Please read carefully http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm
You must create object type and its body by template, and next aggregate function what using this object:
create or replace type Sum_Interval_Obj as object
(
-- Object for creating and support custom aggregate function
duration interval day to second, -- In this property You sum all interval
-- Object Init
static function ODCIAggregateInitialize(
actx IN OUT Sum_Interval_Obj
) return number,
-- Iterate getting values from dataset
member function ODCIAggregateIterate(
self IN OUT Sum_Interval_Obj,
ad_interval IN interval day to second
) return number,
-- Merge parallel summed data
member function ODCIAggregateMerge(
self IN OUT Sum_Interval_Obj,
ctx2 IN Sum_Interval_Obj
) return number,
-- End of query, returning summary result
member function ODCIAggregateTerminate
(
self IN Sum_Interval_Obj,
returnValue OUT interval day to second,
flags IN number
) return number
)
/
create or replace type body Sum_Interval_Obj is
-- Object Init
static function ODCIAggregateInitialize(
actx IN OUT Sum_Interval_Obj
) return number
is
begin
actx := Sum_Interval_Obj(numtodsinterval(0,'SECOND'));
return ODCIConst.Success;
end ODCIAggregateInitialize;
-- Iterate getting values from dataset
member function ODCIAggregateIterate(
self IN OUT Sum_Interval_Obj,
ad_interval IN interval day to second
) return number
is
begin
self.duration := self.duration + ad_interval;
return ODCIConst.Success;
exception
when others then
return ODCIConst.Error;
end ODCIAggregateIterate;
-- Merge parallel calculated intervals
member function ODCIAggregateMerge(
self IN OUT Sum_Interval_Obj,
ctx2 IN Sum_Interval_Obj
) return number
is
begin
self.duration := self.duration + ctx2.duration; -- Add two intervals
-- return = All Ok!
return ODCIConst.Success;
exception
when others then
return ODCIConst.Error;
end ODCIAggregateMerge;
-- End of query, returning summary result
member function ODCIAggregateTerminate(
self IN Sum_Interval_Obj,
returnValue OUT interval day to second,
flags IN number
) return number
is
begin
-- return = All Ok, too!
returnValue := self.duration;
return ODCIConst.Success;
end ODCIAggregateTerminate;
end;
/
-- You own new aggregate function:
CREATE OR REPLACE FUNCTION Sum_Interval(
a_Interval interval day to second
) RETURN interval day to second
PARALLEL_ENABLE AGGREGATE USING Sum_Interval_Obj;
/
Last, check your function:
select sum_interval(duration)
from (select numtodsinterval(1,'SECOND') as duration from dual union all
select numtodsinterval(1,'MINUTE') as duration from dual union all
select numtodsinterval(1,'HOUR') as duration from dual union all
select numtodsinterval(1,'DAY') as duration from dual);
Finally You can create SUM function, if you want.

T-SQL Use Table Variable or Sum Against Parent Table

The scenario is this, I am creating a log table that will end up being quite large once it is all said and done and I want to create a status table that will query from the table with different date ranges and sum the results into multiple total fields.
I plan on writing this into a Stored Procedure but my question would I gain the best performance from reading all my records from the log table into a temp table before doing the sum operations.
IE I have this table:
SummaryValues
90DayValues
60DayValues
30DayValues
14DayValues
7DayValues
1DayValues
Would it be logical to make a take all values for the previous 90 days and then insert them into a table value before then calculating my sum for my 6 fields in my summary table or would it be just as fast to execute 6 sum statements from the log table?
Sometimes you are better reading into a temp table first. Sometimes not. This makes sense if you have multiple passes of processing on the same data
However, if you want "last 90 days", "last 60 day" etc then it can be done in one query
Reading the question again, I'd just run one query and calculate all values in one go. And not bother with any intermediate tables
SELECT
Stuff,
SUM(CASE WHEN dayDiff <= 90 THEN SomeValue ELSE 0 END) AS SumValue90,
SUM(CASE WHEN dayDiff <= 60 THEN SomeValue ELSE 0 END) AS SumValue60,
SUM(CASE WHEN dayDiff <= 30 THEN SomeValue ELSE 0 END) AS SumValue30
FROM
(
SELECT
Stuff,
DATEDIFF(day, SomeData, GETDATE()) AS dayDiff
FROM
Mytable
WHERE
...
) foo
GROUP BY
...

Resources