Oracle: Update values in table with aggregated values from same table - oracle

I am looking for a possibly better approach to this.
I have created a temp table in Oracle 11.2 that I'm using to pre calculate values that I will need in other selects instead of always generating them again with each select.
create global temporary table temp_foo (
DT timestamp(6), --only the date part will be used in this example but for later things I will need the time
Something varchar2(100),
Customer varchar2(100),
MinDate timestamp(6),
MaxDate timestamp(6),
Filecount int,
Errorcount int,
AvgFilecount int,
constraint PK_foo primary key (DT, Customer)
) on commit preserve rows;
I then first insert some fixed values for everything except AvgFilecount. AvgFilecount should contain the average for the Filecount for the 3 previous records (going by the date in DT). It doesn’t matter that the result will be converted to an int, I don’t need the decimal places
DT | Customer | Filecount | AvgFilecount
2019-04-30 | x | 10 | avg(2+3+9)
2019-04-29 | x | 2 | based on values before this
2019-04-28 | x | 3 | based on values before this
2019-04-27 | x | 9 | based on values before this
I thought about using a normal UPDATE statement as this should be faster than looping through the values. I should mention that there are no gaps in the DT field but obviously there is a first one where I won‘t find any previous records. If I would loop through, I could easily calculate AvgFilecount with (the record before previous record/2 + previous record)/3 which I cannot with UPDATE as I cannot guarantee the order of how they are executed. So I‘m fine with just taking the last 3 records (going by DT) and calcuting it from there.
What I thought would be an easy update is giving me headaches. I‘m mostly doing SQL Server where I would just join the 3 other records but it seems is a bit different in Oracle. I have found https://stackoverflow.com/a/2446834/4040068 and wanted to use the second approach in the answer.
update
(select curr.DT, curr.temp_foo, curr.Filecount, curr.AvgFilecount as OLD, (coalesce(Minus1.Filecount, 0) + coalesce(Minus2.Filecount, 0) + coalesce(Minus3.Filecount, 0)) / 3 as NEW
from temp_foo curr
left join temp_foo Minus1 ON Minus1.Customer = curr.Customer and trunc(Minus1.DT) = trunc(curr.DT-1)
left join temp_foo Minus2 ON Minus2.Customer = curr.Customer and trunc(Minus2.DT) = trunc(curr.DT-2)
left join temp_foo Minus3 ON Minus3.Customer = curr.Customer and trunc(Minus3.DT) = curr.DT-3
order by 1, 2
)
set OLD = NEW;
Which gives me an
ORA-01779: cannot modify a column which maps to a non key-preserved
table
01779. 00000 - "cannot modify a column which maps to a non key-preserved table"
*Cause: An attempt was made to insert or update columns of a join view which
map to a non-key-preserved table.
*Action: Modify the underlying base tables directly.
I thought this should work as both join conditions are in the primary key and thus unique. I am currently implementing the first approach in the above mentioned answer but it is getting quite big and it feels like there should be a better solution to this.
Other things I thought about trying:
using a nested subselect (nested because Oracle doesn’t know top(n) and I need to sort the subselect) to select the previous 3 records ordered by DT and then he outer select with rownum <=3 and then I could just use AVG(). However, I was told subselect can be quite slow and joins are better in Oracle performance wise. Dunno if that is really the case, haven‘t done any testing
Edit: My insert right now looks like this. I am already aggregating the Filecount for a day as there can be multiple records per DT per Customer per Something.
insert into temp_foo (DT, Something, Customer, Filecount)
select dates.DT, tbl1.Something, tbl1.Customer, coalesce(sum(tbl3.Filecount),0)
from table(Function_Returning_Daterange(NULL, NULL)) dates
cross join
(SELECT Something,
Code,
Value
FROM Table2 tbl2
WHERE (Something = 'Value')) tbl1
left outer join Table3 tbl3
on tbl3.Customer = tbl1.Customer
and trunc(tbl3.MinDate) = trunc(dates.DT)
group by dates.DT, tbl1.Something, tbl1.Customer;

You could use an analytic average with a window clause:
select dt, customer, filecount,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
and then do the update part with a merge statement:
merge into tmp_foo t
using (
select dt, customer,
avg(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding) as avgfilecount
from tmp_foo
) s
on (s.dt = t.dt and s.customer = t.customer)
when matched then update set t.avgfilecount = s.avgfilecount;
4 rows merged.
select dt, customer, filecount, avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 6
2019-04-28 x 3 9
2019-04-27 x 9
You haven't shown your original insert statement; it might be possible to add the analytic calculation to that, and avoid the separate update step.
Also, if you want the first two date values to be calculated as if the 'missing' extra days before them had zero counts, you could use sum and division instead of avg:
select dt, customer, filecount,
sum(filecount) over (partition by customer order by dt
rows between 3 preceding and 1 preceding)/3 as avgfilecount
from tmp_foo
order by dt desc;
DT CUSTOMER FILECOUNT AVGFILECOUNT
---------- -------- ---------- ------------
2019-04-30 x 10 4.66666667
2019-04-29 x 2 4
2019-04-28 x 3 3
2019-04-27 x 9
It depends what you expect those last calculated values to be.

Related

Sort a list using loops

I'm still fairly new to PL/SQL and I tried to solve the following problem for a long time on my own, but I can not get to a solution (and i did not find a similar problem solved) hence I thought I ask myself.
I am trying to sort a list by the date (which indicates the 'time_left_to_live') so that the result is the same list but, as you can guess, sorted by the date.
EDIT: Maybe i should be even more precise:
My original goal is to write an AFTER UPDATE TRIGGER, which sorts the table again if the time_left_to_live is updated. My idea was to write a Procedure which sorts (or more updates) the list and call it.
Example for clarification:
UPDATE testv1 SET time_left_to_live = '01.05.2020' WHERE list_id = 3;
so after that the old data of list_id 3 should be 1 since it has the shortest amount left to live and the other datas should be incremented by 1.
1 01.05.2020
2 03.05.2020
3 31.05.2020
I hope I could explain the situation somewhat good.
I tried it with a nested loop but I simply can't think inside. Any help is appreciated.
My table:
DROP TABLE testv1;
DROP PROCEDURE wlp_sort;
CREATE TABLE testv1(
list_ID INT,
time_left_to_live DATE
);
INSERT INTO testv1 VALUES (3, '01.06.2020');
INSERT INTO testv1 VALUES (2, '31.05.2020');
INSERT INTO testv1 VALUES (1, '03.05.2020');
--UPDATE testv1 SET time_left_to_live = '01.05.2020' WHERE list_ID = 2;
SELECT * FROM testv1;
I'd suggest you not to do that. I presume that table you posted as an example is exactly that - an example. In reality, it is more complex. Furthermore, I presume that list_id column identifies each row which is OK, but wrong to be used for sorting purposes, especially not the way you wanted - by updating its value via database triggers.
What should you do, then? Use ORDER BY as it is the only mechanism that guarantees that rows will be returned in desired order. A long time ago, I think the last Oracle database version was 8i, you could use group by clause which in the background sorted result for you, but those times have passed so - order by it is.
Here's an example of what I mean.
This is the original contents of your table. Query includes additional column - the one you might want to use - row_number analytic function which "calculates" ordinal number on-the-fly. At first, it matches list_id value.
SQL> select list_id,
2 time_left_to_live,
3 row_number() over (order by time_left_to_live) rn
4 from testv1
5 order by time_left_to_live;
LIST_ID TIME_LEFT_ RN
---------- ---------- ----------
1 03.05.2020 1
2 31.05.2020 2
3 01.06.2020 3
SQL>
Let's update one row (also from your example) and see what happens (note that I used date literal; you updated a date column with a string. Oracle did implicitly try - and succeed - to convert it to a valid date value, but you shouldn't rely on that. 01.05.2020 could be 1st of May or 5th of January, depending on date format which may change and might be different in different databases. Date literal is, on the other hand, always in format date 'yyyy-mm-dd' and there's no confusion):
SQL> update testv1 set time_left_to_live = date '2020-05-01' where list_id = 2;
1 row updated.
SQL> select list_id,
2 time_left_to_live,
3 row_number() over (order by time_left_to_live) rn
4 from testv1
5 order by time_left_to_live;
LIST_ID TIME_LEFT_ RN
---------- ---------- ----------
2 01.05.2020 1
1 03.05.2020 2
3 01.06.2020 3
SQL>
ORDER BY clause sorts data, but now list_id and rn are different; list_id didn't change, but rn represents new order.
If your next step is to do something with a row whose ordinal number is 1, you'd just use query I suggested as an inline view and fetch values whose rn = 1:
SQL> select list_id,
2 time_left_to_live
3 from (select list_id,
4 time_left_to_live,
5 row_number() over (order by time_left_to_live) rn
6 from testv1
7 )
8 where rn = 1;
LIST_ID TIME_LEFT_
---------- ----------
2 01.05.2020
SQL>
I suggest you use this option.
Additional drawbacks related to database trigger: if you write an update statement that modifies list_id, no problem - it works outside of trigger and list_id and rn are synchronized again:
SQL> update testv1 a set
2 a.list_id = (select x.rn
3 from (select b.list_id,
4 b.time_left_to_live,
5 row_number() over (order by b.time_left_to_live) rn
6 from testv1 b
7 ) x
8 where x.list_id = a.list_id
9 );
3 rows updated.
SQL> select list_id,
2 time_left_to_live,
3 row_number() over (order by time_left_to_live) rn
4 from testv1
5 order by time_left_to_live;
LIST_ID TIME_LEFT_ RN
---------- ---------- ----------
1 01.05.2020 1
2 03.05.2020 2
3 01.06.2020 3
SQL>
But, in a trigger, you modify a column which fires a trigger which modifies a column which fires a trigger which modifies a column ... until you exceed maximum number of recursive SQL levels and then Oracle raises an error.
Or, if you planned to select from the table and then do something with it, you'll hit the mutating table error as you can't select from a table which is just being modified. True, you might use a compound trigger (or - in previous Oracle database versions - package and custom type), but - once again, in my opinion, that's just not the way you should handle the problem.
Agree fully with #Littlefoot. But even if you could build a user written sort in a trigger there is NO quarantine that order is preserved when written. A table is by definition an unordered set of rows and the SQL engine in under no obligation to maintain any ordering of presented rows. The way to guarantee any sequence is the order by clause

update rows from multiple tables

I have two tables affiliation and customer, in that i have data like this
aff_id From_cus_id
------ -----------
1 10
2 20
3 30
4 40
5 50
cust_id cust_aff_id
------- -------
10
20
30
40
50
i need to update data for cust_aff_id column from affiliation table which is aff_id like below
cust_id cust_aff_id
------- -------
10 1
20 2
30 3
40 4
50 5
could u please give reply if anyone knows......
Oracle doesn't have an UPDATE with join syntax, but you can use a subquery instead:
UPDATE customer
SET customer.cust_aff_id =
(SELECT aff_id FROM affiliation WHERE From_cus_id = customer.cust_id)
merge into customer t2
using affiliation t1 on (t1.From_cus_id =t2.cust_id )
WHEN MATCHED THEN
update set t2.cust_aff_id = t1.aff_id
;
Here is an update with join syntax. This, quite reasonably, works only if from_cus_id is primary key in the first table and cust_id is foreign key in the second table, referencing the first table. Without these conditions, the requirement doesn't make much sense in the first place anyway... but Oracle requires that these constraints be stated explicitly in the tables. This is also reasonable on Oracle's part IMO.
update
( select t1.aff_id, t2.cust_aff_id
from affiliation t1 join customer t2 on t2.cust_id = t1.from_cus_id) j
set j.cust_aff_id = j.aff_id;

UPDATE with JOIN syntax for Oracle Database

First, I execute the following SQL statements.
drop table names;
drop table ages;
create table names (id number, name varchar2(20));
insert into names values (1, 'Harry');
insert into names values (2, 'Sally');
insert into names values (3, 'Barry');
create table ages (id number, age number);
insert into ages values (1, 25);
insert into ages values (2, 30);
insert into ages values (3, 35);
select * from names;
select * from ages;
As a result, the following tables are created.
ID NAME
---------- ----------
1 Harry
2 Sally
3 Barry
ID AGE
---------- ----------
1 25
2 30
3 35
Now, I want to update increment the age of Sally by 1, i.e. set it to 31. The following query works fine.
update ages set age = age + 1 where id = (select id from names where name = 'Sally');
select * from ages;
The table now looks like this.
ID AGE
---------- ----------
1 25
2 31
3 35
I want to know if there is a way it can be done by joins. For example, I tried the following queries but they fail.
SQL> update ages set age = age + 1 from ages, names where ages.id = names.id and names.name = 'Sally';
update ages set age = age + 1 from ages, names where ages.id = names.id and names.name = 'Sally'
*
ERROR at line 1:
ORA-00933: SQL command not properly ended
SQL> update ages set age = age + 1 from names join ages on ages.id = names.id where names.name = 'Sally';
update ages set age = age + 1 from names join ages on ages.id = names.id where names.name = 'Sally'
*
ERROR at line 1:
ORA-00933: SQL command not properly ended
The syntax of the UPDATE statement is:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10007.htm
where dml_table_expression_clause is:
Please pay attention on ( subquery ) part of the above syntax.
The subquery is a feature that allows to perform an update of joins.
In the most simplest form it can be:
UPDATE (
subquery-with-a-join
)
SET cola=colb
Before update a join, you must know restrictions listed here:
https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_8004.htm
The view must not contain any of the following constructs:
A set operator
A DISTINCT operator
An aggregate or analytic function
A GROUP BY, ORDER BY, MODEL, CONNECT BY, or START WITH clause
A collection expression in a SELECT list
A subquery in a SELECT list
A subquery designated WITH READ ONLY
Joins, with some exceptions, as documented in Oracle Database Administrator's Guide
and also common rules related to updatable views - here (section: Updating a Join View):
http://docs.oracle.com/cd/B19306_01/server.102/b14231/views.htm#sthref3055
All updatable columns of a join view must map to columns of a
key-preserved table. See "Key-Preserved Tables" for a discussion of
key-preserved tables. If the view is defined with the WITH CHECK
OPTION clause, then all join columns and all columns of repeated
tables are not updatable.
We can first create a subquery with a join:
SELECT age
FROM ages a
JOIN names m ON a.id = m.id
WHERE m.name = 'Sally'
This query simply returns the following result:
AGE
----------
30
and now we can try to update our query:
UPDATE (
SELECT age
FROM ages a
JOIN names m ON a.id = m.id
WHERE m.name = 'Sally'
)
SET age = age + 1;
but we get an error:
SQL Error: ORA-01779:cannot modify a column which maps to a non key-preserved table
This error means, that one of the above restriction is not meet (key-preserved table).
However if we add primary keys to our tables:
alter table names add primary key( id );
alter table ages add primary key( id );
then now the update works without any error and a final outcome is:
select * from ages;
ID AGE
---------- ----------
1 25
2 31
3 35

Add next unique value to SQL column

I have two tables which I am trying to join based on two criteria. One of the criteria is that a date from t1 is between a date in t2 and the next date in t2. The other is that the name from t1 matches the name from t2.
I.e. if t2 looks like this:
Record Name Date
1 A1234 2016-01-03 04:58:00
2 A1234 2015-12-15 08:34:00
3 A5678 2016-01-04 03:14:00
4 A1234 2016-01-05 21:06:00
Then:
Any records from t1 for Name A1234 with dates between 2016-01-03 04:58:00 and 2016-01-05 21:06:00 would be joined to record 1.
Any records from t1 for Name A1234 with dates between 2015-12-15 08:34:00 and 2016-01-03 04:58:00 would be joined to record 2
Any records from t1 for A1234 after the date of record 4 would be joined to record 4
Any records from t1 for A5678 would be joined to record 3 because there's only one date.
My initial approach is to use a correlated subquery to find the next date. However, due to a large number of records, I determined this would take over a year to execute because it searches all of t2 for the next later date during each iteration. Original SQLite:
CREATE TABLE outputtable AS SELECT * FROM t1, t2 d
WHERE t1.Name = d.Name AND t1.Date BETWEEN d.Date AND (
SELECT * FROM (
SELECT Date from t2
WHERE t2.Name = d.Name
ORDER BY Date ASC )
WHERE Date > d.Date
LIMIT 1 )
Now, I would like to find the next date only once for all records in t2 and create a new column in t2 that contains the next date. This way, I only search for the next date about 400,000 times instead of 56 billion times, significantly improving my performance.
Thus the output of the query I'm looking for would make t2 look like this:
Record Name Date Next_Date
1 A1234 2016-01-03 04:58:00 2016-01-05 21:06:00
2 A1234 2015-12-15 08:34:00 2016-01-03 04:58:00
3 A5678 2016-01-04 03:14:00 2999-12-31 23:59:59
4 A1234 2016-01-05 21:06:00 2999-12-31 23:59:59
Then I would be able to simply query whether t1.Date is between t2.Date and t2.Next_Date.
How can I build a query that will add the next date to a new column in t2?
Rather than add the new column, you should just be able to use a query like the one below to join the tables:
SELECT
T1.*,
T2_1.*
FROM
T1
INNER JOIN T2 T2_1 ON
T2_1.Name = T1.Name AND
T2_1.some_date < T1.some_date
LEFT OUTER JOIN T2 T2_2 ON
T2_2.Name = T1.Name AND
T2_2.some_date > T2_1.some_date
LEFT OUTER JOIN T2 T2_3 ON
T2_3.Name = T1.Name AND
T2_3.some_date > T2_1.some_date AND
T2_3.some_date < T2_2.some_date
WHERE
T2_3.Name IS NULL
You can do the same with NOT EXISTS, but this method often has better performance.
You can speed up (sub)queries by using proper indexes.
To check which indexes are actually used, use EXPLAIN QUERY PLAN.
Your original query, without any indexes, would be executed by SQLite 3.10.0 like this:
0|0|0|SCAN TABLE t1
0|1|1|SEARCH TABLE t2 AS d USING AUTOMATIC COVERING INDEX (name=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SCAN TABLE t2
1|0|0|USE TEMP B-TREE FOR ORDER BY
(The "automatic" index is created temporarily just for this query; the optimizer has estimated that this would still be faster than not using any index.)
In this case, you get the most optimal query plan by indexing all columns used for lookups:
create index i1nd on t1(name, date);
create index i2nd on t2(name, date);
0|0|1|SCAN TABLE t2 AS d
0|1|0|SEARCH TABLE t1 USING INDEX i1nd (name=? AND date>? AND date<?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE t2 USING COVERING INDEX i2nd (name=? AND date>?)
I've used this method on tables with around 1 mil rows with success. Obviously, creating an index that will cover this query will help performance.
This approach uses RANK to create a value to join against. After creating the RANK in a CTE (I use this for readability reasons, please correct for style or personal preference), use a sub-query to join rnk to rnk + 1; aka the next date.
Here's an example of what the code looks like using your sample values.
IF OBJECT_ID('tempdb..#T2') IS NOT NULL
DROP TABLE #T2
CREATE TABLE #T2
(
Record INT NOT NULL PRIMARY KEY,
Name VARCHAR(10),
[DATE] DATETIME,
)
INSERT INTO #T2
VALUES (1, 'A1234', '2016-01-03 04:58:00'),
(2, 'A1234', '2015-12-15 08:34:00'),
(3, 'A5678', '2016-01-04 03:14:00'),
(4, 'A1234', '2016-01-05 21:06:00');
WITH Rank_Dates
AS (Select *
,rank() OVER(PARTITION BY #t2.name ORDER BY #t2.date DESC) AS rnk
FROM #T2)
select RD1.Record,
RD1.Name,
RD1.DATE,
COALESCE (RD2.DATE, '2999-12-31 23:59:59') AS NEXT_DATE
FROM Rank_Dates RD1
LEFT JOIN Rank_Dates RD2
ON RD1.rnk = RD2.rnk + 1
AND RD1.Name = RD2.Name
ORDER BY RD1.Record -- ORDER BY is optional
;
EDIT: added code output below.
The code above produces the following output.
Record Name DATE NEXT_DATE
1 A1234 2016-01-03 04:58:00.000 2016-01-05 21:06:00.000
2 A1234 2015-12-15 08:34:00.000 2016-01-03 04:58:00.000
3 A5678 2016-01-04 03:14:00.000 2999-12-31 23:59:59.000
4 A1234 2016-01-05 21:06:00.000 2999-12-31 23:59:59.000
On a random note. Would using the CURRENT_TIMESTAMP in place of hard coding '2999-12-31 23:59:59.000' produce a similar result?

Aggregate only new rows from source table

I got one Source table with a timestamp column (YYYY.MM.DD HH24:MI:SS) and a target table with aggregated rows on daily basis (Date column: YYYY.MM.DD).
My Problem is: How do I bring new data from source to target and aggregate it?
I tried:
select
a.Sales,
trunc(a.timestamp,'DD') as TIMESTAMP,
count(1) as COUNT,
from
tbl_Source a
where trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY)from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
group by a.sales,
trunc(a.Timestamp,'DD')
The problem with that is: when I have a row with timestamp '2013.11.15 00:01:32' and the max day from target is the 14th of november, it will only aggregate the 15th. Would I use >= instead of > some rows would get loaded twice.
It looks like you are looking for a merge statement: If the day is already present in tbl_target then update the count else insert the record.
merge into tbl_target dest
using
(
select sales, trunc(timestamp) as theday , count(*) as sales_count
from tbl_Source
where trunc(timestamp) >= ( select nvl(max(time_to_day),to_date('01.01.1975','dd.mm.yyyy')) from tbl_target )
group by sales, trunc(timestamp)
) src
on (src.theday = dest.time_to_day)
when matched then update set
dest.sales_count = src.sales_count
when not matched then
insert (time_to_day, sales_count)
values (src.theday, src.sales_count)
;
As far as I can understand your question: you need to get everything since the last reload to target table.
The problem here: you need this date, but it is truncated during the update.
If my guesses are correct you cannot do anything except to store the date of reload as an additional column because there is no way to get it back from the data presented here.
about your query:
count(*) and count(1) are the same in performance (proved many times, at least in 10-11 versions) - do not make this count(1), looks really ugly
do not use nvl, use coalesce instead of it - it is much faster
I would write your query like that:
with t as (select max(b.time_to_day) mx from tbl_target b)
select a.sales,trunc(a.timestamp,'dd') as timestamp,count(*) as count
from tbl_source a,t
where trunc(a.timestamp,'dd') > t.mx or t.mx is null
group by a.sales,trunc(a.timestamp,'dd')
Does this fit your needs:
WHERE trunc(a.timestamp,'DD') > nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from tbl_target b), to_date('01.01.1975 00:00:00','dd.mm.yyyy hh24:mi:ss'))
i.e. instead of 2013-11-15 00:00:00 compare to 2013-11-16 23:59:59
Update:
This one?
WHERE trunc(a.timestamp,'DD') BETWEEN nvl((select MAX(b.TIME_TO_DAY) from ...) AND nvl((select MAX(b.TIME_TO_DAY) + 1 - 1/(24*60*60) from ...)

Resources