Oracle, performance issue - oracle

I have a Insert Into ... SELECT statement. This statement consists of 3 selects join by UNION ALL CLAUSE.
Insert into
(Select... (1)
UNION ALL
Select... (2)
UNION ALL
Select... (3)
)
Table, which i'am inserting is in nologging mode, Parallel degree = 8, and using /*+ APPEND*/ clause.
Selects 1 and 2 are just simple selects over 1 table. I ran this 2 selects in pl/sql block like:
Select count(1) into x From
(Select (1) UNION ALL Select... (2))
It returned about 1.5 million records in 3 seconds. Then i put these selects into insert statement:
INSERT INTO TABLE_NAME
(Select (1) UNION ALL Select... (2))
And insert lasted same time as above.
Now, last (3) Select is From 1 table and few joins (including 1 left join).
When i ran this in Select Count(1) Into x It returned 80 000 records in about 3 minutes.
I expect that insert of 80 000 records would be no longer than time of select count(1) into, in the meantime insert into ... select has already half an hour of execution time.
1.5 milion rec - 3secs
80 000 rec - ...
Can Anyone enlight me, whats going on? Am i doing something wrong?

Related

snowflake select max date from date array

Imagine I have a table with some field one of which is array off date.
as below
col1 col2 alldate Max_date
1 2 ["2021-02-12","2021-02-13"] "2021-02-13"
2 3 ["2021-01-12","2021-02-13"] "2021-02-13"
4 4 ["2021-01-12"] "2021-01-12"
5 3 ["2021-01-11","2021-02-12"] "2021-02-12"
6 7 ["2021-02-13"] "2021-02-13"
I need to write a query such that to select only the one which has max date in there array. And there is a column which has max date as well.
Like the select statement should show
col1 col2 alldate Max_date
1 2 ["2021-02-12","2021-02-13"] "2021-02-13"
2 3 ["2021-01-12","2021-02-13"] "2021-02-13"
6 7 ["2021-02-13"] "2021-02-13"
The table is huge so a optimized query is needed.
Till now I was thinking of
select col1, col2, maxdate
from t1 where array_contains((select max(max_date) from t1)::variant,date));
But to me it seems running select statement per query is a bad idea.
Any Suggestion
If you want pure speed using lateral flatten is 10% faster than the array_contains approach over 500,000,000 records on a XS warehouse. You can copy paste below code straight into snowflake to test for yourself.
Why is the lateral flattern approach faster?
Well if you look at the query plans the optimiser filters at first step (immediately culling records) where as the array_contains waits until the 4th step before doing the same. The filter is the qualifier of the max(max_date) ...
Create Random Dataset:
create or replace table stack_overflow_68132958 as
SELECT
seq4() col_1,
UNIFORM (1, 500, random()) col_2,
DATEADD(day, UNIFORM (-40, -0, random()), current_date()) random_date_1,
DATEADD(day, UNIFORM (-40, -0, random()), current_date()) random_date_2,
DATEADD(day, UNIFORM (-40, -0, random()), current_date()) random_date_3,
ARRAY_CONSTRUCT(random_date_1, random_date_2, random_date_3) date_array,
greatest(random_date_1, random_date_2, random_date_3) max_date,
to_array(greatest(random_date_1, random_date_2, random_date_3)) max_date_array
FROM
TABLE (GENERATOR (ROWCOUNT => 500000000)) ;
Test Felipe/Mike approach -> 51secs
select
distinct
col_1
,col_2
from
stack_overflow_68132958
qualify
array_contains(max(max_date) over () :: variant, date_array);
Test Adrian approach -> 47 secs
select
distinct
col_1
, col_2
from
stack_overflow_68132958
, lateral flatten(input => date_array) g
qualify
max(max_date) over () = g.value;
I would likely use a CTE for this, like:
WITH x AS (
SELECT max(max_date) as max_max_date
FROM t1
)
select col1, col2, maxdate
from t1
cross join x
where array_contains(x.max_max_date::variant,alldate);
I have not tested the syntax exactly and the data types might vary things a bit, but the concept here is that the CTE will be VERY fast and return a single record with a single value. A MAX() function leverage metadata in Snowflake, so it won't even use a warehouse to get it.
That said, the Snowflake profiler is pretty smart, so your query might actually create the exact same query profile as this statement. Test them both and see what the Profile looks like to see if it truly makes a difference.
To build on Mike's answer, we can do everything in the QUALIFY, without the need for a CTE:
with t1 as (
select 'a' col1, 'b' col2, '2020-01-01'::date maxdate, array_construct('2020-01-01'::date, '2018-01-01', '2017-01-01') alldate
)
select col1, col2, alldate, maxdate
from t1
qualify array_contains((max(maxdate) over())::variant, alldate)
;
Note that you should be careful with types. Both of these are true:
select array_contains('2020-01-01'::date::string::variant, array_construct('2020-01-01', '2019-01-01'));
select array_contains('2020-01-01'::date::variant, array_construct('2020-01-01'::date, '2019-01-01'));
But this is false:
select array_contains('2020-01-01'::date::variant, array_construct('2020-01-01', '2019-01-01'));
You have some great answers, which I only saw, after i wrote mine up.
If your data types, match, you should be good to go, copy paste direct into snowflake ... and this should work.
create or replace schema abc;
use schema abc;
create or replace table myarraytable(col1 number, col2 number, alldates variant, max_date timestamp_ltz);
insert into myarraytable
select 1,2,array_construct('2021-02-12'::timestamp_ltz,'2021-02-13'::timestamp_ltz), '2021-02-13'
union
select 2,3,array_construct('2021-01-12'::timestamp_ltz,'2021-02-13'::timestamp_ltz),'2021-02-13'
union
select 4,4,array_construct('2021-01-12'::timestamp_ltz) , '2021-01-12'
union
select 5,3,array_construct('2021-01-11'::timestamp_ltz,'2021-02-12'::timestamp_ltz) , '2021-02-12'
union
select 6,7,array_construct('2021-02-13'::timestamp_ltz) , '2021-02-13';
select * from myarraytable
order by 1 ;
WITH cte_max AS (
SELECT max(max_date) as max_date
FROM myarraytable
)
select myarraytable.*
from myarraytable, cte_max
where array_contains(cte_max.max_date::variant, alldates)
order by 1 ;

Problem in merge condition in oracle when merging two tables

I have a table which has data as:
id payor_name
---------------
1 AETNA
2 UMR
3 CIGNA
4 METLIFE
4 AETNAU
5 ktm
6 ktm
Id and payor_name are two columns.So,
My expected output is:
id payor_name
---------------
1 AETNA
2 UMR
3 CIGNA
4 METLIFE
4 AETNAU
6 ktm ...> I want to change the id of this row to be 6 from 5.
6 ktm
I want one to one mapping between id and payor_name.So,this is what I tried:
MERGE INTO offc.payor_collec A
USING (select id from offc.payor_collec where payor_name in(
select payor_name from offc.payor_collec group by payor_name having count(distinct id)>=2)) B
ON (A.id=B.id)
WHEN MATCHED THEN
UPDATE SET A.id=B.id
But when I compiled I got error as:
Error at line 1
ORA-38104: Columns referenced in the ON Clause cannot be updated: "A"."ID"
Id is number where as payor_name is varchar2.
How can I achieve this result?
MERGE works, but slightly different than your code.
SQL> select * from test;
ID PAYOR
---------- -----
1 aetna
2 umr
5 ktm
6 ktm
SQL> merge into test t
2 using (select max(t1.id) id,
3 t1.payor_name
4 from test t1
5 group by t1.payor_name
6 ) x
7 on (x.payor_name = t.payor_name)
8 when matched then update set
9 t.id = x.id;
4 rows merged.
SQL> select * from test;
ID PAYOR
---------- -----
1 aetna
2 umr
6 ktm
6 ktm
SQL>
Use a correlated subquery:
UPDATE PAYOR_COLLEC pc
SET pc.ID = (SELECT MAX(pc2.ID)
FROM PAYOR_COLLEC pc2
WHERE pc2.PAYOR_NAME = pc.PAYOR_NAME)
dbfiddle here
You can use a MERGE statement, as you tried and as Littlefoot has shown.
You can also use a correlated subquery as Bob Jarvis has shown, but that will be quite inefficient.
Many Oracle developers are unaware that you can also update through a join. Worse, there are many who say "there is no such thing in Oracle."
In your problem, you need to join your table to an aggregate query (picking just the max id for each payor_name) and the join is on the group by column in the aggregate. This already guarantees that the join column will be unique in the right-hand table; that is all Oracle needs to allow the update through join.
Here is a complete example, starting with the create table statement, then the update and then showing the table after the update. Note that I don't need any constraints (like primary key, not null, unique, etc.) or indexes on the base table. If they do exist, so much the better, but the solution works in the most general case.
create table t (id, payor_name) as
select 1, 'AETNA' from dual union all
select 2, 'UMR' from dual union all
select 3, 'CIGNA' from dual union all
select 4, 'METLIFE' from dual union all
select 4, 'AETNAU' from dual union all
select 5, 'ktm' from dual union all
select 6, 'ktm' from dual;
Table T created.
update
(
select id, payor_name, max_id
from t inner join
(select max(id) as max_id, payor_name from t group by payor_name)
using (payor_name)
)
set id = max_id where id != max_id
;
1 row updated.
select * from t;
ID PAYOR_NAME
----- ----------
1 AETNA
2 UMR
3 CIGNA
4 METLIFE
4 AETNAU
6 ktm
6 ktm
Notice the where clause at the end of the update statement, too. You don't want to update rows to their pre-existing value; that will still generate undo and redo data (although I understand that Oracle has changed that in more recent versions - it now doesn't generate undo and redo unless a row did indeed change). I assume ID is NOT NULL - otherwise you should rewrite the where clause as
where decode(id, max_id, 0) is null
or equivalent

Use a sub-select in the PIVOT's FOR clause?

The standard PIVOT syntax uses a static FOR list:
SELECT *
FROM (
SELECT log_id, event_id, event_time
FROM patient_events
WHERE event_id IN (10,20,30,40,50)
) v
PIVOT (
max(event_time) event_time
FOR event_id IN( 10,20,30,40,50 )
)
Is there a way to make this dynamic?
I know the sub-select in the WHERE clause will work, but can I use one in the FOR?
SELECT *
FROM (
SELECT log_id, event_id, event_time
FROM patient_events
WHERE event_id IN ( sub-select to generate list of IDs )
) v
PIVOT (
max(event_time) event_time
FOR event_id IN( sub-select to generate list of IDs )
)
You can't in pure SQL, but I don't think quite because of the reason suggested - it's not that the IN clause needs to be ordered, it's that it has to be constant.
When given a query, the database needs to know the shape of the result set and the shape needs to be consistent across queries (assuming no other DDL operations have taken place that might affect it). For a PIVOT query, the shape of the result is defined by the IN clause - each entry becomes a column, with a data type corresponding to the aggregation clause.
Hypothetically if you were to allow a sub-select for the IN clause then you could alter the shape of the result set just by performing DML operations. Imagine your sub-select worked and got you a list of all event_ids known to the system - by inserting a new record into whatever drives that sub-select, your query returns a different number of columns even though no DDL has occurred.
Now we're stuck - any view built on that query is invalid because its shape wouldn't match that of the query, but Oracle couldn't know that it's invalid because none of the objects it depends on have been changed by DDL.
Depending on where you're consuming the result, dynamic SQL's your only option - either at the application level (build the IN list yourself) or via a ref cursor in a database function or procedure.
Interesting question.
On the face of it, it shouldn't work, since the list of values (which will become column names) must be ordered. This is not the case for an "IN" list in the WHERE clause. But perhaps it would work with an ORDER BY condition in the sub-SELECT?
Unfortunately, no. This is easy to test. Got the same error message with or without ORDER BY. (And the query works fine if the IN list is just 10, 20, 30, 40 - the actual department numbers from the DEPT table.) Using tables from the standard SCOTT schema.
SQL> select deptno from scott.dept;
DEPTNO
----------
10
20
30
40
4 rows selected.
SQL> select * from (
2 select sal, deptno
3 from scott.emp
4 )
5 pivot (sum(sal) as total_sal
6 for deptno in (10, 20, 30, 40))
7 ;
10_TOTAL_SAL 20_TOTAL_SAL 30_TOTAL_SAL 40_TOTAL_SAL
------------ ------------ ------------ ------------
8750 10875 9400
1 row selected.
SQL> select * from (
2 select sal, deptno
3 from scott.emp
4 )
5 pivot (sum(sal) as total_sal
6 for deptno in (select deptno from scott.dept order by deptno))
7 ;
for deptno in (select deptno from scott.dept order by deptno))
*
ERROR at line 6:
ORA-00936: missing expression

How to get minimum unused number from a column in Oracle?

In my database I have a table with column that indicates the code of each record ( aside from ID column ). this field is unique and each time the user tries to insert a record into the table, the first unused code should be assigned to the record. Now the table has the column of codes with the following order :
+------+
code
+------+
1
+------+
2
+------+
3
+------+
5
+------+
I want a query to return 4 as the result.
Note that this query is highly frequent in my system and the best query with minimum execution time will be appreciated.
Is using a self-join acceptable? If so:
-- your test data:
WITH data AS (SELECT 1 AS code FROM DUAL
UNION SELECT 2 FROM DUAL
UNION SELECT 3 FROM DUAL
UNION SELECT 5 FROM DUAL)
-- request:
SELECT COALESCE(MIN(d1.code+1),1)
FROM data d1 LEFT JOIN data d2 ON d1.code+1 = d2.code
WHERE d2.code IS NULL;
This will build the list of data.code without a successor. And using MIN(...+1) you will get the first empty slot. I used COALESCE(...) in order to handle the specific case where there isn't any entry in the data table.
And alternate form using a sequence generator might lead to better performances as is does not require the whole table to be traversed in order to perform the aggregate function MIN():
-- your test data:
WITH data AS (SELECT 1 AS code FROM DUAL
UNION SELECT 5 FROM DUAL
UNION SELECT 2 FROM DUAL
UNION SELECT 3 FROM DUAL)
-- request:
SELECT T.code FROM (SELECT d1.code
FROM (SELECT LEVEL code FROM DUAL CONNECT BY LEVEL < 9999) d1 LEFT JOIN data d2
ON d1.code = d2.code
WHERE d2.code IS NULL
ORDER BY d1.code ASC
) T WHERE ROWNUM < 2
The drawback is you now have an upper limit hard-coded. It might be dynamically inferred from the data table though. So is is not really blocking. I let you compare timings yourself.
this field is unique and each time the user tries to insert a record into the table, the first unused code should be assigned to the record
Please note however this will lead to a race condition if two concurrent sessions try to insert a row at the same time. Given your example, they will both try to insert a row with code = 4 -- obviously both will not succeed in doing so as your column is unique...
I recently use the code below:
SELECT t1.id+1
FROM table t1
LEFT OUTER JOIN table t2 ON (t1.id + 1 = t2.id)
WHERE t2.id IS NULL
/* and rownum = 1 Need to use a sub select if you want this to work */
ORDER BY t1.id;
I run it every time that I need to insert a new row and use the minimum unused id.
I hope it works for your purposes.
select level unusedval from dual connect by level < 10
minus
select tno from t2);
you can change level condition dependents on max value.

Fetching Lastrow from the table in toad

I have to fetch the first and last row of the table in Toad.
I have used the following query
select * from grade_master where rownum=(select max(rownum) from grade_master)
select * from grade_master where rownum=1
The second query works to fetch the first row. but the first not working. Anyone please help me.
Thanks in advance
Such request makes sense if you specify sort order of the results - there are no such things in database as "first" and "last" rows if sort order is not specified.
SQL> with t as (
2 select 'X' a, 1 b from dual union all
3 select 'C' , 2 from dual union all
4 select 'A' a, 3 b from dual
5 )
6 select a, b, decode(rn, 1, 'First','Last')
7 from (
8 select a, b, row_number() over(order by a) rn,
9 count(*) over() cn
10 from t
11 )
12 where rn in (1, cn)
13 order by rn
14 /
A B DECOD
- ---------- -----
A 3 First
X 1 Last
In oracle the data is not ordered until you specify the order in you sql statement.
So when you do:
select * from grade_master
oracle will give the rows in anyway it want wants.
OTOH if you do
select * from grade_master order by id desc
Then oracle will give the rows back ordered by id descending.
So to get the last row you could do this:
select *
from (select * from grade_master order by id desc)
where rownum = 1
The rownum is determined BEFORE the "order by" clause is assessed, so what this query is doing is ordering the rows descending (the inside query) and then giving this ordered set to the outer query. The outer gets the first row of the set then returns it.

Resources