Delete the same set of values from multiple tables - oracle

I would like to not have to repeat the same subquery over and over for all tables.
Example:
begin
-- where subquery is quite complex and returns thousands of records of a single ID column
delete from t1 where exists ( select 1 from subquery where t1.ID = subquery.ID );
delete from t2 where exists ( select 1 from subquery where t2.ID = subquery.ID );
delete from t3 where exists ( select 1 from subquery where t3.ID = subquery.ID );
end;
/
An alternative I've found is:
declare
type id_table_type is table of table.column%type index by PLS_INTEGER
ids id_table_type;
begin
select ID
bulk collect into ids
from subquery;
forall indx in 1 .. ids.COUNT
delete from t1 where ID = ids(indx);
forall indx in 1 .. ids.COUNT
delete from t2 where ID = ids(indx);
forall indx in 1 .. ids.COUNT
delete from t3 where ID = ids(indx);
end;
/
What are your thoughts about this alternative? is there a more efficient way of doing this?

Create a temporary table, once, to hold the results of the subquery.
For each run, insert the results of the subquery into the temporary table. The subquery only runs once and each delete is simple: delete from mytable t where t.id in (select id from tmptable);.
Truncate the table when finished.

If you could do it in pure SQL than do it in SQL, no need of PL/SQL. With every SQL call in PL/SQL(or vice-versa, but less in this case) there is an overhead associated with each context switch between the two engines.
Now, having said that, if you must do it in PL/SQL, then, it is possible to reduce context switches by bulk binding the whole collection to the DML statement in one operation.
Usually a cursor for loop does an implicit bulk collect limit 100 which is much better than an explicit cursor.
But, it's not just about bulk collecting, we are dealing with the operations we would subsequently do on the array that we have fetched incrementally. We could further improve the performance by using FORALL statement along with BULK COLLECT.
IMO, the best would be do it in pure SQL. If you really want to do it in PL/SQL, then do it as I mentioned above.
I would go with the SQL approach, and since you have the same subquery repeated, I would use QUERY RESULT CACHE. Oracle 11g introduced the QUERY RESULT CACHE.
In your subquery:
SELECT /*+ RESULT_CACHE */ <column_list> .. <your subquery>...
For example,
SQL> EXPLAIN PLAN FOR
2 SELECT /*+ RESULT_CACHE */
3 deptno,
4 AVG(sal)
5 FROM emp
6 GROUP BY deptno;
Explained.
Let's look at the plan table output:
SQL> SELECT * FROM TABLE(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Plan hash value: 4067220884
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 21 | 3 (0)| 00:00:01 |
| 1 | RESULT CACHE | b9aa181887ufz5341w1zqpf1d1 | | | | |
| 2 | HASH GROUP BY | | 3 | 21 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 98 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Result Cache Information (identified by operation id):
------------------------------------------------------
1 - column-count=2; dependencies=(SCOTT.EMP); name="SELECT /*+ RESULT_CACHE */
deptno,
AVG(sal)
FROM emp
GROUP BY deptno"
15 rows selected.

An alternative would be:
begin
for s in (select distinct id from subquery) loop
delete from t1 where t1.id = s.id;
delete from t2 where t2.id = s.id;
delete from t3 where t3.id = s.id;
end loop;
end;
/
In general this is a questionable technique, as processing "row-by-row" is slower than doing multi-row operations. But if the point is that subquery is very slow, this may be an improvement.

There seems to be a dependency between tables t1, t2 and t3 on its id field.
Probably you can solve this by a delete trigger on t1 to remove entries in t2and t3 as well.
I don't know exactly the affected tables of your subquery, maybe you can set an update flag (e.g. a datefield) on a further table t0 and delete entries on t1, t2 and t3 by an update trigger on table t0 if datefield changes.
The advantage of this solution is database consistency and pure sql.

I think it is not possible to use delete on multiple tables at a time.
But,one way could be to use the sub query factoring (with clause) .
As of now,i don't have exact answer but i think with clause might help.I will get back with exact query once i am able to devote some time.
Also,in case you have a requirement of deleting the sub query id from sub query tables as well on daily basis ,then it will be easier in sense,you can create foreign key based on on delete cascade.
Hope it helps .

Related

Return first row in each group from Oracle SQL

Hi I need to come up with a query that efficiently returns just one record per group(I might be thinking about it wrong) and stop searching for more records in that group as soon as it has found one record.
This is my table:
|col1 | col2|
|-----|-----|
| A | 1 |
| A | 2 |
| B | 3 |
| B | 4 |
I want to return.
|col1 | col2|
|-----|-----|
| A | 1 |
| B | 3 |
Note that I don't actually care if in row one I have A,1 or A,2(same applies to second row).
What I want is to get one record that has A in first column could be any record that matches that criteria, and similarly I want one record that has B in col1.
The closes that I know to getting this are two queries
SELECT col1, MIN(col2)
FROM tablename
GROUP BY col1
and the other:
SELECT *
FROM tablename
WHERE col1 = 'A'
AND ROWNUM = 1
First query is not good enough because it will try to find all records that have A in col1(in the actual table I'm looking at this means searching though millions of rows, and my indicies won't be of much help here). Second query will return just one value of col1 at a time, so I'd have to run it thousands of times to get all the records I need.
NOTE:
I did see similar question in here but the answers were focused on just getting the right query results, I my case issue is how long do I need to wait for these results.
Sounds like this is the query you are looking for:
select col1
, min(col2) keep (dense_rank first order by rownum) col2
from tablename
group by col1;
I feel the below query is returning the result quick.
SELECT col1,col2 FROM (
SELECT col1,col2, ROW_NUMBER () over (partition by col1 order by col2 asc)
minseq FROM tablename
--where rownum < 1000000000
)
where minseq = 1;
Normal query
SELECT col1, MIN(col2)
FROM tablename
GROUP BY col1
took 1 min to fetch 500 records out of 100000000 records and this query took 0.03 seconds

Finding summary & basic statistics from data in Vertica

Recently I am exploring HPE Vertica a bit. Is it possible to find summary statistics (mean,sd,quartiles,max,min,counts etc) from a data table loaded in vertica?
These two links;
https://my.vertica.com/docs/7.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/ANALYZE_STATISTICS.htm
https://my.vertica.com/docs/7.0.x/HTML/Content/Authoring/SQLReferenceManual/Functions/VerticaFunctions/ANALYZE_HISTOGRAM.htm
say that we can find statistics & histogram from the data but the result is making no sense to me.
According to it, the ANALYZE_STATISTICS command will throw a 0 for successful execution. Like
NEWDB_aug17=> SELECT ANALYZE_STATISTICS ('MM_schema.capitalline');
ANALYZE_STATISTICS
--------------------
0
(1 row)
Here NEWDB_aug17 is the database, schema is MM_schema under which capitalline table was inserted. But where are the summary measures, i mean the numbers we are actually looking for? Only a 0 is not going to serve my purpose.
Can you please guide me in this context?
Vertica saves the statistics collected by ANALYZE_STATISTICS() in the catalog location.
These statistics are later used to calculate best query execution plan.
You can find the statistics details in the system table v_internal.dc_analyze_statistics
[dbadmin#vertica-1 ~]$ vsql
dbadmin=> \x
Expanded display is on.
dbadmin=> select * from v_internal.dc_analyze_statistics limit 1;
-[ RECORD 1 ]----+-----------------------------------
time | 2017-08-21 02:07:03.287895+00
node_name | v_test_node0001
session_id | v_test_node0001-502811:0x834a4
user_id | 45035996273704962
user_name | dbadmin
transaction_id | 45035996307673368
statement_id | 9
request_id | 1
table_name | test_table
proj_column_name | test_column
proj_name | test_table_sp_v11_b1
table_oid | 45036013037102108
proj_column_oid | 45036013037111264
proj_row_count | 119878353211
disk_percent | 10
disk_read_rows | 11987835321
sample_rows | 131072
sample_bytes | 7602176
start_time | 2017-08-21 02:07:03.657377+00
end_time | 2017-08-21 02:07:24.799398+00
Time: First fetch (1 row): 849.467 ms. All rows formatted: 849.594 ms
Or at this path:
{your_catalog_location}/{db_name}/{node_name}_catalog/DataCollector/AnalyzeStatistics_*.log
percentile_cont function of Vertica would be helpful in retrieving quartile.
create table test
(metric_value integer);
insert into test values(1);
insert into test values(2);
insert into test values(3);
insert into test values(4);
insert into test values(5);
insert into test values(6);
insert into test values(7);
insert into test values(8);
insert into test values(9);
insert into test values(10);
alter table anatest add column metric varchar(100) default 'abc';
select
metric_value,
percentile_cont(1) within group (order by metric_value) over (partition by metric) as max,
percentile_cont(.75) within group (order by metric_value ) over (partition by metric) as q3,
percentile_cont(.5) within group (order by metric_value ) over (partition by metric) as median,
percentile_cont(.25) within group (order by metric_value ) over (partition by metric) as q1,
percentile_cont(0) within group (order by metric_value ) over (partition by metric) as min
from test ;

Oracle Index with multiple Columns querying on single column

In a table in our Oracle installation we have a table with an index on two of the columns (X and Y). If I do a query on the table with a where clause only touching column X, will Oracle be able to use the index?
For example:
Table Y:
Col_A,
Col_B,
Col_C,
Index exists on (Col_A, Col_B)
SELECT * FROM Table_Y WHERE Col_A = 'STACKOVERFLOW';
Will the index be used, or will a table scan be done?
It depends.
You could check it by letting Oracle explain the execution plan:
EXPLAIN PLAN FOR
SELECT * FROM Table_Y WHERE Col_A = 'STACKOVERFLOW';
and then
select * from table(dbms_xplan.display);
So, for example with
create table table_y (
col_a varchar2(30),
col_b varchar2(30),
col_c varchar2(30)
);
create unique index table_y_ix on table_y (col_a, col_b);
and then a
explain plan for
select * from table_y
where col_a = 'STACKOVERFLOW';
select * from table(dbms_xplan.display);
The plan (on my installation) looks like:
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 51 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TABLE_Y | 1 | 51 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TABLE_Y_IX | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COL_A"='STACKOVERFLOW')
ID 2 shows you, that the index TABLE_Y_IX is indeed used for an index range scan.
If on another installation Oracle chooses to use the index is dependend on many things. It's Oracle's query optimizer that makes this decision.
Update If you feel you're be better off (performance wise, that is) if Oracle used the index, you might want to try the + index_asc(...) (see index hint)
So in your case that would be something like
SELECT /*+ index_asc(TABLE_Y TABLE_Y_IX) */ *
FROM Table_Y
WHERE Col_A = 'STACKOVERFLOW';
Additionally, I would ensure that you have gathered statistics on the table and its columns. You can check the date of the last gathering of statistics with a
select last_analyzed from dba_tables where table_name = 'TABLE_Y';
and
select column_name, last_analyzed from dba_tab_columns where table_name = 'TABLE_Y';
If there are no statistics or if they're stale, make yourself familiar with the dbms_stats package to gather such statistics.
These statistics are the data that the query optimizer relies on heavily to make its decisions.

Performance tuning about the "ORDER BY" and "LIKE" clause

I have 2 tables which have many records (say both TableA and TableB has about 3,000,000 records).vr2_input is a varchar input parameters enter by the users and I want to get the most 200 largest "dateField" 's TableA records whose stringField like 'vr2_input' .The 2 tables are joined as the following:
select * from(
select * from
TableA join TableB on TableA.id = TableB.id
where TableA.stringField like 'vr2_input' || '%'
order by TableA.dateField desc
) where rownum < 201
The query is slow , I goggled that and found out that it is because "like" and "order by" involves the full table scan .However , I cannot found a solution to solve the problem . How can I tune this type of SQL? I have already create an index on TableA.stringField and TableA.dateField but how can I use the index feature in the select statement? The database is oracle 10g. Thanks so much!!
Update : I use iddqd 's suggestion and only select the fields that I want and run the explain plan . It cost about 4 mins to finish the query . IX_TableA_stringField is the index name of the TableA.srv_ref field .I run again the explain plan without the hint , the explain plan still get the same result.
EXPLAIN PLAN FOR
select * from(
select
/*+ INDEX(TableB IX_TableA_stringField)*/
TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
TableB.someField1,
TableB.someField2,
TableB.someField3,
from TableA
join TableB on TableA.id=TableB.id
WHERE TableA.stringField like '21'||'%'
order by TableA.dateField desc
) where rownum < 201
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 871807846
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 24000 | 3293 (1)| 00:00:18 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1397 | 163K| 3293 (1)| 00:00:18 |
|* 3 | SORT ORDER BY STOPKEY | | 1397 | 90805 | 3293 (1)| 00:00:18 |
| 4 | NESTED LOOPS | | 1397 | 90805 | 3292 (1)| 00:00:18 |
| 5 | TABLE ACCESS BY INDEX ROWID| TableA | 1397 | 41910 | 492 (1)| 00:00:03 |
|* 6 | INDEX RANGE SCAN | IX_TableA_stringField | 1397 | | 6 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| TableB | 1 | 35 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | PK_TableB | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<201)
3 - filter(ROWNUM<201)
6 - access("TableA"."stringField" LIKE '21%')
filter("TableA"."stringField" LIKE '21%')
8 - access(TableA"."id"="TableB"."id")
You say it's taking about 4 minutes to run the query. The EXPLAIN PLAN output shows an estimate of 18 seconds. So the optimizer is probably far off on some of its estimates in this case. (It could still be choosing the best possible plan, but maybe not.)
The first step in a case like this is to get the actual execution plan and statistics. Run your query with the hint /*+ gather_plan_statistics */, then immediately afterwards execute select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')).
This will show the actual execution plan that was run, and for each step it will show the estimated rows, actual rows, and actual time taken. Post the output here and maybe we can say something more meaningful about your issue.
Without that information, my suggestion is to try out the following rewrite of the query. I believe it is equivalent since it appears that ID is the primary key of TableB.
select TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
TableB.someField1,
TableB.someField2,
TableB.someField3,
from (select * from(
select
TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
from TableA
WHERE TableA.stringField like '21'||'%'
order by TableA.dateField desc
)
where rownum < 201
) TableA
join TableB on TableA.id=TableB.id
Do you need to select all columns (*)? The optimizer will be more likely to full scan if you select all columns. If you need all columns in output you may be better to select the id in your inline view and then join back to select other columns, which could be done with an index lookup. Try running an explain plan for both cases to see what the optimizer is doing.
Create indexes on the stringField and dateField columns. The SQL engine uses them automatically.
select id from(
select /*+ INDEX(TableB stringField_indx)*/ TableB.id from
TableA join TableB on TableA.id = TableB.id
where TableA.stringField like 'vr2_input' || '%'
order by TableA.dateField desc
) where rownum < 201
next:
SELECT * FROM TableB WHERE id iN( id from first query)
Please send stats and DDL of this tables.
If you have enough memory you can hint the query to use hash join. Could you please attach the explain plan
How many records does Table A has if it's the smaller table could you do the select on that table and then loop though the results retrieving the Table B records, as both the select and the sort are on TableA.
A good experiment would be to remove the join and test the speed on that also if allowed can you put the rownum < 201 as an AND clause on the main query. It's probable at the moment that the query is returning all rows to the outer query and then it's getting trimmed?
To optimize the like predicate, you can create a contextual index and use contains clause.
Look: http://docs.oracle.com/cd/B28359_01/text.111/b28303/ind.htm
Thanks
You can create one function index on tableA. That will return 1 or 0 based on the condition TableA.stringField like 'vr2_input' || '%' is satisfied or not. That index will make query run faster. The logic of the function will be
if (substr(TableA.stringField, 1, 9) = 'vr2_input'
THEN
return 1;
else
return 0;
Using actual column names instead of "*" may help. At least common column names should be removed.

How can I get a COUNT(col) ... GROUP BY to use an index?

I've got a table (col1, col2, ...) with an index on (col1, col2, ...). The table has got millions of rows in it, and I want to run a query:
SELECT col1, COUNT(col2) WHERE col1 NOT IN (<couple of exclusions>) GROUP BY col1
Unfortunately, this is resulting in a full table scan of the table, which takes upwards of a minute. Is there any way of getting oracle to use the index on the columns to return the results much faster?
EDIT:
more specifically, I'm running the following query:
SELECT owner, COUNT(object_name) FROM all_objects GROUP BY owner
and there is an index on SYS.OBJ$ (SYS.I_OBJ2) which indexes the owner# and name columns; I believe I should be able to use this index in the query, rather than a full table scan of SYS.OBJ$
I have had the chance to play around with this, and my previous comments regarding the NOT IN are a red herring in this case. The key thing is the presence of NULLs, or rather whether the indexed columns have NOT NULL constraints enforced.
This is going to depend on the version of the database you're using, because the optimizer gets smarter with each release. I'm using 11gR1 and the optimizer used the index in all cases except one: when both columns were null and I didn't include the NOT IN clause:
SQL> desc big_table
Name Null? Type
----------------------------------- ------ -------------------
ID NUMBER
COL1 NUMBER
COL2 VARCHAR2(30 CHAR)
COL3 DATE
COL4 NUMBER
Without the NOT IN clause...
SQL> explain plan for
2 select col4, count(col1) from big_table
3 group by col4
4 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 1753714399
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31964 | 280K| | 7574 (2)| 00:01:31 |
| 1 | HASH GROUP BY | | 31964 | 280K| 45M| 7574 (2)| 00:01:31 |
| 2 | TABLE ACCESS FULL| BIG_TABLE | 2340K| 20M| | 4284 (1)| 00:00:52 |
----------------------------------------------------------------------------------------
9 rows selected.
SQL>
When I dobbed the NOT IN clause back in, the optimizer opted to use the index. Weird.
SQL> explain plan for
2 select col4, count(col1) from big_table
3 where col1 not in (12, 19)
4 group by col4
5 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 343952376
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31964 | 280K| | 5057 (3)| 00:01:01 |
| 1 | HASH GROUP BY | | 31964 | 280K| 45M| 5057 (3)| 00:01:01 |
|* 2 | INDEX FAST FULL SCAN| BIG_I2 | 2340K| 20M| | 1767 (2)| 00:00:22 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------
2 - filter("COL1"<>12 AND "COL1"<>19)
14 rows selected.
SQL>
Just to repeat, in all other cases, as long as one of the indexed columns was declared not nill, the index was used to satisfy the query. This may not be true on earlier versions of Oracle, but it probably points the way forward.
you could use a hint http://download.oracle.com/docs/cd/B10501_01/server.920/a96533/hintsref.htm ,
but remember that using an index might not always result in faster execution.
(Just in case, are you sure it's doing a table scan and not an index scan?)
Try using COUNT(*) instead of COUNT(col2) (assuming this is appropriate for you problem, of course). Also, maybe try an index with just col1.
You are querying against oracle's fixed tables, since you've not stated which db vesion this is, I'll assume a recent one. Have the fixed tables been analyzed and have updated statistics? Have you tried your query using the rule base optimizer by the use of the /*+ rule */ hint. Often I've seen that queries against oracle's own fixed tables perform better when the rule base optimizer is used.

Resources