Need inputs to delete bulk data faster - performance

Database: Oracle 11g
Scenario:
TABLE_A has around 50 million records
TABLE_A has COLUMN_A, COLUMN_B, COLUMN_C, COLUMN_D, COLUMN_E
COLUMN_A is the primary key of TABLE_A
We need to delete around 30 million records from TABLE_A
So, we created another Table TABLE_B
TABLE_B has COLUMN_A with all the GUIDs qualifying for delete from TABLE_A based on TABLE_A.COLUMN_A
TABLE_B has another column QUALIFIER which is populated with sequence starting from 1 to max count of records, say 30 million
TABLE_B is also range based partitioned based on the QUALIFIER column. Each range is split in 3 million records
Which approach among the below would be most efficient way of deleting the records given the above scenario. We are planning to perform this task over a weekend with minimal downtime and also to avoid any segment space issues due to bulk delete:
Approach-I: Use the direct delete statement without any conditions as follows
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B)
Also, can we use parallel hints to improve the performance:
Delete /*+ parallel first_rows*/ from TABLE_A where COLUMN_A in (select /*+ parallel first_rows*/ COLUMN_A from TABLE_B);
Approach-II: Delete the records from TABLE_A by splitting the data based on QUALIFIER column count range to avoid segment space issues if any. And also the records can be deleted in iterations.
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 1 and 300000);
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 300001 and 600000);
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 600001 and 900000);
etc, until
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 2700001 and 3000000);
Also, can we use parallel hints in the above delete statements to improve the performance
Approach-III: Delete the records from TABLE_A by splitting the data based on the partitions on the TABLE_B
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_1));
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_2));
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_3));
etc, until
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_10));
Also, can we use parallel hints in the above delete statements to improve the performance
Is there any other better approach to follow for the above mentioned scenario?

Related

ORACLE SQL Execute Multiple DML Statements Concurrently within single Stored Procedure

I have written a Stored Procedure which has 10 DML statements in series where each DML statement takes around 3 minutes for execution. The total Stored Procedure runs for about 29 minutes in production (Each DML dumps millions of record).
I would need 1st two DML statements alone to run in series and the remaining 8 can run in parallel, since they have no dependency.
Need advice to achieve this without using dbms_job or dbms_scheduler
begin
insert all into table1 values ()
insert all into table2 values ()
select ...;
insert into table3 select ... from table1 join table2...;
insert into table4 select ... from table2 join tableA...;
insert into table5 select ... from table1 join tableB...;
insert into table6 select ... from table1 join tableC...;
insert into table7 select ... from table1 join tableD...;
insert into table8 select ... from table1 join tableE...;
insert into table9 select ... from table1 join tableF...;
insert into table10 select ... from table1 join tableG...;
end;

Oracle - Deleting or inserting rows via cursor

Apologies in advance, I am occasional Oracle user. I have put together a lookup table used by various functions/procedures and need to keep refresh this once a day with rows that either need removing or inserting. I have put together the following simply queries that return the columns against which I can determine the required action. Once I have returned my deletion data, I then need to delete from table A all records where the site_id and zone_ids match. I cant figure out the best way to achieve this, I have thought about running the select statements as cursors, but am not sure how I then delete the rows from table A using the site_id and zone_id from each record returned.
Query That returns records to be deleted from Table_A
SELECT site_id,zone_id,upper(ebts_switch_name)
FROM Table_A
minus
(SELECT site_id,zone_id, upper(ebts_switch_name)
FROM Table_B#remote_db
UNION
SELECT site_id,zone_id,upper(ebts_switch_name)
FROM Table_C);
Query That returns records to be Inserted into Table_A
SELECT cluster_id, site_id,zone_id, upper(trigram),upper(ebts_switch_name)
FROM Table_B#remote_db
WHERE site_id is NOT NULL
minus
SELECT cluster_name,site_id,zone_id,upper(trigram),upper(ebts_switch_name)
FROM Table_A
You can use your statements directly in the manner shown below:
DELETE FROM TABLE_A
WHERE (SITE_ID, ZONE_ID, UPPER(EBTS_SWITCH_NAME)) IN
(SELECT site_id, zone_id, upper(ebts_switch_name)
FROM Table_A
minus
(SELECT site_id, zone_id, upper(ebts_switch_name)
FROM Table_B#remote_db
UNION
SELECT site_id, zone_id, upper(ebts_switch_name)
FROM Table_C));
INSERT INTO TABLE_A (CLUSTER_NAME, SITE_ID, ZONE_ID, TRIGRAM, EBTS_SWITCH_NAME)
SELECT cluster_id, site_id, zone_id, upper(trigram), upper(ebts_switch_name)
FROM Table_B#remote_db
WHERE site_id is NOT NULL
minus
SELECT cluster_name, site_id, zone_id, upper(trigram), upper(ebts_switch_name)
FROM Table_A;
Best of luck.
I can't understand what do you mean by first query, cause it's almost same as
SELECT *
FROM table_a
MINUS
SELECT *
FROM table_a
means empty record set.
But generally, use DELETE syntax
DELETE
FROM table_a
WHERE (col1, col2) IN (SELECT col1, col2
FROM table_b);
And INSERT syntax
INSERT INTO table_a (col1, col2)
SELECT col1, col2
FROM table_b;

hive join with like operator

I have two tables which are using ORC compression and am using TEZ as execution engine. Table_a contains more than 900k records and table_b contains 17 million records. This query taking longer time I have waited for 2 days but the query execution was not completed. what am I doing wrong in this query.
select min(up.id) as comp002uniqueid, min(cp.product_id) as p_id
from
(select * from table_a where u_id is null) up , table_b cp
where cp.title like concat('% ',up.productname,' %')
group by up.productname;

Oracle join rows in order where order is defined differently on each table

I have two tables
TABLE_A with columns project_id, id and load_date
and TABLE_B with columns project_id, delete_flag and delete_date
where TABLE_A.load_date is a new column and I want to populate it based on TABLE_B.delete_date for historic data. Basically, a file has been repeatedly loaded into the system and historically we didn't keep track of when it was loaded. However, each time the file is re-loaded, the previous version of it is updated in TABLE_B with a delete_date (i.e. a soft delete). The previous version just stays in TABLE_A without any changes.
I would like to populate TABLE_A.load_date based on matching projects in TABLE_B. The oldest row in TABLE_A (smallest TABLE_A.id) matches the oldest row in TABLE_B (oldest delete_date), etc. So the rows should match up if you keep picking the next one in order from each table. But I don't know how to turn that into an Oracle statement. What I've got so far is this which doesn't deal with matching on row order:
MERGE INTO TABLE_A a
USING
(
SELECT PROJECT_ID, DELETE_DATE
FROM TABLE_B
WHERE DELETE_FLAG = 'Y'
ORDER BY DELETE_DATE ASC
) b ON (a.PROJECT_ID = b.PROJECT_ID)
WHEN MATCHED THEN UPDATE
SET a.LOAD_DATE = p.DELETE_DATE;
This merge should do the work, as far as I properly understood your criteria:
merge into table_a ta
using (
select pid project_id, id, delete_date
from (
select project_id pid, id,
row_number() over (partition by project_id order by id) rn
from table_a) a
join (
select project_id pid, delete_date,
row_number() over (partition by project_id order by delete_date ) rn
from table_b
where delete_flag='Y') b using (pid, rn) ) tb
on (ta.project_id = tb.project_id and ta.id = tb.id)
when matched then update
set ta.load_date = tb.delete_date

Group By Clause - Oracle SQL

I have to display about 5 columns from 2 database tables using the Group By clause as follows:
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SUM(T2.COLUMN_D)
FROM TABLE1 T1, TABLE2 T2
WHERE T1.COLUMN_A = T2.COLUMN_A
GROUP BY T1.COLUMN_A
Now COLUMN_B has the same value across all the rows having the same COLUMN_A and COLUMN_B is a amount field.
COLUMN_C is a date field and would be same across the same COLUMN_A values.
Ex. Here is dummy data TABLE T1
COLUMN_A COLUMN_B COLUMN_C
1 $25 09/15/2911 12:00:00 AM
1 $25 09/15/2011 12:00:00 AM
2 $20 12/12/2011 12:00:00 AM
...
TABLE T2:
COLUMN_A COLUMN_D
1 $100
1 $10
2 $200
2 $200
.....
Running the query does not work with following error: ORA-00979: not a GROUP BY expression
Removing COLUMN_B and COLUMN_C would work. However I need these columns as well.
Can anyone please suggest the required changed?
This should work
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SumColumnD
FROM TABLE1 T1
INNER JOIN
(SELECT COLUMN_A, SUM(COLUMN_D) AS SumColumnD
FROM TABLE2 T2
GROUP BY COLUMN_A) t ON T1.COLUMN_A = t.COLUMN_A
If the values of COLUMN_B and COLUMN_C are the same across same values of COLUMN_A, then you can simply add them to theGROUP BY clause:
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SUM(T2.COLUMN_D)
FROM TABLE1 T1, TABLE2 T2
WHERE T1.COLUMN_A = T2.COLUMN_A
GROUP BY T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C
You've specified columns COLUMN_B and COLUMN_C in your SELECT list, so Oracle will need to provide a value for them when GROUPing BY COLUMN_A. However. Oracle doesn't know that these columns are constant across same values of COLUMN_A, and you get the error because in general it has no way of returning a value for these columns.
Adding COLUMN_B and COLUMN_C to the GROUP BY clause shouldn't affect the results of the query and should allow you to use them in your SELECT list.

Resources