Database: Oracle 11g
Scenario:
TABLE_A has around 50 million records
TABLE_A has COLUMN_A, COLUMN_B, COLUMN_C, COLUMN_D, COLUMN_E
COLUMN_A is the primary key of TABLE_A
We need to delete around 30 million records from TABLE_A
So, we created another Table TABLE_B
TABLE_B has COLUMN_A with all the GUIDs qualifying for delete from TABLE_A based on TABLE_A.COLUMN_A
TABLE_B has another column QUALIFIER which is populated with sequence starting from 1 to max count of records, say 30 million
TABLE_B is also range based partitioned based on the QUALIFIER column. Each range is split in 3 million records
Which approach among the below would be most efficient way of deleting the records given the above scenario. We are planning to perform this task over a weekend with minimal downtime and also to avoid any segment space issues due to bulk delete:
Approach-I: Use the direct delete statement without any conditions as follows
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B)
Also, can we use parallel hints to improve the performance:
Delete /*+ parallel first_rows*/ from TABLE_A where COLUMN_A in (select /*+ parallel first_rows*/ COLUMN_A from TABLE_B);
Approach-II: Delete the records from TABLE_A by splitting the data based on QUALIFIER column count range to avoid segment space issues if any. And also the records can be deleted in iterations.
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 1 and 300000);
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 300001 and 600000);
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 600001 and 900000);
etc, until
Delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B where QUALIFIER between 2700001 and 3000000);
Also, can we use parallel hints in the above delete statements to improve the performance
Approach-III: Delete the records from TABLE_A by splitting the data based on the partitions on the TABLE_B
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_1));
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_2));
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_3));
etc, until
delete from TABLE_A where COLUMN_A in (select COLUMN_A from TABLE_B PARTITION (TABLE_B_PARTITION_10));
Also, can we use parallel hints in the above delete statements to improve the performance
Is there any other better approach to follow for the above mentioned scenario?
Related
I have written a Stored Procedure which has 10 DML statements in series where each DML statement takes around 3 minutes for execution. The total Stored Procedure runs for about 29 minutes in production (Each DML dumps millions of record).
I would need 1st two DML statements alone to run in series and the remaining 8 can run in parallel, since they have no dependency.
Need advice to achieve this without using dbms_job or dbms_scheduler
begin
insert all into table1 values ()
insert all into table2 values ()
select ...;
insert into table3 select ... from table1 join table2...;
insert into table4 select ... from table2 join tableA...;
insert into table5 select ... from table1 join tableB...;
insert into table6 select ... from table1 join tableC...;
insert into table7 select ... from table1 join tableD...;
insert into table8 select ... from table1 join tableE...;
insert into table9 select ... from table1 join tableF...;
insert into table10 select ... from table1 join tableG...;
end;
Apologies in advance, I am occasional Oracle user. I have put together a lookup table used by various functions/procedures and need to keep refresh this once a day with rows that either need removing or inserting. I have put together the following simply queries that return the columns against which I can determine the required action. Once I have returned my deletion data, I then need to delete from table A all records where the site_id and zone_ids match. I cant figure out the best way to achieve this, I have thought about running the select statements as cursors, but am not sure how I then delete the rows from table A using the site_id and zone_id from each record returned.
Query That returns records to be deleted from Table_A
SELECT site_id,zone_id,upper(ebts_switch_name)
FROM Table_A
minus
(SELECT site_id,zone_id, upper(ebts_switch_name)
FROM Table_B#remote_db
UNION
SELECT site_id,zone_id,upper(ebts_switch_name)
FROM Table_C);
Query That returns records to be Inserted into Table_A
SELECT cluster_id, site_id,zone_id, upper(trigram),upper(ebts_switch_name)
FROM Table_B#remote_db
WHERE site_id is NOT NULL
minus
SELECT cluster_name,site_id,zone_id,upper(trigram),upper(ebts_switch_name)
FROM Table_A
You can use your statements directly in the manner shown below:
DELETE FROM TABLE_A
WHERE (SITE_ID, ZONE_ID, UPPER(EBTS_SWITCH_NAME)) IN
(SELECT site_id, zone_id, upper(ebts_switch_name)
FROM Table_A
minus
(SELECT site_id, zone_id, upper(ebts_switch_name)
FROM Table_B#remote_db
UNION
SELECT site_id, zone_id, upper(ebts_switch_name)
FROM Table_C));
INSERT INTO TABLE_A (CLUSTER_NAME, SITE_ID, ZONE_ID, TRIGRAM, EBTS_SWITCH_NAME)
SELECT cluster_id, site_id, zone_id, upper(trigram), upper(ebts_switch_name)
FROM Table_B#remote_db
WHERE site_id is NOT NULL
minus
SELECT cluster_name, site_id, zone_id, upper(trigram), upper(ebts_switch_name)
FROM Table_A;
Best of luck.
I can't understand what do you mean by first query, cause it's almost same as
SELECT *
FROM table_a
MINUS
SELECT *
FROM table_a
means empty record set.
But generally, use DELETE syntax
DELETE
FROM table_a
WHERE (col1, col2) IN (SELECT col1, col2
FROM table_b);
And INSERT syntax
INSERT INTO table_a (col1, col2)
SELECT col1, col2
FROM table_b;
I have two tables which are using ORC compression and am using TEZ as execution engine. Table_a contains more than 900k records and table_b contains 17 million records. This query taking longer time I have waited for 2 days but the query execution was not completed. what am I doing wrong in this query.
select min(up.id) as comp002uniqueid, min(cp.product_id) as p_id
from
(select * from table_a where u_id is null) up , table_b cp
where cp.title like concat('% ',up.productname,' %')
group by up.productname;
I have two tables
TABLE_A with columns project_id, id and load_date
and TABLE_B with columns project_id, delete_flag and delete_date
where TABLE_A.load_date is a new column and I want to populate it based on TABLE_B.delete_date for historic data. Basically, a file has been repeatedly loaded into the system and historically we didn't keep track of when it was loaded. However, each time the file is re-loaded, the previous version of it is updated in TABLE_B with a delete_date (i.e. a soft delete). The previous version just stays in TABLE_A without any changes.
I would like to populate TABLE_A.load_date based on matching projects in TABLE_B. The oldest row in TABLE_A (smallest TABLE_A.id) matches the oldest row in TABLE_B (oldest delete_date), etc. So the rows should match up if you keep picking the next one in order from each table. But I don't know how to turn that into an Oracle statement. What I've got so far is this which doesn't deal with matching on row order:
MERGE INTO TABLE_A a
USING
(
SELECT PROJECT_ID, DELETE_DATE
FROM TABLE_B
WHERE DELETE_FLAG = 'Y'
ORDER BY DELETE_DATE ASC
) b ON (a.PROJECT_ID = b.PROJECT_ID)
WHEN MATCHED THEN UPDATE
SET a.LOAD_DATE = p.DELETE_DATE;
This merge should do the work, as far as I properly understood your criteria:
merge into table_a ta
using (
select pid project_id, id, delete_date
from (
select project_id pid, id,
row_number() over (partition by project_id order by id) rn
from table_a) a
join (
select project_id pid, delete_date,
row_number() over (partition by project_id order by delete_date ) rn
from table_b
where delete_flag='Y') b using (pid, rn) ) tb
on (ta.project_id = tb.project_id and ta.id = tb.id)
when matched then update
set ta.load_date = tb.delete_date
I have to display about 5 columns from 2 database tables using the Group By clause as follows:
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SUM(T2.COLUMN_D)
FROM TABLE1 T1, TABLE2 T2
WHERE T1.COLUMN_A = T2.COLUMN_A
GROUP BY T1.COLUMN_A
Now COLUMN_B has the same value across all the rows having the same COLUMN_A and COLUMN_B is a amount field.
COLUMN_C is a date field and would be same across the same COLUMN_A values.
Ex. Here is dummy data TABLE T1
COLUMN_A COLUMN_B COLUMN_C
1 $25 09/15/2911 12:00:00 AM
1 $25 09/15/2011 12:00:00 AM
2 $20 12/12/2011 12:00:00 AM
...
TABLE T2:
COLUMN_A COLUMN_D
1 $100
1 $10
2 $200
2 $200
.....
Running the query does not work with following error: ORA-00979: not a GROUP BY expression
Removing COLUMN_B and COLUMN_C would work. However I need these columns as well.
Can anyone please suggest the required changed?
This should work
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SumColumnD
FROM TABLE1 T1
INNER JOIN
(SELECT COLUMN_A, SUM(COLUMN_D) AS SumColumnD
FROM TABLE2 T2
GROUP BY COLUMN_A) t ON T1.COLUMN_A = t.COLUMN_A
If the values of COLUMN_B and COLUMN_C are the same across same values of COLUMN_A, then you can simply add them to theGROUP BY clause:
SELECT T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C, SUM(T2.COLUMN_D)
FROM TABLE1 T1, TABLE2 T2
WHERE T1.COLUMN_A = T2.COLUMN_A
GROUP BY T1.COLUMN_A, T1.COLUMN_B, T1.COLUMN_C
You've specified columns COLUMN_B and COLUMN_C in your SELECT list, so Oracle will need to provide a value for them when GROUPing BY COLUMN_A. However. Oracle doesn't know that these columns are constant across same values of COLUMN_A, and you get the error because in general it has no way of returning a value for these columns.
Adding COLUMN_B and COLUMN_C to the GROUP BY clause shouldn't affect the results of the query and should allow you to use them in your SELECT list.