I am trying to use the following statement for the Delete process and it has to delete around 23566424 Rows, but oracle takes almost 3 hours to complete the process and we have already created an index on " SCHEDULE_DATE_KEY" but still, the process is very slow.Can someone advise on how to make Deletes faster in oracle
DELETE
FROM
EDWSOURCE.SCHEDULE_DAY_F
WHERE
SCHEDULE_DATE_KEY >
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
I don't think any index will help here, probably Oracle will decide the best approach is a full table scan to delete 20M rows from 300M. It is deleting at a rate of over 2000 rows per second, which isn't bad. In fact any additional indexes will slow it down as it has to delete the row entry from the index as well.
A quicker approach could be to create a new table of the rows you want to keep, something like:
create table EDWSOURCE.SCHEDULE_DAY_F_KEEP
as
select * from EDWSOURCE.SCHEDULE_DAY_F
where SCHEDULE_DATE_KEY <=
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
Then recreate any constraints and indexes to use the new table.
Finally drop the old table and rename the new one.
You can try testing a filtered table move. This has an online clause. So you can do this while the application is still running.
Note 12.2 and later the indexes will remain valid. In earlier versions you will need to rebuild the indexes as they will become invalid. Good Luck
Move a Table
Create and populate a new test table.
DROP TABLE t1 PURGE;
CREATE TABLE t1 AS
SELECT level AS id,
'Description for ' || level AS description
FROM dual
CONNECT BY level <= 100;
COMMIT;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
100 1 100
SQL>
Move the table, filtering out rows with an ID value greater than 50.
ALTER TABLE t1 MOVE ONLINE
INCLUDING ROWS WHERE id <= 50;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
50 1 50
SQL>
The rows with an ID value between 51 and 100 have been removed.
As mentioned above if maybe best to PARTITION the table abs drop a PARTITION every N number of days as part of a daily task.
Related
I have identified some duplicates in my table:
-- DUPLICATES: ----
select PPLP_NAME,
START_TIME,
END_TIME,
count(*)
from PPLP_LOAD_GENSTAT
group by PPLP_NAME,
START_TIME,
END_TIME
having count(*) > 1
-- DUPLICATES: ----
How is it possible to delete them?
Even if you don't have the primary key, each record has a unique rowid associated.
By using the query below you delete only the records that don't have the maximum row id by self joining a table with the columns that cause duplication. This will make sure that you delete any duplicates.
DELETE FROM PPLP_LOAD_GENSTAT plg_outer
WHERE ROWID NOT IN(
select MAX(ROWID)
from PPLP_LOAD_GENSTAT plg_inner
WHERE plg_outer.pplp_name = plg_inner.pplg_name
AND plg_outer.start_time= plg_inner.start_time
AND plg_outer.end_time = plg_inner.end_time
);
I'd suggest something easier:
CREATE table NewTable as
SELECT DISTINCT pplp_name,start_time,end_time
FROM YourTable
Then delete your table, and rename the new table.
If you really want to delete records, you can find a few examples of how here.
I am pretty new to Oracle.
I am just stuck when i try to achieve the following logic. I am creating a sql script in oracle that will help me to generate a report. This script will run twice a day so i should't pick the same file when it runs next time.
1) run the query and save the result setand store the Order Id in the temp table when the job runs #11 Am
2) Run the query second time # 3 pm check the temp table and return the result set that's not in temp table.
Following query will generate the result set but not sure how to create a temp table and valid against when it run.
select
rownum as LineNum,
'New' as ActionCode,
ORDER_ID,
AmountType,
trun(sysdate),
trun(systime)
from crd.V_IVZ_T19 t19
where
(t19.acct_cd in
(select fc.child_acct_cd
from cs_config fc
where fc.parent_acct ike 'G_TRI_RPT'))
and t19.date>= trunc(sysdate)
and t19.date<= trunc(sysdate);
Any help much appreciated. I am not sure how to get only the timestamp also.
TEMP table is not the idea here, cause temp table data will not store the data for a long time (just for a session), you just need to create a normal table. Hope it will help you:
--- table for storing ORDER_ID for further checking, make a correct DataType, you also can add date period in the table to control expired order_ids';
CREATE TABLE order_id_store (
order_id NUMBER,
end_date DATE
);
--- filling the table for further checking
INSERT INTO order_id_store
SELECT ORDER_ID, trunc(sysdate)
FROM crd.V_IVZ_T19 t19
WHERE t19.order_id NOT IN (SELECT DISTINCT order_id FROM order_id_store)
AND t19.date>= trunc(sysdate)
AND t19.date<= trunc(sysdate);
--- delete no need data by date period, for example for last 2 days:
DELETE FROM order_id_store WHERE end_date <= trunc(SYSDATE - 2);
COMMIT;
---- for select report without already existed data
SELECT
rownum as LineNum,
'New' as ActionCode,
ORDER_ID,
AmountType,
trun(sysdate),
trun(systime)
FROM crd.V_IVZ_T19 t19
WHERE
(t19.acct_cd in
(select fc.child_acct_cd
from cs_config fc
where fc.parent_acct ike 'G_TRI_RPT'))
AND t19.order_id NOT IN (SELECT DISTINCT order_id FROM order_id_store)
AND t19.date>= trunc(sysdate)
AND t19.date<= trunc(sysdate);
I'm not sure about your "t19.date>=" and "t19.date<=", cause the close duration taking there, make that correct if it's not.
I'm doing some experimentation with query plans in Oracle, and I have the following table:
--create a table to use
create table SKEWED_DATA(
EMP_ID int,
DEPT int,
COL2 int,
CONSTRAINT SKEWED_DATA_PK PRIMARY KEY (EMP_ID)
);
--add an index on dept
create index SKEWED_DATA_INDEX1 on SKEWED_DATA(DEPT);
I then insert 1 million rows of data where 999,999 rows have dept id 1, and 1 row has dept id 99.
Before calculating statistics on the table, Oracle Autotrace shows that when running the following queries, it is using an index scan for both:
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
It's my understanding that it would be more efficient in this case to use a full table scan for dept id 1, and an index scan for dept id 2.
I then run the following command to generate statistics for the table:
execute DBMS_STATS.GATHER_TABLE_STATS ('HARRY','SKEWED_DATA');
And querying the dba_tab_statistics and user_tab_col_statistics confirms that stats and histograms have been gathered.
Running an autotrace on the following queries now shows full table scan for both!
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
My question is: why is Oracle using a full table scan for dept id 99 when there is only 1 row with this value?
UPDATE
I tried running the query for dept 99 with a hint to force Oracle to use the index, and whilst Autotrace believes it to be less efficient, the time it takes is 0.001 seconds, compared to 0.03 seconds when using the full table scan, thus proving (I think?) my theory that Oracle should be using the index in this instance.
select /*+ INDEX(D SKEWED_DATA_INDEX1) */ AVG(COL2) from SKEWED_DATA D where DEPT = 99;
OK, I think I might have solved it. When I had 999,999 rows with dept 1 and 1 row with dept 99, I inspected the number of histogram buckets by running the following query:
select COLUMN_NAME, HISTOGRAM, NUM_BUCKETS, NUM_DISTINCT from USER_TAB_COL_STATISTICS where TABLE_NAME = 'SKEWED_DATA';
This showed that there are 2 distinct values but only 1 bucket. If I change the stats gathering to this:
execute DBMS_STATS.GATHER_TABLE_STATS('HARRY','SKEWED_DATA',estimate_percent=>100);
It then correctly comes up with 2 buckets, and the autotrace shows the 'correct' execution plans. So, I guess it's because of the extreme 'skewness' of my data that Oracle cannot generate the correct stats for it unless the estimate_percent is massive.
Interestingly if I have slightly less skewed data (say about 2-3% of all records with a dept id of 99) Oracle does treat it correctly even when I leave the estimate_percent as default.
So, the moral of the story seems to be: if you have ridiculously skewed data like this and Oracle is not using the correct execution plan, try playing around with the estimate_percent parameter.
I have 5 development schemas. And each of them have partitioned tables. We also have scripts to dynamically create partition tables (Monthly/Yearly). We have to go to DBA everytime for gathering the details over the parition tables. Our real problem is we do have a parition table with 9 partitions. Every day after a delta load operation (Updates/Deletes using a PL/SQL) also some APPEND load using SQL*Loader. This operation happens when database has the peak load. We do have some performace issues over this table.(SELECT queries)
When reported to DBA, they would say the table statistics are stale and after they do "gathering stats", magically the query works faster. I searched about this and identified some information about dynamic performance views.
So, now , I have the following Questions.
1) Can the developer generate a the list of all partitionon tables, partition name, no of records available without going to DBA?
2) Shall we identify the last analysed date of every parition
3) Also the status of the parition(index) if it usable or unusable.
Normally there is no need to identify objects that need statistics gathered. Oracle automatically gathers statistics for stale objects, unless the task has been
manually disabled. This is usually good enough for OLTP systems. Use this query to find the status of the task:
select status
from dba_autotask_client
where client_name = 'auto optimizer stats collection';
STATUS
------
ENABLED
For data warehouse systems there is also not much need to query the data dictionary for stale stats. In a data warehouse statistics need to be considered after almost
every operation. Developers need to get in the habit of always thinking about statistics after a truncate, insert, swap, etc. Eventually they will "just know" when to gather statistics.
But if you still want to see how Oracle determines if statistics are stale, look at DBA_TAB_STATISTICS and DBA_TAB_MODIFICATIONS.
Here is an example of an initial load with statistics gathering. The table and partitions are not stale.
create table test1(a number, b number) partition by list(a)
(
partition p1 values (1),
partition p2 values (2)
);
insert into test1 select 1, level from dual connect by level <= 50000;
begin
dbms_stats.gather_table_stats(user, 'test1');
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 NO
TEST1 50000 2014-01-22 NO
Now add a large number of rows and the statistics are stale.
begin
insert into test1 select 2, level from dual connect by level <= 25000;
commit;
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 YES
TEST1 50000 2014-01-22 YES
USER_TAB_MODIFICATIONS gives more specific information on table staleness.
--Stale statistics.
select user_tables.table_name, user_tab_modifications.partition_name
,inserts+updates+deletes modified_rows, num_rows, last_analyzed
,case when num_rows = 0 then null
else (inserts+updates+deletes) / num_rows * 100 end percent_modified
from user_tab_modifications
join user_tables
on user_tab_modifications.table_name = user_tables.table_name
where user_tables.table_name = 'TEST1';
TABLE_NAME PARTITION_NAME MODIFIED_ROWS NUM_ROWS LAST_ANALYZED PERCENT_MODIFIED
---------- -------------- ------------- -------- ------------- ----------------
TEST1 P2 25000 50000 2014-01-22 50
TEST1 25000 50000 2014-01-22 50
Yes, you can generate a list of partitioned tables, and a lot of related data which you would like to see, by using ALL_PART_TABLES or USER_PART_TABLES (provided you have access).
ALL_TAB_PARTITIONS can be used to get number of rows per partition, alongwith other details.
Check other views Oracle has for gathering details about partitioned tables.
I would suggest that you should analyze the tables, and possibly rebuild the indexes, every day after your data load. If your data load is affecting a lot of records in the table, and is going to affect the existing indexes, it's a good idea to proactively update the statistics for the table and index.
You can use on the system views to get this information (Check http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin005.htm)
I had a some what similar problem and I solved it by gathering stats on stale partitions only using 11g new INCREMENTAL option.
It's the reverse approach to your problem but it might worth investigating (specifically - how oracle determines what's a "stale" partition is).
dbms_stats.set_table_prefs('DWH','FACT_TABLE','INCREMENTAL','TRUE')
I always prefer the pro active approach - meaning, gather stats on stale partition at the last step of my etl, rather then giving the developer stronger privs.
I used to query all_ tables mentioned below.
The statistics and histogram details you mention will be updated in a frequency automatically by Oracle. But when the database is busy with many loads, I have seen these operations needs to be triggered manually. We faced similar situation, so we used to force the Analyze operation after our load for critical tables. You need to have privilege for the id you use to load the table.
ANALYZE TABLE table_name PARTITION (partition_name) COMPUTE STATISTICS;
EDIT: ANALYZE no longer gather CBO stats as mentioned here
So, DBMS_STATS package has to be used.
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT to_estimate_percent_type
(get_param('ESTIMATE_PERCENT')),
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT get_param('METHOD_OPT'),
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'),
cascade BOOLEAN DEFAULT to_cascade_type(get_param('CASCADE')),
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT to_no_invalidate_type (
get_param('NO_INVALIDATE')),
force BOOLEAN DEFAULT FALSE);
And until the analyze is complete, the view tables below may not produce the accurate results (Especially the last_analyzed and num_rows columns)
Note: Try replace all_ as dba_ in table names, if you have access to it, you can try them.
You can also try to get SELECT_CATALOG_ROLE for your development id you use, so that you can SELECT the data dictionary views, and this reduces the dependency over DBA over such queries.(Still DBA are the right persons for few issues!!)
Query to identify the partition table, partition name, number of rows and last Analysed date!
select
all_part.owner as schema_name,
all_part.table_name,
NVL(all_tab.partition_name,'N/A'),
all_tab.num_rows,
all_tab.last_analyzed
from
all_part_tables all_part,
all_tab_partitions all_tab
where all_part.table_name = all_tab.table_name and
all_tab.partition_name = all_tab.partition_name and
all_part.owner=all_tab.table_owner and
all_part.owner in ('SCHEMA1','SCHEMA2','SCHEMA3')
order by all_part.table_name,all_tab.partition_name;
The Below Query returns the index/table name that are UNUSABLE
SELECT INDEX_NAME,
TABLE_NAME,
STATUS
FROM ALL_INDEXES
WHERE status NOT IN ('VALID','N/A');
The Below Query returns the index/table (PARTITION) name that are UNUSABLE
SELECT INDEX_NAME,
PARTITION_NAME,
STATUS ,
GLOBAL_STATS
FROM ALL_IND_PARTITIONS
WHERE status != 'USABLE';
I have an application that uses the Oracle MERGE INTO... DML statement to update table A to correspond with some of the changes in another table B (table A is a summary of selected parts of table B along with some other info). In a typical merge operation, 5-6 rows (out of 10's of thousands) might be inserted in table B and 2-3 rows updated.
It turns out that the application is to be deployed in an environment that has a security policy on the target tables. The MERGE INTO... statement can't be used with these tables (ORA-28132: Merge into syntax does not support security policies)
So we have to change the MERGE INTO... logic to use regular inserts and updates instead. Is this a problem anyone else has run into? Is there a best-practice pattern for converting the WHEN MATCHED/WHEN NOT MATCHED logic in the merge statement into INSERT and UPDATE statements? The merge is within a stored procedure, so it's fine for the solution to use PL/SQL in addition to the DML if that is required.
Another way to do this (other than Merge) would be using two sql statements one for insert and one for update. The "WHEN MATCHED" and "WHEN NOT MATCHED" can be handled using joins or "in" Clause.
If you decide to take the below approach, it is better to run the update first (sine it only runs for the matching records) and then insert the non-Matching records. The Data sets would be the same either way, it just updates less number of records with the order below.
Also, Similar to the Merge, this update statement updates the Name Column even if the names in Source and Target match. If you dont want that, add that condition to the where as well.
create table src_table(
id number primary key,
name varchar2(20) not null
);
create table tgt_table(
id number primary key,
name varchar2(20) not null
);
insert into src_table values (1, 'abc');
insert into src_table values (2, 'def');
insert into src_table values (3, 'ghi');
insert into tgt_table values (1, 'abc');
insert into tgt_table values (2,'xyz');
SQL> select * from Src_Table;
ID NAME
---------- --------------------
1 abc
2 def
3 ghi
SQL> select * from Tgt_Table;
ID NAME
---------- --------------------
2 xyz
1 abc
Update tgt_Table tgt
set Tgt.Name =
(select Src.Name
from Src_Table Src
where Src.id = Tgt.id
);
2 rows updated. --Notice that ID 1 is updated even though value did not change
select * from Tgt_Table;
ID NAME
----- --------------------
2 def
1 abc
insert into tgt_Table
select src.*
from Src_Table src,
tgt_Table tgt
where src.id = tgt.id(+)
and tgt.id is null;
1 row created.
SQL> select * from tgt_Table;
ID NAME
---------- --------------------
2 def
1 abc
3 ghi
commit;
There could be better ways to do this, but this seems simple and SQL-oriented. If the Data set is Large, then a PL/SQL solution won't be as performant.
There are at least two options I can think of aside from digging into the security policy, which I don't know much about.
Process the records to merge row by row. Attempt to do the update, if it fails to update then insert, or vise versa, depending on whether you expect most records to need updating or inserting (ie optimize for the most common case that will reduce the number of SQL statements fired), eg:
begin
for row in (select ... from source_table) loop
update table_to_be_merged
if sql%rowcount = 0 then -- no row matched, so need to insert
insert ...
end if;
end loop;
end;
Another option may be to bulk collect the records you want to merge into an array, and then attempted to bulk insert them, catching all the primary key exceptions (I cannot recall the syntax for this right now, but you can get a bulk insert to place all the rows that fail to insert into another array and then process them).
Logically a merge statement has to check for the presence of each records behind the scenes anyway, and I think it is processed quite similarly to the code I posted above. However, merge will always be more efficient than coding it in PLSQL as it will be only 1 SQL call instead of many.