Identify partitions having stale statistics in a list of schema - performance

I have 5 development schemas. And each of them have partitioned tables. We also have scripts to dynamically create partition tables (Monthly/Yearly). We have to go to DBA everytime for gathering the details over the parition tables. Our real problem is we do have a parition table with 9 partitions. Every day after a delta load operation (Updates/Deletes using a PL/SQL) also some APPEND load using SQL*Loader. This operation happens when database has the peak load. We do have some performace issues over this table.(SELECT queries)
When reported to DBA, they would say the table statistics are stale and after they do "gathering stats", magically the query works faster. I searched about this and identified some information about dynamic performance views.
So, now , I have the following Questions.
1) Can the developer generate a the list of all partitionon tables, partition name, no of records available without going to DBA?
2) Shall we identify the last analysed date of every parition
3) Also the status of the parition(index) if it usable or unusable.

Normally there is no need to identify objects that need statistics gathered. Oracle automatically gathers statistics for stale objects, unless the task has been
manually disabled. This is usually good enough for OLTP systems. Use this query to find the status of the task:
select status
from dba_autotask_client
where client_name = 'auto optimizer stats collection';
STATUS
------
ENABLED
For data warehouse systems there is also not much need to query the data dictionary for stale stats. In a data warehouse statistics need to be considered after almost
every operation. Developers need to get in the habit of always thinking about statistics after a truncate, insert, swap, etc. Eventually they will "just know" when to gather statistics.
But if you still want to see how Oracle determines if statistics are stale, look at DBA_TAB_STATISTICS and DBA_TAB_MODIFICATIONS.
Here is an example of an initial load with statistics gathering. The table and partitions are not stale.
create table test1(a number, b number) partition by list(a)
(
partition p1 values (1),
partition p2 values (2)
);
insert into test1 select 1, level from dual connect by level <= 50000;
begin
dbms_stats.gather_table_stats(user, 'test1');
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 NO
TEST1 50000 2014-01-22 NO
Now add a large number of rows and the statistics are stale.
begin
insert into test1 select 2, level from dual connect by level <= 25000;
commit;
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 YES
TEST1 50000 2014-01-22 YES
USER_TAB_MODIFICATIONS gives more specific information on table staleness.
--Stale statistics.
select user_tables.table_name, user_tab_modifications.partition_name
,inserts+updates+deletes modified_rows, num_rows, last_analyzed
,case when num_rows = 0 then null
else (inserts+updates+deletes) / num_rows * 100 end percent_modified
from user_tab_modifications
join user_tables
on user_tab_modifications.table_name = user_tables.table_name
where user_tables.table_name = 'TEST1';
TABLE_NAME PARTITION_NAME MODIFIED_ROWS NUM_ROWS LAST_ANALYZED PERCENT_MODIFIED
---------- -------------- ------------- -------- ------------- ----------------
TEST1 P2 25000 50000 2014-01-22 50
TEST1 25000 50000 2014-01-22 50

Yes, you can generate a list of partitioned tables, and a lot of related data which you would like to see, by using ALL_PART_TABLES or USER_PART_TABLES (provided you have access).
ALL_TAB_PARTITIONS can be used to get number of rows per partition, alongwith other details.
Check other views Oracle has for gathering details about partitioned tables.
I would suggest that you should analyze the tables, and possibly rebuild the indexes, every day after your data load. If your data load is affecting a lot of records in the table, and is going to affect the existing indexes, it's a good idea to proactively update the statistics for the table and index.
You can use on the system views to get this information (Check http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin005.htm)

I had a some what similar problem and I solved it by gathering stats on stale partitions only using 11g new INCREMENTAL option.
It's the reverse approach to your problem but it might worth investigating (specifically - how oracle determines what's a "stale" partition is).
dbms_stats.set_table_prefs('DWH','FACT_TABLE','INCREMENTAL','TRUE')
I always prefer the pro active approach - meaning, gather stats on stale partition at the last step of my etl, rather then giving the developer stronger privs.

I used to query all_ tables mentioned below.
The statistics and histogram details you mention will be updated in a frequency automatically by Oracle. But when the database is busy with many loads, I have seen these operations needs to be triggered manually. We faced similar situation, so we used to force the Analyze operation after our load for critical tables. You need to have privilege for the id you use to load the table.
ANALYZE TABLE table_name PARTITION (partition_name) COMPUTE STATISTICS;
EDIT: ANALYZE no longer gather CBO stats as mentioned here
So, DBMS_STATS package has to be used.
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT to_estimate_percent_type
(get_param('ESTIMATE_PERCENT')),
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT get_param('METHOD_OPT'),
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'),
cascade BOOLEAN DEFAULT to_cascade_type(get_param('CASCADE')),
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT to_no_invalidate_type (
get_param('NO_INVALIDATE')),
force BOOLEAN DEFAULT FALSE);
And until the analyze is complete, the view tables below may not produce the accurate results (Especially the last_analyzed and num_rows columns)
Note: Try replace all_ as dba_ in table names, if you have access to it, you can try them.
You can also try to get SELECT_CATALOG_ROLE for your development id you use, so that you can SELECT the data dictionary views, and this reduces the dependency over DBA over such queries.(Still DBA are the right persons for few issues!!)
Query to identify the partition table, partition name, number of rows and last Analysed date!
select
all_part.owner as schema_name,
all_part.table_name,
NVL(all_tab.partition_name,'N/A'),
all_tab.num_rows,
all_tab.last_analyzed
from
all_part_tables all_part,
all_tab_partitions all_tab
where all_part.table_name = all_tab.table_name and
all_tab.partition_name = all_tab.partition_name and
all_part.owner=all_tab.table_owner and
all_part.owner in ('SCHEMA1','SCHEMA2','SCHEMA3')
order by all_part.table_name,all_tab.partition_name;
The Below Query returns the index/table name that are UNUSABLE
SELECT INDEX_NAME,
TABLE_NAME,
STATUS
FROM ALL_INDEXES
WHERE status NOT IN ('VALID','N/A');
The Below Query returns the index/table (PARTITION) name that are UNUSABLE
SELECT INDEX_NAME,
PARTITION_NAME,
STATUS ,
GLOBAL_STATS
FROM ALL_IND_PARTITIONS
WHERE status != 'USABLE';

Related

Delete logic is taking a very long time to process in Oracle

I am trying to use the following statement for the Delete process and it has to delete around 23566424 Rows, but oracle takes almost 3 hours to complete the process and we have already created an index on " SCHEDULE_DATE_KEY" but still, the process is very slow.Can someone advise on how to make Deletes faster in oracle
DELETE
FROM
EDWSOURCE.SCHEDULE_DAY_F
WHERE
SCHEDULE_DATE_KEY >
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
I don't think any index will help here, probably Oracle will decide the best approach is a full table scan to delete 20M rows from 300M. It is deleting at a rate of over 2000 rows per second, which isn't bad. In fact any additional indexes will slow it down as it has to delete the row entry from the index as well.
A quicker approach could be to create a new table of the rows you want to keep, something like:
create table EDWSOURCE.SCHEDULE_DAY_F_KEEP
as
select * from EDWSOURCE.SCHEDULE_DAY_F
where SCHEDULE_DATE_KEY <=
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
Then recreate any constraints and indexes to use the new table.
Finally drop the old table and rename the new one.
You can try testing a filtered table move. This has an online clause. So you can do this while the application is still running.
Note 12.2 and later the indexes will remain valid. In earlier versions you will need to rebuild the indexes as they will become invalid. Good Luck
Move a Table
Create and populate a new test table.
DROP TABLE t1 PURGE;
CREATE TABLE t1 AS
SELECT level AS id,
'Description for ' || level AS description
FROM dual
CONNECT BY level <= 100;
COMMIT;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
100 1 100
SQL>
Move the table, filtering out rows with an ID value greater than 50.
ALTER TABLE t1 MOVE ONLINE
INCLUDING ROWS WHERE id <= 50;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
50 1 50
SQL>
The rows with an ID value between 51 and 100 have been removed.
As mentioned above if maybe best to PARTITION the table abs drop a PARTITION every N number of days as part of a daily task.

Query to check the full schema scan for tables in Oracle DB

Hi I have a requirement to scan through the schema and identify the tables which are redundant (candidate for dropping) ,so i did a select in DBA_Dependencies to check whether the tables are being used in any of the DB object types like (Procedure, package body, views, Materialized views....) i was able to find some tables and excluded the tables ,since i also need to capture the total counts, when the table was last loaded/used is there a automated way to select only selected tables (not found in dependencies list) and capture the counts and also when it was used/loaded
Difficulty - so many tables 500+
i have used the below query
Query 1
select table_name,
to_number(extractvalue(xmltype(dbms_xmlgen.getxml('select count(*) c from '||owner||'.'||table_name)),'/ROWSET/ROW/C')) as count
from all_tables
where owner = 'SCHEMA_NAME'
Query 2
select owner, table_name, num_rows, sample_size, last_analyzed from all_tables;
Query 1 Result
Filter Table_name=CUST_ORDER
OWNER TABLE_NAME COUNT SAMPLE_SIZE LAST_ANALYZED
ABCD CUST_ORDER 1083 1023 01.01.2020
Query 2 Result
Filter Table_name=CUST_ORDER
OWNER TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
ABCD CUST_ORDER 1023 1023 01.01.2020
Question
Query 1 - Results not matching when compared with query 2 ,since the same table and filter is applied
in both the queries and why the results are not matching ?
but when i randomly checked other filter it is matching , does any one know the reason ?
Upon further testing i encountered an error ,what does this error signify permissions ?
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file **-**.csv in ****_***_***_***** not found
29913. 00000 - "error in executing %s callout"
*Cause: The execution of the specified callout caused an error.
*Action: Examine the error messages take appropriate action.
The number you see on all_tables is a point in time capture of the number of rows. It will only be updated if the statistics are rebuilt for that table.
Here is an example:
CREATE TABLE t1 AS
SELECT *
FROM all_objects;
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78570
SELECT COUNT(*)
FROM t1;
-- 78570
The stats and the physical number of rows match!
INSERT INTO t1
SELECT *
FROM all_objects ao
WHERE rownum <= 5;
-- 5 rows inserted
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78570
SELECT COUNT(*)
FROM t1;
-- 78575
Here we have the mis-match because rows were inserted (or maybe even deleted), but the stats for the table have not been updated. Let's update them:
BEGIN
dbms_stats.gather_table_stats(ownname => 'SCHEMA',
tabname => 'T1');
END;
/
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78575
Now you can see the rows match. Using the value from all_tables may be good enough for your research (and will certainly be faster to query than counting every table).
Query - 1 is actual data of the table and hence it is accurate data. One can rely on this query's output.
Query - 2 is not actual data. It is the data captured when table was last analyzed and one should not be dependant on this query for finding number of records in the table.
You can gather the stats on this table and execute the query-2 then you will find the same data as query-1
If records are not inserted or deleted from the table after stats are gathered, then query-1 and query-2 data will match for that table.

WITH Clause performance issue in Oracle 11g

Table myfirst3 have 4 columns and 1.2 million records.
Table mtl_object_genealogy has over 10 million records.
Running the below code takes very long time. How to tune this code using with options?
WITH level1 as (
SELECT mln_parent.lot_number,
mln_parent.inventory_item_id,
gen.lot_num ,--fg_lot,
gen.segment1,
gen.rcv_date.
FROM mtl_lot_numbers mln_parent,
(SELECT MOG1.parent_object_id,
p.segment1,
p.lot_num,
p.rcv_date
FROM mtl_object_genealogy MOG1 ,
myfirst3 p
START WITH MOG1.object_id = p.gen_object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
CONNECT BY nocycle PRIOR MOG1.parent_object_id = MOG1.object_id
AND (MOG1.end_date_active IS NULL OR MOG1.end_date_active > SYSDATE)
UNION all
SELECT p1.gen_object_id,
p1.segment1,
p1.lot_num,
p1.rcv_date
FROM myfirst3 p1 ) gen
WHERE mln_parent.gen_object_id = gen.parent_object_id )
select /*+ NO_CPU_COSTING */ *
from level1;
execution plan
CREATE TABLE APPS.MYFIRST3
(
TO_ORGANIZATION_ID NUMBER,
LOT_NUM VARCHAR2(80 BYTE),
ITEM_ID NUMBER,
FROM_ORGANIZATION_ID NUMBER,
GEN_OBJECT_ID NUMBER,
SEGMENT1 VARCHAR2(40 BYTE),
RCV_DATE DATE
);
CREATE TABLE INV.MTL_OBJECT_GENEALOGY
(
OBJECT_ID NUMBER NOT NULL,
OBJECT_TYPE NUMBER NOT NULL,
PARENT_OBJECT_ID NUMBER NOT NULL,
START_DATE_ACTIVE DATE NOT NULL,
END_DATE_ACTIVE DATE,
GENEALOGY_ORIGIN NUMBER,
ORIGIN_TXN_ID NUMBER,
GENEALOGY_TYPE NUMBER,
);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N1 ON INV.MTL_OBJECT_GENEALOGY(OBJECT_ID);
CREATE INDEX INV.MTL_OBJECT_GENEALOGY_N2 ON INV.MTL_OBJECT_GENEALOGY(PARENT_OBJECT_ID);
Your explain plan shows some very big numbers. The optimizer reckons the final result set will be about 3227,000,000,000 rows. Just returning that many rows will take some time.
All table accesses are Full Table Scans. As you have big tables that will eat time too.
As for improvements, it's pretty hard to for us understand the logic of your query. This is your data model, you business rules, your data. You haven't explained anything so all we can do is guess.
Why are you using the WITH clause? You only use the level result set once, so just have a regular FROM clause.
Why are you using UNION ALL? That operation just duplicates the records retrieved from myfirst3 ( all those values are already included as rows where MOG1.object_id = p.gen_object_id.
The MERGE JOIN CARTESIAN operation is interesting. Oracle uses it to implement transitive closure. It is an expensive operation but that's because treewalking a hierarchy is an expensive thing to do. It is unfortunate for you that you are generating all the parent-child relationships for a table with 27 million records. That's bad.
The full table scans aren't the problem. There are no filters on myfirst3 so obviously the database has to get all the records. If there is one parent for each myfirst3 record that's 10% of the contents mtl_object_genealogy so a full table scan would be efficient; but you're rolling up the entire hierarchy so it's like you're looking at a much greater chunk of the table.
Your indexes are irrelevant in the face of such numbers. What might help is a composite index on mtl_object_genealogy(OBJECT_ID, PARENT_OBJECT_ID, END_DATE_ACTIVE).
You want all the levels of PARENT_OBJECT_ID for the records in myfirst3. If you run this query often and mtl_object_genealogy is a slowly changing table you should consider materializing the transitive closure into a table which just has records for all the permutations of leaf records and parents.
To sum up:
Ditch the WITH clause
Drop the UNION ALL
Tune the tree-walk with a composite index (or materializing it)

Why is Oracle using full table scan when it should use an index?

I'm doing some experimentation with query plans in Oracle, and I have the following table:
--create a table to use
create table SKEWED_DATA(
EMP_ID int,
DEPT int,
COL2 int,
CONSTRAINT SKEWED_DATA_PK PRIMARY KEY (EMP_ID)
);
--add an index on dept
create index SKEWED_DATA_INDEX1 on SKEWED_DATA(DEPT);
I then insert 1 million rows of data where 999,999 rows have dept id 1, and 1 row has dept id 99.
Before calculating statistics on the table, Oracle Autotrace shows that when running the following queries, it is using an index scan for both:
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
It's my understanding that it would be more efficient in this case to use a full table scan for dept id 1, and an index scan for dept id 2.
I then run the following command to generate statistics for the table:
execute DBMS_STATS.GATHER_TABLE_STATS ('HARRY','SKEWED_DATA');
And querying the dba_tab_statistics and user_tab_col_statistics confirms that stats and histograms have been gathered.
Running an autotrace on the following queries now shows full table scan for both!
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
My question is: why is Oracle using a full table scan for dept id 99 when there is only 1 row with this value?
UPDATE
I tried running the query for dept 99 with a hint to force Oracle to use the index, and whilst Autotrace believes it to be less efficient, the time it takes is 0.001 seconds, compared to 0.03 seconds when using the full table scan, thus proving (I think?) my theory that Oracle should be using the index in this instance.
select /*+ INDEX(D SKEWED_DATA_INDEX1) */ AVG(COL2) from SKEWED_DATA D where DEPT = 99;
OK, I think I might have solved it. When I had 999,999 rows with dept 1 and 1 row with dept 99, I inspected the number of histogram buckets by running the following query:
select COLUMN_NAME, HISTOGRAM, NUM_BUCKETS, NUM_DISTINCT from USER_TAB_COL_STATISTICS where TABLE_NAME = 'SKEWED_DATA';
This showed that there are 2 distinct values but only 1 bucket. If I change the stats gathering to this:
execute DBMS_STATS.GATHER_TABLE_STATS('HARRY','SKEWED_DATA',estimate_percent=>100);
It then correctly comes up with 2 buckets, and the autotrace shows the 'correct' execution plans. So, I guess it's because of the extreme 'skewness' of my data that Oracle cannot generate the correct stats for it unless the estimate_percent is massive.
Interestingly if I have slightly less skewed data (say about 2-3% of all records with a dept id of 99) Oracle does treat it correctly even when I leave the estimate_percent as default.
So, the moral of the story seems to be: if you have ridiculously skewed data like this and Oracle is not using the correct execution plan, try playing around with the estimate_percent parameter.

Issue with creating index organized table

I'm having a weird problem with index organized table. I'm running Oracle 11g standard.
i have a table src_table
SQL> desc src_table;
Name Null? Type
--------------- -------- ----------------------------
ID NOT NULL NUMBER(16)
HASH NOT NULL NUMBER(3)
........
SQL> select count(*) from src_table;
COUNT(*)
----------
21108244
now let's create another table and copy 2 columns from src_table
set timing on
SQL> create table dest_table(id number(16), hash number(20), type number(1));
Table created.
Elapsed: 00:00:00.01
SQL> insert /*+ APPEND */ into dest_table (id,hash,type) select id, hash, 1 from src_table;
21108244 rows created.
Elapsed: 00:00:15.25
SQL> ALTER TABLE dest_table ADD ( CONSTRAINT dest_table_pk PRIMARY KEY (HASH, id, TYPE));
Table altered.
Elapsed: 00:01:17.35
It took Oracle < 2 min.
now same exercise but with IOT table
SQL> CREATE TABLE dest_table_iot (
id NUMBER(16) NOT NULL,
hash NUMBER(20) NOT NULL,
type NUMBER(1) NOT NULL,
CONSTRAINT dest_table_iot_PK PRIMARY KEY (HASH, id, TYPE)
) ORGANIZATION INDEX;
Table created.
Elapsed: 00:00:00.03
SQL> INSERT /*+ APPEND */ INTO dest_table_iot (HASH,id,TYPE)
SELECT HASH, id, 1
FROM src_table;
"insert" into IOT takes 18 hours !!! I have tried it on 2 different instances of Oracle running on win and linux and got same results.
What is going on here ? Why is it taking so long ?
The APPEND hint is only useful for a heap-organized table.
When you insert into an IOT, I suspect that each row has to be inserted into the real index structure separately, causing a lot of re-balancing of the index.
When you build the index on a heap table, a temp segment is used and I'm guessing that this allows it to reduce the re-balancing overhead that would otherwise take place.
I suspect that if you created an empty, heap-organized table with the primary key, and did the same insert without the APPEND hint, it would take more like the 18 hours.
You might try putting an ORDER BY on your SELECT and see how that affects the performance of the insert into the IOT. It's not guaranteed to be an improvement by any means, but it might be.

Resources