Query to check the full schema scan for tables in Oracle DB - oracle

Hi I have a requirement to scan through the schema and identify the tables which are redundant (candidate for dropping) ,so i did a select in DBA_Dependencies to check whether the tables are being used in any of the DB object types like (Procedure, package body, views, Materialized views....) i was able to find some tables and excluded the tables ,since i also need to capture the total counts, when the table was last loaded/used is there a automated way to select only selected tables (not found in dependencies list) and capture the counts and also when it was used/loaded
Difficulty - so many tables 500+
i have used the below query
Query 1
select table_name,
to_number(extractvalue(xmltype(dbms_xmlgen.getxml('select count(*) c from '||owner||'.'||table_name)),'/ROWSET/ROW/C')) as count
from all_tables
where owner = 'SCHEMA_NAME'
Query 2
select owner, table_name, num_rows, sample_size, last_analyzed from all_tables;
Query 1 Result
Filter Table_name=CUST_ORDER
OWNER TABLE_NAME COUNT SAMPLE_SIZE LAST_ANALYZED
ABCD CUST_ORDER 1083 1023 01.01.2020
Query 2 Result
Filter Table_name=CUST_ORDER
OWNER TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
ABCD CUST_ORDER 1023 1023 01.01.2020
Question
Query 1 - Results not matching when compared with query 2 ,since the same table and filter is applied
in both the queries and why the results are not matching ?
but when i randomly checked other filter it is matching , does any one know the reason ?
Upon further testing i encountered an error ,what does this error signify permissions ?
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file **-**.csv in ****_***_***_***** not found
29913. 00000 - "error in executing %s callout"
*Cause: The execution of the specified callout caused an error.
*Action: Examine the error messages take appropriate action.

The number you see on all_tables is a point in time capture of the number of rows. It will only be updated if the statistics are rebuilt for that table.
Here is an example:
CREATE TABLE t1 AS
SELECT *
FROM all_objects;
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78570
SELECT COUNT(*)
FROM t1;
-- 78570
The stats and the physical number of rows match!
INSERT INTO t1
SELECT *
FROM all_objects ao
WHERE rownum <= 5;
-- 5 rows inserted
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78570
SELECT COUNT(*)
FROM t1;
-- 78575
Here we have the mis-match because rows were inserted (or maybe even deleted), but the stats for the table have not been updated. Let's update them:
BEGIN
dbms_stats.gather_table_stats(ownname => 'SCHEMA',
tabname => 'T1');
END;
/
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78575
Now you can see the rows match. Using the value from all_tables may be good enough for your research (and will certainly be faster to query than counting every table).

Query - 1 is actual data of the table and hence it is accurate data. One can rely on this query's output.
Query - 2 is not actual data. It is the data captured when table was last analyzed and one should not be dependant on this query for finding number of records in the table.
You can gather the stats on this table and execute the query-2 then you will find the same data as query-1
If records are not inserted or deleted from the table after stats are gathered, then query-1 and query-2 data will match for that table.

Related

Delete logic is taking a very long time to process in Oracle

I am trying to use the following statement for the Delete process and it has to delete around 23566424 Rows, but oracle takes almost 3 hours to complete the process and we have already created an index on " SCHEDULE_DATE_KEY" but still, the process is very slow.Can someone advise on how to make Deletes faster in oracle
DELETE
FROM
EDWSOURCE.SCHEDULE_DAY_F
WHERE
SCHEDULE_DATE_KEY >
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
I don't think any index will help here, probably Oracle will decide the best approach is a full table scan to delete 20M rows from 300M. It is deleting at a rate of over 2000 rows per second, which isn't bad. In fact any additional indexes will slow it down as it has to delete the row entry from the index as well.
A quicker approach could be to create a new table of the rows you want to keep, something like:
create table EDWSOURCE.SCHEDULE_DAY_F_KEEP
as
select * from EDWSOURCE.SCHEDULE_DAY_F
where SCHEDULE_DATE_KEY <=
(
SELECT
LAST_PAYROLL_DATE_KEY
FROM
EDWSOURCE.LAST_PAYROLL_DATE
WHERE
CURRENT_FLAG = 'Y'
);
Then recreate any constraints and indexes to use the new table.
Finally drop the old table and rename the new one.
You can try testing a filtered table move. This has an online clause. So you can do this while the application is still running.
Note 12.2 and later the indexes will remain valid. In earlier versions you will need to rebuild the indexes as they will become invalid. Good Luck
Move a Table
Create and populate a new test table.
DROP TABLE t1 PURGE;
CREATE TABLE t1 AS
SELECT level AS id,
'Description for ' || level AS description
FROM dual
CONNECT BY level <= 100;
COMMIT;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
100 1 100
SQL>
Move the table, filtering out rows with an ID value greater than 50.
ALTER TABLE t1 MOVE ONLINE
INCLUDING ROWS WHERE id <= 50;
Check the contents of the table.
SELECT COUNT(*) AS total_rows,
MIN(id) AS min_id,
MAX(id) AS max_id
FROM t1;
TOTAL_ROWS MIN_ID MAX_ID
---------- ---------- ----------
50 1 50
SQL>
The rows with an ID value between 51 and 100 have been removed.
As mentioned above if maybe best to PARTITION the table abs drop a PARTITION every N number of days as part of a daily task.

Compare differences before insert into oracle table

Could you please tell me how to compare differences between table and my select query and insert those results in separate table? My plan is to create one base table (name RESULT) by using select statement and populate it with current result set. Then next day I would like to create procedure which will going to compare same select with RESULT table, and insert differences into another table called DIFFERENCES.
Any ideas?
Thanks!
You can create the RESULT_TABLE using CTAS as follows:
CREATE TABLE RESULT_TABLE
AS SELECT ... -- YOUR QUERY
Then you can use the following procedure which calculates the difference between your query and data from RESULT_TABLE:
CREATE OR REPLACE PROCEDURE FIND_DIFF
AS
BEGIN
INSERT INTO DIFFERENCES
--data present in the query but not in RESULT_TABLE
(SELECT ... -- YOUR QUERY
MINUS
SELECT * FROM RESULT_TABLE)
UNION
--data present in the RESULT_TABLE but not in the query
(SELECT * FROM RESULT_TABLE
MINUS
SELECT ... );-- YOUR QUERY
END;
/
I have used the UNION and the difference between both of them in a different order using MINUS to insert the deleted data also in the DIFFERENCES table. If this is not the requirement then remove the query after/before the UNION according to your requirement.
-- Create a table with results from the query, and ID as primary key
create table result_t as
select id, col_1, col_2, col_3
from <some-query>;
-- Create a table with new rows, deleted rows or updated rows
create table differences_t as
select id
-- Old values
,b.col_1 as old_col_1
,b.col_2 as old_col_2
,b.col_3 as old_col_3
-- New values
,a.col_1 as new_col_1
,a.col_2 as new_col_2
,a.col_3 as new_col_3
-- Execute the query once again
from <some-query> a
-- Outer join to detect also detect new/deleted rows
full join result_t b using(id)
-- Null aware comparison
where decode(a.col_1, b.col_1, 1, 0) = 0
or decode(a.col_2, b.col_2, 1, 0) = 0
or decode(a.col_3, b.col_3, 1, 0) = 0;

Why is Oracle using full table scan when it should use an index?

I'm doing some experimentation with query plans in Oracle, and I have the following table:
--create a table to use
create table SKEWED_DATA(
EMP_ID int,
DEPT int,
COL2 int,
CONSTRAINT SKEWED_DATA_PK PRIMARY KEY (EMP_ID)
);
--add an index on dept
create index SKEWED_DATA_INDEX1 on SKEWED_DATA(DEPT);
I then insert 1 million rows of data where 999,999 rows have dept id 1, and 1 row has dept id 99.
Before calculating statistics on the table, Oracle Autotrace shows that when running the following queries, it is using an index scan for both:
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
It's my understanding that it would be more efficient in this case to use a full table scan for dept id 1, and an index scan for dept id 2.
I then run the following command to generate statistics for the table:
execute DBMS_STATS.GATHER_TABLE_STATS ('HARRY','SKEWED_DATA');
And querying the dba_tab_statistics and user_tab_col_statistics confirms that stats and histograms have been gathered.
Running an autotrace on the following queries now shows full table scan for both!
select AVG(COL2) from SKEWED_DATA D where DEPT = 1;
select AVG(COL2) from SKEWED_DATA D where DEPT = 99;
My question is: why is Oracle using a full table scan for dept id 99 when there is only 1 row with this value?
UPDATE
I tried running the query for dept 99 with a hint to force Oracle to use the index, and whilst Autotrace believes it to be less efficient, the time it takes is 0.001 seconds, compared to 0.03 seconds when using the full table scan, thus proving (I think?) my theory that Oracle should be using the index in this instance.
select /*+ INDEX(D SKEWED_DATA_INDEX1) */ AVG(COL2) from SKEWED_DATA D where DEPT = 99;
OK, I think I might have solved it. When I had 999,999 rows with dept 1 and 1 row with dept 99, I inspected the number of histogram buckets by running the following query:
select COLUMN_NAME, HISTOGRAM, NUM_BUCKETS, NUM_DISTINCT from USER_TAB_COL_STATISTICS where TABLE_NAME = 'SKEWED_DATA';
This showed that there are 2 distinct values but only 1 bucket. If I change the stats gathering to this:
execute DBMS_STATS.GATHER_TABLE_STATS('HARRY','SKEWED_DATA',estimate_percent=>100);
It then correctly comes up with 2 buckets, and the autotrace shows the 'correct' execution plans. So, I guess it's because of the extreme 'skewness' of my data that Oracle cannot generate the correct stats for it unless the estimate_percent is massive.
Interestingly if I have slightly less skewed data (say about 2-3% of all records with a dept id of 99) Oracle does treat it correctly even when I leave the estimate_percent as default.
So, the moral of the story seems to be: if you have ridiculously skewed data like this and Oracle is not using the correct execution plan, try playing around with the estimate_percent parameter.

Identify partitions having stale statistics in a list of schema

I have 5 development schemas. And each of them have partitioned tables. We also have scripts to dynamically create partition tables (Monthly/Yearly). We have to go to DBA everytime for gathering the details over the parition tables. Our real problem is we do have a parition table with 9 partitions. Every day after a delta load operation (Updates/Deletes using a PL/SQL) also some APPEND load using SQL*Loader. This operation happens when database has the peak load. We do have some performace issues over this table.(SELECT queries)
When reported to DBA, they would say the table statistics are stale and after they do "gathering stats", magically the query works faster. I searched about this and identified some information about dynamic performance views.
So, now , I have the following Questions.
1) Can the developer generate a the list of all partitionon tables, partition name, no of records available without going to DBA?
2) Shall we identify the last analysed date of every parition
3) Also the status of the parition(index) if it usable or unusable.
Normally there is no need to identify objects that need statistics gathered. Oracle automatically gathers statistics for stale objects, unless the task has been
manually disabled. This is usually good enough for OLTP systems. Use this query to find the status of the task:
select status
from dba_autotask_client
where client_name = 'auto optimizer stats collection';
STATUS
------
ENABLED
For data warehouse systems there is also not much need to query the data dictionary for stale stats. In a data warehouse statistics need to be considered after almost
every operation. Developers need to get in the habit of always thinking about statistics after a truncate, insert, swap, etc. Eventually they will "just know" when to gather statistics.
But if you still want to see how Oracle determines if statistics are stale, look at DBA_TAB_STATISTICS and DBA_TAB_MODIFICATIONS.
Here is an example of an initial load with statistics gathering. The table and partitions are not stale.
create table test1(a number, b number) partition by list(a)
(
partition p1 values (1),
partition p2 values (2)
);
insert into test1 select 1, level from dual connect by level <= 50000;
begin
dbms_stats.gather_table_stats(user, 'test1');
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 NO
TEST1 50000 2014-01-22 NO
Now add a large number of rows and the statistics are stale.
begin
insert into test1 select 2, level from dual connect by level <= 25000;
commit;
dbms_stats.flush_database_monitoring_info;
end;
/
select table_name, partition_name, num_rows, last_analyzed, stale_stats
from user_tab_statistics
where table_name = 'TEST1'
order by 1, 2;
TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED STALE_STATS
---------- -------------- -------- ------------- -----------
TEST1 P1 50000 2014-01-22 NO
TEST1 P2 0 2014-01-22 YES
TEST1 50000 2014-01-22 YES
USER_TAB_MODIFICATIONS gives more specific information on table staleness.
--Stale statistics.
select user_tables.table_name, user_tab_modifications.partition_name
,inserts+updates+deletes modified_rows, num_rows, last_analyzed
,case when num_rows = 0 then null
else (inserts+updates+deletes) / num_rows * 100 end percent_modified
from user_tab_modifications
join user_tables
on user_tab_modifications.table_name = user_tables.table_name
where user_tables.table_name = 'TEST1';
TABLE_NAME PARTITION_NAME MODIFIED_ROWS NUM_ROWS LAST_ANALYZED PERCENT_MODIFIED
---------- -------------- ------------- -------- ------------- ----------------
TEST1 P2 25000 50000 2014-01-22 50
TEST1 25000 50000 2014-01-22 50
Yes, you can generate a list of partitioned tables, and a lot of related data which you would like to see, by using ALL_PART_TABLES or USER_PART_TABLES (provided you have access).
ALL_TAB_PARTITIONS can be used to get number of rows per partition, alongwith other details.
Check other views Oracle has for gathering details about partitioned tables.
I would suggest that you should analyze the tables, and possibly rebuild the indexes, every day after your data load. If your data load is affecting a lot of records in the table, and is going to affect the existing indexes, it's a good idea to proactively update the statistics for the table and index.
You can use on the system views to get this information (Check http://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin005.htm)
I had a some what similar problem and I solved it by gathering stats on stale partitions only using 11g new INCREMENTAL option.
It's the reverse approach to your problem but it might worth investigating (specifically - how oracle determines what's a "stale" partition is).
dbms_stats.set_table_prefs('DWH','FACT_TABLE','INCREMENTAL','TRUE')
I always prefer the pro active approach - meaning, gather stats on stale partition at the last step of my etl, rather then giving the developer stronger privs.
I used to query all_ tables mentioned below.
The statistics and histogram details you mention will be updated in a frequency automatically by Oracle. But when the database is busy with many loads, I have seen these operations needs to be triggered manually. We faced similar situation, so we used to force the Analyze operation after our load for critical tables. You need to have privilege for the id you use to load the table.
ANALYZE TABLE table_name PARTITION (partition_name) COMPUTE STATISTICS;
EDIT: ANALYZE no longer gather CBO stats as mentioned here
So, DBMS_STATS package has to be used.
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT to_estimate_percent_type
(get_param('ESTIMATE_PERCENT')),
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT get_param('METHOD_OPT'),
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'),
cascade BOOLEAN DEFAULT to_cascade_type(get_param('CASCADE')),
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT to_no_invalidate_type (
get_param('NO_INVALIDATE')),
force BOOLEAN DEFAULT FALSE);
And until the analyze is complete, the view tables below may not produce the accurate results (Especially the last_analyzed and num_rows columns)
Note: Try replace all_ as dba_ in table names, if you have access to it, you can try them.
You can also try to get SELECT_CATALOG_ROLE for your development id you use, so that you can SELECT the data dictionary views, and this reduces the dependency over DBA over such queries.(Still DBA are the right persons for few issues!!)
Query to identify the partition table, partition name, number of rows and last Analysed date!
select
all_part.owner as schema_name,
all_part.table_name,
NVL(all_tab.partition_name,'N/A'),
all_tab.num_rows,
all_tab.last_analyzed
from
all_part_tables all_part,
all_tab_partitions all_tab
where all_part.table_name = all_tab.table_name and
all_tab.partition_name = all_tab.partition_name and
all_part.owner=all_tab.table_owner and
all_part.owner in ('SCHEMA1','SCHEMA2','SCHEMA3')
order by all_part.table_name,all_tab.partition_name;
The Below Query returns the index/table name that are UNUSABLE
SELECT INDEX_NAME,
TABLE_NAME,
STATUS
FROM ALL_INDEXES
WHERE status NOT IN ('VALID','N/A');
The Below Query returns the index/table (PARTITION) name that are UNUSABLE
SELECT INDEX_NAME,
PARTITION_NAME,
STATUS ,
GLOBAL_STATS
FROM ALL_IND_PARTITIONS
WHERE status != 'USABLE';

CHAR_USED query returned '0 rows fetched from 1 column'

I created the following table:
Create table temp.test(c1 VARCHAR2(10 BYTE));
I was trying to use CHAR_USED to determine whether the column size is in BYTES or CHARS but all I am getting back is '0 rows fetched from 1 column'. The database version i am using is Oracle 11g. Does anyone have a clue as to why it is not return the semantic length information for this table?
The query used are as follows:
select CHAR_USED from all_tab_columns where table_name='temp.test'
select CHAR_USED from all_tab_columns where table_name='test' and owner = 'temp'
Assuming that you are not using case-sensitive identifiers (which you are not and should not), object names are stored in the data dictionary in upper case. So when you query a table like all_tab_columns, you'd need to use upper-case
SELECT column_name, char_used
FROM all_tab_columns
WHERE table_name = 'TEST'
AND owner = 'TEMP'

Resources