We have a replication with goldengate from a prod environment.
The tables got initial dumped from the prod and afterwards we started the replication with goldengate. Now we want to migrate the data to another database. But the query plans are different from the prod environment. We think it is because all statistics from the database of the replication are broken/wrong.
The number of rows stated in dba_tables are null, 0 or differs 50-80%.
We tried to do dbms_stats.gather_table_stats on all relevant tables.
It's still broken. We run that querie for all tables that had wrong statistics:
dbms_stats.GATHER_TABLE_STATS(OWNNAME => 'SCHEMA', TABNAME => 'TABLE_NAME', CASCADE => true);
We can't migrate with the bad queryplans.
We are using Oracle Release 12.2.0.1.0 - Production
EDIT: After the answer of #Jon Heller we saw that some indices are partitioned in the prod environment not in the replication. Additionally the global preference DEGREE is 32768 on the replication and NULL on prod.
Are the tables built exactly the same way? Maybe a different table structure is causing the statistics to break, like if one table is partitioned and another is not. Try comparing the DDL:
select dbms_metadata.get_ddl('TABLE', 'TABLE1') from dual;
I'm surprised to hear that statistics are wrong even after gathering stats. Especially the number of rows - since 10g, that number should always be 100% accurate with the default settings.
Can you list the exact commands you are using to gather stats? Also, this is a stretch, but possibly the global preference were changed on one database. It would be pretty evil, but you could set a database default to only look at 0.00001% of the data, which would create terrible statistics. Check your global preferences between both databases.
--Thanks to Tim Hall for this query: https://oracle-base.com/dba/script?category=monitoring&file=statistics_prefs.sql
SELECT DBMS_STATS.GET_PREFS('AUTOSTATS_TARGET') AS autostats_target,
DBMS_STATS.GET_PREFS('CASCADE') AS cascade,
DBMS_STATS.GET_PREFS('DEGREE') AS degree,
DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT') AS estimate_percent,
DBMS_STATS.GET_PREFS('METHOD_OPT') AS method_opt,
DBMS_STATS.GET_PREFS('NO_INVALIDATE') AS no_invalidate,
DBMS_STATS.GET_PREFS('GRANULARITY') AS granularity,
DBMS_STATS.GET_PREFS('PUBLISH') AS publish,
DBMS_STATS.GET_PREFS('INCREMENTAL') AS incremental,
DBMS_STATS.GET_PREFS('STALE_PERCENT') AS stale_percent
FROM dual;
If gathering statistics still leads to different results, the only thing I can think of is corruption. It may be time to create an Oracle service request.
(This is more of an extended comment than an answer, but it might take a lot of code to diagnose this problem. Please update the original question with more information as you find it.)
Related
We have a Oracle 19C database (19.0.0.0.ru-2021-04.rur-2021-04.r1) on AWS RDS which is hosted on an 4 CPU 32 GB RAM instance. The size of the database is not big (35 GB) and the PGA Aggregate Limit is 8GB & Target is 4GB. Whenever the scheduled internal Oracle Auto Optimizer Stats Collection Job (ORA$AT_OS_OPT_SY_nnn) runs then it consumes substantially high PGA memory (approx 7GB) and sometimes this makes database unstable and AWS loses communication with the RDS instance so it restarts the database.
We thought this may be linked to existing Oracle bug 30846782 (19C+: Fast/Excessive PGA growth when using DBMS_STATS.GATHER_TABLE_STATS) but Oracle & AWS had fixed it in the current 19C version we are using. There are no application level operations that consume this much PGA and the database restart have always happened when the Auto Optimizer Stats Collection Job was running. There are couple of more databases, which are on same version, where same pattern was observed and the database was restarted by AWS. We have disabled the job now on those databases to avoid further occurrence of this issue however we want to run this job as disabling it may cause old stats being available in the database.
Any pointers on how to tackle this issue?
I found the same issue in my AWS RDS Oracle 18c and 19c instances, even though I am not in the same patch level as you.
In my case, I applied this workaround and it worked.
SQL> alter system set "_fix_control"='20424684:OFF' scope=both;
However, before applying this change, I strongly suggest that you test it on your non production environments, and if you can, try to consult with Oracle Support. Dealing with hidden parameters might lead to unexpected side effects, so apply it at your own risk.
Instead of completely abandoning automatic statistics gathering, try find any specific objects that are causing the problem. If only a small number of tables are responsible for a large amount of statistics gathering, you can manually analyze those tables or change their preferences.
First, use the below SQL to see which objects are causing the most statistics gathering. According to the test case in bug 30846782, the problem seems to be only related to the number of times DBMS_STATS is called.
select *
from dba_optstat_operations
order by start_time desc;
In addition, you may be able to find specific SQL statements or sessions that generate a lot of PGA memory with the below query. (However, if the database restarts, it's possible that AWR won't save the recorded values.)
select username, event, sql_id, pga_allocated/1024/1024/1024 pga_allocated_gb, gv$active_session_history.*
from gv$active_session_history
join dba_users on gv$active_session_history.user_id = dba_users.user_id
where pga_allocated/1024/1024/1024 >= 1
order by sample_time desc;
If the problem is only related to a small number of tables with a large number of partitions, you can manually gather the stats on just that table in a separate session. Once the stats are gathered, the table won't be analyzed again until about 10% of the data is changed.
begin
dbms_stats.gather_table_stats(user, 'PGA_STATS_TEST');
end;
/
It's not uncommon for a database to spend a long time gathering statistics, but it is uncommon for a database to constantly analyze thousands of objects. Running into this bug implies there is something unusual about your database - are you constantly dropping and creating objects, or do you have a large number of objects that have 10% of their data modified every day? You may need to add a manual gather step to a few of your processes.
Turning off the automatic statistics job entirely will eventually cause many performance problems. Even if you can't add manual gathering steps, you may still want to keep the job enabled. For example, if tables are being analyzed too frequently, you may want to increase the table preference for the "STALE_PERCENT" threshold from 10% to 20%:
begin
dbms_stats.set_table_prefs
(
ownname => user,
tabname => 'PGA_STATS_TEST',
pname => 'STALE_PERCENT',
pvalue => '20'
);
end;
/
We are running Oracle Applications 12.2.4 on a 12.1.0.2.0 database. When I do the following query:
select DBMS_STATS.GET_PREFS('AUTOSTATS_TARGET') as autostats_target,
DBMS_STATS.GET_PREFS('CASCADE') as cascade,
DBMS_STATS.GET_PREFS('DEGREE') as degree,
DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT') as estimate_percent,
DBMS_STATS.GET_PREFS('METHOD_OPT') as method_opt,
DBMS_STATS.GET_PREFS('NO_INVALIDATE') as no_invalidate,
DBMS_STATS.GET_PREFS('GRANULARITY') as granularity,
DBMS_STATS.GET_PREFS('PUBLISH') as publish,
DBMS_STATS.GET_PREFS('INCREMENTAL') as incremental,
DBMS_STATS.GET_PREFS('STALE_PERCENT') as stale_percent
from dual
I get:
"AUTOSTATS_TARGET","CASCADE","DEGREE","ESTIMATE_PERCENT","METHOD_OPT","NO_INVALIDATE","GRANULARITY","PUBLISH","INCREMENTAL","STALE_PERCENT"
"AUTO","DBMS_STATS.AUTO_CASCADE","NULL","DBMS_STATS.AUTO_SAMPLE_SIZE","FOR ALL COLUMNS SIZE AUTO","DBMS_STATS.AUTO_INVALIDATE","AUTO","TRUE","FALSE","10"
However when I run:
select distinct histogram
from user_tab_col_statistics
I only get NONE
How is it possible that an Oracle Application instance can have no tables with skews that need histograms? Or am I not understanding the settings?
Also when you want a histogram on a column do you have to use method_opt => 'for all columns size skewonly'? How can you specify auto for all columns and skew for one column?
I would really like the potential huge speed increase that histograms can bring and I am surprised that Oracle Applications does not provide this by default. There is a Gather Schema Statistics process that runs every night, could it be that the code in that is very old and that it kills any dbms_stats calls? I specifically created the following index which should have a histogram.
create index xxpqh_ss_trans_history_idx1 on hr.pqh_ss_transaction_history (process_name, nvl(selected_person_id, -1)) compress 1 tablespace apps_ts_tx_idx;
exec dbms_stats.gather_table_stats(ownname => 'HR', tabname => 'PQH_SS_TRANSACTION_HISTORY', cascade => true, method_opt => 'for all columns size skewonly');
Oracle Applications uses its own mechanisms for statistics collection and you should be using dbms_stats directly.
"Oracle E-Business Suite statistics should only be gathered using FND_STATS or the Gather Statistics concurrent request. Gathering statistics with DBMS_STATS or the desupported ANALYZE command may result in suboptimal executions plans for E-Business Suite"
Please refer to the following whitepaper for recommendations:
Best Practices for Gathering Statistics with Oracle E-Business Suite (MOS Note 1586374.1)
I have the following query which monitors if anyone tried to logon with a technical users on database:
SELECT COUNT (OS_USERNAME)
FROM DBA_AUDIT_SESSION
WHERE USERNAME IN ('USER1','USER2','USER3')
AND TIMESTAMP>=SYSDATE - 10/(24*60)
AND RETURNCODE !='0'
Unfortunately the performance of this SQL is quite poor since it does TABLE ACCESS FULL on sys.aud$. I tried to narrow it with:
SELECT COUNT (sessionid)
FROM sys.aud$
WHERE userid IN ('USER1','USER2','USER3')
AND ntimestamp# >=SYSDATE - 10/(24*60)
AND RETURNCODE !='0'
and action# between 100 and 102;
And it is even worse. Is it possible at all to optimize that query by forcing oracle to use indexes here? I would be grateful for any help&tips.
SYS.AUD$ does not have any default indexes but it is possible to create one on ntimestamp#.
But proceed with caution. The support document "The Effect Of Creating Index On Table Sys.Aud$ (Doc ID 1329731.1)" includes this warning:
Creating additional indexes on SYS objects including table AUD$ is not supported.
Normally that would be the end of the conversation and you'd want to try another approach. But in this case there are a few reasons why it's worth a shot:
The document goes on to say that an index may be helpful, and to test it first.
It's just an index. The SYS schema is special, but we're still just talking about an index on a table. It could slow things down, or maybe cause space errors, like any index would. But I doubt there's any chance it could do something crazy like cause wrong results bugs.
It's somewhat common to change the tablespace of the audit trail, so that table isn't sacred.
I've seen indexes on it before. 2 of the 400 databases I manage have an index on the columns SESSIONID,SES$TID (although I don't know why). Those indexes have been there for years, have been through an upgrade and patches, and haven't caused problems as far as I know.
Creating an "unsupported" index may be a good option for you, if you're willing to test it and accept a small amount of risk.
Oracle 10g onwards optimizer would choose the best plan for your query, provided you write proper joins. Not sure how many recocds exists in your DBA_AUDIT_SESSION , but you can always use PARALLEL hints to somewhat speed up the execution.
SELECT /*+Parallel*/ COUNT (OS_USERNAME)
--select COUNT (OS_USERNAME)
FROM DBA_AUDIT_SESSION
WHERE USERNAME IN ('USER1','USER2','USER3')
AND TIMESTAMP>=SYSDATE - 10/(24*60)
AND RETURNCODE !='0'
Query Cost reduces to 3 than earlier.
NumRows: 8080019
So it is pretty large due to company regulations. Unfortunately using /*+Parallel*/ here makes it run longer, so the performance is still worse.
Any other suggestions?
We have one of our system that perform quite a bit of database activity in terms of INSERT/UPDATE/DELETE statements against various tables. Because of this the statistics became stale and this is reflected in overall performance.
We want to create a scheduled job that would periodically invoke DBMS_STATS.GATHER_SCHEMA_STATS. Because we don't want actual stats gathering itself to impact the system processing even more we are thinking to collect statistics quite frequent and use GATHER STALE option:
DBMS_STATS.GATHER_SCHEMA_STATS(OWNNAME => 'MY_SCHEMA', OPTIONS =>'GATHER STALE')
This executes almost instantly but running this statement below before and after stats gathering seems to bring back the same records with the same values:
SELECT * FROM user_tab_modifications WHERE inserts + updates + deletes > 0;
The very short time taking to execute and the fact that user_tab_modifications content stays the same makes me question if OPTIONS =>'GATHER STALE' actually does what we expect it should do. On the other hand if I run this again before and after statistics gathering I can see the tables reported as stale before re no longer reported as stale after:
DECLARE
stale dbms_stats.objecttab;
BEGIN
DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'MY_SCHEMA', OPTIONS =>'LIST STALE', objlist => stale);
FOR i in 1 .. stale.count
LOOP
dbms_output.put_line( stale(i).objName );
END LOOP;
END;
On another hand if lets say my_table is one of my tables being listed as part of the tables that part of the user_tab_modifications with inserts + updates + deletes > 0 and I run I can see my_table no longer being reported as having changes.
EXECUTE DBMS_STATS.GATHER_TABLE_STATS(ownname => 'MY_SCHEMA', tabname => 'MY_TABLE');
So my questions are:
Is my approach correct. Can I trust I am getting fresh stats just by running options => 'GATHER STALE' or I should manually collect stats for all tables that come back with a reasonable number of inserts, updates, deletes?
When user_tab_modifications would actually get reset; obviously GATHER STALE option does not seem to do it
We are using Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
Got the following info from Oracle docs.
You should enable monitoring if you use GATHER_DATABASE_STATS or GATHER_SCHEMA_STATS with the GATHER AUTO or GATHER STALE options.
This view USER_TAB_MODIFICATIONS is populated only for tables with the MONITORING attribute. It is intended for statistics collection over a long period of time. For performance reasons, the Oracle Database does not populate this view immediately when the actual modifications occur. Run the FLUSH_DATABASE_MONITORING_INFO procedure in the DBMS_STATS PL/SQL package to populate this view with the latest information. The ANALYZE_ANY system privilege is required to run this procedure.
Hope this helps you to identify which of your assumptions are incorrect and understand the correct usage of "GATHER STALE".
We have a table called t_reading, with the following schema:
MEAS_ASS_ID NUMBER(12,0)
READ_DATE DATE
READ_TIME VARCHAR2(5 BYTE)
NUMERIC_VAL NUMBER
CHANGE_REASON VARCHAR2(240 BYTE)
OLD_IND NUMBER(1,0)
This table is indexed as follows:
CREATE INDEX RED_X4 ON T_READING
(
"OLD_IND",
"READ_DATE" DESC,
"MEAS_ASS_ID",
"READ_TIME"
)
This exact table (with the same data) exists on two servers, the only difference is the Oracle version installed on each one.
The query in question is:
SELECT * FROM t_reading WHERE OLD_IND = 0 AND MEAS_ASS_ID IN (5022, 5003) AND read_date BETWEEN to_date('30/10/2012', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy');
This query executes in less than a second on Oracle 10, and around a minute in Oracle 9.
Are we missing something?
EDIT:
Execution plan for Oracle 9:
Execution plan for Oracle 10:
"Are we missing something?"
Almost certainly, but it's difficult for us to tell you what.
There were some performance improvements in the CBO from 9i to 10g but it's unlikely to make that much difference. So it must be some variation in your systems, which is obviously the hardest thing for us to diagnose, blind and remote as we are.
So, first things to rule out are general system diffences - disk speeds, i/o bottlenecks, memory sizing, etc. You say you have two servers, do they have different specs? Whilst it will require assistence from an sysadmin type to investigate these things, we can discount them with a single question: is it just this query, or can you reproduce this effect with many different queries?
If is just the query, there are at least three possible explanations.
One is data distribution. How was the data populated in the two databases? If the 10g was exported from the 9i database was it sorted in some fashion? Even if it wasn't it is possible that the ETL process has compacted and organised the data and built the fresh indexes in a way which improves the access times.
Another is statistics. Are the 10g statistics fresh and realistic, whilst the 9i statistics are stale and misleading?
A third possibility is a stored execution plan. (You have posted a query with literals, this only applies to queries with bind variables.) Searches on date ranges are notoriously hard to tune. A date range of to_date('30/10/2012', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy') suits one sort of plan, whereas date range of to_date('01/01/2010', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy') may well suit a different approach. If the extant plan on the 9i database suits a broader range then queries for a narrow range may take a long time.
While I've been typing this you have published the explain plans. The killer detail is at the bottom of the 9i plan:
Note: rule-based optimization
You haven't got any stats for the table or the index, so the optimizer is applying the dumb defaults of the RBO. You should really address this, but it's not a simple task. You may need to gather stats for all your tables. You may need to change the OPTIMIZER_MODE in the init.ora file. You may need to undertake a regression test of all the queries on your database. So, it's not something you shoudl do lightly.
In the meantime, if this query is bugging you, you'll need to wrnagle the Rule-Based Optimizer the old-fashioned way. Find out more.
A couple of potential explanations:
You're range scanning different indexes.
Assuming that you've got the same index on your 10g table but you've
just called it a different thing the explain plans are different.
The main worry I would have is the lack of information in the rows, bytes and cost column of the explain plan on your 9i query. Oracle 9i does not collect statistics by default and this detail would indicate that you have not collected statistics on this table. Use dbms_stats to gather statistics on your table and the indexes. Specifically the procedure gather_table_stats:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (
ownname => user,
tabname => 'T_READING',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE
method_opt => 'FOR ALL INDEXED COLUMNS',
cascade => TRUE, -- gather index statistics
);
END:
There are plenty of other options if you're interested. Assuming the indexes are different this may help the CBO (assuming it's "turned on") to pick the correct index.
The other options include what server they are on and what the database parameters are. If they're on different servers then the relative "power", disk-speed, I/O and a never-ending list of other options could easily cause a difference. If the database parameters are different then you have the same problem.
Database tuning is an art as much as a science. Oracle has a whole book on it and there are plenty of other resources out there.
A few observations:
your index is a DESCENDING index. This is a function based index, as such, it won't work as expected under the RULE optimizer.
your 9i plan shows access only on OLD_IND, your 10g plan (you cut off the important predicate bits) shows a range scan + inlist iterator, so depending on that RED_PK, it may be accessing on MEAS_ASS_ID which is perhaps more selective.
in terms of indexing too, to answer your query WHERE OLD_IND = 0 AND MEAS_ASS_ID IN (5022, 5003) AND read_date BETWEEN ie OLD_IND equality, MEAS_ASS_ID equality and read_date range scanned, a better index is (OLD_IND , MEAS_ASS_ID , READ_DATE): do the range scan last to cut down on IO.
Have you tried running an explain on the queries on the two servers, the query optimiser for 9i is different from the one for 10g. The 10g query optimiser is much faster and parallelised. Check out the following link Upgrading Query Optimiser
explain SELECT * FROM t_reading WHERE OLD_IND = 0 AND MEAS_ASS_ID IN (5022, 5003) AND read_date BETWEEN to_date('30/10/2012', 'dd/mm/yyyy') AND to_date('31/10/2012', 'dd/mm/yyyy');