Oracle Optimizer Statistics Advisor and its effect on tables gather statistics - oracle

Recently i am reading more about 'Optimizer Statistics Advisor' and I have done some test on my test database. It gave the below recomondation:
Rule Name: UseConcurrent
Rule Description: Use Concurrent preference for Statistics Collection
Finding: The CONCURRENT preference is not used.
Recommendation: Set the CONCURRENT preference.
Example:
dbms_stats.set_global_prefs('CONCURRENT', 'ALL');
Rationale: The system's condition satisfies the use of concurrent statistics
gathering. Using CONCURRENT increases the efficiency of statistics
gathering.
As I understand from oracle base
Concurrent statistics collection is simply the ability to gather statistics on multiple tables, or table partitions, at the same time. This is done using a combination of the job scheduler, advanced queuing and resource manager.
so this recomondation for all the database , not for the tables. What I am saying if I gather statistics for a table such remocommndation will not have any benefits correct ? Also is there a way there 'Optimizer Statistics Advisor' can be implemented on specific tables ?

Unlikely, because one of the overall aims for the optimizer team is that for 99% of customers, the default settings of the optimizer statistics gathering mechanisms will be sufficient to ensure good plans.
This is why (for example)
when you use CTAS, you'll get stats collected automatically on creation, so (in most cases) no need for additional steps before using table in queries.
a table that has queries that use predicates on skewed data will ultimately end up with histograms (via SYS.COL_USAGE$) without user intervention needed
tables without stats will get them automatically in the next run, and until then will be dynamically sampled
tables that change by 10% will get stats picked up again on next run
and so forth. The aim is that (except in niche cases) you'll not need to "worry" about the optimizer statistics process.

Related

Oracle 12c full text index maintainence

We are planning to use a context index for full text search in Oracle 12c standard edition.
The data on which search will run is a JSON containing one Channel post and its replies from a 3rd party tool that is loaded into our database.(basically, all the chats and replies(including other attributes like timestamp/user etc) are stored in this table).
We are expecting about 50k rows of data per year and a daily of 100-150 DMLs per day. Our index is "SYNC ON COMMIT" currently,so what are the recommendations for optimizing the Oracle Text index?
First, let me preface my response with a disclaimer: I am exploring using Oracle Text as part of a POC currently, and my knowledge is somewhat limited as we're still in the research phase. Additionally, our datasets are in the 10s of millions with 100k DML operations daily.
From what I've read, Oracle Docs suggest scheduling both a FULL and REBUILD optimization for indexes which incur DML, so I currently have implemented the following in our dev environment.
execute ctx_ddl.optimize_index('channel_json_ctx_idx', 'FULL'); --run daily
execute ctx_ddl.optimize_index('channel_json_ctx_idx', 'REBUILD'); --run weekly
I cannot imagine with the dataset you've identified that your index will really become that fragmented and cause performance issues. You could probably get away with less frequent optimizations than what I've mentioned.
You could even forego scheduling the optimization and benchmark your performance. If you see it start to degrade, then note the timespan and perhaps count of DML operations for reference. Then run a 'FULL'. Then test performance. If performance improves, create a schedule. If performance does not improve, then run 'REBUILD'. Test performance. Assuming performance improves then you could schedule the 'REBUILD'for that time range and consider adding a more frequent 'FULL'.

Oracle SQL table and index statistics

I've been doing some reading on gathering table and index statistics on Oracle databases but it's left me ... confused.
For the sake of argument, let's assume Oracle 11gR2 as the RDBMS. Regarding gathering table and index statistics, when should it be done, which is the preferred way of doing it, and does Oracle really automatically gather the necessary statistics for us?
Regarding the first point: when should it be done. I've read that, as a rule of thumb, gathering table and index statistics should be done after around 10% of the table's records have been modified (inserted, updated, etc) since the last time the table was analyzed.
Regarding the second point: which is the preferred way of doing it. If we want to calculate both table and index statistics, does executing DBMS_STATS.GATHER_TABLE_STATS with default options, assuming the table is not partitioned, suffice?
Regarding the third point:does Oracle really gather the necessary statistics automatically for us. If this is the case, should i not worry abouth gathering table statistics (see points 1 and 2)?
Thanks in advance.
EDIT: Following the comment by ammoQ, i realized that the question is not clear in what the use case really is, here. My question is about tables that aren' "manipulated" via a user's actions, i.e manually, rather via procedures typically ran by database jobs. Take my example, for instance. My ETL process loads several tables on a daily basis and it does so in approximately 1 hour. Of that 1 hour, about half is spent analyzing the tables themselves. Thus, the tables area analyzed daily, following insertions or updates. This seems overkill, hence the question.
In general, you need to have statistics that are representative (not necessarily accurate) and that give you the right execution plan. By default, Oracle will run a statistics collection job, during the nightly batch window. That may be fine for some applications, but if you have a data warehouse, which presumably includes a regular data load process, then managing the stats should be part of that process. Note that I have said "managing" and not "collecting" statistics. That's just my way of saying that there are other options for statistics in addition to just gathering statistics, although gathering statistics would be where I would start.
There are also things that can be done to optimize statistics gathering, incremental statistics for example. The other thing that is very important is is to use the AUTO Sample size when gathering stats. Do not specify a percentage, not even 100%. The reason is that auto sample size enables a number of internal optimizations and capabilities that are disabled if you do not use AUTO sample size.
So, taking your specific points
10% staleness is pretty random, and is just a number used by the auto stats.
dbms_stats.gather_table_stats() with default values is the preferred method. One parameter that I may change would be the DEGREE, to enable stats gathering in parallel
In 12c, basic stats are gathered on load into an empty table (or empty partition). Stats are built on indexes when indexes are created. So to reiterate what I said above, stats gathering should be part of your ELT process.
I hope that makes sense and helps.

Can a single SELECT significantly degrade the performance of an Oracle Database?

All,
Consider
an Oracle Database of version 10gR2 or greater.
Hardware sufficient to exceed existing peak workload by more than 100%
Can you construct any circumstance in which a single user with a single connection and read-only access to user tables (not system tables or v$ or x$ tables) degrade the overall performance of an Oracle database. Also list mitigation strategy if any.
As stipulated, consider that this is not about a database which is a few CPU-cycles away from saturation such that any additional load would be dangerous. This is about a well sized box for the current workload.
E.G.
If a user uses a parallel hint, Oracle may use a very high DOP to execute that query and starve other processes of CPU. Mitigation: explain to user that Parallel hints are forbidden.
The simplest approach would be to issue a query that does a Cartesian product of your biggest table with itself a bunch of times. That will blow out your TEMP tablespace rather quickly and generate errors for other sessions that need to sort. You can mitigate that either by granting limited quotas on TEMP (which may get tricky if this is an application account that is used by multiple people simultaneously rather than an individually identifiable account) or by using Resource Manager to kill sessions that either run too long or that use too much CPU or I/O resources.
Even without an explicit PARALLEL hint, it's possible that Oracle will use parallelism automatically. Depending on the Oracle version and how you've configured parallelism, turning on parallel automatic tuning to limit the total number of parallel workers at any time. Of course, that doesn't prevent the read-only user from creating a dozen sessions each of which spawns a couple of parallel workers that choke off other sessions that you actually want to run more parallel workers. You can use Resource Manager to configure priorities for different types of workload if and when the system becomes CPU constrained.
And if you allow an anonymous PL/SQL block, there are more ways to generate havoc by, for example, creating a nested table that gets filled with billions of rows until your PGA is exhausted.

Oracle: Difference in execution plans between databases

I am comparing queries my development and production database.
They are both Oracle 9i, but almost every single query has a completely different execution plan depending on the database.
All tables/indexes are the same, but the dev database has about 1/10th the rows for each table.
On production, the query execution plan it picks for most queries is different from development, and the cost is somtimes 1000x higher. Queries on production also seem to be not using the correct indexes for queries in some cases (full table access).
I have ran dbms_utility.analyze schema on both databases recently as well in the hopes the CBO would figure something out.
Is there some other underlying oracle configuration that could be causing this?
I am a developer mostly so this kind of DBA analysis is fairly confusing at first..
1) The first thing I would check is if the database parameters are equivalent across Prod and Dev. If one of the parameters that affects the decisions of the Cost Based Optimizer is different then all bets are off. You can see the parameter in v$parameter view;
2) Having up to date object statistics is great but keep in mind the large difference you pointed out - Dev has 10% of the rows of Prod. This rowcount is factored into how the CBO decides the best way to execute a query. Given the large difference in row counts I would not expect plans to be the same.
Depending on the circumstance the optimizer may choose to Full Table Scan a table with 20,000 rows (Dev)where it may decide an index is lower cost on the table that has 200,000 rows (Prod). (Numbers just for demonstration, the CBO uses costing algorighms for determining what to FTS and what to Index scan, not absolute values).
3) System statistics also factor into the explain plans. This is a set of statistics that represent CPU and disk i/o characteristics. If your hardware on both systems is different then I would expect your System Statistics to be different and this can affect the plans. Some good discussion from Jonathan Lewis here
You can view system stats via the sys.aux_stats$ view.
Now I'm not sure why different plans are a bad thing for you... if stats are up to date and parameters set correctly you should be getting decent performance from either system no matter what the difference in size...
but it is possible to export statistics from your Prod system and load them into your Dev system. This make your Prod statistics available to your Dev database.
Check the Oracle documentation for the DBMS_STATS package, specifically the EXPORT_SCHEMA_STATS, EXPORT_SYSTEM_STATS, IMPORT_SCHEMA_STATS, IMPORT_SYSTEM_STATS procedures. Keep in mind you may need to disable the 10pm nightly statistics jobs on 10g/11g... or you can investigate Locking statistics after import so they are not updated by nightly jobs.

How Oracle uses statistics data

In previous question I got comment about Oracle statistics:
Oracle doesn't know that 50M is greater than the number of rows. Sure, it has statistics, but they could be old and wrong - and Oracle would never allow itself to deliver an incorrect result only because the statistics are wrong
I was pretty sure that Oracle relies on statistics when preparing query execution plan. Before version 10 it was recommended to refresh statistics from time to time and from 10g Oracle gathers statistics automatically.
Can somebody explain how much Oracle query analyzer relies on statistics data?
Oracle uses statistics a lot, to generate query execution plans. What it does not (and should not) do is use those statistics in a way that will affect query results, which is what you were trying to do with "ROWNUM < 50000000". The statistics may be out of date, or missing. However, this will only mean that Oracle may be slow to generate the correct result, it does not mean that Oracle will return an incorrect result.
If Oracle worked as you hoped, then it might decide that "ROWNUM < 50000000" meant "get all rows" even though the table now contained 60,000,000 rows (but had out of date stats saying it contained only 49,000,000). Fortunately it does not.
Statistics are VERY important for the query optimizer. They should be gathered on a regular basis either automatically or manually.
When executing a query Oracle will produce a pool of available execution plans in order to satisfy your query. Those execution plans are the same from the standpoint that they will return you the same exact result, it's just the road to getting there may be far more efficient for one plan over another. To determine this efficiency, Oracle uses the stats generated on the objects used in each of the execution plans to determine their individual costs. If those stats are non-existent or are stale, the cost associated with each plan will be less accurate and therefore the optimal plan may not be chosen.
Here are some of the key stats that Oracle uses for determining this cost:
Table statistics
* Number of rows
* Number of blocks
* Average row length
Column statistics
* Number of distinct values (NDV) in column
* Number of nulls in column
* Data distribution (histogram)
* Extended statistics
Index statistics
* Number of leaf blocks
* Levels
* Clustering factor
System statistics
* I/O performance and utilization
* CPU performance and utilization
Statistics are used by the oracle cost based optimizer (CBO) to calculate the relative costs of different ways of executing a query so that the most appropriate one can be chosen.
On the whole this works very well, and is being continually improved. For example in 11g you can gather multicolumn histograms that help greatly with queries having predicates on correlated columns (eg. strongly correlated like month of birth and star sign, or more weakly correlted like gender and height).
However it is not perfect. For example estimating the cardinality of the result set of a join between two tables is reasonably accurate, as is estimating the cardinality from a filter operation, but combining the two requires a lot of estimation that can easily be inaccurate. In some cases these issues can be worked around with hints, or with the use of global temporary tables for intermediate result sets.
Another problem of statistics is that changing them can change the execution plan, so there is more of a movement recently to either discourage continual gathering of statistics, or to analyse the impact of changes to statistics before implementing them.
Look for the Jonathan Lewis book -- it is a very thorough treatment of the subject.

Resources