Oracle: Difference in execution plans between databases - oracle

I am comparing queries my development and production database.
They are both Oracle 9i, but almost every single query has a completely different execution plan depending on the database.
All tables/indexes are the same, but the dev database has about 1/10th the rows for each table.
On production, the query execution plan it picks for most queries is different from development, and the cost is somtimes 1000x higher. Queries on production also seem to be not using the correct indexes for queries in some cases (full table access).
I have ran dbms_utility.analyze schema on both databases recently as well in the hopes the CBO would figure something out.
Is there some other underlying oracle configuration that could be causing this?
I am a developer mostly so this kind of DBA analysis is fairly confusing at first..

1) The first thing I would check is if the database parameters are equivalent across Prod and Dev. If one of the parameters that affects the decisions of the Cost Based Optimizer is different then all bets are off. You can see the parameter in v$parameter view;
2) Having up to date object statistics is great but keep in mind the large difference you pointed out - Dev has 10% of the rows of Prod. This rowcount is factored into how the CBO decides the best way to execute a query. Given the large difference in row counts I would not expect plans to be the same.
Depending on the circumstance the optimizer may choose to Full Table Scan a table with 20,000 rows (Dev)where it may decide an index is lower cost on the table that has 200,000 rows (Prod). (Numbers just for demonstration, the CBO uses costing algorighms for determining what to FTS and what to Index scan, not absolute values).
3) System statistics also factor into the explain plans. This is a set of statistics that represent CPU and disk i/o characteristics. If your hardware on both systems is different then I would expect your System Statistics to be different and this can affect the plans. Some good discussion from Jonathan Lewis here
You can view system stats via the sys.aux_stats$ view.
Now I'm not sure why different plans are a bad thing for you... if stats are up to date and parameters set correctly you should be getting decent performance from either system no matter what the difference in size...
but it is possible to export statistics from your Prod system and load them into your Dev system. This make your Prod statistics available to your Dev database.
Check the Oracle documentation for the DBMS_STATS package, specifically the EXPORT_SCHEMA_STATS, EXPORT_SYSTEM_STATS, IMPORT_SCHEMA_STATS, IMPORT_SYSTEM_STATS procedures. Keep in mind you may need to disable the 10pm nightly statistics jobs on 10g/11g... or you can investigate Locking statistics after import so they are not updated by nightly jobs.

Related

Oracle SQL table and index statistics

I've been doing some reading on gathering table and index statistics on Oracle databases but it's left me ... confused.
For the sake of argument, let's assume Oracle 11gR2 as the RDBMS. Regarding gathering table and index statistics, when should it be done, which is the preferred way of doing it, and does Oracle really automatically gather the necessary statistics for us?
Regarding the first point: when should it be done. I've read that, as a rule of thumb, gathering table and index statistics should be done after around 10% of the table's records have been modified (inserted, updated, etc) since the last time the table was analyzed.
Regarding the second point: which is the preferred way of doing it. If we want to calculate both table and index statistics, does executing DBMS_STATS.GATHER_TABLE_STATS with default options, assuming the table is not partitioned, suffice?
Regarding the third point:does Oracle really gather the necessary statistics automatically for us. If this is the case, should i not worry abouth gathering table statistics (see points 1 and 2)?
Thanks in advance.
EDIT: Following the comment by ammoQ, i realized that the question is not clear in what the use case really is, here. My question is about tables that aren' "manipulated" via a user's actions, i.e manually, rather via procedures typically ran by database jobs. Take my example, for instance. My ETL process loads several tables on a daily basis and it does so in approximately 1 hour. Of that 1 hour, about half is spent analyzing the tables themselves. Thus, the tables area analyzed daily, following insertions or updates. This seems overkill, hence the question.
In general, you need to have statistics that are representative (not necessarily accurate) and that give you the right execution plan. By default, Oracle will run a statistics collection job, during the nightly batch window. That may be fine for some applications, but if you have a data warehouse, which presumably includes a regular data load process, then managing the stats should be part of that process. Note that I have said "managing" and not "collecting" statistics. That's just my way of saying that there are other options for statistics in addition to just gathering statistics, although gathering statistics would be where I would start.
There are also things that can be done to optimize statistics gathering, incremental statistics for example. The other thing that is very important is is to use the AUTO Sample size when gathering stats. Do not specify a percentage, not even 100%. The reason is that auto sample size enables a number of internal optimizations and capabilities that are disabled if you do not use AUTO sample size.
So, taking your specific points
10% staleness is pretty random, and is just a number used by the auto stats.
dbms_stats.gather_table_stats() with default values is the preferred method. One parameter that I may change would be the DEGREE, to enable stats gathering in parallel
In 12c, basic stats are gathered on load into an empty table (or empty partition). Stats are built on indexes when indexes are created. So to reiterate what I said above, stats gathering should be part of your ELT process.
I hope that makes sense and helps.

Faking Oracle statistics?

Is there a way to force the Oracle too "see" a table and associated indexes as being bigger than they really are?
In other words, is there a way to "fake" database statistics, so the cost based optimizer would make decisions on a nearly-empty database, that are closer to decisions that would be made in a real, big production database?
The idea is to be able to experiment (vis-a-vis execution plan) with various indexing / querying / (de)normalization strategies very early in the database design process, without wasting time writing code that fills it with representative test data (most of which will end-up being discarded anyway, since the database design is still not settled).
Importing statistics is out of question, since the production database does not even exist yet.
Sure. The DBMS_STATS package has a number of procedures that allow you to force statistics on objects. There are dbms_stats.set_table_stats and dbms_stats.set_index_stats procedures, for example.

when I try to use Oracle Statistics my cost grows?

I have a big query on for tables and I want to optimize it.
The weird part is that when I get the execution plan without statistics it says something like 1.2M. however if I get statistics for one of the tables involved in the query, my cost lowers to 4k. But if I ask for statistics in the other tables the cost grows to 50k, so I am not sure what's happening.
Can anyone explain a reason why giving more statistics actually increases query cost?
The Cost Based Optimiser uses as much information as you can give it in order to calculate the cost of a plan. If you update (i.e. change) the statistics it uses, then obviously that will change the calculated cost of the plan.
It's not actually the gathering of stats that causes the cost to grow - it's how those stats have changed (whether up or down) that causes the calculated cost to change.
In the absence of statistics, Oracle may use heuristics, guesswork or a quick sample of the data (depending on the settings in your instance).
Generally, the better (more accurate or representative) the statistics, the more accurate the cost calculation.
The cost based optimizer has it's challenges. There are rounding errors that can have quite an impact on decisions that it makes. This is one of the reasons that SQL Plan Stability, introduced in 11g is so nice. Forget about 10g, if you can, or prepare for long debugging sessions.
At the first use, a plan is generated based on the current statistics and executed. If SQL is repeated, the SQL and the plan are stored in a baseline. In the maintenance window, the most expensive plans are re evaluated and in many cases, a better plan can be provided. This is possible because at runtime, the optimizer is limited in the time it is given to search for a plan. In the maintenance window, a lot more time can be spent to find the best plan.
In 11g the peeking is also fixed and a single SQL can now have multiple plans, based on the values of the bind variables.
The query cost is based on many factors, where IO is a very important factor.
How are your tables filled and where are the high water marks located? A table that is filled and emptied constantly can have it's high watermark far away....
There are lots of bugs in the optimizer, lots of options, controlled by hidden parameters. You could try to use them to tweek the behaviour. Upgrading to 11g might be a lot smarter as it solves lots of performance problems for many applications.

How do I correctly performance test SELECT queries with Oracle?

I would like to test two queries to find out their performance as apposed to just looking at the execution plan. I have seen Tom Kyte do this all the time on his website as a way to gather evidence on his theories.
I believe there are many pitfalls in performance testing, for example, when i run a query in SQL developer for the first time, that query might return some fair number. Running that exact same query again, returns instantaneously. There must be some sort of caching on the server or client going on and I understand this is important - however I am only interested in non cached performance.
What are the guidelines to performance test? AND how do I write a performance test which repeats the query? Do i just write an anonymous block & loop? How do i get timing information, averages, medians, std deviations?
Oracle (and other databases) cache queries, which is where you see the behavior you describe. A "hard" parse means there's no query plan for the query, which leaves Oracle to figure out the query plan based on indexes and statistics. A "soft" parse is what happens when you run the identical query afterwards, and receive an instantaneous result, because the query plan exists & Oracle re-uses it. See the Ask Tom question about it for more details.
Be aware of the EXPLAIN output:
With the cost-based optimizer, execution plans can and do change as the underlying costs change. EXPLAIN PLAN output shows how Oracle runs the SQL statement when the statement was explained. This can differ from the plan during actual execution for a SQL statement, because of differences in the execution environment and explain plan environment.
Focusing on the non-cached performance gives a worst-case scenario, but given that caching will occur - non-cached benchmarks aren't realistic in everyday use.
To build off OMG Ponies answer, tuning based on timing is something that's possible, but not realistic. You'd have to start either with a fully-cached buffer cache in every case, or a fully-empty buffer cache, and neither of those is going to be representative of reality - especially if there's no competing load.
When I'm tuning, it's generally against a live system with activity, and I focus on tuning logical I/Os, either through using the extended SQL trace (dbms_monitor.session_trace_enable / dbms_monitor.session_trace_disable) and the tkprof utility, or using SQL*Plus and set autotrace traceonly - which does all the work of the query, but throws the output away, because I'm usually not interested in watching a jillion rows scroll by.
The exact mechanism usually involves bound SQL, using something like the following:
variable :my_bind1 number;
variable :my_bind2 varchar2(30);
begin
:my_bind1 := 42;
:my_bind2 := 'some meaningful string';
end;
/
set timing on;
set autotrace traceonly;
[godawful query with binds]
set autotrace off;
Within the results, I'm looking for the plan I'd expect, a comparative value for sorts - assuming any exist - and most importantly, the number of consistent I/Os. That's how many blocks Oracle had to read in consistent mode to satisfy the query. I can't find the original source of the quote, but I think it's Cary Milsap of Method R.
"Tune your logical I/Os, and your physical I/Os will follow."
In performance tuning, if the only piece of data you look at is wall-clock time, you will only be getting a small part of the whole picture. You need to at least look at the execution plan, as well as IO stats, in order to work out how best to tune the query.
Also, you need to eliminate other causes of performance issues - e.g. if there is a general performance issue across many queries, it might not be the fault of just one of them - it might be an architecture problem, or significant concurrent activity on the database, or even an underlying hardware issue.
I've had similar issues to what you describe before; e.g. a certain type of query which should be very fast was taking 30 seconds to run on the first time, then would settle down to a second or two. As soon as I looked at the execution plan, however, it was obvious that it was using a full table scan, because it couldn't use the unique index that had been created. The first time the query ran, most of the data was loaded into the cache (in fact, there were two levels of cache involved - the database buffer cache, as well as a storage-level cache over the disks) so subsequent full table scans were extremely fast.
What is correctly ?
Since 11g there are a few extra complications to take into account. The optimizer pre peeking has become a lot smarter and sql plan stability has a BIG influence. These two features make the database auto tuning but can also have unexpected effects during performance tests, for example because not all variations of the plans are known and accepted at the beginning of the tests.
This might be the cause that a second test run, the day after the first run, suddenly runs much quicker, without any apparent changes.
Since 11g performance testing is less important, compared to writing logically correct code. For example a Cartesian product and filtering out one distinct value van be functional correct but is in most of the cases wrong code because it fetches more data than logically needed.
If the queries fetches the data that is really needed and is in the correct control structure, have the database processes tune the code during the maintenance windows. In many cases the differences between the test environment and production are such that a comparison can not be safely made.
Don't get me wrong, testing is important but mostly for the logic compared to performance testing before 11g, there are extra steps to be taken.
For nice reading see Oracle® Database 2 Day + Performance Tuning Guide 11g Release 2 (11.2)

Does Oracle 11g automatically index fields frequently used for full table scans?

I have an app using an Oracle 11g database. I have a fairly large table (~50k rows) which I query thus:
SELECT omg, ponies FROM table WHERE x = 4
Field x was not indexed, I discovered. This query happens a lot, but the thing is that the performance wasn't too bad. Adding an index on x did make the queries approximately twice as fast, which is far less than I expected. On, say, MySQL, it would've made the query ten times faster, at the very least. (Edit: I did test this on MySQL, and there saw a huge difference.)
I'm suspecting Oracle adds some kind of automatic index when it detects that I query a non-indexed field often. Am I correct? I can find nothing even implying this in the docs.
As has already been indicated, Oracle11g does NOT dynamically build indexes based on prior experience. It is certainly possible and indeed happens often that adding an index under the right conditions will produce the order of magnitude improvement you note.
But as has also already been noted, 50K (seemingly short?) rows is nothing to Oracle. The Oracle database in fact has a great deal of intelligence that allows it to scan data without indexes most efficiently. Every new release of the Oracle RDBMS gets better at moving large amounts of data. I would suggest to you that the reason Oracle was so close to its "best" timing even without the index as compared to MySQL is that Oracle is just a more intelligent database under the covers.
However, the Oracle RDBMS does have many features that touch upon the subject area you have opened. For example:
10g introduced a feature called AUTOMATIC SQL TUNING which is exposed via a gui known as the SQL TUNING ADVISOR. This feature is intended to analyze queries on its own, in depth and includes the ability to do WHAT-IF analysis of alternative query plans. This includes simulation of indexes which do not actually exist. However, this would not explain any performance differences you have seen because the feature needs to be turned on and it does not actually build any indexes, it only makes recommendations for the DBA to make indexes, among other things.
11g includes AUTOMATIC STATISTICS GATHERING which when enabled will automatically collect statistics on database objects as it deems necessary based on activity on those objects.
Thus the Oracle RDBMS is doing what you have suggested, dynamically altering its environment on its own based on its experience with your workload over time in order to improve performance. Creating indexes on the fly is just not one of the things is does yet. As an aside, this has been hinted to by Oracle in private sevearl times so I figure it is in the works for some future release.
Does Oracle 11g automatically index fields frequently used for full table scans?
No.
In regards the MySQL issue, what storage engine you use can make a difference.
"MyISAM relies on the operating system for caching reads and writes to the data rows while InnoDB does this within the engine itself"
Oracle will cache the table/data rows, so it won't need to hit the disk. depending on the OS and hardware, there's a chance that MySQL MyISAM had to physically read the data off the disk each time.
~50K rows, depending greatly on how big each row is, could conceivably be stored in under 1000 blocks, which could be quickly read into the buffer cache by a full table scan (FTS) in under 50 multi-block reads.
Adding appropriate index(es) will allow queries on the table to scale smoothly as the data volume and/or access frequency goes up.
"Adding an index on x did make the
queries approximately twice as fast,
which is far less than I expected. On,
say, MySQL, it would've made the query
ten times faster, at the very least."
How many distinct values of X are there? Are they clustered in one part of the table or spread evenly throughout it?
Indexes are not some voodoo device: they must obey the laws of physics.
edit
"Duplicates could appear, but as it
is, there are none."
If that column has neither a unique constraint nor a unique index the optimizer will choose an execution path on the basis that there could be duplicate values in that column. This is the value of declaring the data model as accuratley as possible: the provision of metadata to the optimizer. Keeping the statistics up to date is also very useful in this regard.
You should have a look at the estimated execution plan for your query, before and after the index has been created. (Also, make sure that the statistics are up-to-date on your table.) That will tell you what exactly is happening and why performance is what it is.
50k rows is not that big of a table, so I wouldn't be surprised if the performance was decent even without the index. Thus adding the index to equation can't really bring much improvement to query execution speed.

Resources