Faking Oracle statistics? - performance

Is there a way to force the Oracle too "see" a table and associated indexes as being bigger than they really are?
In other words, is there a way to "fake" database statistics, so the cost based optimizer would make decisions on a nearly-empty database, that are closer to decisions that would be made in a real, big production database?
The idea is to be able to experiment (vis-a-vis execution plan) with various indexing / querying / (de)normalization strategies very early in the database design process, without wasting time writing code that fills it with representative test data (most of which will end-up being discarded anyway, since the database design is still not settled).
Importing statistics is out of question, since the production database does not even exist yet.

Sure. The DBMS_STATS package has a number of procedures that allow you to force statistics on objects. There are dbms_stats.set_table_stats and dbms_stats.set_index_stats procedures, for example.

Related

Usability vs Visibility in Oracle

Oracle states that 'Invisible indexes are especially useful for testing the removal of an index before dropping it or using indexes temporarily without affecting the overall application.'
I don't understand why visibility is 'especially' useful for this, wouldn't making an index unusable be especially useful since DML operations are not maintained therefor it resembles dropping an index more so than making it simply invisible. I've never actually worked with this, I'm guessing that making an index invisible/visible is easier than making it usable/unusable because you have to rebuild an index somehow when you make it usable?
It is referring to the impact on your queries via statistics and the optimizer.
Many Oracle databases have complex schemas, user bases, as well as really large tables and indexes. Some even have very controlled schema statistics. Dropping a big index can be an expensive step (time-wise).
The statistics gatherer collects statistics to populate the data dictionary, which is used for the optimizer. Index stats are one of the key inputs to the Cost Based Optimizer. "Faking" the drop will cause the optimizer to act as if the index is gone, and you can then see the impact on the query plans. If you find that the drop wasn't such a good idea, you can immediately revert it. On the other hand, some indexes take hours to build, so you can see how it is valuable to be able to test it out first.

Does Oracle 11g automatically index fields frequently used for full table scans?

I have an app using an Oracle 11g database. I have a fairly large table (~50k rows) which I query thus:
SELECT omg, ponies FROM table WHERE x = 4
Field x was not indexed, I discovered. This query happens a lot, but the thing is that the performance wasn't too bad. Adding an index on x did make the queries approximately twice as fast, which is far less than I expected. On, say, MySQL, it would've made the query ten times faster, at the very least. (Edit: I did test this on MySQL, and there saw a huge difference.)
I'm suspecting Oracle adds some kind of automatic index when it detects that I query a non-indexed field often. Am I correct? I can find nothing even implying this in the docs.
As has already been indicated, Oracle11g does NOT dynamically build indexes based on prior experience. It is certainly possible and indeed happens often that adding an index under the right conditions will produce the order of magnitude improvement you note.
But as has also already been noted, 50K (seemingly short?) rows is nothing to Oracle. The Oracle database in fact has a great deal of intelligence that allows it to scan data without indexes most efficiently. Every new release of the Oracle RDBMS gets better at moving large amounts of data. I would suggest to you that the reason Oracle was so close to its "best" timing even without the index as compared to MySQL is that Oracle is just a more intelligent database under the covers.
However, the Oracle RDBMS does have many features that touch upon the subject area you have opened. For example:
10g introduced a feature called AUTOMATIC SQL TUNING which is exposed via a gui known as the SQL TUNING ADVISOR. This feature is intended to analyze queries on its own, in depth and includes the ability to do WHAT-IF analysis of alternative query plans. This includes simulation of indexes which do not actually exist. However, this would not explain any performance differences you have seen because the feature needs to be turned on and it does not actually build any indexes, it only makes recommendations for the DBA to make indexes, among other things.
11g includes AUTOMATIC STATISTICS GATHERING which when enabled will automatically collect statistics on database objects as it deems necessary based on activity on those objects.
Thus the Oracle RDBMS is doing what you have suggested, dynamically altering its environment on its own based on its experience with your workload over time in order to improve performance. Creating indexes on the fly is just not one of the things is does yet. As an aside, this has been hinted to by Oracle in private sevearl times so I figure it is in the works for some future release.
Does Oracle 11g automatically index fields frequently used for full table scans?
No.
In regards the MySQL issue, what storage engine you use can make a difference.
"MyISAM relies on the operating system for caching reads and writes to the data rows while InnoDB does this within the engine itself"
Oracle will cache the table/data rows, so it won't need to hit the disk. depending on the OS and hardware, there's a chance that MySQL MyISAM had to physically read the data off the disk each time.
~50K rows, depending greatly on how big each row is, could conceivably be stored in under 1000 blocks, which could be quickly read into the buffer cache by a full table scan (FTS) in under 50 multi-block reads.
Adding appropriate index(es) will allow queries on the table to scale smoothly as the data volume and/or access frequency goes up.
"Adding an index on x did make the
queries approximately twice as fast,
which is far less than I expected. On,
say, MySQL, it would've made the query
ten times faster, at the very least."
How many distinct values of X are there? Are they clustered in one part of the table or spread evenly throughout it?
Indexes are not some voodoo device: they must obey the laws of physics.
edit
"Duplicates could appear, but as it
is, there are none."
If that column has neither a unique constraint nor a unique index the optimizer will choose an execution path on the basis that there could be duplicate values in that column. This is the value of declaring the data model as accuratley as possible: the provision of metadata to the optimizer. Keeping the statistics up to date is also very useful in this regard.
You should have a look at the estimated execution plan for your query, before and after the index has been created. (Also, make sure that the statistics are up-to-date on your table.) That will tell you what exactly is happening and why performance is what it is.
50k rows is not that big of a table, so I wouldn't be surprised if the performance was decent even without the index. Thus adding the index to equation can't really bring much improvement to query execution speed.

Oracle: Difference in execution plans between databases

I am comparing queries my development and production database.
They are both Oracle 9i, but almost every single query has a completely different execution plan depending on the database.
All tables/indexes are the same, but the dev database has about 1/10th the rows for each table.
On production, the query execution plan it picks for most queries is different from development, and the cost is somtimes 1000x higher. Queries on production also seem to be not using the correct indexes for queries in some cases (full table access).
I have ran dbms_utility.analyze schema on both databases recently as well in the hopes the CBO would figure something out.
Is there some other underlying oracle configuration that could be causing this?
I am a developer mostly so this kind of DBA analysis is fairly confusing at first..
1) The first thing I would check is if the database parameters are equivalent across Prod and Dev. If one of the parameters that affects the decisions of the Cost Based Optimizer is different then all bets are off. You can see the parameter in v$parameter view;
2) Having up to date object statistics is great but keep in mind the large difference you pointed out - Dev has 10% of the rows of Prod. This rowcount is factored into how the CBO decides the best way to execute a query. Given the large difference in row counts I would not expect plans to be the same.
Depending on the circumstance the optimizer may choose to Full Table Scan a table with 20,000 rows (Dev)where it may decide an index is lower cost on the table that has 200,000 rows (Prod). (Numbers just for demonstration, the CBO uses costing algorighms for determining what to FTS and what to Index scan, not absolute values).
3) System statistics also factor into the explain plans. This is a set of statistics that represent CPU and disk i/o characteristics. If your hardware on both systems is different then I would expect your System Statistics to be different and this can affect the plans. Some good discussion from Jonathan Lewis here
You can view system stats via the sys.aux_stats$ view.
Now I'm not sure why different plans are a bad thing for you... if stats are up to date and parameters set correctly you should be getting decent performance from either system no matter what the difference in size...
but it is possible to export statistics from your Prod system and load them into your Dev system. This make your Prod statistics available to your Dev database.
Check the Oracle documentation for the DBMS_STATS package, specifically the EXPORT_SCHEMA_STATS, EXPORT_SYSTEM_STATS, IMPORT_SCHEMA_STATS, IMPORT_SYSTEM_STATS procedures. Keep in mind you may need to disable the 10pm nightly statistics jobs on 10g/11g... or you can investigate Locking statistics after import so they are not updated by nightly jobs.

What makes Oracle more scalable?

Oracle seems to have a reputation for being more scalable than other RDBMSes. After working with it a bit, I can say that it's more complex than other RDBMSes, but I haven't really seen anything that makes it more scalable than other RDBMSes. But then again, I haven't really worked on it in a whole lot of depth.
What features does Oracle have that are more scalable?
Oracle's RAC architecture is what makes it scalable where it can load balance across nodes and parallel queries can be split up and pushed to other nodes for processing.
Some of the tricks like loading blocks from another node's buffer cache instead of going to disc make performance a lot more scalable.
Also, the maintainability of RAC with rolling upgrades help make the operation of a large system more sane.
There is also a different aspect of scalability - storage scalability. ASM makes increasing the storage capacity very straightforward. A well designed ASM based solution, should scale past the 100s of terabyte size without needing to do anything very special.
Whether these make Oracle more scalable than other RDBMSs, I don't know. But I think I would feel less happy about trying to scale up a non-Oracle database.
Cursor sharing is (or was) a big advantage over the competition.
Basically, the same query plan is used for matching queries. An application will have a standard set of queries it issue (eg get the orders for this customer id). The simple way is to treat every query individually, so if you see 'SELECT * FROM ORDERS WHERE CUSTOMER_ID = :b1', you look at whether table ORDERS has an index on CUSTOMER_ID etc. As a result, you can spend as much work looking up meta data to get a query plan as actually retrieving the data. With simple keyed lookups, a query plan is easy. Complex queries with multiple tables joined on skewed columns are harder.
Oracle has a cache of query plans, and older/less used plans are aged out as new ones are required.
If you don't cache query plans, there's a limit to how smart you can make your optimizer as the more smarts you code into it, the bigger impact you have on each query processed. Caching queries means you only incur that overhead the first time you see the query.
The 'downside' is that for cursor sharing to be effective you need to use bind variables. Some programmers don't realise that and write code that doesn't get shared and then complain that Oracle isn't as fast as mySQL.
Another advantage of Oracle is the UNDO log. As a change is done, the 'old version' of the data is written to an undo log. Other database keep old versions of the record in the same place as the record. This requires VACUUM style cleanup operations or you bump into space and organisation issues. This is most relevant in databases with high update or delete activity.
Also Oracle doesn't have a central lock registry. A lock bit is stored on each individual data record. SELECT doesn't take a lock. In databases where SELECT locks, you could have multiple users reading data and locking each other or preventing updates, introducing scalability limits. Other databases would lock a record when a SELECT was done to ensure that no-one else could change that data item (so it would be consistent if the same query or transaction looked at the table again). Oracle uses UNDO for its read consistency model (ie looking up the data as it appeared at a specific point in time).
Tom Kyte's "Expert Oracle Database Architecture" from Apress does a good job of describing Oracle's architecture, with some comparisons with other rDBMSs. Worth reading.

Is there a major performance gain by using stored procedures?

Is it better to use a stored procedure or doing it the old way with a connection string and all that good stuff? Our system has been running slow lately and our manager wants us to try to see if we can speed things up a little and we were thinking about changing some of the old database calls over to stored procedures. Is it worth it?
The first thing to do is check the database has all the necessary indexes set up. Analyse where your code is slow, and examine the relevant SQL statements and indexes relating to them. See if you can rewrite the SQL statement to be more efficient. Check that you aren't recompiling an SQL (prepared) statement for every iteration in a loop instead of outside it once.
Moving an SQL statement into a stored procedure isn't going to help if it is grossly inefficient in implementation. However the database will know how to best optimise the SQL and it won't need to do it repeatedly. It can also make the client side code cleaner by turning a complex SQL statement into a simple procedure call.
I would take a quick look at Stored Procedures are EVIL.
So long as your calls are consistent the database will store the execution plan (MS SQL anyway). The strongest remaining reason for using stored procedures are for easy and sure security management.
If I were you I'd first be looking for adding indices where required. Also run a profiling tool to examine what is taking long and if that sql needs to changed, e.g. adding more Where clauses or restricting result set.
You should consider caching where you can.
Stored procedures will not make things faster.
However, rearranging your logic will have a huge impact. The tidy, focused transactions that you design when thinking of stored procedures are hugely beneficial.
Also, stored procedures tend to use bind variables, where other programming languages sometimes rely on building SQL statements on-the-fly. A small, fixed set of SQL statements and bind variables is fast. Dynamic SQL statements are slow.
An application which is "running slow lately" does not need coding changes.
Measure. Measure. Measure. "slow" doesn't mean much when it comes to performance tuning. What is slow? Which exact transaction is slow? Which table is slow? Focus.
Control all change. All. What changed? OS patch? RDBMS change? Application change? Something changed to slow things down.
Check for constraints in scale. Is a table slowing down because 80% of the data is history that you use for reporting once a year?
Stored procedures are never the solution to performance problems until you can absolutely point to a specific block of code which is provably faster as a stored procedure.
stored procedures can be really help if they avoid sending huge amounts of data and/or avoid doing roundtrips to the server,so they can be valuable if your application has one of these problems.
After you finish your research you will realize there are two extreme views at opposite side of the spectrum. Historically the Java community has been against store procs due to the availability of frameworks such as hibernate, conversely the .NET community has used more stored procs and this legacy goes as far as the vb5/6 days. Put all this information in context and stay away from the extreme opinions on either side of the coin.
Speed should not be the primary factor to decide against or in favor of stored procs. You can achieve sp performace using inline SQL with hibernate and other frameworks. Consider maintenance and which other programs such as reports, scripts could use the same stored procs used by your application. If your scenario requires multiple consumers for the same SQL code, stored procedures are a good candidate, maintenance will be easier. If this is not the case, and you decide to use inline sql, consider externalizing it in config files to facilitate maintenance.
At the end of the day, what counts is what will make your particular scenario a success for your stakeholders.
If your server is getting noticeably slower in your busy season it may be because of saturation rather than anything inefficent in the database. Basic queuing theory tells us that a server gets hyperbolically slower as it approaches saturation.
The basic relationship is 1/(1-X) where X is the proportion of load. This describes the average queue length or time to wait before being served. Therefore a server that is getting saturated will slow down very rapidly when the load spikes.
A server that is 25% loaded will have an average service time of 1.333K for some constant K (loosely, K is the time for the machine to perform one transaction). A server that is 50% loaded will have an average service time of 2K and a server that is 90% loaded will have an average service time of 10K. Given that the slowdowns are hyperbolic in nature, it often doesn't take a large change in overall load to produce a significant degradation in response time.
Obviously this is somewhat simplistic as the server will be processing multiple requests concurrently (there are more elaborate queuing models for this situation), but the broad principle still applies.
So, if your server is experiencing transient loads that are saturating it, you will experience patches of noticeable slow-down. Note that these slow-downs need only be in one bottlenecked area of the system to slow the whole process down. If you are only experiencing this now in a busy season there is a possibility that your server has simply hit a constraint on a resource, rather than being particularly slow or inefficient.
Note that this possibility is not antithetical to the possibility of inefficiencies in the code. You may find that the way to ease the bottleneck is to tune some of your queries.
In order to tell if the system is bottlenecked, start gathering profiling information. If you can find resources with a large number of waits, this should give you a good starting point.
The final possibility is that you need to upgrade your server. If there are no major inefficiencies in the code (this might well be the case if profiling doesn't indicate any disproportionately large bottlenecks) you may simply need bigger hardware. I have no idea what your volumes are, but don't discount the possibility that you may have outgrown your server.
Yes, stored procs is a step forward towards acheiving good performance. The main reason is that stored procedures can be pre-compiled and their execution plan cached.
You however need to first analyse where your performance bottlenecks are really - so that you approach this exercise in a structured way.
As it has been suggested in one of the responses, try analyse using a profiler tool where the problem is - e.g do you need to create indexes...
Cheers
Like all of the above posts suggest, you first want to clean up your SQL statements, have appropriate indexes. caching can be tricky, I cant comment unless I have more detail on what you are trying to accomplish.
But one thing about sprocs, make sure you dont let it generate dynamic SQL statements
because for one, it will be pointless and it can be subjected to SQL Injection attacks...this has happened in one of the projects I looked into.
I would recommend sprocs for updates mainly, and then select statements.
good luck :)
You can never say in advance. You must do it and measure the difference because in 9 out of 10 cases, the bottleneck is not where you think.
If you use a stored procedure, you don't have to transmit the data. DBs are usually slow at executing [EDIT]complex[/EDIT] stored procedures [EDIT]with loops, higher math, etc[/EDIT]. So it really depends on how much work you would need to do, how slow your network is, how fast the DB executes this particular code, etc.

Resources