I've been doing some reading on gathering table and index statistics on Oracle databases but it's left me ... confused.
For the sake of argument, let's assume Oracle 11gR2 as the RDBMS. Regarding gathering table and index statistics, when should it be done, which is the preferred way of doing it, and does Oracle really automatically gather the necessary statistics for us?
Regarding the first point: when should it be done. I've read that, as a rule of thumb, gathering table and index statistics should be done after around 10% of the table's records have been modified (inserted, updated, etc) since the last time the table was analyzed.
Regarding the second point: which is the preferred way of doing it. If we want to calculate both table and index statistics, does executing DBMS_STATS.GATHER_TABLE_STATS with default options, assuming the table is not partitioned, suffice?
Regarding the third point:does Oracle really gather the necessary statistics automatically for us. If this is the case, should i not worry abouth gathering table statistics (see points 1 and 2)?
Thanks in advance.
EDIT: Following the comment by ammoQ, i realized that the question is not clear in what the use case really is, here. My question is about tables that aren' "manipulated" via a user's actions, i.e manually, rather via procedures typically ran by database jobs. Take my example, for instance. My ETL process loads several tables on a daily basis and it does so in approximately 1 hour. Of that 1 hour, about half is spent analyzing the tables themselves. Thus, the tables area analyzed daily, following insertions or updates. This seems overkill, hence the question.
In general, you need to have statistics that are representative (not necessarily accurate) and that give you the right execution plan. By default, Oracle will run a statistics collection job, during the nightly batch window. That may be fine for some applications, but if you have a data warehouse, which presumably includes a regular data load process, then managing the stats should be part of that process. Note that I have said "managing" and not "collecting" statistics. That's just my way of saying that there are other options for statistics in addition to just gathering statistics, although gathering statistics would be where I would start.
There are also things that can be done to optimize statistics gathering, incremental statistics for example. The other thing that is very important is is to use the AUTO Sample size when gathering stats. Do not specify a percentage, not even 100%. The reason is that auto sample size enables a number of internal optimizations and capabilities that are disabled if you do not use AUTO sample size.
So, taking your specific points
10% staleness is pretty random, and is just a number used by the auto stats.
dbms_stats.gather_table_stats() with default values is the preferred method. One parameter that I may change would be the DEGREE, to enable stats gathering in parallel
In 12c, basic stats are gathered on load into an empty table (or empty partition). Stats are built on indexes when indexes are created. So to reiterate what I said above, stats gathering should be part of your ELT process.
I hope that makes sense and helps.
Related
Recently i am reading more about 'Optimizer Statistics Advisor' and I have done some test on my test database. It gave the below recomondation:
Rule Name: UseConcurrent
Rule Description: Use Concurrent preference for Statistics Collection
Finding: The CONCURRENT preference is not used.
Recommendation: Set the CONCURRENT preference.
Example:
dbms_stats.set_global_prefs('CONCURRENT', 'ALL');
Rationale: The system's condition satisfies the use of concurrent statistics
gathering. Using CONCURRENT increases the efficiency of statistics
gathering.
As I understand from oracle base
Concurrent statistics collection is simply the ability to gather statistics on multiple tables, or table partitions, at the same time. This is done using a combination of the job scheduler, advanced queuing and resource manager.
so this recomondation for all the database , not for the tables. What I am saying if I gather statistics for a table such remocommndation will not have any benefits correct ? Also is there a way there 'Optimizer Statistics Advisor' can be implemented on specific tables ?
Unlikely, because one of the overall aims for the optimizer team is that for 99% of customers, the default settings of the optimizer statistics gathering mechanisms will be sufficient to ensure good plans.
This is why (for example)
when you use CTAS, you'll get stats collected automatically on creation, so (in most cases) no need for additional steps before using table in queries.
a table that has queries that use predicates on skewed data will ultimately end up with histograms (via SYS.COL_USAGE$) without user intervention needed
tables without stats will get them automatically in the next run, and until then will be dynamically sampled
tables that change by 10% will get stats picked up again on next run
and so forth. The aim is that (except in niche cases) you'll not need to "worry" about the optimizer statistics process.
We are planning to use a context index for full text search in Oracle 12c standard edition.
The data on which search will run is a JSON containing one Channel post and its replies from a 3rd party tool that is loaded into our database.(basically, all the chats and replies(including other attributes like timestamp/user etc) are stored in this table).
We are expecting about 50k rows of data per year and a daily of 100-150 DMLs per day. Our index is "SYNC ON COMMIT" currently,so what are the recommendations for optimizing the Oracle Text index?
First, let me preface my response with a disclaimer: I am exploring using Oracle Text as part of a POC currently, and my knowledge is somewhat limited as we're still in the research phase. Additionally, our datasets are in the 10s of millions with 100k DML operations daily.
From what I've read, Oracle Docs suggest scheduling both a FULL and REBUILD optimization for indexes which incur DML, so I currently have implemented the following in our dev environment.
execute ctx_ddl.optimize_index('channel_json_ctx_idx', 'FULL'); --run daily
execute ctx_ddl.optimize_index('channel_json_ctx_idx', 'REBUILD'); --run weekly
I cannot imagine with the dataset you've identified that your index will really become that fragmented and cause performance issues. You could probably get away with less frequent optimizations than what I've mentioned.
You could even forego scheduling the optimization and benchmark your performance. If you see it start to degrade, then note the timespan and perhaps count of DML operations for reference. Then run a 'FULL'. Then test performance. If performance improves, create a schedule. If performance does not improve, then run 'REBUILD'. Test performance. Assuming performance improves then you could schedule the 'REBUILD'for that time range and consider adding a more frequent 'FULL'.
I have an app using an Oracle 11g database. I have a fairly large table (~50k rows) which I query thus:
SELECT omg, ponies FROM table WHERE x = 4
Field x was not indexed, I discovered. This query happens a lot, but the thing is that the performance wasn't too bad. Adding an index on x did make the queries approximately twice as fast, which is far less than I expected. On, say, MySQL, it would've made the query ten times faster, at the very least. (Edit: I did test this on MySQL, and there saw a huge difference.)
I'm suspecting Oracle adds some kind of automatic index when it detects that I query a non-indexed field often. Am I correct? I can find nothing even implying this in the docs.
As has already been indicated, Oracle11g does NOT dynamically build indexes based on prior experience. It is certainly possible and indeed happens often that adding an index under the right conditions will produce the order of magnitude improvement you note.
But as has also already been noted, 50K (seemingly short?) rows is nothing to Oracle. The Oracle database in fact has a great deal of intelligence that allows it to scan data without indexes most efficiently. Every new release of the Oracle RDBMS gets better at moving large amounts of data. I would suggest to you that the reason Oracle was so close to its "best" timing even without the index as compared to MySQL is that Oracle is just a more intelligent database under the covers.
However, the Oracle RDBMS does have many features that touch upon the subject area you have opened. For example:
10g introduced a feature called AUTOMATIC SQL TUNING which is exposed via a gui known as the SQL TUNING ADVISOR. This feature is intended to analyze queries on its own, in depth and includes the ability to do WHAT-IF analysis of alternative query plans. This includes simulation of indexes which do not actually exist. However, this would not explain any performance differences you have seen because the feature needs to be turned on and it does not actually build any indexes, it only makes recommendations for the DBA to make indexes, among other things.
11g includes AUTOMATIC STATISTICS GATHERING which when enabled will automatically collect statistics on database objects as it deems necessary based on activity on those objects.
Thus the Oracle RDBMS is doing what you have suggested, dynamically altering its environment on its own based on its experience with your workload over time in order to improve performance. Creating indexes on the fly is just not one of the things is does yet. As an aside, this has been hinted to by Oracle in private sevearl times so I figure it is in the works for some future release.
Does Oracle 11g automatically index fields frequently used for full table scans?
No.
In regards the MySQL issue, what storage engine you use can make a difference.
"MyISAM relies on the operating system for caching reads and writes to the data rows while InnoDB does this within the engine itself"
Oracle will cache the table/data rows, so it won't need to hit the disk. depending on the OS and hardware, there's a chance that MySQL MyISAM had to physically read the data off the disk each time.
~50K rows, depending greatly on how big each row is, could conceivably be stored in under 1000 blocks, which could be quickly read into the buffer cache by a full table scan (FTS) in under 50 multi-block reads.
Adding appropriate index(es) will allow queries on the table to scale smoothly as the data volume and/or access frequency goes up.
"Adding an index on x did make the
queries approximately twice as fast,
which is far less than I expected. On,
say, MySQL, it would've made the query
ten times faster, at the very least."
How many distinct values of X are there? Are they clustered in one part of the table or spread evenly throughout it?
Indexes are not some voodoo device: they must obey the laws of physics.
edit
"Duplicates could appear, but as it
is, there are none."
If that column has neither a unique constraint nor a unique index the optimizer will choose an execution path on the basis that there could be duplicate values in that column. This is the value of declaring the data model as accuratley as possible: the provision of metadata to the optimizer. Keeping the statistics up to date is also very useful in this regard.
You should have a look at the estimated execution plan for your query, before and after the index has been created. (Also, make sure that the statistics are up-to-date on your table.) That will tell you what exactly is happening and why performance is what it is.
50k rows is not that big of a table, so I wouldn't be surprised if the performance was decent even without the index. Thus adding the index to equation can't really bring much improvement to query execution speed.
I am comparing queries my development and production database.
They are both Oracle 9i, but almost every single query has a completely different execution plan depending on the database.
All tables/indexes are the same, but the dev database has about 1/10th the rows for each table.
On production, the query execution plan it picks for most queries is different from development, and the cost is somtimes 1000x higher. Queries on production also seem to be not using the correct indexes for queries in some cases (full table access).
I have ran dbms_utility.analyze schema on both databases recently as well in the hopes the CBO would figure something out.
Is there some other underlying oracle configuration that could be causing this?
I am a developer mostly so this kind of DBA analysis is fairly confusing at first..
1) The first thing I would check is if the database parameters are equivalent across Prod and Dev. If one of the parameters that affects the decisions of the Cost Based Optimizer is different then all bets are off. You can see the parameter in v$parameter view;
2) Having up to date object statistics is great but keep in mind the large difference you pointed out - Dev has 10% of the rows of Prod. This rowcount is factored into how the CBO decides the best way to execute a query. Given the large difference in row counts I would not expect plans to be the same.
Depending on the circumstance the optimizer may choose to Full Table Scan a table with 20,000 rows (Dev)where it may decide an index is lower cost on the table that has 200,000 rows (Prod). (Numbers just for demonstration, the CBO uses costing algorighms for determining what to FTS and what to Index scan, not absolute values).
3) System statistics also factor into the explain plans. This is a set of statistics that represent CPU and disk i/o characteristics. If your hardware on both systems is different then I would expect your System Statistics to be different and this can affect the plans. Some good discussion from Jonathan Lewis here
You can view system stats via the sys.aux_stats$ view.
Now I'm not sure why different plans are a bad thing for you... if stats are up to date and parameters set correctly you should be getting decent performance from either system no matter what the difference in size...
but it is possible to export statistics from your Prod system and load them into your Dev system. This make your Prod statistics available to your Dev database.
Check the Oracle documentation for the DBMS_STATS package, specifically the EXPORT_SCHEMA_STATS, EXPORT_SYSTEM_STATS, IMPORT_SCHEMA_STATS, IMPORT_SYSTEM_STATS procedures. Keep in mind you may need to disable the 10pm nightly statistics jobs on 10g/11g... or you can investigate Locking statistics after import so they are not updated by nightly jobs.
I'm wondering if you have a table that contains 24 million record, how does that impact performance (does each insert/update/delete) take significantly longer to go through?
This is our Audit table, so when we make change changes in other tables we log then on to the Audit tale, does it also take significantly longer to carry out these update as well ?
The right answer is "it depends", of course...
But as far as I get, your concern is in how Audit table affects performance of queries (on other tables) when Audit table grows.
Probably you only insert into your Audit table. Insert time doesn't depend on amount of data already in table. So, no matter how big Audit table is, it should affect performance equally (given that database design isn't incredibly bad).
Of course, select or delete on Audit table itseft can take longer when the table grows.
If I read your question as "does a large Oracle table take longer for IUD operations", generally speaking the answer is no. I think the most impact on the insert/update/delete operations will be felt from the indexes present on this table (more indexes = slower performance for these operations).
However, if your auditing logic needs to look up existing rows in the audit table for procedural logic in some manner that doesn't use primary or unique keys, then there will be a performance impact with a large table.
There are many factors that come into play in regards to how fast an insert/update/delete occurs. For example, how many indexes are on the table? If a table has many indexes and you insert/update the table, it can cause the operation to take longer. How is the data stored in the physical structures of the database (i.e. the tablespaces if you're using Oracle, for example)? Are your indexes and data on separate disks, which can help speed up I/O?
Obviously, if you are writing out audit records then it can affect performance. But in a well-tuned database, it shouldn't be slowing it down enough to where you notice.
The approach I use for audit tables is to use triggers on the main tables and these triggers write out the audit records. But from a performance standpoint, it really depends on a lot of factors as to how fast the updates to your main tables will run.
I would recommend looking at the explain plan output for one of your slow updates if you are using Oracle (other DBs usually have such tools as well, google can help here). You can then see what plan the optimizer is generating and diagnose where the problems could be. You could potentially get a DBA to assist you as well to help figure out what's causing the slowness.
I'd suspect performance will be more related to contention than table size. Generally inserts happen at the 'end' of the table. Sessions inserting into that table have to take a latch on the block while they are writing records to it. During that time other sessions may have to wait for that block (which is where you may see busy buffer wait events).
But really you need to look at the SQLs in question and see what they are waiting on, and whether they are contributing significantly to an unacceptable level of performance. Then determine a course of action based on the specific wait events that are causing the problem.
Check out anything from Cary Milsap on performance tuning
The impact of table size is different for INSERT,DELETE AND UPDATE operation
Insert statement is not impacted much by table size as when we Insert data into table it will add to the next data block available.If there are Indexes on that particular table then Oracle has to search for particular data block before inserting data in that block ,which require search operation that need time.
Delete and Update statements are impacted by Table size as more the data more time is require to search for the particular row to Delete and Update operation