Gather statistics on some Oracle tables take a long time. These tables have a record count ranging from 2 Million Records to 9 Million Records. The tables have around 5-6 indexes on each one of them.
The Oracle version is
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
"CORE 10.2.0.1.0 Production"
TNS for IBM/AIX RISC System/6000: Version 10.2.0.1.0 - Productio
NLSRTL Version 10.2.0.1.0 - Production
The gather stats syntax is
dbms_stats.gather_table_stats('OWNER', 'TABLE_NAME', estimate_percent => 100, method_opt => 'for all columns size auto', cascade => true);
We cannot change the parameters of the above gather stats command as the application vendor insists as these parameters be used.
Please let me know if there is anything we could do to reduce the time taken by the Gather Statistics JOB. I noticed that the JOB when runs, causes the app performance to reduce a little and this is not acceptable.
I also noticed that some of the tables occupy a lot of space on the Disk but the real data (by doing a estimate of record count multiplied by average row length) is a lot lesser. Looks like the tables need to be Compacted/Shrunk/High Water Mark Reset etc.
Some tables for example occupy 9 GB on Disk but Real Data is shown to be 1.2 GB . . . Almost 70% Space Wasted in Fragmentation.
Will a ALTER TABLE Shrink reduce the overall time taken to collect statistics on the table? Is it recommended?
Yes, shrinking space will help. If you can afford to take the application down for a while, and the tables aren't going to just bounce right back to their previous size, then shrinking space is always a good idea.
Other than that, if parameters can't change there's not much you can do to improve things. Setting the DEGREE parameter can signficantly improve performance in some cases. I know you said you can't change any parameters, but I don't see how they could complain about that one. Although it may make the job run faster it would probably impact the system performance even more (but for a shorter period of time).
The best solution might be to upgrade to 11g, where any sane application would use estimate_percent => dbms_stats.auto_sample_size. Statistics collection in 11g is much better than 10g. With features like improved auto sample algorithms, incremental statistics, setting statistics preferences, and concurrent statistics, gathering statistics is often much faster and more accurate.
Related
I've been doing some reading on gathering table and index statistics on Oracle databases but it's left me ... confused.
For the sake of argument, let's assume Oracle 11gR2 as the RDBMS. Regarding gathering table and index statistics, when should it be done, which is the preferred way of doing it, and does Oracle really automatically gather the necessary statistics for us?
Regarding the first point: when should it be done. I've read that, as a rule of thumb, gathering table and index statistics should be done after around 10% of the table's records have been modified (inserted, updated, etc) since the last time the table was analyzed.
Regarding the second point: which is the preferred way of doing it. If we want to calculate both table and index statistics, does executing DBMS_STATS.GATHER_TABLE_STATS with default options, assuming the table is not partitioned, suffice?
Regarding the third point:does Oracle really gather the necessary statistics automatically for us. If this is the case, should i not worry abouth gathering table statistics (see points 1 and 2)?
Thanks in advance.
EDIT: Following the comment by ammoQ, i realized that the question is not clear in what the use case really is, here. My question is about tables that aren' "manipulated" via a user's actions, i.e manually, rather via procedures typically ran by database jobs. Take my example, for instance. My ETL process loads several tables on a daily basis and it does so in approximately 1 hour. Of that 1 hour, about half is spent analyzing the tables themselves. Thus, the tables area analyzed daily, following insertions or updates. This seems overkill, hence the question.
In general, you need to have statistics that are representative (not necessarily accurate) and that give you the right execution plan. By default, Oracle will run a statistics collection job, during the nightly batch window. That may be fine for some applications, but if you have a data warehouse, which presumably includes a regular data load process, then managing the stats should be part of that process. Note that I have said "managing" and not "collecting" statistics. That's just my way of saying that there are other options for statistics in addition to just gathering statistics, although gathering statistics would be where I would start.
There are also things that can be done to optimize statistics gathering, incremental statistics for example. The other thing that is very important is is to use the AUTO Sample size when gathering stats. Do not specify a percentage, not even 100%. The reason is that auto sample size enables a number of internal optimizations and capabilities that are disabled if you do not use AUTO sample size.
So, taking your specific points
10% staleness is pretty random, and is just a number used by the auto stats.
dbms_stats.gather_table_stats() with default values is the preferred method. One parameter that I may change would be the DEGREE, to enable stats gathering in parallel
In 12c, basic stats are gathered on load into an empty table (or empty partition). Stats are built on indexes when indexes are created. So to reiterate what I said above, stats gathering should be part of your ELT process.
I hope that makes sense and helps.
Is there any performance overhead using Oracle Change Notification on a database table having fairly large size and on which 5k-8k operations are performed daily ?
After running it continuously for two days I have found few 'java.lang.IndexOutOfBoundException'
8k operations per day is trivial. My 10 year old laptop can do that. I suspect your java error is not performance/overhead related.
I have one table which are number of rows '7515966' and this table depend on another tables. We create View for generating SSRS reports.
Now size of View is increase so that performance issue occur on report.
We start archiving data for large table. but i can't understand which methodology use please guide us..
Thank you...
Table partitioning in 2012 is only available in Enterprise Edition. See https://msdn.microsoft.com/en-us/library/cc645993(v=sql.110).aspx for details on what's available for each edition.
7million rows is not a lot of rows for SQL Server, we routinely deal with billions of rows. However, as your rows get into the 10s of millions range, you'll probably expose various performance gaps in your system. E.g. are your queries efficiently written so they only touch the rows they need, do you have the right indexes, are statistics up to date, is tempdb optimized, etc...
One common weak link in 9 out of 10 databases (regardless of make) I've worked with is the storage subsystem. Is yours able to keep up with the large data set you need to work with. Storage for databases should be designed and configured based on throughput, concurrency and latency requirements first. Space generally the last thing to worry about once the other requirements, including HA/DR, are met.
If you have deficiencies in your current system, you can pay for the expensive enterprise edition and implement table partitioning but you will likely still suffer performance problems soon after, if not immediately.
I'm trying to improve the execution time of a view of an Oracle database, which takes ages to load, and it involves a table which has 15,352,595 records. I'm considering to gather statistics on it coz I suspect the poor performance is due to stale stats.
However, I'm worried that this will burden the server a lot and I'm not very confident that its harddisks (or whatever hardware components) can withstand heavy workloads without breaking.
Is that a real thing that should be worried?
You can gather statistics with low value of estimate percentage.
I am comparing queries my development and production database.
They are both Oracle 9i, but almost every single query has a completely different execution plan depending on the database.
All tables/indexes are the same, but the dev database has about 1/10th the rows for each table.
On production, the query execution plan it picks for most queries is different from development, and the cost is somtimes 1000x higher. Queries on production also seem to be not using the correct indexes for queries in some cases (full table access).
I have ran dbms_utility.analyze schema on both databases recently as well in the hopes the CBO would figure something out.
Is there some other underlying oracle configuration that could be causing this?
I am a developer mostly so this kind of DBA analysis is fairly confusing at first..
1) The first thing I would check is if the database parameters are equivalent across Prod and Dev. If one of the parameters that affects the decisions of the Cost Based Optimizer is different then all bets are off. You can see the parameter in v$parameter view;
2) Having up to date object statistics is great but keep in mind the large difference you pointed out - Dev has 10% of the rows of Prod. This rowcount is factored into how the CBO decides the best way to execute a query. Given the large difference in row counts I would not expect plans to be the same.
Depending on the circumstance the optimizer may choose to Full Table Scan a table with 20,000 rows (Dev)where it may decide an index is lower cost on the table that has 200,000 rows (Prod). (Numbers just for demonstration, the CBO uses costing algorighms for determining what to FTS and what to Index scan, not absolute values).
3) System statistics also factor into the explain plans. This is a set of statistics that represent CPU and disk i/o characteristics. If your hardware on both systems is different then I would expect your System Statistics to be different and this can affect the plans. Some good discussion from Jonathan Lewis here
You can view system stats via the sys.aux_stats$ view.
Now I'm not sure why different plans are a bad thing for you... if stats are up to date and parameters set correctly you should be getting decent performance from either system no matter what the difference in size...
but it is possible to export statistics from your Prod system and load them into your Dev system. This make your Prod statistics available to your Dev database.
Check the Oracle documentation for the DBMS_STATS package, specifically the EXPORT_SCHEMA_STATS, EXPORT_SYSTEM_STATS, IMPORT_SCHEMA_STATS, IMPORT_SYSTEM_STATS procedures. Keep in mind you may need to disable the 10pm nightly statistics jobs on 10g/11g... or you can investigate Locking statistics after import so they are not updated by nightly jobs.