I am running a query on a table which only has 283 records. The query is going for Full table scan as no indexed column value is passed in predicate.
Cost is only 12 , but CPU cost is very high - 4,75,189.
What is the reason for high CPU cost even though the table has very less no of records?
What is the difference between Cost & CPU Cost?
PL SQL developer is used as an IDE.
Query -:
SELECT qmh_client, qmh_ip_timestamp, qmh_plant, qmh_key_tsklst_grp,
qmh_grp_cntr, qmh_valid_dt, qmh_tdc_desc, qmh_cert_std,
qmh_tsklst_usage, qmh_statu, qmh_lot_size_from, qmh_lot_size_to,
qmh_tl_usage_uom, qmh_ctyp, qmh_cp_no, qmh_cp_version, qmh_tdform,
qmh_ref_tdc, qmh_licn_no, qmh_guege_len, qmh_ip_activity,
qmh_cp_activity, qmh_ip_sts_code, qmh_cp_sts_code, qmh_ltext_act,
qmh_ltxt_sts_code, qph_ip_id, qmh_ip_mess, qmh_cp_id, qmh_cp_mess,
qmh_rfd, qmh_smtp_addr, qmh_crt_time, qmh_crt_date, qmh_crt_by,
qmh_ip_upd_time, qmh_ip_upd_date, qmh_ip_upd_by, qmh_cp_upd_time,
qmh_upd_date, qmh_cp_upd_by, qmh_clas_sts_code, qmh_clas_id,
qmh_clas_mess, qmh_clas_upd_time, qmh_clas_upd_date,
qmh_clas_upd_by, qmh_prd_ind, qmh_tdc_type, qmh_pi_status
FROM ipdmdm.t_qm_insp_hdr
WHERE qmh_pi_status = 'N'
FOR UPDATE
According to the manual CPU_COST and IO_COST are measured in different ways. IO_COST is "proportional to the number of data blocks read by the operation", CPU_COST is "proportional to the number of machine cycles required for the operation".
The difference between the costs should not be too surprising since many database operations require orders of magnitude more CPU cycles than disk reads. A simple test on my PC produced similar results:
create table test1(a char(1000));
insert into test1 select level from dual connect by level <= 283;
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
explain plan set statement_id = 'cost test' for select * from test1 for update;
select cpu_cost, io_cost from plan_table where statement_id = 'cost test' and id = 0;
CPU_COST IO_COST
-------- -------
348672 13
Even though it's called the Cost Based Optimizer, the cost is usually not a helpful metric when evaluating execution plans. The "Operation" and "Rows" columns are much more useful.
Also, if you're interested in explain plans, stop using the IDE's crippled view of them and use the text version that Oracle supplies. Use explain plan for select ... and select * from table(dbms_xplan.display);. PL/SQL Developer is a great tool, but its explain plan window is missing critical information (the Notes section) and has some bugs (it does not include session settings).
Check this
COST: Cost of the operation as estimated by the optimizer’s query
approach. Cost is not determined for table access operations. The
value of this column does not have any particular unit of measurement;
it is merely a weighted value used to compare costs of execution
plans. The value of this column is a function of the CPU_COST and
IO_COST columns.
CPU_COST: CPU cost of the operation as estimated by the query
optimizer’s approach. The value of this column is proportional to the
number of machine cycles required for the operation. For statements
that use the rule-based approach, this column is null.
You can refer this article to understand the What is the cost column in an explain plan?
Depending on your release and setting for the hidden parameter
_optimizer_cost_model (cpu or io), the cost is taken from the cpu_cost and io_cost columns in the plan table (, in turn, estimates from
sys.aux_stats$. The "cost" column is not any particular unit of
measurement, it is a weighted average of the costs derived from the
cost-based decision tree generated when the SQL statement is bring
processed. The cost column is essentially an estimate of the run-time
for a given operation.
Related
This query ONE
SELECT * FROM TEST_RANDOM WHERE EMPNO >= '236400' AND EMPNO <= '456000';
in the Oracle Database is running with cost 1927.
And this query TWO :
SELECT * FROM TEST_RANDOM WHERE EMPNO = '236400';
is running with cost 1924.
This table TEST_RANDOM has 1.000.000 rows, I created this table so:
Create table test_normal (empno varchar2(10), ename varchar2(30), sal number(10), faixa varchar2(10));
Begin
For i in 1..1000000
Loop
Insert into test_normal values(
to_char(i), dbms_random.string('U',30),
dbms_random.value(1000,7000), 'ND'
);
If mod(i, **10000)** = 0 then
Commit;
End if;
End loop;
End;
Create table test_random
as
select /*+ append */ * from test_normal order by dbms_random.random;
I created a B-Tree index in the field EMPNO so:
CREATE INDEX IDX_RANDOM_1 ON TEST_RANDOM (EMPNO);
After this, the query TWO improved, and the cost changed to 4.
But the query ONE did not improve, because Oracle Database ignored it, for some reason Oracle Database understood that this query is not worth it to use the plan execution with the index...
My question is: What could we do to improve this query ONE performance? Because the solution of the index did not solve and its cost continues to be expensive...
For this query, Oracle does not use an index because the optimizer correctly estimated the number of rows and correctly decided that a full table scan would be faster or more efficient.
B-Tree indexes are generally only useful when they can be used to return a small percentage of rows, and your first query returns about 25% of the rows. It's hard to say what the ideal percentage of rows is, but 25% is almost always too large. On my system, the execution plan changes from full table scan to index range scan when the query returns 1723 rows - but that number will likely be different for you.
There are several reasons why full table scans are better than indexes for retrieving a large percentage of rows:
Single-block versus multi-block: In Oracle, like in almost all computer systems, it can be significantly faster to retrieve multiple chunks of data at a time (sequential access) instead of retrieving one random chunk of data at a time (random access).
Clustering factor: Oracle stores all rows in blocks, which are usually 8KB large and are analogous to pages. If the index is very inefficient, like if the index is built on randomly sorted data and two sequential reads rarely read from the same block, then reading 25% of all the rows from an index may still require reading 100% of the table blocks.
Algorithmic complexity: A full table scan reads the data as a simple heap, which is O(N). A single index access is much faster, at O(LOG(N)). But as the number of index accesses increases, the benefit wears off, until eventually using the index is O(N * LOG(N)).
Some things you can do to improve performance without indexes:
Partitioning: Partitioning is the idea solution for retrieving a large percentage of data from a table (but the option must be licensed). With partitioning, Oracle splits the logical table into multiple physical tables, and the query can only read from the required partitions. This can create the benefit of multi-block reads, but still limits the amount of data scanned.
Parallelism: Make Oracle work harder instead of smarter. But parallelism probably isn't worth the trouble for such a small table.
Materialized views: Create tables that only store exactly what you need.
Ordering the data: Improve the index clustering factor by sorting the table data by the relevant column instead of doing it randomly. In your case, replace order by dbms_random.random with order by empno. Depending on your version and platform, you may be able to use a materialized zone map to keep the table sorted.
Compression: Shrink the table to make it faster to read the whole thing.
That's quite a lot of information for what is possibly a minor performance problem. Before you go down this rabbit hole, it might be worth asking if you actually have an important performance problem as measured by a clock or by resource consumption, or are you just fishing for performance problems by looking at the somewhat meaningless cost metric?
I have a query where I am doing join between two tables and there are many filters.
I run the explain plan, I see
cost:214, Bytes: 6154, Cardinality:67
To reduce the cost, I created a function based index on one column which was being used earlier too as one of the filters in the query. I gathered table stats and then gathered index stats. Now, I ran the explain plan again. This time I see
cost:214, Bytes: 122604, Cardinality:1202
My question: What is the relation between Cost & Bytes? Why the number of Bytes and Cardinality increased? Shouldn't creating function based index should have reduced Cost a little?
Can someone please help me understand this?
Cost is documented in the SQL Tuning Guide (but not the number of bytes):
The optimizer cost model accounts for the machine resources that a
query is predicted to use.
The cost is an internal numeric measure that represents the estimated
resource usage for a plan. The cost is specific to a query in an
optimizer environment. To estimate cost, the optimizer considers
factors such as the following:
System resources, which includes estimated I/O, CPU, and memory
Estimated number of rows returned (cardinality)
Size of the initial data sets
Distribution of the data
Access structures
Note:
The cost is an internal measure that the optimizer uses to compare
different plans for the same query. You cannot tune or change cost.
The execution time is a function of the cost, but cost does not equate
directly to time. For example, if the plan for query A has a lower
cost than the plan for query B, then the following outcomes are
possible:
A executes faster than B.
A executes slower than B.
A executes in the same amount of time as B.
Therefore, you cannot compare the costs of different queries with one
another. Also, you cannot compare the costs of semantically equivalent
queries that use different optimizer modes.
See Query optimizer concepts in SQL tuning guide.
Don't forget that EXPLAIN PLAN is only the estimated plan.
To check what the actual plan is really doing use DBMS_XPLAN.DISPLAY_CURSOR or SQL trace.
I would like to know if the explain plan cost in Oracle always determine if a specific query is most efficient (in terms of performance, resource usage, disk access, ect) than other?
My question is because I have two tables.
One with a local partition index.
The other with global partition index.
Both have the same structure and the same data. Then i have a query, the cost is significantly different. The global partition index one has a very small cost, and the local partition index one a very high cost.However when i run the queries in SQL Developer, the response time is higher for the table with global partition index.
Thanks.
Cost is not comparable across two different SQL statements. it cannot and should not be inferred that higher cost = higher runtime or IO/CPU usage.
Cost is just an internal ranking that oracle applies when its calculating all possible sql plans for a specific sql statement.
As you've seen, a low cost for one sql is longer to run than a high cost sql. the cost numbers are affected by a great many things such as sql hints (first_rows etc), table statistics, system level statistics (load stats, or setting different numbers on optimizer_index_cost_adj/optimizer_index_caching etc).
Always tune SQL by IO/CPU (ie actual resource usage). ignore "cost" really.
also see here: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:313416745628
I have a question regarding the columns LOW_VALUE and HIGH_VALUE in the view USER_TAB_COLUMNS (or equivalent).
I was just wondering if these values are always correct, as in, if you have a column with 500k rows with value 1, 500k rows with value of 5 and 1 row with a value of 1000, the LOW_VALUE should be 1 (after you convert the raw figure) and HIGH_VALUE should be 1000 (after you convert the raw figure). However, are there any circumstances where Oracle would 'miss' this outlier value and instead have 5 for HIGH_VALUE?
Also, what is the purpose of these 2 values?
Thanks
As with all optimizer-related statistics, these values are estimates with varying degrees of accuracy from whenever statistics were gathered on the table. As such, it is entirely expected that they would be close but not completely accurate and entirely possible that they would be wildly incorrect.
When you gather statistics, you specify a percentage of the rows (or blocks) that should be sampled. It is possible to specify a 100% sample size, in which case Oracle would examine every row, but it is relatively rare to ask for a sample size nearly that large. It is much more efficient to ask for a much smaller sample size (either explicitly or by letting Oracle automatically determine the sample size). If your sample of rows happens not to include the one row with a value of 1000, the HIGH_VALUE would not be 1000, the HIGH_VALUE would be 5 assuming that is the largest value that the sample saw.
Statistics are also a snapshot in time. By default, 11g will gather statistics every night on objects that have undergone enough change since the last time that statistics were gathered on that object to warrant refreshing the statistics though you can disable that job or change the parameters. So if you gather statistics today with a 100% sample size in order to get a HIGH_VALUE of 1000 and then insert one row with a value of 3000 and never modify the table again, it's likely that Oracle would never gather statistics on that table again (unless you explicitly requested it to) and that the HIGH_VALUE would remain 1000 forever.
Assuming that there is no histogram on the column (which is another whole discussion), Oracle uses the LOW_VALUE and HIGH_VALUE to estimate how selective a particular predicate would be. If the LOW_VALUE is 1, the HIGH_VALUE is 1000, there are 1,000,000 rows in the table, there is no histogram on the column, and you run a query like
SELECT *
FROM some_table
WHERE column_name BETWEEN 100 and 101
Oracle will guess that the data is uniformly distributed between 1 and 1000 so that this query would return 1,000 rows (multiplying the number of rows in the table (1 million) by the fraction of the range the query covers (1/1000)). This selectivity estimate, in turn, would drive the optimizer's determination of whether it would be more efficient to use an index or to do a table scan, what join methods to use, what order to evaluate the various predicates, etc. If you have a non-uniform distribution of data, however, you'll likely end up with a histogram on the column which gives Oracle more detailed information about the distribution of data in the column than the LOW_VALUE and HIGH_VALUE provide.
I'm trying to optimize a query but don't quite understand some of the information returned from Explain Plan. Can anyone tell me the significance of the OPTIONS and COST columns? In the OPTIONS column, I only see the word FULL. In the COST column, I can deduce that a lower cost means a faster query. But what exactly does the cost value represent and what is an acceptable threshold?
The output of EXPLAIN PLAN is a debug output from Oracle's query optimiser. The COST is the final output of the Cost-based optimiser (CBO), the purpose of which is to select which of the many different possible plans should be used to run the query. The CBO calculates a relative Cost for each plan, then picks the plan with the lowest cost.
(Note: in some cases the CBO does not have enough time to evaluate every possible plan; in these cases it just picks the plan with the lowest cost found so far)
In general, one of the biggest contributors to a slow query is the number of rows read to service the query (blocks, to be more precise), so the cost will be based in part on the number of rows the optimiser estimates will need to be read.
For example, lets say you have the following query:
SELECT emp_id FROM employees WHERE months_of_service = 6;
(The months_of_service column has a NOT NULL constraint on it and an ordinary index on it.)
There are two basic plans the optimiser might choose here:
Plan 1: Read all the rows from the "employees" table, for each, check if the predicate is true (months_of_service=6).
Plan 2: Read the index where months_of_service=6 (this results in a set of ROWIDs), then access the table based on the ROWIDs returned.
Let's imagine the "employees" table has 1,000,000 (1 million) rows. Let's further imagine that the values for months_of_service range from 1 to 12 and are fairly evenly distributed for some reason.
The cost of Plan 1, which involves a FULL SCAN, will be the cost of reading all the rows in the employees table, which is approximately equal to 1,000,000; but since Oracle will often be able to read the blocks using multi-block reads, the actual cost will be lower (depending on how your database is set up) - e.g. let's imagine the multi-block read count is 10 - the calculated cost of the full scan will be 1,000,000 / 10; Overal cost = 100,000.
The cost of Plan 2, which involves an INDEX RANGE SCAN and a table lookup by ROWID, will be the cost of scanning the index, plus the cost of accessing the table by ROWID. I won't go into how index range scans are costed but let's imagine the cost of the index range scan is 1 per row; we expect to find a match in 1 out of 12 cases, so the cost of the index scan is 1,000,000 / 12 = 83,333; plus the cost of accessing the table (assume 1 block read per access, we can't use multi-block reads here) = 83,333; Overall cost = 166,666.
As you can see, the cost of Plan 1 (full scan) is LESS than the cost of Plan 2 (index scan + access by rowid) - which means the CBO would choose the FULL scan.
If the assumptions made here by the optimiser are true, then in fact Plan 1 will be preferable and much more efficient than Plan 2 - which disproves the myth that FULL scans are "always bad".
The results would be quite different if the optimiser goal was FIRST_ROWS(n) instead of ALL_ROWS - in which case the optimiser would favour Plan 2 because it will often return the first few rows quicker, at the cost of being less efficient for the entire query.
The CBO builds a decision tree, estimating the costs of each possible execution path available per query. The costs are set by the CPU_cost or I/O_cost parameter set on the instance. And the CBO estimates the costs, as best it can with the existing statistics of the tables and indexes that the query will use. You should not tune your query based on cost alone. Cost allows you to understand WHY the optimizer is doing what it does. Without cost you could figure out why the optimizer chose the plan it did. Lower cost does not mean a faster query. There are cases where this is true and there will be cases where this is wrong. Cost is based on your table stats and if they are wrong the cost is going to be wrong.
When tuning your query, you should take a look at the cardinality and the number of rows of each step. Do they make sense? Is the cardinality the optimizer is assuming correct? Is the rows being return reasonable. If the information present is wrong then its very likely the optimizer doesn't have the proper information it needs to make the right decision. This could be due to stale or missing statistics on the table and index as well as cpu-stats. Its best to have stats updated when tuning a query to get the most out of the optimizer. Knowing your schema is also of great help when tuning. Knowing when the optimizer chose a really bad decision and pointing it in the correct path with a small hint can save a load of time.
Here is a reference for using EXPLAIN PLAN with Oracle: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm), with specific information about the columns found here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm#i18300
Your mention of 'FULL' indicates to me that the query is doing a full-table scan to find your data. This is okay, in certain situations, otherwise an indicator of poor indexing / query writing.
Generally, with explain plans, you want to ensure your query is utilizing keys, thus Oracle can find the data you're looking for with accessing the least number of rows possible. Ultimately, you can sometime only get so far with the architecture of your tables. If the costs remain too high, you may have to think about adjusting the layout of your schema to be more performance based.
In recent Oracle versions the COST represent the amount of time that the optimiser expects the query to take, expressed in units of the amount of time required for a single block read.
So if a single block read takes 2ms and the cost is expressed as "250", the query could be expected to take 500ms to complete.
The optimiser calculates the cost based on the estimated number of single block and multiblock reads, and the CPU consumption of the plan. the latter can be very useful in minimising the cost by performing certain operations before others to try and avoid high CPU cost operations.
This raises the question of how the optimiser knows how long operations take. recent Oracle versions allow the collections of "system statistics", which are definitely not to be confused with statistics on tables or indexes. The system statistics are measurements of the performance of the hardware, mostly importantly:
How long a single block read takes
How long a multiblock read takes
How large a multiblock read is (often different to the maximum possible due to table extents being smaller than the maximum, and other reasons).
CPU performance
These numbers can vary greatly according to the operating environment of the system, and different sets of statistics can be stored for "daytime OLTP" operations and "nighttime batch reporting" operations, and for "end of month reporting" if you wish.
Given these sets of statistics, a given query execution plan can be evaluated for cost in different operating environments, which might promote use of full table scans at some times or index scans at others.
The cost is not perfect, but the optimiser gets better at self-monitoring with every release, and can feedback the actual cost in comparison to the estimated cost in order to make better decisions for the future. this also makes it rather more difficult to predict.
Note that the cost is not necessarily wall clock time, as parallel query operations consume a total amount of time across multiple threads.
In older versions of Oracle the cost of CPU operations was ignored, and the relative costs of single and multiblock reads were effectively fixed according to init parameters.
FULL is probably referring to a full table scan, which means that no indexes are in use. This is usually indicating that something is wrong, unless the query is supposed to use all the rows in a table.
Cost is a number that signals the sum of the different loads, processor, memory, disk, IO, and high numbers are typically bad. The numbers are added up when moving to the root of the plan, and each branch should be examined to locate the bottlenecks.
You may also want to query v$sql and v$session to get statistics about SQL statements, and this will have detailed metrics for all kind of resources, timings and executions.