Related
I'm a bit confused by this, I hope someone can help. I'm reading trough Markus Winand's excellent Use The Index Luke book and there's this thing about concatenated indexes.
There is an (EMPLOYEE_ID, SUBSIDIARY_ID) index created so when he queries
SELECT first_name, last_name
FROM employees
WHERE subsidiary_id = 20
This execution plan comes up :
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 478 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 106 | 478 |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SUBSIDIARY_ID"=20)
But here's the thing : on my own employees table (empno, ename, init, job, mgr, bdate, msal, comm, deptno) I created a concatenated index on (ENAME, JOB)
The query select ename from employees where job = 'TRAINER'; gives me the following execution plan :
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4271702361
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 45 | 1 (0)| 00:00:01 |
|* 1 | INDEX SKIP SCAN | ENAME_INDEX | 3 | 45 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - access("JOB"='TRAINER')
filter("JOB"='TRAINER')
So now I'm a bit confused.
1) How come, despite the order, my index was still used?
2) Does an Index Skip Scan work for any concatenated index where I don't use the first column in a where clause?
3) Does an Index Skip Scan have a major impact on performance?
4) How come there's both an access and a filter predicate?
And while I'm here, I have another question
5) Do I need to take any precautions in indexing dates?
Oracle does have the ability, via an index skip scan, to use a composite index when you don't specify the leading column of the index in your predicate. This is generally much less efficient than a regular index scan, however. Conceptually, you can think of it doing an index scan for every distinct value of the leading column of the index. Normally, Oracle would only consider this sort of plan if the leading column had a few distinct values and the trailing column(s) were particularly selective. I wouldn't expect either to be true here-- presumably ename is nearly unique and job is rather less selective. I would expect that a full scan of the table would be more efficient so I would guess that something about your statistics is "wonky". If your table is particularly small, that could certainly cause query plans to be unusual simply because every plan appears to be exceptionally cheap.
In the real world, there are exceptionally few cases that someone sees an "index skip scan" in a query plan and thinks "Great! That's the plan I wanted." It generally means that something has gone wrong but that it may not have gone as far wrong as it might have.
Good question.
Obviously, if you query had contained both ENAME and JOB, then Oracle would have used the index, either with a INDEX RANGE SCAN or INDEX UNIQUE SCAN. However, the leading edge of the index, ENAME, was not provided in the query predicates. So, Oracle's Cost Based Optimizer (CBO) has a choice. It may choose to do a FULL TABLE SCAN (ignoring the index), or an INDEX SKIP SCAN.
I assume you know what a FULL TABLE SCAN is, so I won't go into that.
So, what's an INDEX SKIP SCAN? Well, the CBO has the option, depending on the shape and size of the index, to do a skip scan. This usually happens when there is a relatively small number of distinct values for the leading column in the index. What happens is, Oracle takes the index and effectively breaks it down into several indexes. Suppose the leading column has 4 distinct values, (1-4). So, Oracle looks at the subset of the index where leading column equals 1, and does a range scan of that subset of the index, then does the same for the subsets of the index where leading column equals 2, then 3, then 4. In some cases, depending on how many distinct values the leading column has, and how selective the range scan on the second column is, this access path may be less costly than a FULL TABLE SCAN.
This is another reason why, if all other things are equal, you may want to put less selective columns in the leading edge of the index. (The other main reason being compression.)
Answers for most of your questions: https://oracle-base.com/articles/9i/index-skip-scanning
1) This is exactly what INDEX SKIP SCAN is for.
2) Yes, it can be used, but it depends on your statistics
3) It might and it might not -> depends on your statistics
4) Access path is about selecting data blocks to load and filter is about how row from data block are filtered out.
5) Index on DATE works very similarly to the index on other data types. DATE is 7 bytes long.
I was trying to understand the explain plan in oracle and wanted to know what conditions oracle considers while forming the explain plan
I was testing a simple query in HR schema present in oracle 11g
select * from countries
where region_id in (select region_id from regions where region_name = 'Europe');
When I ran the following queries:
explain plan for
select * from countries
where region_id in (select region_id from regions where region_name = 'Europe');
SELECT * FROM table(dbms_xplan.display(null,null,'basic'));
I got the following output in the explain table:
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | INDEX FULL SCAN | COUNTRY_C_ID_PK |
| 3 | TABLE ACCESS BY INDEX ROWID| REGIONS |
| 4 | INDEX UNIQUE SCAN | REG_ID_PK |
--------------------------------------------------------
Here I observed that the outer query was executed first, i.e countries table was executed first as indicated by Row 3.
Now I added an index on the region_name of the regions table and ran the explain plan again
and got the following output
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | TABLE ACCESS BY INDEX ROWID| REGIONS |
| 3 | INDEX RANGE SCAN | REGIONNAME_REGIONS_IX |
| 4 | INDEX UNIQUE SCAN | COUNTRY_C_ID_PK |
| 5 | INDEX RANGE SCAN | COUNTRIES_REGIONID_IX |
--------------------------------------------------------------
Now my question is:
Shouldn't the inner query be executed first irrespective of whether index is present or not
If the adding an index alters the execution plan, what other features can alter it?
In general case what is the execution process like is it sequential (first executes the join which occurs first and then goes to next join in the query) ?
Thanks in advance for your help.
-Varun
The explain plan relies heavily on the Cost Based Optimizer (CBO). You can help this process out by gathering statistics on the table(s) you are querying against. Now in terms of why would the index change the plan, that is because you have supplied critical information to the CBO that it did not have before. It is the equivalent of me asking you this question:
No index:
"Where is the street?"
With index:
"Where is the street that has a blue house on it?"
The second question gives greater context and is thus faster for you to deduce and you don't have to enumerate all such things that are streets.
You can supply hints to a query i.e.:
select /*+ parallel */ * from table
to give a hint to run this query in parallel.
For the third question, that I imagine is a bit of the Oracle process and is not documented for the world to consume.
In the first question, no not necessarily, it is all cost based.
I don't know if they changed anything in the execution plan outputs in 11g, but are you sure you are showing us the right query? You are selecting all columns (select *) from table countries, but the explain plan does not show any table access? Or does COUNTRY_C_ID_PK include all columns?
I would expect the following plan (without the index):
SELECT
NESTED LOOP
FULL TABLE SCAN (regions)
TABLE ACCESS BY INDEX ROWID (countries)
INDEX RANGE SCAN (COUNTRIES_REGIONID_IX)
With the index in place, I would expect something like this:
SELECT
NESTED LOOP
TABLE ACCESS BY INDEX ROWID (regions)
INDEX RANGE SCAN (REGIONNAME_REGIONS_IX)
TABLE ACCESS BY INDEX ROWID (countries)
INDEX RANGE SCAN (COUNTRIES_REGIONID_IX)
For your questions:
Oracle may drive the query from the inner or outer query as it sees fit depending on the available statistics
There are soo many things that influence the execution plan...
Oracle can only join two tables (or row sources) at a time. The result of a join is also a row source that can be joined to the next table
The cost-based optimiser goes through a few stages, including query transformation. Your query has almost certainly be rewritten by the optimiser to:
select countries.*
from countries join regions on (countries.region_id = regions.region_id)
where regions.region_name = 'Europe';
So the concept of inner and outer queries as represented in the original query may not apply post-transformation. Incidentally, this is why arguments concerning EXISTS () vs IN () are often moot -- the query in both cases can often be rewritten as a join.
Among the information that the optimiser uses (version dependent) are:
Statistics on the table
Statistics on the table columns
Histograms of table column values
Presence of indexes
Size and type of, and statistics on, indexes -- in particular the clustering factor
Presence of constraints -- including not null and check constraints.
Estimated cost of single and multiblock reads and cpu ops per second.
Partitioning
Presence and state of materialised views and/or query rewrite declarations.
Performance of previous versions of the query.
So in short, don't be surprised by anything the optimiser does. It's a very sophisticated piece of kit.
I'm not sure what is going on with my derby database, but I seem to have tables that I can can see from the ij interface...
ij> show tables in derbytest;
TABLE_SCHEM |TABLE_NAME |REMARKS
------------------------------------------------------------------------
DERBYTEST |DATATYPETEST |
DERBYTEST |LOCATION |
DERBYTEST |SUIVI |
now I get the tables description...
ij> describe derbytest.datatypetest;
COLUMN_NAME |TYPE_NAME|DEC&|NUM&|COLUM&|COLUMN_DEF|CHAR_OCTE&|IS_NULL&
------------------------------------------------------------------------------
A_DATE |DATE |0 |10 |10 |NULL |NULL |NO
AN_INT |INTEGER |0 |10 |10 |NULL |NULL |YES
A_DECIMAL |DECIMAL |0 |10 |5 |NULL |NULL |YES
A_STRING |VARCHAR |NULL|NULL|20 |NULL |40 |YES
A_SWITCH |BOOLEAN |NULL|NULL|1 |NULL |NULL |YES
So I guess the table exists, but...
ij> select * from derbytest.datatypetest;
ERREUR XSAI2 : Le conglomÚrat (1á232) demandÚ n'existe pas.
So a quick check to see if the problem is being caused by an 'empty' table..
ij> select * from derbytest.suivi;
OBS |DATE |TIME
-----------------------------------------------------------------------
which to me suggests not!
I'm not sure I fully understand the implication of the error message, i found this in the docs
Table 36. Class XSAI: Store - access.protocol.interface SQLSTATE
Message Text XSAI2 The conglomerate () requested does not
exist.
which isn't amazingly helpful!
I've had a look at the various API docs for the engine, language, testing and tools, but I don't know where to start to look, any pointers would be helpfull.
It may be related to how I am setting the database, so some quick background.
I connecting to this test DB from a java test class. It gathers info from another datasource (XL of flat file) then drops it into this database (or that is the aim). I am only showing a small 'test' that I may to ensure my connection was working.
I have another schema in this file that has more tables, and they all have this same problem.
Have I not correctly closed a conection and lost data?
Have I somehow inadvertently delected a data file, that contained the missing 'conglomerate'
Any help is greately appreciated.
David.
ps I have other test DB's that I haven't checked to see if they have the same problem.
I'm running java 6 on XP.
edit1: Just checked the other testDB I am using, it contains no tables! I obviously cleaned up after myself. Now where did that cat go ??
That is odd behavior, to be sure. I'm not sure what's wrong.
Did you create your table inside a transaction but not yet commit that transaction?
Did you create your table using an in-memory database, in which case it disappears when you close the database?
Did you create your database in one location on the disk, then later connect to a different location with 'create=true', in which case Derby creates a new blank database in the new location?
Did you create your database using one schema, then connect with a different schema?
The error message does suggest some internal damage to the table. The number in parentheses (1a232) is a conglomerate number, which is also used to identify the conglomerate's filename in your filesystem. So you could look in the filesystem and match up the files that are there to the tables in your database (by selecting from sys.sysconglomerates).
You get a conglomerate for the table itself, plus additional conglomerates for each secondary index, both those created by CREATE INDEX and those created by implicit constraints such as UNIQUE or REFERENCES.
If you suspect you have table damage, it's best to restore from a backup. Did you experience any system crashes, disk full events, etc., which might have indicated table damage?
I have a large table that is replicated from Oracle 10.2.0.4 to and Oracle 9i database using MView replication over the network. The master table is about 50GB, 160M rows and there are about 2 - 3M new or updates rows per day.
The master table has a materialized view log created using rowid.
The full refresh of the view works and takes about 5 hours, which we can live with.
However the fast refresh is struggling to keep up. Oracle seems to require two queries against the mlog and master table to do the refresh, the first looks like this:
SELECT /*+ */
DISTINCT "A1"."M_ROW$$"
FROM "GENEVA_ADMIN"."MLOG$_BILLSUMMARY" "A1"
WHERE "A1"."M_ROW$$" <> ALL (SELECT "A2".ROWID
FROM "GENEVA_ADMIN"."BILLSUMMARY" "A2"
WHERE "A2".ROWID = "A1"."M_ROW$$")
AND "A1"."SNAPTIME$$" > :1
AND "A1"."DMLTYPE$$" <> 'I'
The current plan is:
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH UNIQUE | |
| 2 | FILTER | |
| 3 | TABLE ACCESS BY INDEX ROWID| MLOG$_BILLSUMMARY |
| 4 | INDEX RANGE SCAN | MLOG$_BILLSUMMARY_AK1 |
| 5 | TABLE ACCESS BY USER ROWID | BILLSUMMARY |
When there are 3M rows changed, this query literally runs forever - its basically useless. However, if I rewrite it slightly and tell it to full scan the master table and mlog table, it completes in 20 minutes.
The problem is that the above query is coming out of the inners of Oracle and I cannot change it. The problem is really the FILTER operation on line 2 - if I could get it to full scan both tables and hash join / anti-join, I am confident I can get it to complete quick enough, but no receipe of hints I offer will get this query to stop using the FILTER operation - maybe its not even valid. I can use hints to get it to full scan both the tables, but the FILTER operation remains, and I understand it execute long 5 for each row returned by line 3, which will be 2- 3M rows.
Has anyone got any ideas on how to trick this query into the plan I want without changing the actual query, or better, any ways of getting replication to take a more sensible plan for my tablesizes?
Thanks,
Stephen.
As you wrote the queries are part of an internal Oracle mechanism so your tuning options are limited. The fast-refresh algorithm seems to behave differently in the more recent versions, check Alberto Dell’Era’s analysis.
You could also look into SQL profiles (10g feature). With the package DBMS_SQLTUNE this should allow you to tune individual SQL statements.
How do the estimated cardinalities look for the refresh query in comparison to the actual cardinalities? Maybe the MLOG$ table statistics are incorrect.
It might be better to have no statistics on the table and lock them in order to invoke dynamic sampling, which ought to give a reasonable estimation based on the multiple predicates in the query.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I enjoyed the answers and questions about hidden features in sql server
What can you tell us about Oracle?
Hidden tables, inner workings of ..., secret stored procs, package that has good utils...
Since Apex is now part of every Oracle database, these Apex utility functions are useful even if you aren't using Apex:
SQL> declare
2 v_array apex_application_global.vc_arr2;
3 v_string varchar2(2000);
4 begin
5
6 -- Convert delimited string to array
7 v_array := apex_util.string_to_table('alpha,beta,gamma,delta', ',');
8 for i in 1..v_array.count
9 loop
10 dbms_output.put_line(v_array(i));
11 end loop;
12
13 -- Convert array to delimited string
14 v_string := apex_util.table_to_string(v_array,'|');
15 dbms_output.put_line(v_string);
16 end;
17 /
alpha
beta
gamma
delta
alpha|beta|gamma|delta
PL/SQL procedure successfully completed.
"Full table scans are not always bad. Indexes are not always good."
An index-based access method is less efficient at reading rows than a full scan when you measure it in terms of rows accessed per unit of work (typically per logical read). However many tools will interpret a full table scan as a sign of inefficiency.
Take an example where you are reading a few hundred invoices frmo an invoice table and looking up a payment method in a small lookup table. Using an index to probe the lookup table for every invoice probably means three or four logical io's per invoice. However, a full scan of the lookup table in preparation for a hash join from the invoice data would probably require only a couple of logical reads, and the hash join itself would cmoplete in memory at almost no cost at all.
However many tools would look at this and see "full table scan", and tell you to try to use an index. If you do so then you may have just de-tuned your code.
Incidentally over reliance on indexes, as in the above example, causes the "Buffer Cache Hit Ratio" to rise. This is why the BCHR is mostly nonsense as a predictor of system efficiency.
The cardinality hint is mostly undocumented.
explain plan for
select /*+ cardinality(#inner 5000) */ *
from (select /*+ qb_name(inner) */ * from dual)
/
select * from table(dbms_xplan.display)
/
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 10000 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| DUAL | 1 | 2 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
The Buffer Cache Hit Ratio is virtually meaningless as a predictor of system efficiency
You can view table data as of a previous time using Flashback Query, with certain limitations.
Select *
from my_table as of timestamp(timestamp '2008-12-01 15:21:13')
11g has a whole new feature set around preserving historical changes more robustly.
Frequent rebuilding of indexes is almost always a waste of time.
wm_concat works like the the MySql group_concat but it is undocumented.
with data:
-car- -maker-
Corvette Chevy
Taurus Ford
Impala Chevy
Aveo Chevy
select wm_concat(car) Cars, maker from cars
group by maker
gives you:
-Cars- -maker-
Corvette, Impala, Aveo Chevy
Taurus Ford
The OVERLAPS predicate is undocumented.
http://oraclesponge.wordpress.com/2008/06/12/the-overlaps-predicate/
I just found out about the pseudo-column Ora_rowSCN. If you don't set your table up for this, this pcolumn gives you the block SCN. This could be really useful for the emergency, "Oh crap I have no auditing on this table and wonder if someone has changed the data since yesterday."
But even better is if you create the table with Rowdependecies ON. That puts the SCN of the last change on every row. This will help you avoid a "Lost Edit" problem without having to include every column in your query.
IOW, when you app grabs a row for user modification, also select the Ora_rowscn. Then when you post the user's edits, include Ora_rowscn = v_rscn in addition to the unique key in the where clause. If someone has touched the row since you grabbed it, aka lost edit, the update will match zero rows since the ora_rowscn will have changed.
So cool.
If you get the value of PASSWORD column on DBA_USERS you can backup/restore passwords without knowing them:
ALTER USER xxx IDENTIFIED BY VALUES 'xxxx';
Bypass the buffer cache and read straight from disk using direct path reads.
alter session set "_serial_direct_read"=true;
Causes a tablespace (9i) or fast object (10g+) checkpoint, so careful on busy OLTP systems.
More undocumented stuff at http://awads.net/wp/tag/undocumented/
Warning: Use at your own risk.
I don't know if this counts as hidden, but I was pretty happy when I saw this way of quickly seeing what happened with a SQL statement you are tuning.
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM DUAL;
SELECT * FROM TABLE(dbms_xplan.display_cursor( NULL, NULL, 'RUNSTATS_LAST'))
;
PLAN_TABLE_OUTPUT
-----------------------------------------------------
SQL_ID 5z36y0tq909a8, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM DUAL
Plan hash value: 272002086
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------
| 1 | TABLE ACCESS FULL| DUAL | 1 | 1 | 1 |00:00:00.02 | 3 | 2 |
---------------------------------------------------------------------------------------------
12 rows selected.
Where:
E-Rows is estimated rows.
A-Rows is actual rows.
A-Time is actual time.
Buffers is actual buffers.
Where the estimated plan varies from the actual execution by orders of magnitude, you know you have problems.
Not a hidden feature, but Finegrained-access-control (FGAC), also known as row-level security, is something I have used in the past and was impressed with the efficiency of its implementation. If you are looking for something that guarantees you can control the granularity of how rows are exposed to users with differing permissions - regardless of the application that is used to view data (SQL*Plus as well as your web app) - then this a gem.
The built-in fulltext indexing is more widely documented, but still stands out because of its stability (just try running a full-reindexing of fulltext-indexed columns on similar data samples on MS-SQL and Oracle and you'll see the speed difference).
WITH Clause
Snapshot tables. Also found in Oracle Lite, and extremely useful for rolling your own replication mechanism.
#Peter
You can actually bind a variable of type "Cursor" in TOAD, then use it in your statement and it will display the results in the result grid.
exec open :cur for select * from dual;
Q: How to call a stored with a cursor from TOAD?
A: Example, change to your cursor, packagename and stored proc name
declare cursor PCK_UTILS.typ_cursor;
begin
PCK_UTILS.spc_get_encodedstring(
'U',
10000002,
null,
'none',
cursor);
end;
The Model Clause (available for Oracle 10g and up)
WM_CONCAT for string aggregation
Scalar subquery caching is one of the most surprising features in Oracle
-- my_function is NOT deterministic but it is cached!
select t.x, t.y, (select my_function(t.x) from dual)
from t
-- logically equivalent to this, uncached
select t.x, t.y, my_function(t.x) from t
The "caching" subquery above evaluates my_function(t.x) only once per unique value of t.x. If you have large partitions of the same t.x value, this will immensely speed up your queries, even if my_function is not declared DETERMINISTIC. Even if it was DETERMINISTIC, you can safe yourself a possibly expensive SQL -> PL/SQL context switch.
Of course, if my_function is not a deterministic function, then this can lead to wrong results, so be careful!