query taking too long to execute from plsql block - oracle

This query when executed alone it takes 1 second to executed when the same query is executed through procedure it is taking 20 seconds, please help me on this
SELECT * FROM
(SELECT TAB1.*,ROWNUM ROWNUMM FROM
(SELECT wh.workitem_id, wh.workitem_priority, wh.workitem_type_id, wt.workitem_type_nm,
wh.workitem_status_id, ws.workitem_status_nm, wh.analyst_group_id,
ag.analyst_group_nm, wh.owner_uuid, earnings_estimate.pr_get_name_from_uuid(owner_uuid) owner_name,
wh.create_user_id, earnings_estimate.pr_get_name_from_uuid( wh.create_user_id) create_name, wh.create_ts,
wh.update_user_id,earnings_estimate.pr_get_name_from_uuid(wh.update_user_id) update_name, wh.update_ts, wh.bb_ticker_id, wh.node_id,
wh.eqcv_analyst_uuid, earnings_estimate.pr_get_name_from_uuid( wh.eqcv_analyst_uuid) eqcv_analyst_name,
WH.WORKITEM_NOTE,Wh.PACKAGE_ID ,Wh.COVERAGE_STATUS_NUM ,CS.COVERAGE_STATUS_CD ,Wh.COVERAGE_REC_NUM,I.INDUSTRY_CD INDUSTRY_CODE,I.INDUSTRY_NM
INDUSTRY_NAME,WOT.WORKITEM_OUTLIER_TYPE_NM as WORKITEM_SUBTYPE_NM
,count(1) over() AS total_count,bro.BB_ID BROKER_BB_ID,bro.BROKER_NM BROKER_NAME, wh.assigned_analyst_uuid,earnings_estimate.pr_get_name_from_uuid(wh.assigned_analyst_uuid)
assigned_analyst_name
FROM earnings_estimate.workitem_type wt,
earnings_estimate.workitem_status ws,
earnings_estimate.workitem_outlier_type wot,
(SELECT * FROM (
SELECT WH.ASSIGNED_ANALYST_UUID,WH.DEFERRED_TO_DT,WH.WORKITEM_NOTE,WH.UPDATE_USER_ID,EARNINGS_ESTIMATE.PR_GET_NAME_FROM_UUID(WH.UPDATE_USER_ID)
UPDATE_NAME, WH.UPDATE_TS,WH.OWNER_UUID, EARNINGS_ESTIMATE.PR_GET_NAME_FROM_UUID(OWNER_UUID)
OWNER_NAME,WH.ANALYST_GROUP_ID,WH.WORKITEM_STATUS_ID,WH.WORKITEM_PRIORITY,EARNINGS_ESTIMATE.PR_GET_NAME_FROM_UUID( WI.CREATE_USER_ID) CREATE_NAME, WI.CREATE_TS,
wi.create_user_id,wi.workitem_type_id,wi.workitem_id,RANK() OVER (PARTITION BY WH.WORKITEM_ID ORDER BY WH.CREATE_TS DESC NULLS LAST, ROWNUM) R,
wo.bb_ticker_id, wo.node_id,wo.eqcv_analyst_uuid,
WO.PACKAGE_ID ,WO.COVERAGE_STATUS_NUM ,WO.COVERAGE_REC_NUM,
wo.workitem_outlier_type_id
FROM earnings_estimate.workitem_history wh
JOIN EARNINGS_ESTIMATE.workitem_outlier wo
ON wh.workitem_id=wo.workitem_id
JOIN earnings_estimate.workitem wi
ON wi.workitem_id=wo.workitem_id
AND WI.WORKITEM_TYPE_ID=3
and wh.workitem_status_id not in (1,7)
WHERE ( wo.bb_ticker_id IN (SELECT
column_value from table(v_tickerlist) )
)
)wh
where r=1
AND DECODE(V_DATE_TYPE,'CreatedDate',WH.CREATE_TS,'LastModifiedDate',WH.UPDATE_TS) >= V_START_DATE
AND decode(v_date_type,'CreatedDate',wh.create_ts,'LastModifiedDate',wh.update_ts) <= v_end_date
and decode(wh.owner_uuid,null,-1,wh.owner_uuid)=decode(v_analyst_id,null,decode(wh.owner_uuid,null,-1,wh.owner_uuid),v_analyst_id)
) wh,
earnings_estimate.analyst_group ag,
earnings_estimate.coverage_status cs,
earnings_estimate.research_document rd,
( SELECT
BB.BB_ID ,
BRK.BROKER_ID,
BRK.BROKER_NM
FROM EARNINGS_ESTIMATE.BROKER BRK,COMMON.BB_ID BB
WHERE BRK.ORG_ID = BB.ORG_ID
AND BRK.ORG_LOC_REC_NUM = BB.ORG_LOC_REC_NUM
AND BRK.primary_broker_ind='Y') bro,
earnings_estimate.industry i
WHERE wh.analyst_group_id = ag.analyst_group_id
AND wh.workitem_status_id = ws.workitem_status_id
AND wh.workitem_type_id = wt.workitem_type_id
AND wh.coverage_status_num=cs.coverage_status_num
AND wh.workitem_outlier_type_id=wot.workitem_outlier_type_id
AND wh.PACKAGE_ID=rd.PACKAGE_ID(+)
AND rd.industry_id=i.industry_id(+)
AND rd.BROKER_BB_ID=bro.BB_ID(+)
ORDER BY wh.create_ts)tab1 )
;

I agree that the problem is most likely related to SELECT column_value from table(v_tickerlist).
By default, Oracle estimates that table functions return 8168 rows. Since you're testing the query with a single value, I assume that the actual number of values is usually much smaller. Cardinality estimates, like any forecast, are always wrong. But they should at least be in the ballpark of the actual cardinality for the optimizer to do its job properly.
You can force Oracle to always check the size with dynamic sampling. This will require more time to generate the plan, but it will probably be worth it in this case.
For example:
SQL> --Sample type
SQL> create or replace type v_tickerlist is table of number;
2 /
Type created.
SQL> --Show explain plans
SQL> set autotrace traceonly explain;
SQL> --Default estimate is poor. 8168 estimated, versus 3 actual.
SQL> SELECT column_value from table(v_tickerlist(1,2,3));
Execution Plan
----------------------------------------------------------
Plan hash value: 1748000095
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 16336 | 16 (0)| 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 16336 | 16 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
SQL> --Estimate is perfect when dynamic sampling is used.
SQL> SELECT /*+ dynamic_sampling(tickerlist, 2) */ column_value
2 from table(v_tickerlist(1,2,3)) tickerlist;
Execution Plan
----------------------------------------------------------
Plan hash value: 1748000095
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 6 | 6 (0)| 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 3 | 6 | 6 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement (level=2)
SQL>
If that doesn't help, look at your explain plan (and post it here). Find where the cardinality estimate is most wrong, then try to figure out why that is.

Your query is too big and will take time when executing on bulk data. Try putting few de-normalised temp tables, extract the data there and then join between the temp tables. That will increase the performance.
With this stand alone query, do not pass any variable inside the subqueries as in the below line...
WHERE ( wo.bb_ticker_id IN (SELECT
column_value from table(v_tickerlist)
Also, the outer joins will toss the performance.. Better to implement the denormalised temp tables

Related

Understanding characteristics of a query for which an index makes a dramatic difference

I am trying to come up with an example showing that indexes can have a dramatic (orders of magnitude) effect on query execution time. After hours of trial and error I am still at square one. Namely, the speed-up is not large even when the execution plan shows using the index.
Since I realized that I better have a large table for the index to make a difference, I wrote the following script (using Oracle 11g Express):
CREATE TABLE many_students (
student_id NUMBER(11),
city VARCHAR(20)
);
DECLARE
nStudents NUMBER := 1000000;
nCities NUMBER := 10000;
curCity VARCHAR(20);
BEGIN
FOR i IN 1 .. nStudents LOOP
curCity := ROUND(DBMS_RANDOM.VALUE()*nCities, 0) || ' City';
INSERT INTO many_students
VALUES (i, curCity);
END LOOP;
COMMIT;
END;
I then tried quite a few queries, such as:
select count(*)
from many_students M
where M.city = '5467 City';
and
select count(*)
from many_students M1
join many_students M2 using(city);
and a few other ones.
I have seen this post and think that my queries satisfy the requirements stated in the replies there. However, none of the queries I tried showed dramatic improvement after building an index: create index myindex on many_students(city);
Am I missing some characteristic that distinguishes a query for which an index makes a dramatic difference? What is it?
The test case is a good start but it needs a few more things to get a noticeable performance difference:
Realistic data sizes. One million rows of two small values is a small table. With a table that small the performance difference between a good and a bad execution plan may not matter much.
The below script will double the table size until it gets to 64 million rows. It takes about 20 minutes on my machine. (To make it go quicker, for larger sizes, you could make the table nologging and add an /*+ append */ hint to the insert.
--Increase the table to 64 million rows. This took 20 minutes on my machine.
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
insert into many_students select * from many_students;
commit;
--The table has about 1.375GB of data. The actual size will vary.
select bytes/1024/1024/1024 gb from dba_segments where segment_name = 'MANY_STUDENTS';
Gather statistics. Always gather statistics after large table changes. The optimizer cannot do its job well unless it has table, column, and index statistics.
begin
dbms_stats.gather_table_stats(user, 'MANY_STUDENTS');
end;
/
Use hints to force a good and bad plan. Optimizer hints should usually be avoided. But to quickly compare different plans they can be helpful to fix a bad plan.
For example, this will force a full table scan:
select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
But you'll also want to verify the execution plan:
explain plan for select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
select * from table(dbms_xplan.display);
Flush the cache. Caching is probably the main culprit behind the index and full table scan queries taking the same amount of time. If the table fits entirely in memory then the time to read all the rows may be almost too small to measure. The number could be dwarfed by the time to parse the query or to send a simple result across the network.
This command will force Oracle to remove almost everything from the buffer cache. This will help you test a "cold" system. (You probably do not want to run this statement on a production system.)
alter system flush buffer_cache;
However, that won't flush the operating system or SAN cache. And maybe the table really would fit in memory on production. If you need to test a fast query it may be necessary to put it in a PL/SQL loop.
Multiple, alternating runs. There many things happening in the background, like caching and other processes. It's so easy to get bad results because something unrelated changed on the system.
Maybe the first run takes extra long to put things in a cache. Or maybe some huge job was started between queries. To avoid those issues, alternate running the two queries. Run them five times, throw out the highs and lows, and compare the averages.
For example, copy and paste the statements below five times and run them. (If using SQL*Plus, run set timing on first.) I already did that and posted the times I got in a comment before each line.
--Seconds: 0.02, 0.02, 0.03, 0.234, 0.02
alter system flush buffer_cache;
select count(*) from many_students M where M.city = '5467 City';
--Seconds: 4.07, 4.21, 4.35, 3.629, 3.54
alter system flush buffer_cache;
select /*+ full(M) */ count(*) from many_students M where M.city = '5467 City';
Testing is hard. Putting together decent performance tests is difficult. The above rules are only a start.
This might seem like overkill at first. But it's a complex topic. And I've seen so many people, including myself, waste a lot of time "tuning" something based on a bad test. Better to spend the extra time now and get the right answer.
An index really shines when the database doesn't need to go to every row in a table to get your results. So COUNT(*) isn't the best example. Take this for example:
alter session set statistics_level = 'ALL';
create table mytable as select * from all_objects;
select * from mytable where owner = 'SYS' and object_name = 'DUAL';
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 300 |00:00:00.01 | 12 |
| 1 | TABLE ACCESS FULL| MYTABLE | 1 | 19721 | 300 |00:00:00.01 | 12 |
---------------------------------------------------------------------------------------
So, here, the database does a full table scan (TABLE ACCESS FULL), which means it has to visit every row in the database, which means it has to load every block from disk. Lots of I/O. The optimizer guessed that it was going to find 15000 rows, but I know there's only one.
Compare that with this:
create index myindex on mytable( owner, object_name );
select * from mytable where owner = 'SYS' and object_name = 'JOB$';
select * from table( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 | 2 |
| 1 | TABLE ACCESS BY INDEX ROWID| MYTABLE | 1 | 2 | 1 |00:00:00.01 | 3 | 2 |
|* 2 | INDEX RANGE SCAN | MYINDEX | 1 | 1 | 1 |00:00:00.01 | 2 | 2 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS' AND "OBJECT_NAME"='JOB$')
Here, because there's an index, it does an INDEX RANGE SCAN to find the rowids for the table that match our criteria. Then, it goes to the table itself (TABLE ACCESS BY INDEX ROWID) and looks up only the rows we need and can do so efficiently because it has a rowid.
And even better, if you happen to be looking for something that is entirely in the index, the scan doesn't even have to go back to the base table. The index is enough:
select count(*) from mytable where owner = 'SYS';
select * from table( dbms_xplan.display_cursor( null, null, 'ALLSTATS LAST' ));
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 46 | 46 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 46 | 46 |
|* 2 | INDEX RANGE SCAN| MYINDEX | 1 | 8666 | 9294 |00:00:00.01 | 46 | 46 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
Because my query involved the owner column and that's contained in the index, it never needs to go back to the base table to look anything up there. So the index scan is enough, then it does an aggregation to count the rows. This scenario is a little less than perfect, because the index is on (owner, object_name) and not just owner, but its definitely better than doing a full table scan on the main table.

ORDER BY subquery and ROWNUM goes against relational philosophy?

Oracle 's ROWNUM is applied before ORDER BY. In order to put ROWNUM according to a sorted column, the following subquery is proposed in all documentations and texts.
select *
from (
select *
from table
order by price
)
where rownum <= 7
That bugs me. As I understand, table input into FROM is relational, hence no order is stored, meaning the order in the subquery is not respected when seen by FROM.
I cannot remember the exact scenarios but this fact of "ORDER BY has no effect in the outer query" I have read more than once. Examples are in-line subqueries, subquery for INSERT, ORDER BY of PARTITION clause, etc. For example in
OVER (PARTITION BY name ORDER BY salary)
the salary order will not be respected in outer query, and if we want salary to be sorted at outer query output, another ORDER BY need to be added in the outer query.
Some insights from everyone on why the relational property is not respected here and order is stored in the subquery ?
The ORDER BY in this context is in effect Oracle's proprietary syntax for generating an "ordered" row number on a (logically) unordered set of rows. This is a poorly designed feature in my opinion but the equivalent ISO standard SQL ROW_NUMBER() function (also valid in Oracle) may make it clearer what is happening:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, *
from table
) t
where rn <= 7;
In this example the ORDER BY goes where it more logically belongs: as part of the specification of a derived row number attribute. This is more powerful than Oracle's version because you can specify several different orderings defining different row numbers in the same result. The actual ordering of rows returned by this query is undefined. I believe that's also true in your Oracle-specific version of the query because no guarantee of ordering is made when you use ORDER BY in that way.
It's worth remembering that Oracle is not a Relational DBMS. In common with other SQL DBMSs Oracle departs from the relational model in some fundamental ways. Features like implicit ordering and DISTINCT exist in the product precisely because of the non-relational nature of the SQL model of data and the consequent need to work around keyless tables with duplicate rows.
Not surprisingly really, Oracle treats this as a bit of a special case. You can see that from the execution plan. With the naive (incorrect/indeterminate) version of the limit that crops up sometimes, you get SORT ORDER BY and COUNT STOPKEY operations:
select *
from my_table
where rownum <= 7
order by price;
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
|* 2 | COUNT STOPKEY | | | | | |
| 3 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM<=7)
If you just use an ordered subquery, with no limit, you only get the SORT ORDER BY operation:
select *
from (
select *
from my_table
order by price
);
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 13 | 3 (34)| 00:00:01 |
| 2 | TABLE ACCESS FULL| MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
With the usual subquery/ROWNUM construct you get something different,
select *
from (
select *
from my_table
order by price
)
where rownum <= 7;
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 3 (34)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1 | 13 | 3 (34)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 1 | 13 | 3 (34)| 00:00:01 |
| 4 | TABLE ACCESS FULL | MY_TABLE | 1 | 13 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=7)
3 - filter(ROWNUM<=7)
The COUNT STOPKEY operation is still there for the outer query, but the inner query (inline view, or derived table) now has a SORT ORDER BY STOPKEY instead of the simple SORT ORDER BY. This is all hidden away in the internals so I'm speculating, but it looks like the stop key - i.e. the row number limit - is being pushed into the subquery processing, so in effect the subquery may only end up with seven rows anyway - though the plan's ROWS value doesn't reflect that (but then you get the same plan with a different limit), and it still feels the need to apply the COUNT STOPKEY operation separately.
Tom Kyte covered similar ground in an Oracle Magazine article, when talking about "Top- N Query Processing with ROWNUM" (emphasis added):
There are two ways to approach this:
- Have the client application run that query and fetch just the first N rows.
- Use that query as an inline view, and use ROWNUM to limit the results, as in SELECT * FROM ( your_query_here ) WHERE ROWNUM <= N.
The second approach is by far superior to the first, for two reasons. The lesser of the two reasons is that it requires less work by the client, because the database takes care of limiting the result set. The more important reason is the special processing the database can do to give you just the top N rows. Using the top- N query means that you have given the database extra information. You have told it, "I'm interested only in getting N rows; I'll never consider the rest." Now, that doesn't sound too earth-shattering until you think about sorting—how sorts work and what the server would need to do.
... and then goes on to outline what it's actually doing, rather more authoritatively than I can.
Interestingly I don't think the order of the final result set is actually guaranteed; it always seems to work, but arguably you should still have an ORDER BY on the outer query too to make it complete. It looks like the order isn't really stored in the subquery, it just happens to be produced like that. (I very much doubt that will ever change as it would break too many things; this ends up looking similar to a table collection expression which also always seems to retain its ordering - breaking that would stop dbms_xplan working though. I'm sure there are other examples.)
Just for comparison, this is what the ROW_NUMBER() equivalent does:
select *
from (
select ROW_NUMBER() OVER (ORDER BY price) rn, my_table.*
from my_table
) t
where rn <= 7;
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 52 | 4 (25)| 00:00:01 |
|* 1 | VIEW | | 2 | 52 | 4 (25)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 2 | 26 | 4 (25)| 00:00:01 |
| 3 | TABLE ACCESS FULL | MY_TABLE | 2 | 26 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RN"<=7)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "PRICE")<=7)
Adding to sqlvogel's good answer :
"As I understand, table input into FROM is relational"
No, table input into FROM is not relational. It is not relational because "table input" are tables and tables are not relations. The myriads of quirks and oddities in SQL eventually all boil down to that simple fact : the core building brick in SQL is the table, and a table is not a relation. To sum up the differences :
Tables can contain duplicate rows, relations cannot. (As a consequence, SQL offers bag algebra, not relational algebra. As another consequence, it is as good as impossible for SQL to even define equality comparison for its most basic building brick !!! How would you compare tables for equality given that you might have to deal with duplicate rows ?)
Tables can contain unnamed columns, relations cannot. SELECT X+Y FROM ... As a consequence, SQL is forced into "column identity by ordinal position", and as a consequence of that, you get all sorts of quirks, e.g. in SELECT A,B FROM ... UNION SELECT B,A FROM ...
Tables can contain duplicate column names, relations cannot. A.ID and B.ID in a table are not distinct column names. The part before the dot is not part of the name, it is a "scope identifier", and that scope identifier "disappears" once you're "outside the SELECT" it appears/is introduced in. You can verify this with a nested SELECT : SELECT A.ID FROM (SELECT A.ID, B.ID FROM ...). It won't work (unless your particular implementation departs from the standard in order to make it work).
Various SQL constructs leave people with the impression that tables do have an ordering to rows. The ORDER BY clause, obviously, but also the GROUP BY clause (which can be made to work only by introducing rather dodgy concepts of "intermediate tables with rows grouped together"). Relations simply are not like that.
Tables can contain NULLs, relations cannot. This one has been beaten to death.
There should be some more, but I don't remember them off the tip of the hat.

Oracle is not using the Indexes

I have a very large table in oracle 11g that has a very simple index in a char field (that is normally Y or N)
If I just execute the queue as bellow it takes around 10s to return
select QueueId, QueueSiteId, QueueData from queue where QueueProcessed = 'N'
However if I force it to use the index I create it takes 80ms
select /*+ INDEX(avaqueue QUEUEPROCESSED_IDX) */ QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N'
Also if I run under the explain plan for as bellow:
explain plan for select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N'
and
explain plan for select /*+ INDEX(avaqueue QUEUEPROCESSED_IDX) */
QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N'
For the frist plan I got:
------------------------------------------------------------------------------
Plan hash value: 803924726
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 691K| 128M| 12643 (1)| 00:02:32 |
|* 1 | TABLE ACCESS FULL| AVAQUEUE | 691K| 128M| 12643 (1)| 00:02:32 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("QUEUEPROCESSED"='N')
For the second pla I got:
Plan hash value: 2012309891
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 691K| 128M| 24386 (1)| 00:04:53 |
| 1 | TABLE ACCESS BY INDEX ROWID| AVAQUEUE | 691K| 128M| 24386 (1)| 00:04:53 |
|* 2 | INDEX RANGE SCAN | QUEUEPROCESSED_IDX | 691K| | 1297 (1)| 00:00:16 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("QUEUEPROCESSED"='N')
------------------------------------------------------------------------------
What proves that if I don't explicit tell oracle to use the index it does not use it, my question is why is oracle not using this index? Oracle is normally smart enough to make decisions 10 times better than me, that is the first time I actually have to force oracle to use a index and I am not very comfortable with it.
Does anyone have a good explanation for oracle decision to not use the index in this very explicit case?
The QueueProcessed column is probably missing a histogram so Oracle does not know the data is skewed.
If Oracle does not know the data is skewed it will assume the equality predicate, QueueProcessed = 'N', returns DBA_TABLES.NUM_ROWS /
DBA_TAB_COLUMNS.NUM_DISTINCT. The optimizer thinks the query returns half the rows in the table. Based on the 80ms return time the real number of rows returned is small.
Index range scans generally only work well when they select a small percentage of the rows. Index range scans read from a data structure one block at a time. And if the data is randomly distributed, it may need to read every block of data from the table anyway. For those reasons, if the query accesses a large portion of the table, it is more efficient to use a multi-block full table scan.
The bad cardinality estimate from the skewed data causes Oracle to think a full table scan is better. Creating a histogram will fix the issue.
Sample schema
Create a table, fill it with skewed data, and gather statistics the first time.
drop table queue;
create table queue(
queueid number,
queuesiteid number,
queuedata varchar2(4000),
queueprocessed varchar2(1)
);
create index QUEUEPROCESSED_IDX on queue(queueprocessed);
--Skewed data - only 100 of the 100000 rows are set to N.
insert into queue
select level, level, level, decode(mod(level, 1000), 0, 'N', 'Y')
from dual connect by level <= 100000;
begin
dbms_stats.gather_table_stats(user, 'QUEUE');
end;
/
The first execution will have the problem.
In this case the default statistics settings do not gather histograms the first time. The plan shows a full table scan and estimates Rows=50000, exactly half.
explain plan for
select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N';
select * from table(dbms_xplan.display);
Plan hash value: 1157425618
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 878K| 103 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| QUEUE | 50000 | 878K| 103 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("QUEUEPROCESSED"='N')
Create a histogram
The default statistics settings are usually sufficient. Histogram may not be collected for several reasons. They may be manually disabled - check for the tasks, jobs or preferences set by the DBA.
Also, histograms are only automatically collected on columns that are both skewed and used. Gathering histograms can take time, there's no need to create the histogram on a column that is never used in a relevant predicate. Oracle tracks when a column is used and could benefit from a histogram, although that data is lost if the table is dropped.
Running a sample query and re-gathering statistics will make the histogram appear:
select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N';
begin
dbms_stats.gather_table_stats(user, 'QUEUE');
end;
/
Now the Rows=100 and the Index is used.
explain plan for
select QueueId, QueueSiteId, QueueData
from queue where QueueProcessed = 'N';
select * from table(dbms_xplan.display);
Plan hash value: 2630796144
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 | 1800 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| QUEUE | 100 | 1800 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | QUEUEPROCESSED_IDX | 100 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("QUEUEPROCESSED"='N')
Here's the histogram:
select column_name, histogram
from dba_tab_columns
where table_name = 'QUEUE'
order by column_name;
COLUMN_NAME HISTOGRAM
----------- ---------
QUEUEDATA NONE
QUEUEID NONE
QUEUEPROCESSED FREQUENCY
QUEUESITEID NONE
Create the histogram
Try to determine why the histogram was missing. Check that statistics are gathered with the defaults, there are no weird column or table preferences, and that table is not constantly dropped and re-loaded.
If you cannot rely on the default statistics job for your process you can manually gather histograms with the method_opt parameter like this:
begin
dbms_stats.gather_table_stats(user, 'QUEUE', method_opt=>'for columns size 254 queueprocessed');
end;
/
The answer - at least the first one that will just lead to more questions - is right there in the plans. The first plan has an estimated cost and estimated execution time about half that of the second plan. In the absence of the hint, Oracle is choosing the plan that it thinks will run faster.
So of course the next question is why is its estimate so far off in this case. Not only are the estimated times wrong relative to each other, both are much greater than what you actually experience when running the query.
The first thing I would look at is the estimated number of rows returned. The optimizer is guessing, in both cases, that there are about 691,000 rows in table matching your predicate. Is this close to the truth, or very far off? If it's far off, then refreshing statistics may be the right solution. Although if the column only has two possible values, I'd be kind of surprised if the existing stats are so off base.

Generalising Oracle Static Cursors

I am creating an OLAP-like package in Oracle where you call a main, controlling function that assembles its returning output table by making numerous left joins. These joined tables are defined in 'slave' functions within the package, which return specific subsets using static cursors, parameterised by the function's arguments. The thing is, these cursors are all very similar.
Is there a way, beyond generating dynamic queries and using them in a ref cursor, that I can generalise these. Every time I add a function, I get this weird feeling, as a developer, that this isn't particularly elegant!
Pseduocode
somePackage
function go(param)
return select myRows.id,
stats1.value,
stats2.value
from myRows
left join table(somePackage.stats1(param)) stats1
on stats1.id = myRows.id
left join table(somePackage.stats2(param)) stats2
on stats2.id = myRows.id
function stats1(param)
return [RESULTS OF SOME QUERY]
function stats2(param)
return [RESULTS OF A RELATED QUERY]
The stats queries all have the same structure:
First they aggregate the data in a useful way
Then they split this data into logical sections, based on criteria, and aggregate again (e.g., by department, by region, etc.) then union the results
Then they return the results, cast into the relevant object type, so I can easily do a bulk collect
Something like:
cursor myCursor is
with fullData as (
[AGGREGATE DATA]
),
fullStats as (
[AGGREGATE FULLDATA BY TOWN]
union all
[AGGREGATE FULLDATA BY REGION]
union all
[AGGREGATE FULLDATA BY COUNTRY]
)
select myObjectType(fullStats.*)
from fullStats;
...
open myCursor;
fetch myCursor bulk collect into output limit 1000;
close myCursor;
return output;
Filter operations can help build dynamic queries with static SQL. Especially when the column list is static.
You may have already considered this approach but discarded it for performance reasons. "Why execute every SQL block if we only need the results
from one of them?" You're in luck, the optimizer already does this for you with a FILTER operation.
Example Query
First create a function that waits 5 seconds every time it is run. It will help find which query blocks were executed.
create or replace function slow_function return number is begin
dbms_lock.sleep(5);
return 1;
end;
/
This static query is controlled by bind variables. There are three query blocks but the entire query runs in 5 seconds instead of 15.
declare
v_sum number;
v_query1 number := 1;
v_query2 number := 0;
v_query3 number := 0;
begin
select sum(total)
into v_sum
from
(
select total from (select slow_function() total from dual) where v_query1 = 1
union all
select total from (select slow_function() total from dual) where v_query2 = 1
union all
select total from (select slow_function() total from dual) where v_query3 = 1
);
end;
/
Execution Plan
This performance is not the result of good luck; it's not simply Oracle randomly executing one predicate before another. Oracle analyzes the bind variables before
run-time and does not even execute the irrelevant query blocks. That's what the FILTER operation below is doing. (Which is a poor name, many people generally
refer to all predicates as "filters". But only some of them result in a FILTER operation.)
select * from table(dbms_xplan.display_cursor(sql_id => '0cfqc6a70kzmt'));
SQL_ID 0cfqc6a70kzmt, child number 0
-------------------------------------
SELECT SUM(TOTAL) FROM ( SELECT TOTAL FROM (SELECT SLOW_FUNCTION()
TOTAL FROM DUAL) WHERE :B1 = 1 UNION ALL SELECT TOTAL FROM (SELECT
SLOW_FUNCTION() TOTAL FROM DUAL) WHERE :B2 = 1 UNION ALL SELECT TOTAL
FROM (SELECT SLOW_FUNCTION() TOTAL FROM DUAL) WHERE :B3 = 1 )
Plan hash value: 926033116
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | | 3 | 39 | 6 (0)| 00:00:01 |
| 3 | UNION-ALL | | | | | |
|* 4 | FILTER | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 8 | FILTER | | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(:B1=1)
6 - filter(:B2=1)
8 - filter(:B3=1)
Issues
The FILTER operation is poorly documented. I can't explain in detail when it does or does not work, and exactly what affects it has on other parts of the query. For example, in the explain plan the Rows estimate is 3, but at run time Oracle should easily be able to estimate the cardinality is 1. Apparently the execution plan is not that dynamic, that poor cardinality estimate may cause later issues. Also, I've seen some weird cases where static expressions are not appropriately filtered. But if a query uses a simple equality predicate it should be fine.
This approach allows you remove all dynamic SQL and replace it with a large static SQL statement. Which has some advantages; dynamic SQL is often "ugly" and difficult to debug. But people only familiar with procedural programming tend to think of a single SQL statement as one huge God-function, a bad practice. They won't appreciate that the UNION ALLs create independent blocks of SQL
Dynamic SQL is Still Probably Better
In general I would recommend against this approach. What you have is good because it looks good. The main problem with dynamic SQL is that people don't treat it like real code; it's not commented or formatted and ends up looking like a horrible mess that nobody can understand. If you are able to spend the extra time to generate clean code then you should stick with that.

is there a tricky way to optimize this query

I'm working on a table that has 3008698 rows
exam_date is a DATE field.
But queries I run want to match only the month part. So what I do is:
select * from my_big_table where to_number(to_char(exam_date, 'MM')) = 5;
which I believe takes long because of function on the column. Is there a way to avoid this and make it faster? other than making changes to the table? exam_date in the table have different date values. like 01-OCT-10 or 12-OCT-10...and so on
I don't know Oracle, but what about doing
WHERE exam_date BETWEEN first_of_month AND last_of_month
where the two dates are constant expressions.
select * from my_big_table where MONTH(exam_date) = 5
oops.. Oracle huh?..
select * from my_big_table where EXTRACT(MONTH from exam_date) = 5
Bear in mind that since you want approximately 1/12th of all the data, it may well be more efficient for Oracle to perform a full table scan anyway. This may explain why performance was worse when you followed harpo's advice.
Why? Suppose your data is such that 20 rows fit on each database block (on average), so that you have a total of 3,000,000/20 = 150,000 blocks. That means a full table scan will require 150,000 block reads. Now about 1/12th of the 3,000,000 rows will be for month 05. 3,000,000/12 is 250,000. So that's 250,000 table reads if you use the index - and that's ignoring the index reads that will also be required. So in this example the full table scan does a lot less work than the indexed search.
Bear in miond that there are only twelve distinct values for MONTH. So unless you have a strongly clustered set of records (say if you use partitioining) it is possible that using an index is not necessarily the most efficient way of querying in this fashion.
I didn't find that using EXTRACT() lead the optimizer to use a regular index on my date column but YMMV:
SQL> create index big_d_idx on big_table(col3) compute statistics
2 /
Index created.
SQL> set autotrace traceonly explain
SQL> select * from big_table
2 where extract(MONTH from col3) = 'MAY'
3 /
Execution Plan
----------------------------------------------------------
Plan hash value: 3993303771
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23403 | 1028K| 4351 (3)| 00:00:53 |
|* 1 | TABLE ACCESS FULL| BIG_TABLE | 23403 | 1028K| 4351 (3)| 00:00:53 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('M
AY'))
SQL>
What definitely can persuade the optimizer to use an index in these scenarios is building a function-based index:
SQL> create index big_mon_fbidx on big_table(extract(month from col3))
2 /
Index created.
SQL> select * from big_table
2 where extract(MONTH from col3) = 'MAY'
3 /
Execution Plan
----------------------------------------------------------
Plan hash value: 225326446
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23403 | 1028K| 475 (0)|00:00:06|
| 1 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 23403 | 1028K| 475 (0)|00:00:06|
|* 2 | INDEX RANGE SCAN | BIG_MON_FBIDX | 9361 | | 382 (0)|00:00:05|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(EXTRACT(MONTH FROM INTERNAL_FUNCTION("COL3"))=TO_NUMBER('MAY'))
SQL>
The function call means that Oracle won't be able to use any index that might be defined on the column.
Either remove the function call (as in harpo's answer) or use a function based index.

Resources