The following query performs very poorly, due to the "order by". My goal is to get only a small subset of the resultset (using ROWNUM, for example). However, when I add "order by" it goes through the entire resultset performing an index lookup for each record, which makes it extremely slow. Without sorting the query is about 100 times faster when I limit the resultset to, for example, 1000 records.
QUERY:
SELECT text_field
from mytable where
contains(text_field,'ABC', 1)>0
order by another_field;
THIS IS HOW I CREATED THE INDEX:
CREATE INDEX myindex ON mytable (text_field) INDEXTYPE IS ctxsys.context FILTER BY another_field
EXECUTION PLAN:
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | TABLE ACCESS BY INDEX ROWID| MYTABLE |
|* 3 | DOMAIN INDEX | MYINDEX |
---------------------------------------------------------------
I also used CTXCAT instead of CONTEXT, and no improvement. I think the problem is, when I want the results sorted (only top 1000), it performs an index lookup for each record in the "entire" resultset. Is there a way to avoid that?
Thank you.
To have the ordering applied before the rownum filter, you need to use an in-line view:
SELECT text_file
from (
SELECT text_field
from mytable where
contains(text_field,'ABC', 1)>0
order by another_field
)
where rownum <= 1000;
With your index in place Oracle should optimise this to do as little work as possible. You should see 'sort order by stopkey' and 'count stopkey' steps in the plan, which is Oracle being clever and knowing it only needs to get 1000 values from the index.
If you don't use the in-line view but just add the rownum to your original query it will still optimise it but as you state it will order the first 1000 random (or indeterminate, anyway) rows it finds, because of the sequence of operations it performs.
Related
I'm trying to optimize a set of stored procs which are going against many tables including this view. The view is as such:
We have TBL_A (id, hist_date, hist_type, other_columns) with two types of rows: hist_type 'O' vs. hist_type 'N'. The view self joins table A to itself and transposes the N rows against the corresponding O rows. If no N row exists for the O row, the O row values are repeated. Like so:
CREATE OR REPLACE FORCE VIEW V_A (id, hist_date, hist_type, other_columns_o, other_columns_n)
select
o.id, o.hist_date, o.hist_type,
o.other_columns as other_columns_o,
case when n.id is not null then n.other_columns else o.other_columns end as other_columns_n
from
TBL_A o left outer join TBL_A n
on o.id=n.id and o.hist_date=n.hist_date and n.hist_type = 'N'
where o.hist_type = 'O';
TBL_A has a unique index on: (id, hist_date, hist_type). It also has a unique index on: (hist_date, id, hist_type) and this is the primary key.
The following query is at issue (in a stored proc, with x declared as TYPE_TABLE_OF_NUMBER):
select b.id BULK COLLECT into x from TBL_B b where b.parent_id = input_id;
select v.id from v_a v
where v.id in (select column_value from table(x))
and v.hist_date = input_date
and v.status_new = 'CLOSED';
This query ignores the index on id column when accessing TBL_A and instead does a range scan using the date to pick up all the rows for the date. Then it filters that set using the values from the array. However if I simply give the list of ids as a list of numbers the optimizer uses the index just fine:
select v.id from v_a v
where v.id in (123, 234, 345, 456, 567, 678, 789)
and v.hist_date = input_date
and v.status_new = 'CLOSED';
The problem also doesn't exist when going against TBL_A directly (and I have a workaround that does that, but it's not ideal.).Is there a way to get the optimizer to first retrieve the array values and use them as predicates when accessing the table? Or a good way to restructure the view to achieve this?
Oracle does not use the index because it assumes select column_value from table(x) returns 8168 rows.
Indexes are faster for retrieving small amounts of data. At some point it's faster to scan the whole table than repeatedly walk the index tree.
Estimating the cardinality of a regular SQL statement is difficult enough. Creating an accurate estimate for procedural code is almost impossible. But I don't know where they came up with 8168. Table functions are normally used with pipelined functions in data warehouses, a sorta-large number makes sense.
Dynamic sampling can generate a more accurate estimate and likely generate a plan that will use the index.
Here's an example of a bad cardinality estimate:
create or replace type type_table_of_number as table of number;
explain plan for
select * from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 8168 | 00:00:01 |
-------------------------------------------------------------------------
Here's how to fix it:
explain plan for select /*+ dynamic_sampling(2) */ *
from table(type_table_of_number(1,2,3,4,5,6,7));
select * from table(dbms_xplan.display(format => '-cost -bytes'));
Plan hash value: 1748000095
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 00:00:01 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 7 | 00:00:01 |
-------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
I am currently facing a performance problem when using the table functions. I will explain.
I am working with Oracle types and one of them is defined like below:
create or replace TYPE TYPESTRUCTURE AS OBJECT
(
ATTR1 VARCHAR2(30),
ATTR2 VARCHAR2(20),
ATTR3 VARCHAR2(20),
ATTR4 VARCHAR2(20),
ATTR5 VARCHAR2(20),
ATTR6 VARCHAR2(20),
ATTR7 VARCHAR2(20),
ATTR8 VARCHAR2(20),
ATTR9 VARCHAR2(20),
ATTR10 VARCHAR2(20),
ATTR11 VARCHAR2(20),
ATTR12 VARCHAR2(20),
ATTR13 VARCHAR2(10),
ATTR14 VARCHAR2(50),
ATTR15 VARCHAR2(13)
);
Then I have one table of this type like:
create or replace TYPE TYPESTRUCTURE_ARRAY AS TABLE OF TYPESTRUCTURE ;
I have one procedure with the following variables:
arr QCSTRUCTURE_ARRAY;
arr2 QCSTRUCTURE_ARRAY;
ARR is only containing one single instance of TYPESTRUCTURE with all its attributes set to NULL except ATTR4 which is set to 'ABC'
ARR2 is completelly empty.
Here comes the part which is giving me the performance issue.
The purpose is to take some values from a view (depending on the value on ATTR4) and fill those in same or similar structure. So I do the following:
SELECT TYPESTRUCTURE(MV.A,null,null,MV.B,MV.C,MV.D,null,null,MV.E,null,null,MV.F,MV.F,MV.G,MV.H)
BULK COLLECT INTO arr2
FROM TABLE(arr) PARS
JOIN MYVIEW MV
ON MV.B = PARS.ATTR4;
The code here works correctly except for the fact that is taking 15 seconds to execute the query...
This query is filling into ARR around 20 instances of TYPESTRUCTURE (or rows).
It could look like there may be lots of data on the view. But what gets me strange is that if I change the query and I set something hardcoded like the one below then is completelly fast (miliseconds)
SELECT TYPESTRUCTURE(MV.A,null,null,MV.B,MV.C,MV.D,null,null,MV.E,null,null,MV.F,MV.F,MV.G,MV.H)
BULK COLLECT INTO arr2
FROM (SELECT 'ABC' ATTR4 FROM DUAL) PARS
JOIN MYVIEW MV
ON MV.B = PARS.ATTR4;
In this new query I am directly hardcoding the value but keeping the join in order to try to test something as much similar as the one above but without the TABLE() function..
So here my question.... Is it possible that this TABLE() function is creating such a big delay with only having one single record inside? I would like to know whether someone can give me some advice on what is wrong in my approach and whether there may be some other way to achieve...
Thanks!!
This problem is likely caused by a poor optimizer estimate for the number of rows returned by the TABLE function. The CARDINALITY or DYNAMIC_SAMPLING hints may be the best way to solve the problem.
Cardinality estimate
Oracle gathers statistics on tables and indexes in order to estimate the cost of accessing those objects. The most important estimate is how many rows will be returned by an object. Procedural code does not have statistics, by default, and Oracle does not make any attempt to parse the code and estimate how many rows will be produced. Whenever Oracle sees a procedural row source it uses a static number. On my database, the number is 16360. On most databases the estimate is 8192, as beherenow pointed out.
explain plan for
select * from table(sys.odcinumberlist(1,2,3));
select * from table(dbms_xplan.display(format => 'basic +rows'));
Plan hash value: 2234210431
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16360 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 16360 |
--------------------------------------------------------------
Fix #1: CARDINALITY hint
As beherenow suggested, the CARDINALITY hint can solve this problem by statically telling Oracle how many rows to estimate.
explain plan for
select /*+ cardinality(1) */ * from table(sys.odcinumberlist(1,2,3));
select * from table(dbms_xplan.display(format => 'basic +rows'));
Plan hash value: 2234210431
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 16360 |
--------------------------------------------------------------
Fix #2: DYNAMIC_SAMPLING hint
A more "official" solution is to use the DYNAMIC_SAMPLING hint. This hint tells Oracle to sample some data at run time before it builds the explain plan. This adds some cost to building the explain plan, but it will return the true number of rows. This may work much better if you don't know the number ahead of time.
explain plan for
select /*+ dynamic_sampling(2) */ * from table(sys.odcinumberlist(1,2,3));
select * from table(dbms_xplan.display(format => 'basic +rows'));
Plan hash value: 2234210431
--------------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 |
| 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH| | 3 |
--------------------------------------------------------------
But what's really slow?
We don't know exactly was slow in your query. But whenever things are slow it's usually best to focus on the worst cardinality estimate. Row estimates are never perfect, but being off by several orders of magnitude can have a huge impact on an execution plan. In the simplest case it may change an index range scan to a full table scan.
I have created a query doing this in ORACLE:
SELECT SUBSTR(title,1,INSTR(title,' ',1,1)) AS first_word, COUNT(*) AS word_count
FROM FILM
GROUP BY SUBSTR(title,1,INSTR(title,' ',1,1))
HAVING COUNT(*) >= 20;
Results after running:
539 rows selected. Elapsed: 00:00:00.22
I need to improve the performance of this and created a function-based index as so:
CREATE INDEX INDX_FIRSTWRD ON FILM(SUBSTR(title,1,INSTR(title,' ',1,1)));
After running the same query at the top of this post, I still get the same performance:
539 rows selected. Elapsed: 00:00:00.22
Is the index not being applied or overwritten or am I doing something wrong?
Thanks for any help you could provide. :)
EDIT:
Execution Plan:
----------------------------------------------------------
Plan hash value: 2033354507
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20000 | 2968K| 138 (2)| 00:00:02 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 20000 | 2968K| 138 (2)| 00:00:02 |
| 3 | TABLE ACCESS FULL| FILM | 20000 | 2968K| 136 (0)| 00:00:02 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(COUNT(*)>=20)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
471 consistent gets
0 physical reads
0 redo size
14030 bytes sent via SQL*Net to client
908 bytes received via SQL*Net from client
37 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
539 rows processed
The problem is that the value you're using for the index may be null - if there is no space in the title (i.e. it's a one-word title like "Jaws") then your substr evaluates to null. That probably isn't what you want, incidentally - you probably want the end position to be conditional on whether there is a space at all, but that's beyond the scope of the question. (And even if you correct that logic, Oracle may still not be able to trust that the result can't be null, even if the underlying column is not nullable). Edit: see below for more on using nvl to handle single-word titles.
Since nulls aren't included in indexes, the single-title rows won't be indexed. But you're asking for all rows, and Oracle knows the index doesn't hold all rows, so it can't use the index to fulfil the query - even if you add a hint telling it to, it has to ignore that hint.
The only time the index will be used is if you include a filter that references the indexed value too, and explicitly or implicitly exclude nulls, e.g.:
SELECT SUBSTR(title,1,INSTR(title,' ',1,1)) AS first_word, COUNT(*) AS word_count
FROM FILM
WHERE SUBSTR(title,1,INSTR(title,' ',1,1)) IS NOT NULL
GROUP BY SUBSTR(title,1,INSTR(title,' ',1,1))
HAVING COUNT(*) >= 20;
(which also probably isn't what you actually want).
SQL Fiddle for queries with and without a filter, and with and without an index hint. (Click the 'execution plan' link against each result section to see whether it's doing a full table scan or a full index scan).
And another Fiddle showing that the index can't be used even with the filter if the filter still allows null values, again since they are not in the index.
Since SylvainLeroux brought it up, Oracle isn't quite clever enough to know the computed value can't be null if you coalesce it, even if the underlying column is not-null (as a function-based index or as a virtual column). Possibly because there could be a lot of branches to evaluate. But it is clever enough if you use the simpler and proprietary nvl instead:
CREATE INDEX INDX_FIRSTWRD
ON FILM(NVL(SUBSTR(title,1,INSTR(title,' ',1,1)),title));
SELECT NVL(SUBSTR(title,1,INSTR(title,' ',1,1)),title) AS first_word,
COUNT(*) AS word_count
FROM FILM
GROUP BY NVL(SUBSTR(title,1,INSTR(title,' ',1,1)),title)
HAVING COUNT(*) >= 20;
But only if title is defined as not-null. And coalesce does work if the virtual column is also declared not-null (thanks Sylvain).
SQL Fiddle with a function-based index and another with a virtual column.
539 rows selected. Elapsed: 00:00:00.22
Do you really think you need to tune the query which returns 539 rows in less than a second? 220 milliseconds, precicely! Think about it.
In your case, I think CBO does the best possible thing. And that is the reason it doesn't use the index. Because, to read every row from the table, using the index is an overhead. It needs to read the index and then do a table access by rowid. Probably, in your small table, it could read the entire table with less IO to fetch the data.
If the table is small enough to be in a single block, then, it just requires a one IO to fetch required data from single block with full table scan.
You can try to check the explain plan by hinting the query to use the index and see if anything really improves. Remember, you are trying unnecessarily to improve the performance of a query which executes in less than a second!
I need to paginate on a set of models that can/will become large. The results have to be sorted so that the latest entries are the ones that appear on the first page (and then, we can go all the way to the start using 'next' links).
The query to retrieve the first page is the following, 4 is the number of entries I need per page:
SELECT "relationships".* FROM "relationships" WHERE ("relationships".followed_id = 1) ORDER BY created_at DESC LIMIT 4 OFFSET 0;
Since this needs to be sorted and since the number of entries is likely to become large, am I going to run into serious performance issues?
What are my options to make it faster?
My understanding is that an index on 'followed_id' will simply help the where clause. My concern is on the 'order by'
Create an index that contains these two fields in this order (followed_id, created_at)
Now, how large is the large we are talking about here? If it will be of the order of millions.. How about something like the one that follows..
Create an index on keys followed_id, created_at, id (This might change depending upon the fields in select, where and order by clause. I have tailor-made this to your question)
SELECT relationships.*
FROM relationships
JOIN (SELECT id
FROM relationships
WHERE followed_id = 1
ORDER BY created_at
LIMIT 10 OFFSET 10) itable
ON relationships.id = itable.id
ORDER BY relationships.created_at
An explain would yield this:
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | relationships | ref | sample_rel2 | sample_rel2 | 5 | | 1 | Using where; Using index |
+----+-------------+---------------+------+---------------+-------------+---------+------+------+-----------------------------------------------------+
If you examine carefully, the sub-query containing the order, limit and offset clauses will operate on the index directly instead of the table and finally join with the table to fetch the 10 records.
It makes a difference when at one point your query makes a call like limit 10 offset 10000. It will retrieve all the 10000 records from the table and fetch the first 10. This trick should restrict the traversal to just the index.
An important note: I tested this in MySQL. Other database might have subtle differences in behavior, but the concept holds good no matter what.
you can index these fields. but it depends:
you can assume (mostly) that the created_at is already ordered. So that might by unnecessary. But that more depends on you app.
anyway you should index followed_id (unless its the primary key)
Let's say users have 1 - n accounts in a system. When they query the database, they may choose to select from m acounts, with m between 1 and n. Typically the SQL generated to fetch their data is something like
SELECT ... FROM ... WHERE account_id IN (?, ?, ..., ?)
So depending on the number of accounts a user has, this will cause a new hard-parse in Oracle, and a new execution plan, etc. Now there are a lot of queries like that and hence, a lot of hard-parses, and maybe the cursor/plan cache will be full quite early, resulting in even more hard-parses.
Instead, I could also write something like this
-- use any of these
CREATE TYPE numbers AS VARRAY(1000) of NUMBER(38);
CREATE TYPE numbers AS TABLE OF NUMBER(38);
SELECT ... FROM ... WHERE account_id IN (
SELECT column_value FROM TABLE(?)
)
-- or
SELECT ... FROM ... JOIN (
SELECT column_value FROM TABLE(?)
) ON column_value = account_id
And use JDBC to bind a java.sql.Array (i.e. an oracle.sql.ARRAY) to the single bind variable. Clearly, this will result in less hard-parses and less cursors in the cache for functionally equivalent queries. But is there anything like general a performance-drawback, or any other issues that I might run into?
E.g: Does bind variable peeking work in a similar fashion for varrays or nested tables? Because the amount of data associated with every account may differ greatly.
I'm using Oracle 11g in this case, but I think the question is interesting for any Oracle version.
I suggest you try a plain old join like in
SELECT Col1, Col2
FROM ACCOUNTS ACCT
TABLE TAB,
WHERE ACCT.User = :ParamUser
AND TAB.account_id = ACCT.account_id;
An alternative could be a table subquery
SELECT Col1, Col2
FROM (
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
) ACCT,
TABLE TAB
WHERE TAB.account_id = ACCT.account_id;
or a where subquery
SELECT Col1, Col2
FROM TABLE TAB
WHERE TAB.account_id IN
(
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
);
The first one should be better for perfomance, but you better check them all with explain plan.
Looking at V$SQL_BIND_CAPTURE in a 10g database, I have a few rows where the datatype is VARRAY or NESTED_TABLE; the actual bind values were not captured. In an 11g database, there is just one such row, but it also shows that the bind value is not captured. So I suspect that bind value peeking essentially does not happen for user-defined types.
In my experience, the main problem you run into using nested tables or varrays in this way is that the optimizer does not have a good estimate of the cardinality, which could lead it to generate bad plans. But, there is an (undocumented?) CARDINALITY hint that might be helpful. The problem with that is, if you calculate the actual cardinality of the nested table and include that in the query, you're back to having multiple distinct query texts. Perhaps if you expect that most or all users will have at most 10 accounts, using the hint to indicate that as the cardinality would be helpful. Of course, I'd try it without the hint first, you may not have an issue here at all.
(I also think that perhaps Miguel's answer is the right way to go.)
For medium sized list (several thousand items) I would use this approach:
First:generate a prepared statement with an XMLTABLE in join with your main table.
For instance:
String myQuery = "SELECT ...
+" FROM ACCOUNTS A,"
+ "XMLTABLE('tab/row' passing XMLTYPE(?) COLUMNS id NUMBER path 'id') t
+ "WHERE A.account_id = t.id"
then loop through your data and build a StringBuffer with this content:
StringBuffer idList = "<tab><row><id>101</id></row><row><id>907</id></row> ...</tab>";
eventually, prepare and submit your statement, then fetch the results.
myQuery.setString(1, idList);
ResultSet rs = myQuery.executeQuery();
while (rs.next()) {...}
Using this approach is also possible to pass multi-valued list, as in the select statement
SELECT * FROM TABLE t WHERE (t.COL1, t.COL2) in (SELECT X.COL1, X.COL2 FROM X);
In my experience performances are pretty good, and the approach is flexible enough to be used in very complex query scenarios.
The only limit is the size of the string passed to the DB, but I suppose it is possible to use CLOB in place of String for arbitrary long XML wrapper to the input list;
This binding a variable number of items into an in list problem seems to come up a lot in various form. One option is to concatenate the IDs into a comma separated string and bind that, and then use a bit of a trick to split it into a table you can join against, eg:
with bound_inlist
as
(
select
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.token
Bind variable peaking is going to be a problem though.
Does the query plan actually change for larger number of accounts, ie would it be more efficient to move from index to full table scan in some cases, or is it borderline? As someone else suggested, you could use the CARDINALITY hint to indicate how many IDs are being bound, the following test case proves this actually works:
create table actual_table (id integer, padding varchar2(100));
create unique index actual_table_idx on actual_table(id);
insert into actual_table
select level, 'this is just some padding for '||level
from dual connect by level <= 1000;
explain plan for
with bound_inlist
as
(
select /*+ CARDINALITY(10) */
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.id;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 840 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 840 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 10 | 190 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | ACTUAL_TABLE_IDX | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | ACTUAL_TABLE | 1 | 65 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Another option is to always use n bind variables in every query. Use null for m+1 to n.
Oracle ignores repeated items in the expression_list. Your queries will perform the same way and there will be fewer hard parses. But there will be extra overhead to bind all the variables and transfer the data. Unfortunately I have no idea what the overall affect on performance would be, you'd have to test it.