Oracle Date index is slow. Query is 300 times faster without it - performance

I had an Oracle query as below that took 10 minutes or longer to run:
select
r.range_text as duration_range,
nvl(count(c.call_duration),0) as calls,
nvl(SUM(call_duration),0) as total_duration
from
call_duration_ranges r
left join
big_table c
on c.call_duration BETWEEN r.range_lbound AND r.range_ubound
and c.aaep_src = 'MAIN_SOURCE'
and c.calltimestamp_local >= to_date('01-02-2014 00:00:00' ,'dd-MM-yyyy HH24:mi:ss')
AND c.calltimestamp_local <= to_date('28-02-2014 23:59:59','dd-MM-yyyy HH24:mi:ss')
and c.destinationnumber LIKE substr( 'abc:1301#company.com:5060;user=phone',1,8) || '%'
group by
r.range_text
order by
r.range_text
If I changed the date part of the query to:
(c.calltimestamp_local+0) >= to_date('01-02-2014 00:00:00' ,'dd-MM-yyyy HH24:mi:ss')
(AND c.calltimestamp_local+0) <= to_date('28-02-2014 23:59:59','dd-MM-yyyy HH24:mi:ss')
It runs in 2 seconds. I did this based on another post to avoid using the date index. Seems counter intuitive though--the index slowing things down so much.
Ran the explain plan and it seems identical between the new and updated query. Only difference is the the MERGE JOIN operation is 16,269 bytes in the old query and 1,218 bytes in the new query. Actually cardinality is higher in the old query as well. And I actually don't see an "INDEX" operation on the old or new query in the explain plan, just for the index on the destinationnumber field.
So why is the index slowing down the query so much? What can I do to the index--don't think using the "+0" is the best solution going forward...
Querying for two days of data, suppressing use of destinationnumber index:
0 SELECT STATEMENT ALL_ROWS 329382 1218 14
1 SORT GROUP BY 329382 1218 14
2 MERGE JOIN OUTER 329381 1218 14
3 SORT JOIN 4 308 14
4 TABLE ACCESS FULL CALL_DURATION_RANGES ANALYZED 3 308 14
5 FILTER
6 SORT JOIN 329377 65 1
7 TABLE ACCESS BY GLOBAL INDEX ROWID BIG_TABLE ANALYZED 329376 65 1
8 INDEX RANGE SCAN IDX_CDR_CALLTIMESTAMP_LOCAL ANALYZED 1104 342104
Querying for 2 days using destinationnumber index:
0 SELECT STATEMENT ALL_ROWS 11 1218 14
1 SORT GROUP BY 11 1218 14
2 MERGE JOIN OUTER 10 1218 14
3 SORT JOIN 4 308 14
4 TABLE ACCESS FULL CALL_DURATION_RANGES ANALYZED 3 308 14
5 FILTER
6 SORT JOIN 6 65 1
7 TABLE ACCESS BY GLOBAL INDEX ROWID BIG_TABLE ANALYZED 5 65 1
8 INDEX RANGE SCAN IDX_DESTINATIONNUMBER_PART ANALYZED 4 4
Querying for one month, suppressing destinationnumber index--full scan:
0 SELECT STATEMENT ALL_ROWS 824174 1218 14
1 SORT GROUP BY 824174 1218 14
2 MERGE JOIN OUTER 824173 1218 14
3 SORT JOIN 4 308 14
4 TABLE ACCESS FULL CALL_DURATION_RANGES ANALYZED 3 308 14
5 FILTER
6 SORT JOIN 824169 65 1
7 PARTITION RANGE ALL 824168 65 1
8 TABLE ACCESS FULL BIG_TABLE ANALYZED 824168 65 1

Seems counter intuitive though--the index slowing things down so much.
Counter-intuitive only if you don't understand how indexes work.
Indexes are good for retrieving individual rows. They are not suited to retrieving large numbers of records. You haven't bothered to provide any metrics but it seems likely your query is touching a large number of rows. In which case a full table scan or other set=based operation will be more much efficient.
Tuning date range queries is tricky, because it's very hard for the database to know how many records lie between the two bounds, no matter how up-to-date our statistics are. (Even more tricky to tune when the date bounds can vary - one day is a different matter from one month or one year.) So often we need to help the optimizer by using our knowledge of our data.
don't think using the "+0" is the best solution going forward...
Why not? People have been using that technique to avoid using an index in a specific query for literally decades.
However, there are more modern solutions. The undocumented cardinality hint is one:
select /*+ cardinality(big_table,10000) */
... should be enough to dissuade the optimizer from using an index - provided you have accurate statistics gathered for all the tables in the query.
Alternatively you can force the optimizer to do a full table scan with ...
select /*+ full(big_table) */
Anyway, there's nothing you can do to the index to change the way databases work. You could make things faster with partitioning, but I would guess if your organisation had bought the Partitioning option you'd be using it already.

These are the reasons using an index slows down a query:
A full tablescan would be faster. This happens if a substantial fraction of rows has to be retrieved. The concrete numbers depend on various factors, but as a rule of thumb in common situations using an index is slower if you retrieve more than 10-20% of your rows.
Using another index would be even better, because fewer rows are left after the first stage. Using a certain index on a table usually means that other indexes cannot be used.
Now it is the optimizers job to decide which variant is the best. To perform this task, he has to guess (among other things) how much rows are left after applying certain filtering clauses. This estimate is based on the tables statistics, and is usually quite ok. It even takes into account skewed data, but it might be off if either your statitiscs are outdated or you have a rather uncommon distribution of data. For example, if you computed your statistics before the data of february was inserted in your example, the optimizer might wrongly conclude that there are only few (if any) rows left after applaying the date range filter.
Using combined indexes on several columns might also be an option dependent on your data.
Another note on the "skewed data issue": There are cases the optimizer detects skewed data in column A if you have an index on cloumn A but not if you only have a combined index on columns A and B, because the combination might make the distribution more even. This is one of the few cases where an index on A,B does not make an index on A redundant.
APCs answer show how to use hints to direct the optimizer in the right direction if it still produces wrong plans even with right statistics.

Related

Oracle JOIN VS PARTITION BY

I maintain a table in Oracle that contains several hundred thousand lines of code, including a priority column, which indicates for each line its importance according to the needs of the system.
ID
BRAND
COLOR
VALUE
SIZE
PRIORITY
EFFECTIVE_DATE_FROM
EFFECTIVE_DATE_FROM
1
BL
BLUE
58345
12
1
10/07/2022
NULL
2
TK
BLACK
4455
1
1
10/07/2022
NULL
3
TK
RED
16358
88
2
11/01/2022
NULL
4
WRA
RED
98
10
6
18/07/2022
NULL
5
BL
BLUE
20942
18
7
02/06/2022
NULL
At any given moment thousands more rows may enter the table, and it is necessary to SELECT from it the 1000 rows with the highest priority.
Although the naive solution is to SELECT using ORDER BY PRIORITY ASC, we find that the process takes a long time when the table contains a very large amount of rows (say over 2 million records).
The solutions proposed so far are to divide the table into 2 different tables, so that in advance the records with priority 1 will be entered into Table A, and the rest of the records will be entered in Table B, and then we will SELECT using UNION between the two tables.
This way we save the ORDER BY process since Table A always contains priority 1, and it remains to sort only the data in Table B, which is expected to be smaller than the overall large table.
On the other hand, it was also suggested to leave the large table in place, and perform the SELECT using PARTITION BY on the priority column.
I searched the web for differences in speed or efficiency between the two options but did not find one, and I am debating how to do it. So which of the options is preferable if we focus on the efficiency and complexity of time?

Force partition pruning on Oracle

I have a query similar to this
select *
from small_table A
inner join huge_table B on A.DATE =B.DATE
The huge_table is partitioned by DATE, and the PK is DATE, some_id and some_other_id (so the join not is done by pk index).
small_table just contains a few dates.
The total cost of the SQL is 48 minutes
By some reason the explain plan give me a "PARTITION RANGE (ALL)" with a high numbers on cardinality. Looks like access to the full table, not just the partitions indicated by small_table.DATE
If I put the SQL inside a loop and do
for o in (select date from small_table)
loop
select *
from small_table A
inner join huge_table B on A.DATE =B.DATE
where B.DATE=O.DATE
end loop;
Only takes 2 minutes 40 seconds (the full loop).
There is any way to force the partition pruning on Oracle 12c?
Additional info:
small_table has 37 records for 13 different dates. huge_table has 8,000 million of records with 179 dates/partitions. The SQL needs one field from small_table, but I can tweak the SQL to not use it
Update:
With the use_nl hint, now the cardinality show in the execution plan is more accurate and the execution time downs from 48 minutes to 4 minutes.
select /* use_nl(B) */*
from small_table A
inner join huge_table B on A.DATE =B.DATE
This seems like the problem:
"small_table have 37 registries for 13 different dates. huge_table has 8.000 millions of registries with 179 dates/partitions....
The SQL need one field from small_table, but I can tweak the SQL to not use it "
According to the SQL you posted you're joining the two tables on just their DATE columns with no additional conditions. If that's really the case you are generating a cross join in which each partition of huge_table is joined to small_table 2-3 times. So your result set may be much large than you're expecting, which means more database effort, which means more time.
The other thing to notice is that the cardinality of small_table to huge_table partitions is about 1:4; the optimizer doesn't know that there are really only thirteen distinct huge_table partitions in play.
Optimization ought to be a science and this is more guesswork than anything but try this:
select B.*
from ( select /*+ cardinality(t 13) */
distinct t.date
from small_table t ) A
inner join huge_table B
on A.DATE =B.DATE
This should communicate to the optimizer that only a small percentage of the huge_table partitions are required, which may make it choose partition pruning. Also it removes that Cartesian product, which should improve performance too. Obviously you will need to apply that tweak you mentioned, to remove the need to query anything else from small_table.

Join Performance Not Good enough - Oracle SQL Query Optimisation

I am investigating a problem where our application takes too much time to get data from Oracle Database. In my investigation, I found that the slowness of the query traces to the join between tables and because of the aggregate function- SUM.
This may look simple but I am not a good with SQL query optimization.
The query is below
SELECT T1.TONNES, SUM(R.TONNES) AS TOTAL_TONNES
FROM
RECLAIMED R ,
(SELECT DELIVERY_OUT_ID, SUM(TONNES) AS TONNES FROM RECLAIMED WHERE DELIVERY_IN_ID=53773 GROUP BY DELIVERY_OUT_ID) T1
where
R.DELIVERY_OUT_ID = T1.DELIVERY_OUT_ID
GROUP BY
T1.TONNES
SUM(R.TONNES) is the total tonnes per delivery out.
SUM(TONNES) is the total tonnes per delivery in.
My table looks like
I have 16 million entries in this table, and by trying multiple delivery_in_id's by average I am getting about 6 seconds for the query to comeback.
I have similar database (complete copy but only have 4 million entries) and when the same query is applied I am getting less than 1 seconds.
They have both the same indexes so I am confident that index is not a problem.
I am certain that it is just the data, it is heavy on the first database(16 million). I have a feeling that when this query is optimized then the problem will be solved.
Open for suggestions : )
Are the two DB on the same server? If it's not, first, compare the computer configuration, settings and running applications.
If there is no differences, you can try to check if your have NULL values in the column you want to SUM. Use NVL-function to improve your query if there are some.
Also, you may "Analyse index" (or "Rebuild Index"). It cleans up the index. (it's quite fast and safe for your data).
If it is not helping, look if the TABLESPACE of your table is not full. It might have some impact... but I am not sure.
;-)
I've solved the performance problem by updating the stored procedure. It is optimized in a way by adding a filter in the first table before joining the second table. Below is the outcome stored procedure
SELECT R.DELIVERY_IN_ID, R.DELIVERY_OUT_ID, SUM(R.TONNES),
(SELECT SUM(TONNES) AS TONNES FROM RECLAIMED WHERE DELIVERY_OUT_ID=R.DELIVERY_OUT_ID) AS TOTAL_TONNES
FROM
CTSBT_RECLAIMED R
WHERE DELIVERY_IN_ID=53733
GROUP BY DELIVERY_IN_ID, R.DELIVERY_OUT_ID
The result in timing/performance is huge for my case since I am joining a huge table(16M).This query now goes for less than a second. The slowness, I am suspecting is due to 1 table having no index(see T1) and even though in my case it is only about 20 items, it is enough to slow the query down because it compares it to 16 million entries.
The optimized query does the filtering of this 16 million and merge to T1 after.
Should there be a better way how to optimized this? Probably. But I am happy with this result and solved what I intended to solve. Now moving on.
Thanks for those who commented.

performance for sum oracle

I have to sum a huge number of data with aggregation and where clause, using this query
what I am doing is like this : I have three tables one contains terms the second contains user terms , and the third contains correlation factor between term and user term.
I want to calculate the similarity between the sentence that that user inserted with an already existing sentences, and take the results greater than .5 by summing the correlation factor between sentences' terms
The problem is that this query takes more than 15 min. because I have huge tables
any suggestions to improve performance please?
insert into PLAG_SENTENCE_SIMILARITY
SELECT plag_TERMS.SENTENCE_ID ,plag_User_TERMS.SENTENCE_ID,
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length),
plag_TERMs.isn,
plag_user_terms.isn
FROM plag_TERM_CORRELATIONS3,
plag_TERMS,
Plag_User_TERMS
WHERE ( Plag_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM1
AND Plag_User_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM2
AND Plag_User_Terms.ISN=123)
having
least( sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length) >0.5
group by (plag_User_TERMS.SENTENCE_ID,plag_TERMS.SENTENCE_ID , plag_TERMs.isn, plag_terms.sentence_length,plag_user_terms.sentence_length, plag_user_terms.isn);
plag_terms contains more than 50 million records and plag_correlations3 contains 500000
If you have a sufficient amount of free disk space, then create a materialized view
over the join of the three tables
fast-refreshable on commit (don't use the ANSI join syntax here, even if tempted to do so, or the mview won't be fast-refreshable ... a strange bug in Oracle)
with query rewrite enabled
properly physically organized for quick calculations
The query rewrite is optional. If you can modify the above insert-select, then you can just select from the materialized view instead of selecting from the join of the three tables.
As for the physical organization, consider
hash partitioning by Plag_User_Terms.ISN (with a sufficiently high number of partitions; don't hesitate to partition your table with e.g. 1024 partitions, if it seems reasonable) if you want to do a bulk calculation over all values of ISN
single-table hash clustering by Plag_User_Terms.ISN if you want to retain your calculation over a single ISN
If you don't have a spare disk space, then just hint your query to
either use nested loops joins, since the number of rows processed seems to be quite low (assumed by the estimations in the execution plan)
or full-scan the plag_correlations3 table in parallel
Bottom line: Constrain your tables with foreign keys, check constraints, not-null constraints, unique constraints, everything! Because Oracle optimizer is capable of using most of these informations to its advantage, as are the people who tune SQL queries.

How to force oracle to use index range scan?

I have a series of extremely similar queries that I run against a table of 1.4 billion records (with indexes), the only problem is that at least 10% of those queries take > 100x more time to execute than others.
I ran an explain plan and noticed that the for the fast queries (roughly 90%) Oracle is using an index range scan; on the slow ones, it's using a full index scan.
Is there a way to force Oracle to do an index range scan?
To "force" Oracle to use an index range scan, simply use an optimizer hint INDEX_RS_ASC. For example:
CREATE TABLE mytable (a NUMBER NOT NULL, b NUMBER NOT NULL, c CHAR(10)) NOLOGGING;
INSERT /*+ APPEND */ INTO mytable(a,b,c)
SELECT level, mod(level,100)+1, 'a' FROM dual CONNECT BY level <= 1E6;
CREATE INDEX myindex_ba ON mytable(b, a);
EXECUTE dbms_stats.gather_table_stats(NULL,'mytable');
SELECT /*+ FULL(m) */ b FROM mytable m WHERE b=10; -- full table scan
SELECT /*+ INDEX_RS_ASC(m) */ b FROM mytable m WHERE b=10; -- index range scan
SELECT /*+ INDEX_FFS(m) */ b FROM mytable m WHERE b=10; -- index fast full scan
Whether this will make your query actually run faster depends on many factors like the selectivity of the indexed value or the physical order of the rows in your table. For instance, if you change the query to WHERE b BETWEEN 10 AND <xxx>, the following costs appear in the execution plans on my machine:
b BETWEEN 10 AND 10 20 40 80
FULL 749 750 751 752
INDEX_RS_ASC 29 325 865 1943
INDEX_FFS 597 598 599 601
If you change the query very slightly to not only select the indexed column b, but also other, non-index columns, the costs change dramatically:
b BETWEEN 10 AND 10 20 40 80
FULL 749 750 751 754
INDEX_RS_ASC 3352 40540 108215 243563
INDEX_FFS 3352 40540 108215 243563
I suggest the following approach:-
Get an explain plan on the slow statement
Using an INDEX hint, get an explain plan on using the index
You'll notice that the cost of the INDEX plan is greater. This is why Oracle is not choosing the index plan. The cost is Oracle's estimate based on the statistics it has and various assumptions.
If the estimated cost of a plan is greater, but it actually runs quicker then the estimate is wrong. Your job is to figure out why the estimate is wrong and correct that. Then Oracle will choose the right plan for this statement and others on it's own.
To figure out why it's wrong, look at the number of expected rows in the plan. You will probably find one of these is an order of magnitude out. This might be due to non-uniformly distributed column values, old statistics, columns that corelate with each other etc.
To resolve this, you can get Oracle to collect better statistics and hint it with better starting assumptions. Then it will estimate accurate costs and come up with the fastest plan.
If you post more information I might be able to comment further.
If you want to know why the optimizer takes the decisions it does you need to use the 10053 trace.
SQL> alter session set events '10053 trace name context forever, level 1';
Then run explain plans for a sample fast query and a sample slow query. In the user dump directory you will get trace files detailing the decision trees which the CBO goes through. Somewhere in those files you will find the reasons why it chooses a full index scan over an index range scan.
I'm not saying the trace files are an easy read. The best resource for understanding them is Wolfgang Breitling's excellent whitepaper "A look under the hood of CBO" (PDF)
you can use oracle sql hints. you can force to use specific index or exclude index
check the documentation
http://psoug.org/reference/hints.html
http://www.adp-gmbh.ch/ora/sql/hints/index.html
like
select /*+ index(scott.emp ix_emp) */ from scott.emp emp_alias
I have seen hint is ignored by Oracle.
Recently, our DBA uses "optimizer_index_cost_adj" and it utilized the index.
This is Oracle parameter but you could use it as session level.
100 is default value and we used 10 as the parameter.

Resources