We are optimizing performance and want to create the materialized view (several millions of records) built on joins of handful of tables. This view will be used in order to show users documents in folders with delay not more than several (3-5) seconds.
I suppose it must be out-of-place MV with refresh interval several seconds.
Is it acceptable solution from database point of view?
The view will be something like this:
SELECT *
FROM documents this_
LEFT OUTER JOIN account_statements this_1_
ON this_.Id = this_1_.FK_Document
LEFT OUTER JOIN contracts this_2_ ON this_.Id = this_2_.FK_Document
LEFT OUTER JOIN pension_agreements this_3_
ON this_.Id = this_3_.FK_Contract
LEFT OUTER JOIN dead this_4_ ON this_.Id = this_4_.FK_Document
LEFT OUTER JOIN pay_orders this_5_ ON this_.Id = this_5_.FK_Document
LEFT OUTER JOIN pay_registers this_6_
ON this_.Id = this_6_.FK_Document
LEFT OUTER JOIN pocards this_7_ ON this_.Id = this_7_.FK_Document
LEFT OUTER JOIN ransom_agreements this_8_
ON this_.Id = this_8_.FK_Document
LEFT OUTER JOIN successor_statements this_9_
ON this_.Id = this_9_.FK_Document
INNER JOIN document_treenodes treenodes14_
ON this_.Id = treenodes14_.fk_document
INNER JOIN treenodes treenode2_
ON treenodes14_.fk_treenode = treenode2_.Id
LEFT OUTER JOIN registration_cards regcard1_
ON this_.fk_registration_card = regcard1_.Id
LEFT OUTER JOIN employees todirectem12_
ON regcard1_.to_direct = todirectem12_.Id
LEFT OUTER JOIN REG_CARD_STATUSES regcardsta11_
ON regcard1_.status = regcardsta11_.Id
LEFT OUTER JOIN filestorages filestorag10_
ON this_.fk_file = filestorag10_.Id
LEFT OUTER JOIN actions holdaction4_
ON this_.fk_hold = holdaction4_.Id
LEFT OUTER JOIN employees holdemploy5_
ON holdaction4_.fk_operator = holdemploy5_.Id
LEFT OUTER JOIN actions doneaction6_
ON this_.fk_done = doneaction6_.Id
LEFT OUTER JOIN employees doneemploy7_
ON doneaction6_.fk_operator = doneemploy7_.Id
LEFT OUTER JOIN actions signaction8_
ON this_.fk_signed = signaction8_.Id
LEFT OUTER JOIN employees signemploy9_
ON signaction8_.fk_operator = signemploy9_.Id
LEFT OUTER JOIN actions scanaction3_
ON this_.fk_scan = scanaction3_.Id
UPDATE
The bottleneck is in the following:
SELECT *
FROM documents this_
INNER JOIN document_treenodes treenodes14_ ON this_.Id = treenodes14_.fk_document
INNER JOIN treenodes treenode2_ ON treenodes14_.fk_treenode = treenode2_.Id
LEFT OUTER JOIN registration_cards regcard1_ ON this_.fk_registration_card = regcard1_.Id
WHERE (
regcard1_.status IS NULL OR
(
NOT (
regcard1_.status = 3 /* :p0 */)
AND
NOT (
regcard1_.status = 4 /* :p1 */)
)
)
AND
this_.fk_deleted IS NULL AND
(
this_.isdelete IS NULL OR
this_.isdelete = 0 /* :p2 */)
AND
treenode2_.Id = 1235 /* :p3 */ AND
this_.fk_done IS NULL AND
(
regcard1_.status IS NULL OR
NOT (
regcard1_.status = 1 /* :p4 */)
)
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
treenode2_.Id = 1235 /* :p3 */ AND
this_.fk_done IS NULL AND
(
regcard1_.status IS NULL OR
NOT (
regcard1_.status = 1 /* :p4 */)
)
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
And the plan is:
Plan hash value: 3579815467
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 85M| | 50518 (1)| 00:00:04 |
|* 1 | VIEW | | 105K| 85M| | 50518 (1)| 00:00:04 |
|* 2 | WINDOW SORT PUSHED RANK | | 105K| 13M| 14M| 50518 (1)| 00:00:04 |
|* 3 | HASH JOIN RIGHT OUTER | | 105K| 13M| | 48503 (1)| 00:00:04 |
| 4 | INDEX FULL SCAN | REG_CARD_STATUSES_PK | 4 | 12 | | 1 (0)| 00:00:01 |
|* 5 | FILTER | | | | | | |
|* 6 | HASH JOIN RIGHT OUTER | | 105K| 13M| 4048K| 48502 (1)| 00:00:04 |
| 7 | TABLE ACCESS FULL | REGISTRATION_CARDS | 84317 | 3046K| | 171 (2)| 00:00:01 |
|* 8 | HASH JOIN | | 183K| 17M| 3936K| 47339 (1)| 00:00:04 |
|* 9 | INDEX FAST FULL SCAN| DOCUMENT_TREENODE_PK | 183K| 1788K| | 1872 (2)| 00:00:01 |
|* 10 | TABLE ACCESS FULL | DOCUMENTS | 5064K| 425M| | 24635 (2)| 00:00:02 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_010"."rowlimit_$$_rownumber"<=0+50 AND
"from$_subquery$_010"."rowlimit_$$_rownumber">0)
2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("THIS_"."ID") DESC )<=0+50)
3 - access("REGCARD1_"."STATUS"="REGCARDSTA11_"."ID"(+))
5 - filter(("REGCARD1_"."STATUS" IS NULL OR "REGCARD1_"."STATUS"<>3 AND
"REGCARD1_"."STATUS"<>4) AND ("REGCARD1_"."STATUS" IS NULL OR "REGCARD1_"."STATUS"<>1))
6 - access("THIS_"."FK_REGISTRATION_CARD"="REGCARD1_"."ID"(+))
8 - access("THIS_"."ID"="TREENODES14_"."FK_DOCUMENT")
9 - filter("TREENODES14_"."FK_TREENODE"=1235)
10 - filter("THIS_"."FK_DONE" IS NULL AND ("THIS_"."ISDELETE"=0 OR "THIS_"."ISDELETE" IS NULL)
AND "THIS_"."FK_DELETED" IS NULL)
I took the liberty of 'reformatting' your query (BTW: I think there is a bit of a copy-paste error in there, some parts look doubled near the end)
The 'layout' will have zero effect on the actual execution times, but it makes it easier for me to understand what you're doing (simply because I'm used to my own style, I'm not claiming it's better, it's simply what I am used to)
Anyway, if I understood correctly and didn't mess up the brackets, then this should be equivalent to your query:
SELECT *
FROM documents this_
INNER JOIN document_treenodes treenodes14_
ON treenodes14_.fk_document = this_.Id
INNER JOIN treenodes treenode2_
ON treenode2_.Id = treenodes14_.fk_treenode
AND treenode2_.Id = 1235 /* :p3 */
LEFT OUTER JOIN registration_cards regcard1_
ON regcard1_.Id = this_.fk_registration_card
WHERE this_.fk_deleted IS NULL
AND this_.fk_done IS NULL
AND (
this_.isdelete IS NULL OR this_.isdelete = 0 /* :p2 */
)
AND (
regcard1_.status IS NULL OR regcard1_.status NOT IN (3 /* :p0 */, 4 /* :p1 */, 1 /* :p4 */)
)
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
I doubt this change will make much difference, but it might change the way the system approaches steps 8 & 9 in the current plan. Worth a try =)
Anyway, what I 'learned' from the query is that you seem to want all records that have no matching [registration_cards] record, but if there are, then they should not have status 3, 4 or 1 (:p0, :p1, :p4 respectively).
=> wouldn't this be equivalent to saying that you want all [documents] records for which there is no matching [registration_cards] record that has status 3, 4 or 1?
SELECT *
FROM documents this_
INNER JOIN document_treenodes treenodes14_
ON treenodes14_.fk_document = this_.Id
INNER JOIN treenodes treenode2_
ON treenode2_.Id = treenodes14_.fk_treenode
AND treenode2_.Id = 1235 /* :p3 */
WHERE this_.fk_deleted IS NULL
AND this_.fk_done IS NULL
AND (
this_.isdelete IS NULL OR this_.isdelete = 0 /* :p2 */
)
AND NOT EXISTS ( SELECT *
FROM registration_cards regcard1_
WHERE regcard1_.Id = this_.fk_registration_card
AND regcard1_.status IN (3 /* :p0 */, 4 /* :p1 */, 1 /* :p4 */) )
ORDER BY this_.Id DESC
OFFSET 0 ROWS
FETCH FIRST 50 /* :p5 */ ROWS ONLY
Assuming [registration_card].Id is the PK of the table, or there is a covering index on the status and Id field, this might be slightly faster. Like I said before, I'm under the impression most of the time is lost in sorting the result-set but then again I might be totally misinterpreting the explain plan. Googling around actually seems to inform me that the explain plan is but guesswork and not 'the real deal'.... sigh, at times I really pity you poor Oracle users =P
Related
I have 3 tables to join to get the output in the below format.
My table 1 is like:
--------------------------------------------------------
T1_ID1 | T1_ID2 | NAME
--------------------------------------------------------
123 | T11231 | TestName11
123 | T11232 | TestName12
234 | T1234 | TestName13
345 | T1345 | TestName14
--------------------------------------------------------
My table 2 is like:
--------------------------------------------------------
T2_ID1 | T2_ID2 | NAME
--------------------------------------------------------
T11231 | T21231 | TestName21
T11232 | T21232 | TestName21
T1234 | T2234 | TestName22
--------------------------------------------------------
My table 3 is like:
----------------------------------------------------------
T3_ID1 | TYPE | REF
----------------------------------------------------------
T21231 | 1 | 123456
T21232 | 2 | 1234#test.com
T2234 | 2 | 123#test.com
----------------------------------------------------------
My desired output is:
------------------------------------------------------
T1_ID1 | PHONE | EMAIL
------------------------------------------------------
123 | 123456 | 1234#test.com
234 | | 123#test.com
345 | |
------------------------------------------------------
Requirements:
T1_ID2 of table 1 left joins with T2_ID1 of table 2.
T2_ID2 of table 2 left joins with T3_ID1 of table 3.
TYPE of table 3 specifies 1 if the value is phone and specified 2 if value is email.
My output should contain T1_ID1 of table 1 and its corresponding value of REF in table 3, with the REF in the same row.
That is, in this case, T1_ID1 with value 123 has both phone and email. So, it is displayed in the same row in output.
If phone alone is available for corresponding value of T1_ID1, then phone should be populated in the result with email as null and vice versa.
If neither phone nor email is available, nothing should be populated.
I had tried the below SQLs but in vain. Where am I missing? Please extend your help.
Option 1:
SELECT DISTINCT
t1.t1_id1,
t3.ref
|| (
CASE
WHEN t3.type = 1 THEN
1
ELSE
0
END
) phone,
t3.ref
|| (
CASE
WHEN t3.type = 2 THEN
1
ELSE
0
END
) email
FROM
table1 t1
LEFT JOIN table2 t2 ON t1.t1_id2 = t2.t2_id1
LEFT JOIN table3 t3 ON t2.t2_id2 = t3.t3_id1;
Option 2:
SELECT DISTINCT
t1.t1_id1,
t3.ref,
(
CASE
WHEN t3.type = 1 THEN
1
ELSE
0
END
) phone,
t3.ref,
(
CASE
WHEN t3.type = 2 THEN
1
ELSE
0
END
) email
FROM
table1 t1
LEFT JOIN table2 t2 ON t1.t1_id2 = t2.t2_id1
LEFT JOIN table3 t3 ON t2.t2_id2 = t3.t3_id1;
Option 3:
SELECT DISTINCT
t1.t1_id1,
(
CASE
WHEN t3.type = 1 THEN
1
ELSE
0
END
) phone,
(
CASE
WHEN t3.type = 2 THEN
1
ELSE
0
END
) email
FROM
table1 t1
LEFT JOIN table2 t2 ON t1.t1_id2 = t2.t2_id1
LEFT JOIN table3 t3 ON t2.t2_id2 = t3.t3_id1;
select t1_id1, max(t3.ref )phone, max(t33.ref) email
from table1
left outer join
table2 on t1_id2=t2_id1
left outer join table3 t3 on t3.t3_id1=t2_id2 and t3.type=1
left outer join table3 t33 on t33.t3_id1=t2_id2 and t33.type=2
group by t1_id1
if you have maximum one phone and one email in table3 for each t2_id2 entry in table2.
having sql for a big table join (232mln records) with GTT by index. explanation looks like below:
4 NESTED LOOPS
( Estim. Costs = 439,300 , Estim. #Rows = 548,275 )
Estim. CPU-Costs = 3,642,574,678 Estim. IO-Costs = 438,956
1 INDEX FAST FULL SCAN ZTRM_REXP_PRESEL~0
( Estim. Costs = 336 , Estim. #Rows = 548,275 )
Estim. CPU-Costs = 3,432,714 Estim. IO-Costs = 336
3 TABLE ACCESS BY INDEX ROWID BATCHED TEXT_REXP_ITEM
( Estim. Costs = 1 , Estim. #Rows = 1 )
Estim. CPU-Costs = 6,637 Estim. IO-Costs = 1
Filter Predicates
2 INDEX RANGE SCAN TEXT_REXP_ITEM~Y01
( Estim. Costs = 1 , Estim. #Rows = 1 )
Search Columns: 3
Estim. CPU-Costs = 4,523 Estim. IO-Costs = 1
Access Predicates
it shows wrong estimations because of GTT usage. the goal is to make Nested loop for index (2) and gtt (1) first and only then make access to a table itself (3). for some reason, hint USE_NL_WITH_INDEX("TEXT_REXP_ITEM" "TEXT_REXP_ITEM~Y01") is simply being ignored. any ideas why?
(1) consists of
EXPOSURE_ID
VERSION
(2) consists of
Column Name #Distinct
MANDT 1
ZZHEAD_EXPOSURE_ID 251,454
ZZHEAD_VERSION 3,217
ZZHEAD_ATTRIBUTE_DH01 1,691
EXT_ITEM_ID 823
ZZHEAD_ATTRIBUTE_LH01 3
ZZHEAD_RELEASE_STATE 1
(1) and (2) are joined by exposure_id and version fields
text explanation
| 3 | NESTED LOOPS | | 548K| 135M| 439K (1)| 00:00:18 |
| 4 | INDEX FAST FULL SCAN | ZTRM_REXP_PRESEL~0 | 548K| 16M| 336 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID BATCHED| TEXT_REXP_ITEM | 1 | 228 | 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | TEXT_REXP_ITEM~Y01 | 1 | | 1 (0)| 00:00:01 |
thank you
The optimizer is obeying the hint. As the docs say:
The USE_NL_WITH_INDEX hint instructs the optimizer to join the
specified table to another row source with a nested loops join using
the specified table as the inner table
In a nested loop, the outer table is the first one accessed. The inner table is the second.
So the plan uses ZTRM_REXP_PRESEL~0 as the outer table. And TEXT_REXP_ITEM as the inner table. Which is exactly what you've asked for!
Constructing a similar example and using Oracle Database 19c's hint reporting mechanism shows the hint is followed:
create table t1 (
c1 int
);
create table t2 (
c1 int, c2 varchar2(100)
);
create index i1
on t1 ( c1 );
create index i2
on t2 ( c1 );
insert into t1 values ( 1, 'stuff' );
insert into t2
with rws as (
select level x from dual
connect by level <= 1000
)
select x, rpad ( 'stuff', 100, 'f' )
from rws;
exec dbms_stats.gather_table_stats ( user, 't1' ) ;
exec dbms_stats.gather_table_stats ( user, 't2' ) ;
set serveroutput off
select /*+ USE_NL_WITH_INDEX ( T2 I2 ) */*
from t1
join t2
on t1.c1 = t2.c1;
select *
from table(dbms_xplan.display_cursor(null, null, 'BASIC LAST +HINT_REPORT'));
Plan hash value: 3271411982
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
| 3 | INDEX FULL SCAN | I1 |
| 4 | INDEX RANGE SCAN | I2 |
| 5 | TABLE ACCESS BY INDEX ROWID| T2 |
---------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------
4 - SEL$58A6D7F6 / T2#SEL$1
- USE_NL_WITH_INDEX ( T2 I2 )
I have a query in which I am producing results with rows that contain 0 values. I would like to exclude any rows in which columns B or C = 0. To exclude such rows, I have added the T2.A <> 0 and T2.A != 0. When I do this, the 0 values are replaced with NULLs. Thus I also added T2.A IS NOT NULL.
My results still produce the columns that I do not need which show (null) and would like to exclude these.
SELECT
(SELECT
SUM(T2.A) as prem
FROM Table_2 T2, Table_2 T1
WHERE T2.ENT_REF = T1.ENT_REF
AND UPPER(T2.PER) = 'HURR'
AND UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
AND T2.A <> 0
AND T2.A IS NOT NULL
) as B,
(SELECT
SUM(T2.A) as prem
FROM Table_2 T2, Table_2 T1
WHERE T2.ENT_REFE = T1.ENT_REF
AND UPPER(T2.PER) IN ('I', 'II', 'II')
AND UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
AND T2.A <> 0
AND T2.A IS NOT NULL
) as C
Ideally the result will go from:
+----+--------+--------+
| ID | B | C |
+----+--------+--------+
| 1 | 24 | 123 |
| 2 | 65 | 78 |
| 3 | 43 | 89 |
| 3 | 0 | 0 |
| 4 | 95 | 86 |
| 5 | 43 | 65 |
| 5 | (null) | (null) |
+----+--------+--------+
To something similar to the following:
+----+-----+-----+
| ID | B | C |
+----+-----+-----+
| 1 | 24 | 123 |
| 2 | 65 | 78 |
| 3 | 43 | 89 |
| 4 | 95 | 86 |
| 5 | 43 | 65 |
+----+-----+-----+
I have also attempted distinct values, but I have other columns such as dates which are different per row. Although I need to include dates, they are not as important to me as only getting B and C columns with only values > 0. I have also tried using a GROUP BY ID statement, but I get an error that states 'ORA-00979: not a GROUP BY expression'
You have written all the conditions in the SELECT clause.
You are facing the issue because the WHERE clause decides the number of rows to be fetched and SELECT clause decides values to be returned.
In your case, something like the following is happening:
Simple Example:
-- MANUAL DATA
WITH DATAA AS (
SELECT
1 KEY,
'VALS' VALUE,
1 SEQNUM
FROM
DUAL
UNION ALL
SELECT
2,
'IDEAL OPTION',
2
FROM
DUAL
UNION ALL
SELECT
10,
'EXCLUDE',
3
FROM
DUAL
)
-- QUERY OF YOUR TYPE
SELECT
(
SELECT
KEY
FROM
DATAA I
WHERE
I.KEY = 1
AND O.KEY = I.KEY
) AS KEY, -- DECIDE VALUES TO BE SHOWN
(
SELECT
KEY
FROM
DATAA I
WHERE
I.SEQNUM = 1
AND O.SEQNUM = I.SEQNUM
) AS SEQNUM -- DECIDE VALUES TO BE SHOWN
FROM
DATAA O
WHERE
O.KEY <= 2; -- DECIDES THE NUMBER OF RECORDS
OUTPUT:
If you don't want to change much logic in your query then just use additional WHERE clause outside your final query like:
SELECT <bla bla bla>
FROM <YOUR FINAL QUERY>
WHERE B IS NOT NULL AND C IS NOT NULL
Cheers!!
I guess you were on the right track, trying to group values.
In order to do that, columns (that are supposed to be distinct) will be left alone (such as ID in the following example), while the rest should be aggregated (using min, max or any other you find appropriate).
For example, as you said that there's some date column you don't care about - I mean, which one of them you'll select - then select the first one (i.e. min(date_column)). Similarly, you'd do with the rest. The group by clause should contain all non-aggregated columns (id in this example).
select id,
sum(a) a,
sum(b) b,
min(date_column) date_column
from your_current_query
group by id
If I understand your query right, it would be much easier and more performant, to avoid the lookups in the Select clause. Try to bring it all in one Query:
SELECT * FROM (
SELECT T2.ENT_REF AS ID,
SUM(CASE WHEN UPPER(T2.PER) = 'HURR' THEN T2.A END) AS B,
SUM(CASE WHEN UPPER(T2.PER) IN ('I', 'II', 'II') THEN T2.A END) as C
FROM Table_2 T2
WHERE UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
GROUP BY T2.ENT_REF
)
WHERE B IS NOT NULL
OR C IS NOT NULL
I'm having some trouble with an update statement in my oracle database.
The query takes to much time and the temp tablespace is running out of space, but it provides the correct data.
I tried to convert the subqueries to joins but i couldn't figure out how to do it correctly.
If someone knows how to improve the statement or how to convert it into a join i would be really grateful.
UPDATE table1 t1
SET t1.inxdc = (SELECT sda_x
FROM table2 t2
WHERE t1.c1 = t2.c1
AND t1.c2 = t2.c2
AND t1.c3 = t2.c3
AND t1.c4 = t2.c4
AND t1.c5 = t2.c5
AND t1.c6 = t2.c6
AND t2.ident = 'K_SDA_W'
AND rownum=1)
WHERE EXISTS
(SELECT 1
FROM table2 t2
WHERE t1.c1 = t2.c1
AND t1.c2 = t2.c2
AND t1.c3 = t2.c3
AND t1.c4 = t2.c4
AND t1.c5 = t2.c5
AND t1.c6 = t2.c6
AND t2.ident = 'K_SDA_W');
edit1:
Some information for the tables
table1 PKs = c1,c2,c3,c4,c5,c6
table2 PKs = ident,c4,c5,c6, and 3 others not mentioned in the statement (c7,c8,c9)
index: besides the PKs only on table2 c1
table1 data: 12466 rows
table2 data: 194827 rows
edit2:
Execution Plan
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | UPDATE STATEMENT | |
| 1 | UPDATE | table1 |
| 2 | NESTED LOOPS SEMI | |
| 3 | TABLE ACCESS FULL | table1 |
| 4 | TABLE ACCESS BY INDEX ROWID| table2 |
| 5 | INDEX RANGE SCAN | t2.c1 |
| 6 | COUNT STOPKEY | |
| 7 | TABLE ACCESS BY INDEX ROWID| table2 |
| 8 | INDEX RANGE SCAN | t2.PK |
--------------------------------------------------------------
There are very few rows in Table1, just drop the WHERE clause in this particular situation and add NVL to the value returned from subquery:
UPDATE table1 t1
SET t1.inxdc = NVL((SELECT sda_x
FROM table2 t2
WHERE t1.c1 = t2.c1
AND t1.c2 = t2.c2
AND t1.c3 = t2.c3
AND t1.c4 = t2.c4
AND t1.c5 = t2.c5
AND t1.c6 = t2.c6
AND t2.ident = 'K_SDA_W'
AND rownum=1), t1.inxdc);
In general your update should be quick, have you checked performance of the subquery? Check if an index (and which one) is used on table2 for the subquery (best, show us the exection plan).
I think table t2 shoulkd have an index on c1,c2,c3, c4,c5,c6,ident
In this case the update of t1 should be really faster.
I have 2 tables which have many records (say both TableA and TableB has about 3,000,000 records).vr2_input is a varchar input parameters enter by the users and I want to get the most 200 largest "dateField" 's TableA records whose stringField like 'vr2_input' .The 2 tables are joined as the following:
select * from(
select * from
TableA join TableB on TableA.id = TableB.id
where TableA.stringField like 'vr2_input' || '%'
order by TableA.dateField desc
) where rownum < 201
The query is slow , I goggled that and found out that it is because "like" and "order by" involves the full table scan .However , I cannot found a solution to solve the problem . How can I tune this type of SQL? I have already create an index on TableA.stringField and TableA.dateField but how can I use the index feature in the select statement? The database is oracle 10g. Thanks so much!!
Update : I use iddqd 's suggestion and only select the fields that I want and run the explain plan . It cost about 4 mins to finish the query . IX_TableA_stringField is the index name of the TableA.srv_ref field .I run again the explain plan without the hint , the explain plan still get the same result.
EXPLAIN PLAN FOR
select * from(
select
/*+ INDEX(TableB IX_TableA_stringField)*/
TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
TableB.someField1,
TableB.someField2,
TableB.someField3,
from TableA
join TableB on TableA.id=TableB.id
WHERE TableA.stringField like '21'||'%'
order by TableA.dateField desc
) where rownum < 201
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 871807846
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 24000 | 3293 (1)| 00:00:18 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1397 | 163K| 3293 (1)| 00:00:18 |
|* 3 | SORT ORDER BY STOPKEY | | 1397 | 90805 | 3293 (1)| 00:00:18 |
| 4 | NESTED LOOPS | | 1397 | 90805 | 3292 (1)| 00:00:18 |
| 5 | TABLE ACCESS BY INDEX ROWID| TableA | 1397 | 41910 | 492 (1)| 00:00:03 |
|* 6 | INDEX RANGE SCAN | IX_TableA_stringField | 1397 | | 6 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| TableB | 1 | 35 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | PK_TableB | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<201)
3 - filter(ROWNUM<201)
6 - access("TableA"."stringField" LIKE '21%')
filter("TableA"."stringField" LIKE '21%')
8 - access(TableA"."id"="TableB"."id")
You say it's taking about 4 minutes to run the query. The EXPLAIN PLAN output shows an estimate of 18 seconds. So the optimizer is probably far off on some of its estimates in this case. (It could still be choosing the best possible plan, but maybe not.)
The first step in a case like this is to get the actual execution plan and statistics. Run your query with the hint /*+ gather_plan_statistics */, then immediately afterwards execute select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')).
This will show the actual execution plan that was run, and for each step it will show the estimated rows, actual rows, and actual time taken. Post the output here and maybe we can say something more meaningful about your issue.
Without that information, my suggestion is to try out the following rewrite of the query. I believe it is equivalent since it appears that ID is the primary key of TableB.
select TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
TableB.someField1,
TableB.someField2,
TableB.someField3,
from (select * from(
select
TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
from TableA
WHERE TableA.stringField like '21'||'%'
order by TableA.dateField desc
)
where rownum < 201
) TableA
join TableB on TableA.id=TableB.id
Do you need to select all columns (*)? The optimizer will be more likely to full scan if you select all columns. If you need all columns in output you may be better to select the id in your inline view and then join back to select other columns, which could be done with an index lookup. Try running an explain plan for both cases to see what the optimizer is doing.
Create indexes on the stringField and dateField columns. The SQL engine uses them automatically.
select id from(
select /*+ INDEX(TableB stringField_indx)*/ TableB.id from
TableA join TableB on TableA.id = TableB.id
where TableA.stringField like 'vr2_input' || '%'
order by TableA.dateField desc
) where rownum < 201
next:
SELECT * FROM TableB WHERE id iN( id from first query)
Please send stats and DDL of this tables.
If you have enough memory you can hint the query to use hash join. Could you please attach the explain plan
How many records does Table A has if it's the smaller table could you do the select on that table and then loop though the results retrieving the Table B records, as both the select and the sort are on TableA.
A good experiment would be to remove the join and test the speed on that also if allowed can you put the rownum < 201 as an AND clause on the main query. It's probable at the moment that the query is returning all rows to the outer query and then it's getting trimmed?
To optimize the like predicate, you can create a contextual index and use contains clause.
Look: http://docs.oracle.com/cd/B28359_01/text.111/b28303/ind.htm
Thanks
You can create one function index on tableA. That will return 1 or 0 based on the condition TableA.stringField like 'vr2_input' || '%' is satisfied or not. That index will make query run faster. The logic of the function will be
if (substr(TableA.stringField, 1, 9) = 'vr2_input'
THEN
return 1;
else
return 0;
Using actual column names instead of "*" may help. At least common column names should be removed.