I have a query which uses ROW_NUMBER(). I have something like this:
ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
The query runs pretty fast. However, if I add any reference to the "rnum" column, the query slows to a crawl. So, it appears that just having ROW_NUMBER() is not the issue, but when I use the "rnum" in the actual query, it crawls for like 30 seconds.
Any thoughts?
For Reference, here is the query:
WITH aquire AS (
SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT d.trans_id, d.source, 'AquireMedia' AS provider,
d.trans_time AS publish_date, '/research/get_news.php?id=' || d.trans_id AS story_link,
i.name AS industry_name, s.sector_name, d.headline AS subject, NULL AS teaser,
NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(d.trans_id,'AQUIREMEDIA') AS tickers,
ROW_NUMBER() OVER (PARTITION BY d.trans_id ORDER BY d.trans_time DESC) as rtnum
FROM story_descriptions_3m d, story_tickers_3m t, uber_master_mv m, industry i, ind_sector ix, sectors s, comp_ind c
WHERE d.trans_id = t.trans_id
AND t.m_ticker = m.m_ticker
AND t.m_ticker = c.m_ticker(+)
AND c.ind_code = i.ind_code(+)
AND i.ind_code = ix.ind_code(+)
AND ix.sector_id = s.sector_id(+) AND s.sector_id = 10 )
WHERE rtnum = 1),
partner AS (
SELECT rtnum, trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT CAST(n.story_id AS VARCHAR2(20)) trans_id, n.provider AS source, 'Partner News' AS provider,
n.story_date AS publish_date, n.link AS story_link, i.name AS industry_name, s.sector_name, n.title AS subject,
CAST(substr(n.teaser,1,4000) AS VARCHAR2(4000)) AS teaser, NEWS.NEWS_FUNCTIONS.CONCATENATE_TICKERS(n.story_id,'OTHER') AS tickers,
ROW_NUMBER() OVER (PARTITION BY n.story_id ORDER BY n.story_date DESC) as rtnum
FROM news_stories_3m n, news_stories_lookup_3m t, comp_ind c, uber_master_mv m, industry i, ind_sector ix, sectors s
WHERE t.story_id = n.story_id
AND t.ticker = m.ticker
AND m.m_ticker = c.m_ticker(+)
AND c.ind_code = i.ind_code(+)
AND i.ind_code = ix.ind_code(+)
AND ix.sector_id = s.sector_id(+) AND s.sector_id = 10 )
WHERE rtnum = 1)
SELECT trans_id, source, provider,
TO_CHAR(publish_date,'MM/DD/YYYY HH24:MI:SS') AS publish_date,
UNIX_TIMESTAMP(publish_date) AS timestamp,
story_link, industry_name, sector_name, subject, teaser, tickers
FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers,
ROW_NUMBER() OVER (ORDER BY publish_date DESC) rnum
FROM (SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM aquire WHERE rtnum <= 5
UNION ALL
SELECT trans_id, source, provider, publish_date, story_link, industry_name, sector_name, subject, teaser, tickers
FROM partner WHERE rtnum <= 5))
WHERE rnum BETWEEN 1 AND 1 * 5;
Let simulate your query on a simple example to demonstrate and explain that you encounter expactable results.
Sample Data
create table tab1 as
select rownum id, lpad('x',3000,'y') pad from dual connect by level <= 1000000;
Now if you run a query below in your IDE, you will instantly see the first page of the result set.
Note, you define the row_numberbut do not use it.
select id, pad from (
select id, pad,
row_number() over (order by id) as rnum
from tab1
)
The answert is in the execution plan below
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 2866M| 135K (1)| 00:00:06 |
| 1 | TABLE ACCESS FULL| TAB1 | 1000K| 2866M| 135K (1)| 00:00:06 |
--------------------------------------------------------------------------
you see that no sorting and filtering is performed, the row_number is simple ignored.
This (fetching only few initial rows and no sorting) explains why the query performs.
Contrary if you constraint on the row_number as follows
SQL> select id, pad from (
2 select id, pad,
3 row_number() over (order by id) as rnum
4 from tab1
5 ) where rnum between 1 and 5
6 ;
Elapsed: 00:00:07.80
You observe respectable elapsed time. Again the execution plan provides the answer.
See here how to get the execution plan for your query.
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 7640 | | 762K (1)| 00:00:30 |
|* 1 | VIEW | | 5 | 7640 | | 762K (1)| 00:00:30 |
|* 2 | WINDOW SORT PUSHED RANK| | 1000K| 2866M| 3906M| 762K (1)| 00:00:30 |
| 3 | TABLE ACCESS FULL | TAB1 | 1000K| 2866M| | 135K (1)| 00:00:06 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNUM">=1 AND "RNUM"<=5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "ID")<=5)
The consequence is that now you must now go throu all records (in your case perform all the joins), which breaks the performance.
To prove it, run simple your performat query with a fetch all option or with added order by clause. It is quite possible that you will get the same non-performant result as in your second query.
Final Remark
Instead of the ROW_NUMBER() you may use the row_limiting_clause
Pass the ordering column from the row_number in the order by clause and use offset and fetch first to limit the result.
select id, pad from (
select id, pad
from tab1
) order by id
fetch first 5 rows only;
Under the cover you'll see the identical execution plan using the WINDOW SORT PUSHED RANK as above.
Related
I have a query in which I am producing results with rows that contain 0 values. I would like to exclude any rows in which columns B or C = 0. To exclude such rows, I have added the T2.A <> 0 and T2.A != 0. When I do this, the 0 values are replaced with NULLs. Thus I also added T2.A IS NOT NULL.
My results still produce the columns that I do not need which show (null) and would like to exclude these.
SELECT
(SELECT
SUM(T2.A) as prem
FROM Table_2 T2, Table_2 T1
WHERE T2.ENT_REF = T1.ENT_REF
AND UPPER(T2.PER) = 'HURR'
AND UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
AND T2.A <> 0
AND T2.A IS NOT NULL
) as B,
(SELECT
SUM(T2.A) as prem
FROM Table_2 T2, Table_2 T1
WHERE T2.ENT_REFE = T1.ENT_REF
AND UPPER(T2.PER) IN ('I', 'II', 'II')
AND UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
AND T2.A <> 0
AND T2.A IS NOT NULL
) as C
Ideally the result will go from:
+----+--------+--------+
| ID | B | C |
+----+--------+--------+
| 1 | 24 | 123 |
| 2 | 65 | 78 |
| 3 | 43 | 89 |
| 3 | 0 | 0 |
| 4 | 95 | 86 |
| 5 | 43 | 65 |
| 5 | (null) | (null) |
+----+--------+--------+
To something similar to the following:
+----+-----+-----+
| ID | B | C |
+----+-----+-----+
| 1 | 24 | 123 |
| 2 | 65 | 78 |
| 3 | 43 | 89 |
| 4 | 95 | 86 |
| 5 | 43 | 65 |
+----+-----+-----+
I have also attempted distinct values, but I have other columns such as dates which are different per row. Although I need to include dates, they are not as important to me as only getting B and C columns with only values > 0. I have also tried using a GROUP BY ID statement, but I get an error that states 'ORA-00979: not a GROUP BY expression'
You have written all the conditions in the SELECT clause.
You are facing the issue because the WHERE clause decides the number of rows to be fetched and SELECT clause decides values to be returned.
In your case, something like the following is happening:
Simple Example:
-- MANUAL DATA
WITH DATAA AS (
SELECT
1 KEY,
'VALS' VALUE,
1 SEQNUM
FROM
DUAL
UNION ALL
SELECT
2,
'IDEAL OPTION',
2
FROM
DUAL
UNION ALL
SELECT
10,
'EXCLUDE',
3
FROM
DUAL
)
-- QUERY OF YOUR TYPE
SELECT
(
SELECT
KEY
FROM
DATAA I
WHERE
I.KEY = 1
AND O.KEY = I.KEY
) AS KEY, -- DECIDE VALUES TO BE SHOWN
(
SELECT
KEY
FROM
DATAA I
WHERE
I.SEQNUM = 1
AND O.SEQNUM = I.SEQNUM
) AS SEQNUM -- DECIDE VALUES TO BE SHOWN
FROM
DATAA O
WHERE
O.KEY <= 2; -- DECIDES THE NUMBER OF RECORDS
OUTPUT:
If you don't want to change much logic in your query then just use additional WHERE clause outside your final query like:
SELECT <bla bla bla>
FROM <YOUR FINAL QUERY>
WHERE B IS NOT NULL AND C IS NOT NULL
Cheers!!
I guess you were on the right track, trying to group values.
In order to do that, columns (that are supposed to be distinct) will be left alone (such as ID in the following example), while the rest should be aggregated (using min, max or any other you find appropriate).
For example, as you said that there's some date column you don't care about - I mean, which one of them you'll select - then select the first one (i.e. min(date_column)). Similarly, you'd do with the rest. The group by clause should contain all non-aggregated columns (id in this example).
select id,
sum(a) a,
sum(b) b,
min(date_column) date_column
from your_current_query
group by id
If I understand your query right, it would be much easier and more performant, to avoid the lookups in the Select clause. Try to bring it all in one Query:
SELECT * FROM (
SELECT T2.ENT_REF AS ID,
SUM(CASE WHEN UPPER(T2.PER) = 'HURR' THEN T2.A END) AS B,
SUM(CASE WHEN UPPER(T2.PER) IN ('I', 'II', 'II') THEN T2.A END) as C
FROM Table_2 T2
WHERE UPPER(T2.ENT_TYPE) = 'POL'
AND T2.Cov NOT IN ('OUTPROP','COV')
GROUP BY T2.ENT_REF
)
WHERE B IS NOT NULL
OR C IS NOT NULL
I have Incoming Stock transaction data using Oracle:
ID | DESCRIPTION | PART_NO | QUANTITY | DATEADDED
TR5 | FG | P0025 | 5 | 06-SEP-2017 08:20:33 <-- just now added
TR4 | Test | TEST1 | 8 | 05-SEP-2017 15:11:15
TR3 | FG | GSDFGSG | 10 | 31-AUG-2017 16:26:04
TR2 | FG | GSDFGSG | 2 | 31-AUG-2017 16:05:39
TR1 | FG | GSDFGSG | 2 | 30-AUG-2017 16:30:16
And now I'm grouping that data to be:
TR_ID | PART_NO | TOTAL
TR1 | GSDFGSG | 14
TR4 | TEST1 | 8
TR5 | P0025 | 5 <-- just now added
Query Code:
SELECT MIN(TRANSACTION_EQUIPMENTID) as TR_ID,
PART_NO,
SUM(T.QUANTITY) AS TOTAL
FROM WA_II_TBL_TR_EQUIPMENT T
GROUP BY T.PART_NO
As you can see on that data and query code, I'm show TR_ID using MIN to get first ID on first transaction.
And now I have Outgoing transaction data:
Assume I try to get quantity 8
ID_FK | QUANTITY
TR1 | 8
And now I want to get last ID due to quantity 8 has been consumed
ID | DESCRIPTION | PART_NO | QUANTITY
TR3| FG | GSDFGSG | 10 <-- CONSUMED 4+2+2, TOTAL 8
TR2| FG | GSDFGSG | 2 <-- CONSUMED 2+2, TOTAL 4
TR1| FG | GSDFGSG | 2 <-- CONSUMED 2
As you can see above, TR1, TR2 has been consumed. Now I want the query
SELECT MIN(TRANSACTION_EQUIPMENTID) as TR_ID,
PART_NO,
SUM(T.QUANTITY) AS TOTAL
FROM WA_II_TBL_TR_EQUIPMENT T
GROUP BY T.PART_NO
get the last id is : TR3, due to TR1 & TR2 has been consumed.
How to do that in query?
Take minimum id where growing sum is greater than 8. Use analytic sum():
select min(id) id
from (select t.*,
sum(quantity) over (partition by part_no order by id) sq
from t
where part_no = 'GSDFGSG'
)
where sq >= 8
Test data, output:
create table t(ID varchar2(3), DESCRIPTION varchar2(5),
PART_NO varchar2(8), QUANTITY number(5), DATEADDED date);
insert into t values ('TR4', 'Test', 'TEST1', 8, timestamp '2017-09-05 15:11:15');
insert into t values ('TR3', 'FG', 'GSDFGSG', 10, timestamp '2017-08-31 16:26:04');
insert into t values ('TR2', 'FG', 'GSDFGSG', 2, timestamp '2017-08-31 16:05:39');
insert into t values ('TR1', 'FG', 'GSDFGSG', 2, timestamp '2017-08-30 16:30:16');
insert into t values ('TR5', 'FG', 'GSDFGSG', 3, timestamp '2017-08-31 17:00:00');
Edit:
Add part_no and total columns and group by clause:
select min(id) id, part_no, min(sq) total
from (select t.*,
sum(quantity) over (partition by part_no order by id) sq
from t
where part_no = 'GSDFGSG'
)
where sq >= 8
group by part_no
ID PART_NO TOTAL
--- -------- ----------
TR3 GSDFGSG 14
Suppose I have a table with the following data:
+----------+-----+--------+
| CLASS_ID | Day | Period |
+----------+-----+--------+
| 1 | A | CCR |
+----------+-----+--------+
| 1 | B | CCR |
+----------+-----+--------+
| 2 | A | 1 |
+----------+-----+--------+
| 2 | A | 2 |
+----------+-----+--------+
| 3 | A | 3 |
+----------+-----+--------+
| 3 | B | 4 |
+----------+-----+--------+
| 4 | A | 5 |
+----------+-----+--------+
As you could probably guess from the nature of the data, I'm working on an Oracle SQL query that pulls class schedule data from a Student Information System. I'm trying to pull a class's "period expression", a calculated value that contains the Day and Period fields into a single field. Let's get my expectation out of the way first:
If the Periods match, Period should be the GROUP BY field, and Day should be the aggregated field (via a LISTAGG function), so the calculated field would be CCR (A-B)
If the Days match, Day should be the GROUP BY field, and Period should be the aggregated field, so the calculated field would be 1-2 (A)
I'm only aware of how to do each GROUP BY individually, something like for where Days match:
SELECT
day,
LISTAGG(period, '-') WITHIN GROUP (ORDER BY period)
FROM schedule
GROUP BY day
and vice versa for matching Periods, but I'm not seeing how I could do that dynamically for Period and Day in the same query.
You'll also notice that the last row in the example data set doesn't span multiple days or periods, so I also need to account for classes that don't need a GROUP BY at all.
Edit
The end result should be:
+------------+
| Expression |
+------------+
| CCR(A-B) |
+------------+
| 1-2(A) |
+------------+
| 3-4(A-B) |
+------------+
| 5(A) |
+------------+
It is really not clear to me WHY you want output in that way. It doesn't provide any useful information (I don't think) - you can't tell, for example for class_id = 3, which combinations of day and period are actually used. There are four possible combinations (according to the output), but only two are actually in the class schedule.
Anyway - you may have your reasons. Here is how you can do it. You seem to want to LISTAGG both the day and the period (both grouped by class_id, they are not grouped by each other). The difficulty is that you want distinct values in the aggregate lists only - no duplicates. So you will need to select distinct, separately for period and for day, then to the list aggregations, and then concatenate the results in an inner join.
Something like this:
with
test_data ( class_id, day, period ) as (
select 1, 'A', 'CCR' from dual union all
select 1, 'B', 'CCR' from dual union all
select 2, 'A', '1' from dual union all
select 2, 'A', '2' from dual union all
select 3, 'A', '3' from dual union all
select 3, 'B', '4' from dual union all
select 4, 'A', '5' from dual
)
-- end of test data; the actual solution (SQL query) begins below this line
select a.class_id, a.list_per || '(' || b.list_day || ')' as expression
from ( select class_id,
listagg(period, '-') within group (order by period) as list_per
from ( select distinct class_id, period from test_data )
group by class_id
) a
inner join
( select class_id,
listagg(day, '-') within group (order by day) as list_day
from ( select distinct class_id, day from test_data )
group by class_id
) b
on a.class_id = b.class_id
;
CLASS_ID EXPRESSION
-------- ----------
1 CCR(A-B)
2 1-2(A)
3 3-4(A-B)
4 5(A)
How about union with having count(*) = 1?
select LISTAGG(period, '-') list WITHIN GROUP (ORDER BY period)
from schedule
group by CLASS_ID, day
having count(*) = 1
union all
select LISTAGG(day, '-') list WITHIN GROUP (ORDER BY day)
from schedule
group by CLASS_ID, period
having count(*) = 1
I am trying to:
Create a cursor that gets all the current prices of items in a store.
I bulk collect the cursor and loop upserting by using MERGE statement into STORE_INVENTORY table.
Now I want to NULL out the PRICE column in the STORE_INVENTORY table that are not in the cursor.
How can step 3 be done? I can do step 1 and 2 already as I have already updated or inserted the items that are pulled from the cursor.
Here is some example data:
There are three source tables where it is updated by an external party. My objective is to take these three sources of data and merge it into a singular table.
SOURCE TABLES
ITEM_TYPES
DESC_ID | TYPE
A | Kitchen
B | Bath
ITEM_MANIFEST
LOC_ID | ORIGIN
U | USA
C | CHINA
ITEM_PRICE
ITEM_ID | PRICE | DESC_ID | LOC_ID | DATE
0 | 3.99 | A | U | 9/11/2015
1 | 2.99 | B | C | 9/11/2015
2 | 1.99 | A | U | 9/05/2015
DESTINATION TABLE
STORE_INVENTORY
ITEM_ID | TYPE | ORIGIN | PRICE
0 | Kitchen | CHINA | 3.99
8 | Bath | USA | 2.99
So after I execute the SQL Procedure that has a date as a parameter. It will only pull from ITEM_PRICE if it's after the given date.
If execute the procedure with the passed in date 9/10/2015
Expected Output
STORE_INVENTORY
0 | Kitchen | USA | 3.99
1 | Bath | China | 2.99
8 | Bath | USA | NULL
So, something like this, then?
drop table item_description;
drop table item_manifest;
drop table item_price;
drop table store_inventory;
create table item_description
as
select 'A' desc_id, 'Kitchen' type from dual union all
select 'B' desc_id, 'Bath' type from dual;
create table item_manifest
as
select 'U' loc_id, 'USA' origin from dual union all
select 'C' loc_id, 'CHINA' origin from dual;
create table item_price
as
select 0 item_id, 3.99 price, 'A' desc_id, 'U' loc_id, to_date('11/09/2015', 'dd/mm/yyyy') dt from dual union all
select 1 item_id, 2.99 price, 'B' desc_id, 'C' loc_id, to_date('11/09/2015', 'dd/mm/yyyy') dt from dual union all
select 2 item_id, 1.99 price, 'A' desc_id, 'U' loc_id, to_date('05/09/2015', 'dd/mm/yyyy') dt from dual;
create table store_inventory
as
select 0 item_id, 'Kitchen' type, 'CHINA' origin, 3.99 price from dual union all
select 8 item_id, 'Bath' type, 'USA' origin, 2.99 price from dual;
select * from store_inventory;
ITEM_ID TYPE ORIGIN PRICE
---------- ------- ------ ----------
0 Kitchen CHINA 3.99
8 Bath USA 2.99
select coalesce(ip.item_id, si.item_id) item_id,
coalesce(id.type, si.type) type,
coalesce(im.origin, si.origin) origin,
ip.price
from item_description id
inner join item_price ip on (id.desc_id = ip.desc_id and ip.dt > to_date('10/09/2015', 'dd/mm/yyyy')) -- use a parameter for the date here
inner join item_manifest im on (ip.loc_id = im.loc_id)
full outer join store_inventory si on (si.item_id = ip.item_id);
ITEM_ID TYPE ORIGIN PRICE
---------- ------- ------ ----------
0 Kitchen USA 3.99
8 Bath USA
1 Bath CHINA 2.99
merge into store_inventory tgt
using (select coalesce(ip.item_id, si.item_id) item_id,
coalesce(id.type, si.type) type,
coalesce(im.origin, si.origin) origin,
ip.price
from item_description id
inner join item_price ip on (id.desc_id = ip.desc_id and ip.dt > to_date('10/09/2015', 'dd/mm/yyyy')) -- use a parameter for the date here
inner join item_manifest im on (ip.loc_id = im.loc_id)
full outer join store_inventory si on (si.item_id = ip.item_id)) src
on (src.item_id = tgt.item_id)
when matched then
update set tgt.type = src.type,
tgt.origin = src.origin,
tgt.price = src.price
when not matched then
insert (tgt.item_id, tgt.type, tgt.origin, tgt.price)
values (src.item_id, src.type, src.origin, src.price);
commit;
select * from store_inventory;
ITEM_ID TYPE ORIGIN PRICE
---------- ------- ------ ----------
0 Kitchen USA 3.99
8 Bath USA
1 Bath CHINA 2.99
Obviously, your procedure would have an input parameter of DATE datatype to pass into the query, and your query would use the parameter, rather than a hardcoded date like I did in my example. E.g. ip.dt > p_cutoff_date
I can do step 1 and 2 already as I have already updated or inserted
the items that are pulled from the cursor.
Hmm. These steps seem unnecessary - why not do them as part of the MERGE statement? What does the store_inventory table look like before you do your insert/update from the cursor? Also, what is the cursor you're using to do this?
couldn't you do a date-limited subselect of ITEM_PRICE.PRICE, after pulling in the TYPE and ORIGIN via the main join to ITEM_PRICE, without limiting on date?
i.e. something like.
select ITEM_ID, TYPE, ORIGIN
/* not selecting PRICE in the main join */
,(select PRICE from ITEM_PRICE where your join conditions
and DATE >= your param)
from ITEM_TYPES, ITEM_MANIFEST, ITEM_PRICE
where your join conditions, but no criteria on DATE
Sorry, would be clearer and easier to type up if you had provided your existing query.
From re-reading your question, I am unsure if you are inserting only 2 rows but want to get 3. Or if you have 3 rows, but you want to NULL out the missing price.
If the target table already has the 3 rows, then, instead of doing a CURSOR based approach (which can be slow on high volumes and is fussy to write), why not do an UPDATE instead, with DATE as a criteria? The NULL will be assigned to price if there is no match, that's how UPDATEs work.
UPDATE STORE_INVENTORY set PRICE
= (select PRICE from ITEM_PRICE where your join conditions
and DATE >= your param)
I have 2 tables which have many records (say both TableA and TableB has about 3,000,000 records).vr2_input is a varchar input parameters enter by the users and I want to get the most 200 largest "dateField" 's TableA records whose stringField like 'vr2_input' .The 2 tables are joined as the following:
select * from(
select * from
TableA join TableB on TableA.id = TableB.id
where TableA.stringField like 'vr2_input' || '%'
order by TableA.dateField desc
) where rownum < 201
The query is slow , I goggled that and found out that it is because "like" and "order by" involves the full table scan .However , I cannot found a solution to solve the problem . How can I tune this type of SQL? I have already create an index on TableA.stringField and TableA.dateField but how can I use the index feature in the select statement? The database is oracle 10g. Thanks so much!!
Update : I use iddqd 's suggestion and only select the fields that I want and run the explain plan . It cost about 4 mins to finish the query . IX_TableA_stringField is the index name of the TableA.srv_ref field .I run again the explain plan without the hint , the explain plan still get the same result.
EXPLAIN PLAN FOR
select * from(
select
/*+ INDEX(TableB IX_TableA_stringField)*/
TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
TableB.someField1,
TableB.someField2,
TableB.someField3,
from TableA
join TableB on TableA.id=TableB.id
WHERE TableA.stringField like '21'||'%'
order by TableA.dateField desc
) where rownum < 201
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 871807846
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 24000 | 3293 (1)| 00:00:18 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 1397 | 163K| 3293 (1)| 00:00:18 |
|* 3 | SORT ORDER BY STOPKEY | | 1397 | 90805 | 3293 (1)| 00:00:18 |
| 4 | NESTED LOOPS | | 1397 | 90805 | 3292 (1)| 00:00:18 |
| 5 | TABLE ACCESS BY INDEX ROWID| TableA | 1397 | 41910 | 492 (1)| 00:00:03 |
|* 6 | INDEX RANGE SCAN | IX_TableA_stringField | 1397 | | 6 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| TableB | 1 | 35 | 2 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | PK_TableB | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<201)
3 - filter(ROWNUM<201)
6 - access("TableA"."stringField" LIKE '21%')
filter("TableA"."stringField" LIKE '21%')
8 - access(TableA"."id"="TableB"."id")
You say it's taking about 4 minutes to run the query. The EXPLAIN PLAN output shows an estimate of 18 seconds. So the optimizer is probably far off on some of its estimates in this case. (It could still be choosing the best possible plan, but maybe not.)
The first step in a case like this is to get the actual execution plan and statistics. Run your query with the hint /*+ gather_plan_statistics */, then immediately afterwards execute select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')).
This will show the actual execution plan that was run, and for each step it will show the estimated rows, actual rows, and actual time taken. Post the output here and maybe we can say something more meaningful about your issue.
Without that information, my suggestion is to try out the following rewrite of the query. I believe it is equivalent since it appears that ID is the primary key of TableB.
select TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
TableB.someField1,
TableB.someField2,
TableB.someField3,
from (select * from(
select
TableA.id,
TableA.stringField,
TableA.dateField,
TableA.someField2,
TableA.someField3,
from TableA
WHERE TableA.stringField like '21'||'%'
order by TableA.dateField desc
)
where rownum < 201
) TableA
join TableB on TableA.id=TableB.id
Do you need to select all columns (*)? The optimizer will be more likely to full scan if you select all columns. If you need all columns in output you may be better to select the id in your inline view and then join back to select other columns, which could be done with an index lookup. Try running an explain plan for both cases to see what the optimizer is doing.
Create indexes on the stringField and dateField columns. The SQL engine uses them automatically.
select id from(
select /*+ INDEX(TableB stringField_indx)*/ TableB.id from
TableA join TableB on TableA.id = TableB.id
where TableA.stringField like 'vr2_input' || '%'
order by TableA.dateField desc
) where rownum < 201
next:
SELECT * FROM TableB WHERE id iN( id from first query)
Please send stats and DDL of this tables.
If you have enough memory you can hint the query to use hash join. Could you please attach the explain plan
How many records does Table A has if it's the smaller table could you do the select on that table and then loop though the results retrieving the Table B records, as both the select and the sort are on TableA.
A good experiment would be to remove the join and test the speed on that also if allowed can you put the rownum < 201 as an AND clause on the main query. It's probable at the moment that the query is returning all rows to the outer query and then it's getting trimmed?
To optimize the like predicate, you can create a contextual index and use contains clause.
Look: http://docs.oracle.com/cd/B28359_01/text.111/b28303/ind.htm
Thanks
You can create one function index on tableA. That will return 1 or 0 based on the condition TableA.stringField like 'vr2_input' || '%' is satisfied or not. That index will make query run faster. The logic of the function will be
if (substr(TableA.stringField, 1, 9) = 'vr2_input'
THEN
return 1;
else
return 0;
Using actual column names instead of "*" may help. At least common column names should be removed.