Seeking hints to simplify complex query - oracle

Building a web page to show summary data and a chart. The query to obtain my summary data appears to be overly complex and there must be a simpler way to accomplish. I'm mainly experienced with SQL Server, and under SQL Server, getting row and column level totals is done within the main query. No unions or sub queries required, unless you are doing some more complex things.
However, under Oracle 10g, this appears to be the way to accomplish the same thing.
The resulting data is put into a JSON array and populates a v1.10 DataTable.
The source data has a row containing the date, item and a count of items.
The ending table uses a pivot, becoming 8 columns, 6 for the items, a date and row-level total. I trimmed 2 columns to simplify reduce the clutter in the question. The final row has column-level totals and the final grand total. Any suggestions welcome.
The query is here
SELECT *
FROM (
SELECT TO_CHAR("DATE", 'MM/DD/YYYY') AS "DATE"
, ITEM_NAME
, SUM(ITEM_COUNT) AS TOTAL
FROM MY_VIEW
WHERE 1=1
AND "DATE" > ADD_MONTHS(TO_DATE(SYSDATE, 'DD-MM-RR'), -1)
AND ITEM_NAME IN ('ITEM-01','ITEM-02','ITEM-03','ITEM-04')
GROUP BY "DATE", ITEM_NAME
UNION ALL
SELECT TO_CHAR("DATE", 'MM/DD/YYYY') AS "DATE"
, 'ROW_TOTAL' AS ITEM_NAME
, SUM(ITEM_COUNT) AS TOTAL
FROM MY_VIEW
WHERE 1=1
AND "DATE" > ADD_MONTHS(TO_DATE(SYSDATE, 'DD-MM-RR'), -1)
AND ITEM_NAME IN ('ITEM-01','ITEM-02','ITEM-03','ITEM-04')
GROUP BY "DATE"
)
PIVOT
(
MAX(TOTAL) FOR ITEM_NAME IN ('ITEM-01','ITEM-02','ITEM-03','ITEM-04','ROW_TOTAL')
)
UNION ALL
SELECT *
FROM (
SELECT 'GRAND TOTAL' AS "DATE"
, ITEM_NAME
, SUM(ITEM_COUNT) AS TOTAL
FROM MY_VIEW
WHERE 1=1
AND "DATE" > ADD_MONTHS(TO_DATE(SYSDATE, 'DD-MM-RR'), -1)
AND ITEM_NAME IN ('ITEM-01','ITEM-02','ITEM-03','ITEM-04')
GROUP BY ITEM_NAME
UNION ALL
SELECT 'GRAND TOTAL' AS "DATE"
, 'ROW_TOTAL' AS ITEM_NAME
, SUM(ITEM_COUNT) AS TOTAL
FROM MY_VIEW
WHERE 1=1
AND "DATE" > ADD_MONTHS(TO_DATE(SYSDATE, 'DD-MM-RR'), -1)
AND ITEM_NAME IN ('ITEM-01','ITEM-02','ITEM-03','ITEM-04')
)
PIVOT
(
MAX(TOTAL) FOR ITEM_NAME IN ('ITEM-01','ITEM-02','ITEM-03','ITEM-04', 'ROW_TOTAL')
)
ORDER BY 1
And the end results should look like this:
DATE ITEM-01 ITEM-02 ITEM-03 ITEM-04 ROW_TOTAL
======================================================
4/18/17 1,063,008 460,436 106,715 97,532 1,829,364
4/19/17 1,061,819 479,338 103,946 108,179 1,859,825
4/20/17 1,095,853 536,835 107,437 101,949 1,944,677
4/21/17 1,153,345 642,364 108,940 106,988 2,121,068
4/22/17 1,075,849 633,873 102,459 99,999 2,012,710
4/23/17 913,952 591,783 95,291 100,144 1,794,358
4/24/17 1,036,377 626,043 115,105 98,339 1,977,043
4/25/17 1,079,163 602,237 118,189 100,478 2,001,529
4/26/17 1,110,499 639,640 109,793 103,360 2,069,311
4/27/17 1,119,696 620,081 105,781 108,276 2,061,452
4/28/17 1,125,676 618,763 113,234 96,326 2,057,169
4/29/17 1,026,974 620,059 102,856 96,150 1,940,394
4/30/17 903,913 539,694 83,531 97,073 1,716,114
5/1/17 1,043,598 590,027 100,272 96,519 1,932,843
5/2/17 1,074,912 623,392 101,793 97,724 2,000,981
5/3/17 1,078,865 620,662 101,699 102,900 2,010,014
5/4/17 1,090,501 628,785 110,248 103,593 2,040,658
5/5/17 1,125,984 686,945 128,657 105,356 2,150,037
5/6/17 1,031,267 625,189 117,290 99,358 1,967,819
5/7/17 921,467 551,497 97,482 93,520 1,752,940
5/8/17 1,064,291 624,366 93,463 98,860 1,979,863
5/9/17 1,085,062 661,509 97,791 98,083 2,039,114
5/10/17 1,103,794 634,868 94,364 102,345 2,033,911
5/11/17 1,107,449 617,931 94,420 103,717 2,024,126
5/12/17 1,130,463 647,744 97,616 102,684 2,079,009
5/13/17 1,056,653 621,182 96,743 99,801 1,974,710
5/14/17 970,969 583,865 87,953 97,682 1,831,516
5/15/17 1,075,979 633,102 95,356 101,336 2,003,830
5/16/17 1,094,805 634,421 96,802 99,533 2,026,891
GRAND TOTAL 30,822,183 17,596,631 2,985,226 2,917,804 57,233,276

It might go faster if you use 'analytical queries' to perform totalling without needing to run separate grouping queries. An example analytic expression might be:
Select
Sum(item_count) over(partition by date) --btw "date" is a poor name choice for a column
From
Table
Where
Item_name in ...
Or alternatively, use 'grouping sets', 'cube' or 'rollup'
The difference? Analytics establish grouping characteristics that add an extra column to a report with aggregation of the row. Grouping sets, cubes and roll ups add extra rows to a report with aggregations of a column
Apologies for not giving an example of this; they're quite an extensive topic requiring in depth discussion so it's partly beyond the scope of my answer, and partly that I'm writing this on an iPad with no recent use of them to call on from memory (the topic is that vast) and no way to test or run one, so I'll leave it as a pointer for you to do further background research. Essentially a grouping set is an instruction akin to "here's a single data set, iterate it once and perform these N number of different group by aggregates as you go.." essentially one group would be by date and name (so single lines are output) and the other group by is probably by name (so totals for each name are output)..
then do your pivot. For more info, the 'phrases in quotes' are what you'd look up in the manual/web
All this is a little bit dirty, by the way.. your reporting tool from end should really be building this summary, rather than oracle, though doing grouping (but not pivoting) in the DB helpfully reduces network traffic

Related

How to calculate longest period between two specific dates in SQL?

I have problem with the task which looks like that I have a table Warehouse containing a list of items that a company has on stock. This
table contains the columns ItemID, ItemTypeID, InTime and OutTime, where InTime (OutTime)
specifies the point in time where a respective item has entered (left) the warehouse. I have to calculate the longest period that the company has gone without an item entering or leaving the warehouse. I am trying to resolve it this way:
select MAX(OutTime-InTime) from Warehouse where OutTime is not null
Is my understanding correct? Because I believe that it is not ;)
You want the greatest gap between any two consecutive actions (item entering or leaving the warehouse). One method is to unpivot the in and out times to rows, then use lag() to get the date of the "previous" action. The final step is aggregation:
select max(x_time - lag_x_time) max_time_diff
from warehouse w
cross apply (
select x_time, lag(x.x_time) over(order by x.x_time) lag_x_time
from (
select w.in_time as x_time from dual
union all select w.out_time from dual
) x
) x
You can directly perform date calculation in oracle.
The result is calculated in days.
If you want to do it in hours, multiply the result by 24.
To calculate the duration in [day], and check all the information in the table:
SELECT round((OutTime - InTime)) as periodDay, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER BY periodDay DESC
To calculate the duration in [hour]:
SELECT round((OutTime - InTime)*24) AS periodHour, Warehouse .*
FROM Warehouse
WHERE OutTime is not null
ORDER periodHour DESC
round() is used to remove the digits.
Select only the record with maximum period.
SELECT *
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
Select only the record with maximum period, with the period indicated.
SELECT (OutTime - InTime) AS period, Warehouse.*
FROM Warehouse
WHERE (OutTime - InTime) =
( SELECT MAX(OutTime - InTime) FROM Warehouse)
When finding the longest period, the condition where OutTime is null is not needed.
SQL Server has DateDiff, Oracle you can just take one date away from the other.
The code looks ok. Oracle has a live SQL tool where you can test out queries in your browser that should help you.
https://livesql.oracle.com/

Oracle tuning for query with query annidate

i am trying to better a query. I have a dataset of ticket opened. Every ticket has different rows, every row rappresent an update of the ticket. There is a field (dt_update) that differs it every row.
I have this indexs in the st_remedy_full_light.
IDX_ASSIGNMENT (ASSIGNMENT)
IDX_REMEDY_INC_ID (REMEDY_INC_ID)
IDX_REMDULL_LIGHT_DTUPD (DT_UPDATE)
Now, the query is performed in 8 second. Is high for me.
WITH last_ticket AS
( SELECT *
FROM st_remedy_full_light a
WHERE a.dt_update IN
( SELECT MAX(dt_update)
FROM st_remedy_full_light
WHERE remedy_inc_id = a.remedy_inc_id
)
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
This is the plan
How i could to better this query?
P.S. This is just a part of a big query
Additional information:
- The table st_remedy_full_light contain 529.507 rows
You could try:
WITH last_ticket AS
( SELECT remedy_inc_id, ASSIGNMENT,
rank() over (partition by remedy_inc_id order by dt_update desc) rn
FROM st_remedy_full_light a
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
where rn = 1;
The best alternative query, which is also much easier to execute, is this:
select remedy_inc_id
, max(assignment) keep (dense_rank last order by dt_update)
from st_remedy_full_light
group by remedy_inc_id
This will use only one full table scan and a (hash/sort) group by, no self joins.
Don't bother about indexed access, as you'll probably find a full table scan is most appropriate here. Unless the table is really wide and a composite index on all columns used (remedy_inc_id,dt_update,assignment) would be significantly quicker to read than the table.

HIVE equivalent of FIRST and LAST

I have a table with 3 columns:
table1: ID, CODE, RESULT, RESULT2, RESULT3
I have this SAS code:
data table1
set table1;
BY ID, CODE;
IF FIRST.CODE and RESULT='A' THEN OUTPUT;
ELSE IF LAST.CODE and RESULT NE 'A' THEN OUTPUT;
RUN;
So we are grouping the data by ID and CODE, and then writing to the dataset if certain conditions are met. I want to write a hive query to replicate this. This is what I have:
proc sql;
create table temp as
select *, row_number() over (partition by ID, CODE) as rowNum
from table1;
create table temp2 as
select a.ID, a.CODE, a.RESULT, a.RESULT2, a.RESULT3
from temp a
inner join (select ID, CODE, max(rowNum) as maxRowNum
from temp
group by ID, CODE) b
on a.ID=b.ID and a.CODE=b.CODE
where (a.rowNum=1 and a.RESULT='A') or (a.rowNum=b.maxRowNum and a.RESULT NE 'A');
quit;
There are two issues I see with this.
1) The row that is first or last in each BY group is entirely dependant on the order of rows in table1 in SAS, we aren't ordering by anything. I don't think row order is preserved when translating to a hive query.
2) The SAS code is taking the first row in each BY GROUP or the last, not both. I think that my HIVE query is taking both, resulting in more rows than I want.
Any suggestions or insight on how to improve my query is appreciated. Is it even possible to replicate this SAS code in HIVE?
The SAS code has a by statement (BY ID CODE;), which tells SAS that the set dataset is sorted at those levels. So, not a random selection for first. and last..
That said, we can replicate this in HIVE by using the first_value and last_value window functions.
FIRST.CODE should replicate to
first_value(code) over (partition by Id order by code)fcode
Similarly, LAST.CODE would be
last_value(code) over (partition by Id order by code)lcode
Once you have the fcode and lcode columns, use case when statements for the result column criteria. Like,
case when (code=fcode and result='A') or (code=lcode and result<>'A')
then 1 else 0 end as op_flag
Then the fetch the table with where op_flag = 1
SAMPLE
select id, code, result from (
select *,
first_value(code) over (partition by id order by code)fcode,
last_value(code) over (partition by id order by code)lcode
from footab) f
where (code=fcode and result='A') or (code=lcode and result<>'A')
Regarding point 1) the BY group processing requires the input data to be sorted or indexed on BY variables, so though the code contains no ordering, the source data is processed in order. If the input data was not indexed/sorted, SAS will throw error.
Regarding this, possible differences are on rows with same values of BY variables, especially if the RESULT is different.
In SAS, I would pre-sort data by ID, CODE, RESULT, then use BY ID CODE in order to not be influenced by order of rows.
Regarding 2) FIRST and LAST can be both true in SAS. Since your condition for first and last on RESULT is different, I guess this is not a source of differences.
I guess you could add another field as
row_number() over (partition by ID, CODE desc) as rowNumDesc
to detect last row with rowNumDesc = 1 (so that you skip the join).
EDIT:
I think the two programs above both include random selection of rows for groups with same values of ID and CODE variables, especially with same values of RESULT. But you should get same number of rows from both. If not, just debug it.
However the random aspect in SAS code/storage is based on physical order of rows, while the ROW_NUMBERs randomness within a group will be influenced by the implementation of the function in the engine.

Selecting data from one table or another in multiple queries PL/SQL

The easiest way to ask my question is with a Hypothetical Scenario.
Lets say we have 3 tables. Singapore_Prices, Produce_val, and Bosses_unreasonable_demands.
So Prices is a pretty simple table. Item column containing a name, and a Price column containing a number.
Produce_Val is also simple 2 column table. Type column containing what type the produce is (Fruit or veggie) and then Name column (Tomato, pineapple, etc.)
The Bosses_unreasonable_demands only contains one column, Fruit, which CAN contain the names of some fruits.
OK? Ok.
SO, My boss wants me to write a query that returns the prices for every fruit in his unreasonable demands table. Simple enough. BUT, if he doesn't have any entries in his table, he just wants me to output the prices of ALL fruits that exist in produce_val.
Now, assuming I don't know where the DBA who designed this silly hypothetical system lives (and therefore can't get him to fix this), our query would look like this:
if <Logic to determine if Bosses demands are empty>
Then
select Item, Price
from Singapore_Prices
where Item in (select Fruit from Bosses_Unreasonable_demands)
Else
select Item, Price
from Singapore_Prices
where Item in (select Name from Produce_val where type = 'Fruit')
end if;
(Well, we'd select those into a variable, and then output the variable, probably with bulk-collect shenanigans, but that's not important)
Which works. It is entirely functional, and won't be slow, even if we extend it out to 2000 other stores other than Singapore. (Well, no slower than anything else that touches 2000 some tables) BUT, I'm still doing two different select statements that are practically identical. My Comp Sci teacher rolls in their grave every time my fingers hit ctrl-V. I can cut this code in half and only do one select statement. I KNOW I can.
I just have no earthly idea how. I can't use cursors as an in statement, I can't use nested tables or varrays, I can't use cleverly crafted strings, I... I just... I don't know. I don't know how to do this. Is there a way? Does it exist?
Or do I have to copy/paste forever?
Your best bet would be dynamic SQL, because you can't parameterize table or column names.
You will have a SQL query template, have a logic to determine tables and columns that you want to query, then blend them together and execute.
Another aproach, (still a lot of ctrl-v like code) is to use set construction UNION ALL:
select 1st query where boss_condition
union all
select 2nd query where not boss_condition
Try this:
SELECT *
FROM (SELECT s.*, 'BOSS' AS FRUIT_SOURCE
FROM BOSSES_UNREASONABLE_DEMANDS b
INNER JOIN SINGAPORE_FRUIT_LIST s
ON s.ITEM = b.FRUIT
CROSS JOIN (SELECT COUNT(*) AS BOSS_COUNT
FROM BOSSES_UNREASONABLE_DEMANDS)) x
UNION ALL
(SELECT s.*, 'NORMAL' AS FRUIT_SOURCE
FROM PRODUCE_VAL p
INNER JOIN SINGAPORE_FRUIT_LIST s
ON (s.ITEM = p.NAME AND
s.TYPE = 'Fruit')
CROSS JOIN (SELECT COUNT(*) AS BOSS_COUNT
FROM BOSSES_UNREASONABLE_DEMANDS)) n
WHERE (BOSS_COUNT > 0 AND FRUIT_SOURCE = 'BOSS') OR
(BOSS_COUNT = 0 AND FRUIT_SOURCE = 'NORMAL')
Share and enjoy.
I think you can use nested tables. Assume you have a schema-level nested table type FRUIT_NAME_LIST (defined using CREATE TYPE).
SELECT fruit
BULK COLLECT INTO my_fruit_name_list
FROM bosses_unreasonable_demands
;
IF my_fruit_name_list.count = 0 THEN
SELECT name
BULK COLLECT INTO my_fruit_name_list
FROM produce_val
WHERE type='Fruit'
;
END IF;
SELECT item, price
FROM singapore_prices
WHERE item MEMBER OF my_fruit_name_list
;
(or, WHERE item IN (SELECT column_value FROM TABLE(CAST(my_fruit_name_list AS fruit_name_list)) if you like that better)

How to otimize select from several tables with millions of rows

Have the following tables (Oracle 10g):
catalog (
id NUMBER PRIMARY KEY,
name VARCHAR2(255),
owner NUMBER,
root NUMBER REFERENCES catalog(id)
...
)
university (
id NUMBER PRIMARY KEY,
...
)
securitygroup (
id NUMBER PRIMARY KEY
...
)
catalog_securitygroup (
catalog REFERENCES catalog(id),
securitygroup REFERENCES securitygroup(id)
)
catalog_university (
catalog REFERENCES catalog(id),
university REFERENCES university(id)
)
Catalog: 500 000 rows, catalog_university: 500 000, catalog_securitygroup: 1 500 000.
I need to select any 50 rows from catalog with specified root ordered by name for current university and current securitygroup. There is a query:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Where 100 - some catalog, 200 - some securitygroup, 300 - some university. This query return 50 rows from ~ 170 000 in 3 minutes.
But next query return this rows in 2 sec:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
I build next indexes: (catalog.id, catalog.name, catalog.owner), (catalog_securitygroup.catalog, catalog_securitygroup.index), (catalog_university.catalog, catalog_university.university).
Plan for first query (using PLSQL Developer):
http://habreffect.ru/66c/f25faa5f8/plan2.jpg
Plan for second query:
http://habreffect.ru/f91/86e780cc7/plan1.jpg
What are the ways to optimize the query I have?
The indexes that can be useful and should be considered deal with
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
So the following fields can be interesting for indexes
c: id, root
cs: catalog, securitygroup
cu: catalog, university
So, try creating
(catalog_securitygroup.catalog, catalog_securitygroup.securitygroup)
and
(catalog_university.catalog, catalog_university.university)
EDIT:
I missed the ORDER BY - these fields should also be considered, so
(catalog.name, catalog.id)
might be beneficial (or some other composite index that could be used for sorting and the conditions - possibly (catalog.root, catalog.name, catalog.id))
EDIT2
Although another question is accepted I'll provide some more food for thought.
I have created some test data and run some benchmarks.
The test cases are minimal in terms of record width (in catalog_securitygroup and catalog_university the primary keys are (catalog, securitygroup) and (catalog, university)). Here is the number of records per table:
test=# SELECT (SELECT COUNT(*) FROM catalog), (SELECT COUNT(*) FROM catalog_securitygroup), (SELECT COUNT(*) FROM catalog_university);
?column? | ?column? | ?column?
----------+----------+----------
500000 | 1497501 | 500000
(1 row)
Database is postgres 8.4, default ubuntu install, hardware i5, 4GRAM
First I rewrote the query to
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root < 50
AND cs.catalog = c.id
AND cu.catalog = c.id
AND cs.securitygroup < 200
AND cu.university < 200
ORDER BY c.name
LIMIT 50 OFFSET 100
note: the conditions are turned into less then to maintain comparable number of intermediate rows (the above query would return 198,801 rows without the LIMIT clause)
If run as above, without any extra indexes (save for PKs and foreign keys) it runs in 556 ms on a cold database (this is actually indication that I oversimplified the sample data somehow - I would be happier if I had 2-4s here without resorting to less then operators)
This bring me to my point - any straight query that only joins and filters (certain number of tables) and returns only a certain number of the records should run under 1s on any decent database without need to use cursors or to denormalize data (one of these days I'll have to write a post on that).
Furthermore, if a query is returning only 50 rows and does simple equality joins and restrictive equality conditions it should run even much faster.
Now let's see if I add some indexes, the biggest potential in queries like this is usually the sort order, so let me try that:
CREATE INDEX test1 ON catalog (name, id);
This makes execution time on the query - 22ms on a cold database.
And that's the point - if you are trying to get only a page of data, you should only get a page of data and execution times of queries such as this on normalized data with proper indexes should take less then 100ms on decent hardware.
I hope I didn't oversimplify the case to the point of no comparison (as I stated before some simplification is present as I don't know the cardinality of relationships between catalog and the many-to-many tables).
So, the conclusion is
if I were you I would not stop tweaking indexes (and the SQL) until I get the performance of the query to go below 200ms as rule of the thumb.
only if I would find an objective explanation why it can't go below such value I would resort to denormalisation and/or cursors, etc...
First I assume that your University and SecurityGroup tables are rather small. You posted the size of the large tables but it's really the other sizes that are part of the problem
Your problem is from the fact that you can't join the smallest tables first. Your join order should be from small to large. But because your mapping tables don't include a securitygroup-to-university table, you can't join the smallest ones first. So you wind up starting with one or the other, to a big table, to another big table and then with that large intermediate result you have to go to a small table.
If you always have current_univ and current_secgrp and root as inputs you want to use them to filter as soon as possible. The only way to do that is to change your schema some. In fact, you can leave the existing tables in place if you have to but you'll be adding to the space with this suggestion.
You've normalized the data very well. That's great for speed of update... not so great for querying. We denormalize to speed querying (that's the whole reason for datawarehouses (ok that and history)). Build a single mapping table with the following columns.
Univ_id, SecGrp_ID, Root, catalog_id. Make it an index organized table of the first 3 columns as pk.
Now when you query that index with all three PK values, you'll finish that index scan with a complete list of allowable catalog Id, now it's just a single join to the cat table to get the cat item details and you're off an running.
The Oracle cost-based optimizer makes use of all the information that it has to decide what the best access paths are for the data and what the least costly methods are for getting that data. So below are some random points related to your question.
The first three tables that you've listed all have primary keys. Do the other tables (catalog_university and catalog_securitygroup) also have primary keys on them?? A primary key defines a column or set of columns that are non-null and unique and are very important in a relational database.
Oracle generally enforces a primary key by generating a unique index on the given columns. The Oracle optimizer is more likely to make use of a unique index if it available as it is more likely to be more selective.
If possible an index that contains unique values should be defined as unique (CREATE UNIQUE INDEX...) and this will provide the optimizer with more information.
The additional indexes that you have provided are no more selective than the existing indexes. For example, the index on (catalog.id, catalog.name, catalog.owner) is unique but is less useful than the existing primary key index on (catalog.id). If a query is written to select on the catalog.name column, it is possible to do and index skip scan but this starts being costly (and most not even be possible in this case).
Since you are trying to select based in the catalog.root column, it might be worth adding an index on that column. This would mean that it could quickly find the relevant rows from the catalog table. The timing for the second query could be a bit misleading. It might be taking 2 seconds to find 50 matching rows from catalog, but these could easily be the first 50 rows from the catalog table..... finding 50 that match all your conditions might take longer, and not just because you need to join to other tables to get them. I would always use create table as select without restricting on rownum when trying to performance tune. With a complex query I would generally care about how long it take to get all the rows back... and a simple select with rownum can be misleading
Everything about Oracle performance tuning is about providing the optimizer enough information and the right tools (indexes, constraints, etc) to do its job properly. For this reason it's important to get optimizer statistics using something like DBMS_STATS.GATHER_TABLE_STATS(). Indexes should have stats gathered automatically in Oracle 10g or later.
Somehow this grew into quite a long answer about the Oracle optimizer. Hopefully some of it answers your question. Here is a summary of what is said above:
Give the optimizer as much information as possible, e.g if index is unique then declare it as such.
Add indexes on your access paths
Find the correct times for queries without limiting by rowwnum. It will always be quicker to find the first 50 M&Ms in a jar than finding the first 50 red M&Ms
Gather optimizer stats
Add unique/primary keys on all tables where they exist.
The use of rownum is wrong and causes all the rows to be processed. It will process all the rows, assigned them all a row number, and then find those between 0 and 50. When you want to look for in the explain plan is COUNT STOPKEY rather than just count
The query below should be an improvement as it will only get the first 50 rows... but there is still the issue of the joins to look at too:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
where rownum <= 50
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Also, assuming this for a web page or something similar, maybe there is a better way to handle this than just running the query again to get the data for the next page.
try to declare a cursor. I dont know oracle, but in SqlServer would look like this:
declare #result
table (
id numeric,
name varchar(255)
);
declare __dyn_select_cursor cursor LOCAL SCROLL DYNAMIC for
--Select
select distinct
c.id, c.name
From [catalog] c
inner join university u
on u.catalog = c.id
and u.university = 300
inner join catalog_securitygroup s
on s.catalog = c.id
and s.securitygroup = 200
Where
c.root = 100
Order by name
--Cursor
declare #id numeric;
declare #name varchar(255);
open __dyn_select_cursor;
fetch relative 1 from __dyn_select_cursor into #id,#name declare #maxrowscount int
set #maxrowscount = 50
while (##fetch_status = 0 and #maxrowscount <> 0)
begin
insert into #result values (#id, #name);
set #maxrowscount = #maxrowscount - 1;
fetch next from __dyn_select_cursor into #id, #name;
end
close __dyn_select_cursor;
deallocate __dyn_select_cursor;
--Select temp, final result
select
id,
name
from #result;

Resources