I am trying to create a sales table however after 8 hours it still has not run. I have attempted to speed up the query by adding the hints and reducing the timeframe to 2022 only however after 3 hours it is still running. Is there a way to optimise this query?
DROP TABLE BIRTHDAY_SALES;
CREATE TABLE BIRTHDAY_SALES AS
--(
SELECT /*+ parallel(32) */
DISTINCT T.CONTACT_KEY
, S.CAMPAIGN_NAME
, S.CONTROL_GROUP_FLAG
, S.SEGMENT_NAME
, count(distinct t.ORDER_NUM) as TRANS
, count(distinct case when p.store_key = '42381' then t.ORDER_NUM else NULL end) as TRANS_ONLINE
, count(distinct case when p.store_key != '42381' then t.ORDER_NUM else NULL end) as TRANS_OFFLINE
, sum(t.ITEM_AMT) as SALES
, sum(case when p.store_key = '42381' then t.ITEM_AMT else NULL end) as SALES_ONLINE
, sum(case when p.store_key != '42381' then t.ITEM_AMT else NULL end) as SALES_OFFLINE
, sum(case when t.item_quantity_val>0 and t.item_amt<=0 then 0 else t.item_quantity_val end) QTY
, sum(case when (p.store_key = '42381' and t.ITEM_QUANTITY_VAL>0 and t.ITEM_AMT>0) then t.ITEM_QUANTITY_VAL else null end) QTY_ONLINE
, sum(case when (p.store_key != '42381' and t.ITEM_QUANTITY_VAL>0 and t.ITEM_AMT>0) then t.ITEM_QUANTITY_VAL else null end) QTY_OFFLINE
FROM CRM_TARGET.B_TRANSACTION T
JOIN BDAY_PROG S
ON T.CONTACT_KEY = S.CONTACT_KEY
JOIN CRM_TARGET.T_ORDITEM_SD P
ON T.PRODUCT_KEY = P.PRODUCT_KEY
where t.TRANSACTION_TYPE_NAME = 'Item'
and t.BU_KEY = '15'
and t.TRANSACTION_DT_KEY >= '20220101'
and t.TRANSACTION_DT_KEY <= '20221231'
and t.member_sale_flag = 'Y'
and t.bu_key = '15'
and t.CONTACT_KEY != 0
group by
T.CONTACT_KEY
, S.CAMPAIGN_NAME
, S.CONTROL_GROUP_FLAG
, S.SEGMENT_NAME
-- )
;
Performance tuning is not something we can effectively deal with on a forum like this, as there are too many factors to consider. You will have to examine the explain plan and look at ASH data (v$active_session_history) to see what the predominant waits are and on what plan step. Only then can you determine what's wrong and take steps to fix it.
However, here are some obvious things to look for:
Make sure there are no many-to-many joins. I'm guessing B_TRANSACTION probably has many rows with the same CONTACT_KEY and many rows with the same PRODUCT_KEy. That's okay, but then you must ensure that CONTACT_KEY is unique within BDA_PROG and PRODUCT_KEY is unique within T_ORDITEM_SD. IF that's not the case, you will get a partial Cartesian product from the hidden many-to-many join and will spend a huge amount of time on reading/writing to temp.
Make sure no more than one of those joins is one-to-many. Multiple one-to-manies stemming off the same parent table will effectively give you a many-to-many between the children, with the same effect.
You are asking for a date range of a month. In most systems, you are better off doing a full table scan (with parallel query if you can) than using indexes to get a whole month's worth of transactional data. If it is using an index that can really mess you up. You can fix this with hints (see below)
It might be using nested loops joins when a reporting query like this is likely better off using hash joins. Again, I'm just guessing based on the names of your tables; only knowledge of your data can determine this for sure.
Ensure that the PGA workareas are of reasonable size. Ask your DBA to query v$pgastat and report the global memory bound. It should be at its max of 1G, but probably anything over 100M is reasonable. If it's less than that, you may need to ask the DBA to increase the pga_aggregate_target, or you can manually set your own sort_area_size/hash_area_size session parameters (not the best thing to do).
You are asking for DOP 32. That's pretty high. Ensure there are that many CPU cores on the database server, that parallel_max_servers > 64 and that you aren't getting downgraded to serial by anything. Ask your DBA what a reasonable DOP would be.
Do you really need COUNT(DISTINCT ... ) on ORDERNUM? If you are just counting # of transactions, it would be less work to simply say SUM(CASE (WHEN .... ) THEN 1 ELSE 0 END)
Remove the DISTINCT keyword. It's not doing anything - your GROUP BY will already result in the results being distinct.
Consult ASH (v$active_session_history) to see if you are actually blocked by something, showing some kind of concurrency wait. Your CTAS might not be doing anything at all because of some library cache lock or full tablespace if the database is configured to suspend until space is added.
Here's something to try - again, it's a long shot without knowing your data or table structure. But I've seen enough reports like this to make at least a somewhat educated guess:
SELECT /*+ USE_HASH(t s p) FULL(t) FULL(s) FULL(p) PARALLEL(8) */ t.contact_key . . .
Related
i am trying to better a query. I have a dataset of ticket opened. Every ticket has different rows, every row rappresent an update of the ticket. There is a field (dt_update) that differs it every row.
I have this indexs in the st_remedy_full_light.
IDX_ASSIGNMENT (ASSIGNMENT)
IDX_REMEDY_INC_ID (REMEDY_INC_ID)
IDX_REMDULL_LIGHT_DTUPD (DT_UPDATE)
Now, the query is performed in 8 second. Is high for me.
WITH last_ticket AS
( SELECT *
FROM st_remedy_full_light a
WHERE a.dt_update IN
( SELECT MAX(dt_update)
FROM st_remedy_full_light
WHERE remedy_inc_id = a.remedy_inc_id
)
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
This is the plan
How i could to better this query?
P.S. This is just a part of a big query
Additional information:
- The table st_remedy_full_light contain 529.507 rows
You could try:
WITH last_ticket AS
( SELECT remedy_inc_id, ASSIGNMENT,
rank() over (partition by remedy_inc_id order by dt_update desc) rn
FROM st_remedy_full_light a
)
SELECT remedy_inc_id, ASSIGNMENT FROM last_ticket
where rn = 1;
The best alternative query, which is also much easier to execute, is this:
select remedy_inc_id
, max(assignment) keep (dense_rank last order by dt_update)
from st_remedy_full_light
group by remedy_inc_id
This will use only one full table scan and a (hash/sort) group by, no self joins.
Don't bother about indexed access, as you'll probably find a full table scan is most appropriate here. Unless the table is really wide and a composite index on all columns used (remedy_inc_id,dt_update,assignment) would be significantly quicker to read than the table.
I have a block of sql that runs pretty smooth outside the procedure. The moment I put the sql block in the procedure to return the ref_cursor, the procedure takes quite a bit of long time to execute the ref_cursor.
With help from DBAs, we implemented DB profile and it worked great to speed up the procedure but then any minor change in that particular procedure make it go haywire. I am not sure what the problem.. I am running out of options. How should I go about troubleshooting this particular weird issue?
Thank you in advance.
Edit.. here is the query
with query_ownership as (SELECT leeo.legal_entity_id,
leeo.parent_le_id,
SUM(leeo.effective_ownership) ownership_percent
FROM data_ownership leeo
WHERE leeo.start_date <=
to_date('12/31/2012','mm/dd/yyyy')
AND ((leeo.end_date < &lvTaxYearDate and leeo.end_date > &lvTaxYearBeginDate)
to_date('12/31/2012','mm/dd/yyyy') OR
leeo.end_date IS NULL)
and leeo.stock_type in ('E')
GROUP BY leeo.legal_entity_id, leeo.parent_le_id
HAVING SUM(leeo.effective_ownership) > 0
),
query_branches as ( SELECT b.branch_id as legal_entity_id,
b.legal_entity_id as perent_le_id,
1.00 as ownership_percent
FROM company_branches b
WHERE b.tax_year = 2012),
child_query as (select * from query_ownership
UNION
select * from query_branches),
parent_query as (select * from query_ownership
UNION
select * from query_branches),
inner_query as (SELECT rownum as sortcode,
-level as lvl,
child_query.parent_le_id,
child_query.legal_entity_id,
child_query.ownership_percent
FROM child_query
START WITH child_query.legal_entity_id = 'AB1203'
CONNECT BY NOCYCLE PRIOR child_query.legal_entity_id =
child_query.parent_le_id
AND child_query.ownership_percent >= 0.01
and level = 0
UNION
SELECT rownum as sortcode,
level - 1 as lvl,
parent_query.parent_le_id,
parent_query.legal_entity_id,
parent_query.ownership_percent
FROM parent_query
START WITH parent_query.legal_entity_id = 'AB1203'
CONNECT BY NOCYCLE
PRIOR parent_query.parent_le_id =
parent_query.legal_entity_id
AND parent_query.ownership_percent >= 0.01)
,ownership_heirarchy as (
SELECT max(inner_query.sortcode) as sortcode,
max(inner_query.lvl) as lvl,
inner_query.parent_le_id,
inner_query.legal_entity_id,
inner_query.ownership_percent from inner_query
GROUP BY inner_query.parent_le_id,
inner_query.legal_entity_id,
inner_query.ownership_percent
)
,goldList as (
SELECT lem2.legal_entity_id from ownership_heirarchy,
company_entity_year lem1,
company_entity_year lem2
WHERE ownership_heirarchy.parent_le_id = lem2.legal_entity_id
AND lem2.tax_year = 2012
AND ownership_heirarchy.legal_entity_id = lem1.legal_entity_id
AND lem1.tax_year = 2012
AND lem1.legal_entity_type <> 'EXT'
AND lem1.non_legal_entity_flag is null
AND lem2.legal_entity_type <> 'EXT'
AND lem2.non_legal_entity_flag is null
and TRIM(lem2.alt_tax_type) is null
and UPPER(lem2.tax_type) in ('DC', 'DPS', 'TXN')
),
fulllist as (
select * from goldList
union
select gc.parent_le_id from company_entity_year e, consolidation_group gc
where e.LEGAL_ENTITY_ID = 'AB1203' and e.tax_year = 2012
and e.TAX_CONSOLIDATION_GRP = gc.group_id
union
select e.leid from vdst_entity e where e.TAX_YEAR = 2012
and e.ALT_TAX_TYPE in (3,8)
and e.LEID = 'AB1203'
)
select distinct dc.dcn_id as dcnId,
dc.dcn_name as dcnName,
dy.dcn_year_id dcnYearId,
ty.tax_year_id taxYearId,
ty.tax_year taxYear
from company_dcn dc, company_dcn_year dy, company_tax_year ty
where dc.dcn_id = dy.dcn_id
and dy.year_id = ty.tax_year_id
and ty.tax_year = 2012
and dc.leid in (
select * from fulllist
);
First, ensure that statistics are up-to-date by running DBMS_STATS.GATHER_TABLE_STATS for each table involved in the query. Next, obtain the plan for the query with different parameter values - it's entirely possible that a change in the parameters may make the plan better or worse. Given that you're showing us no information about the query, the procedure, and the tables involved there's no way to be more specific.
Best of luck.
Find out what part of the execution plan is causing problems. There are several ways to do this:
Use DBMS_XPLAN to find a good and bad plan. Use explain plan for ... and select * from table(dbms_xplan.display); to find the good plan in your session. Use dbms_xplan.display_cursor(sql_id => 'some sql_id') to find the bad plan. Compare the plans and look for differences. This can be very difficult because you can't usually tell which parts of the execution plan are slow. If you're lucky there will be only one difference and then obviously that difference is the problem.
Use DBMS_SQLTUNE.REPORT_SQL_MONITOR to find what part of the plan is bad. Run the bad query and use SQL Monitoring to find out which operation in the execution plan is bad. The report shows which operations take the longest, and which cardinality estimates are off by the most. Focus on the slow parts, and the first steps of the plan with a huge cardinality difference between estimated and actual.
Look at profile hints to find out how Oracle fixes the bad plan. Profiles are a collection of hints that help nudge the optimizer into making the correct decisions. Those hints might tell you what the problem is. For example, if one of the hints is OPT_ESTIMATE(JOIN (A B) SCALE_ROWS=100), the profile is telling the optimizer to increase the cardinality estimate by 100X. You may be able to recreate that same affect by either including that hint in the query or by creating and locking fake table statistics. Use this process from Kerry Osborne to find the profile hints.
Either way, this process can be difficult and time-consuming. Try to shrink the query as much as possible. Tuning a 97 line query can be almost impossible some times. It's possible there is only one root problem, but that problem changes so much of the execution plan that it looks like there are a dozen problems.
These steps only help you identify the problem. Fixing it may be a whole other question and answer.
I am using DBMS_PROFILER for basic profiling of my PL/SQL packages. I am also using it to get code coverage statistics using the following query:
SELECT EXEC.unit_name unitname,ROUND (EXEC.cnt/total.cnt * 100, 1) Code_coverage FROM
(SELECT u.unit_name, COUNT(1) cnt FROM plsql_profiler_data d, plsql_profiler_units u WHERE u.unit_number = d.unit_number GROUP BY u.unit_name) total,
(SELECT u.unit_name, COUNT(1) cnt FROM plsql_profiler_data d, plsql_profiler_units u WHERE u.unit_number = d.unit_number AND d.total_occur > 0 GROUP BY u.unit_name) EXEC
WHERE EXEC.unit_name = total.unit_name
I clear the plsql_profiler_data,plsql_profiler_units,plsql_profiler_runs tables before each profiler runs so that I need not know the run id each time.
This will give me Package wise information on the percentage of code that was covered during the profiling. Now I am trying to see if this can be built as a normal coverage report where I can know which line of code was covered and which one wasnt(say select lineOfCode, iscovered from...) so that I can built a report with html formatting to indicate if a line was covered or not.
I am not too proficient in Oracle table structures on where the functions and procedures get saved etc. (Got the above query from a blog and modified slightly to remove run id's)
Is this possible?
If so how can I achieve this?
I think this approaches what you're after:
-- View lines of code profiled, along with run times, next to the complete, ordered source..
-- Provides an annotated view of profiled packages, procs, etc.
-- Only the first line of a multiline SQL statement will register with timings.
SELECT u.UNIT_OWNER || '.' || u.UNIT_NAME AS "Unit"
, s.line
, CASE WHEN d.TOTAL_OCCUR >= 0 THEN 'C'
ELSE ' ' END AS Covered
, s.TEXT
, TO_CHAR(d.TOTAL_TIME / (1000*1000*1000), 'fm990.000009') AS "Total Time (sec)"
, CASE WHEN NVL(d.TOTAL_OCCUR, 1) > 0 THEN d.TOTAL_OCCUR ELSE 1 END AS "# Iterations"
, TO_CHAR(CASE WHEN d.TOTAL_OCCUR > 0 THEN d.TOTAL_TIME / (d.TOTAL_OCCUR * (1000*1000*1000))
ELSE NULL END, 'fm990.000009') AS "Avg Time (sec)"
FROM all_source s
LEFT JOIN plsql_profiler_units u ON s.OWNER = u.UNIT_OWNER
AND s.NAME = u.UNIT_NAME
AND s.TYPE = u.UNIT_TYPE
LEFT JOIN plsql_profiler_data d ON u.UNIT_NUMBER = d.UNIT_NUMBER
AND s.LINE = d.LINE#
AND d.RUNID = u.RUNID
WHERE u.RUNID = ? -- Add RUNID of profiler run to investigate here
ORDER BY u.UNIT_NAME
, s.LINE
There are few issues to keep in mind.
1) Many rows in the plsql_profiler_data table will NOT have accurate values in their TOTAL_TIME column because they executed faster than the resolution of the timer.
Ask Tom re: timings:
The timings are collected using some unit of time, typically only
granular to the HSECS.
That means that many discrete events that take less then 1/100th of a
second, appear to take ZERO seconds.
Many discrete events that take less then 1/100ths of a second may
appear to take 1/100th of a second.
2) Only the FIRST line in a multiline statement will show as covered. So if you split an INSERT or whatever across multiple lines, I don't know of any easy way to have every line of that statement to show as profiled in an Annotated Source style of report.
Also, check out Oracle's dbms_profiler documentation and this useful package reference for help crafting queries against the collected profiler data.
Actually there are some tools for PL/SQL that do code coverage. See the answers to this question for more information.
Said this, you can find information on user created data structure and code in following tables:
user_source: here you can find the source in the TEXT field typified by function, procedure, package, etc.
User_tables
user_indexes
user_types: if you use some kind of OO code.
Other tables beginning with user_ that you may need.
Basically you would need to check the result of your query against user_source and get extra information from the other tables.
I have a situation in my application for displaying the count of data which match different criterion. Since the performance of counting is degrading with respect to the growth of database, we decided to show only the availability information using the exists clause.
Below is my table structure
Table: DocInfo
---------------------------------------
DocId number
DocName varchar(250)
DocStatus number
SignedBy number
ForwardedBy number
ForwardCount number
DocOwner number
MgrID number
ProjectId number
The current query which does the counting is like this
SELECT NVL(SUM(CASE
WHEN (DocStatus IN (1150,1155,1170,1182,1190) AND
DocOwner=56366 AND
ForwardCount=0)
THEN 1
ELSE 0
END), 0) "ForReview",
NVL(SUM(CASE
WHEN (DocStatus IN (1200) And
MgrID = 56366 AND
ForwardCount = 0 )
THEN 1
ELSE 0
END), 0) "Accepted" ,
NVL(SUM(CASE
WHEN (DocStatus IN (1150,1155,1170,1182,1190) AND
DocOwner=56366 AND
MgrID = 0 )
THEN 1
ELSE 0
END), 0) "Waiting"
FROM DocInfo
WHERE ProjectId = 313 and
(DocOwner = 56366 or MgrID = 56366)
I need to change the counting to an exists clause so that i can show whether documents are available or not in each category.
Since this change is to improve the performance, running this as different queries is also not advisable. Please help me, I have ran out of my limited knowledge.
Sorry to miss the part which i have already tried.
I have changed the above query to a union with exists clause in each like below.
SELECT 'ForReview' AS A
FROM DUAL
WHERE EXISTS (SELECT NULL
FROM DocInfo
WHERE ProjectId = 313 and
(DocOwner = 56366 or MgrID = 56366) and
(DocStatus IN (1150,1155,1170,1182,1190) AND
DocOwner=56366 AND
ForwardCount=0))
UNION
SELECT 'Accepted' AS A
FROM DUAL
WHERE EXISTS (SELECT NULL
FROM DocInfo
WHERE ProjectId = 313 and
(DocOwner = 56366 or MgrID = 56366) and
(DocStatus IN (1200) And
MgrID = 56366 AND
ForwardCount = 0 ))
UNION
SELECT 'Waiting' AS A
FROM DUAL
WHERE EXISTS (SELECT NULL
FROM DocInfo
WHERE ProjectId = 313 and
(DocOwner = 56366 or MgrID = 56366) and
(DocStatus IN (1150,1155,1170,1182,1190) AND
DocOwner=56366 AND
MgrID = 0))
I have mentioned only 3 conditions, whereas my actual application has 8 different criteria to be added into this query. so when i have 8 Exists clauses, it runs internally as 8 different queries, and in effect it takes more time - single segment in the entire union query takes only 560 ms whereas all queries together takes around 7 seconds to generate the output.
Since my requirement is only to identify the Availability of any such record i do not want to navigate through the entire recordset and count it.
Is there anyway to optimize/rewrite this query
Thank You
"so when i have 8 Exists clauses, it runs internally as 8 different
queries, and in effect it takes more time - single segment in the
entire union query takes only 560 ms whereas all queries together
takes around 7 seconds to generate the output."
Surprise, surprise. Running what amounts to the same query eight times will not be faster than running that query once.
Now it is true that EXISTS can be faster, because it only needs to find a single row which matches the given criteria, rather than retrieving an entire data set. However you have just shifted the retrieved data into the WHERE clause so the database still has to do the same amount of work. In fact, it is apparently doing a lot more work, because 7s > (560ms * 8).
To solve your problem properly you need to understand how the database works and how to tune it. Find out more.
For a start, define a tuning goal. Your original query takes half a second to run: that's not lightning fast but it is pretty quick. Why is this a problem? How quickly do you want it to run?
Next, run an EXPLAIN PLAN. Is the query using indexes? How efficiently is its index usage> What percentage of the rows are being selected?
Now you also need to undersatnd your data. Is the selected data evenly distributed throughout the table or are there clusters? Do some projects, owners or managers have more records than others? How does that distribution effect performance?
Please bear in mind, tuning is a science and it is complicated: there are whole books on the subject and some people make very fine livings as performance troubleshooters. It requires a lot of information about your system, both knowledge of what your application does and low-level information on which activities your database is doing. We can help you in your quest to find a more performant solution but we cannot just look at a shonky query and tell you how to re-write so it runs quicker.
Have the following tables (Oracle 10g):
catalog (
id NUMBER PRIMARY KEY,
name VARCHAR2(255),
owner NUMBER,
root NUMBER REFERENCES catalog(id)
...
)
university (
id NUMBER PRIMARY KEY,
...
)
securitygroup (
id NUMBER PRIMARY KEY
...
)
catalog_securitygroup (
catalog REFERENCES catalog(id),
securitygroup REFERENCES securitygroup(id)
)
catalog_university (
catalog REFERENCES catalog(id),
university REFERENCES university(id)
)
Catalog: 500 000 rows, catalog_university: 500 000, catalog_securitygroup: 1 500 000.
I need to select any 50 rows from catalog with specified root ordered by name for current university and current securitygroup. There is a query:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Where 100 - some catalog, 200 - some securitygroup, 300 - some university. This query return 50 rows from ~ 170 000 in 3 minutes.
But next query return this rows in 2 sec:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
I build next indexes: (catalog.id, catalog.name, catalog.owner), (catalog_securitygroup.catalog, catalog_securitygroup.index), (catalog_university.catalog, catalog_university.university).
Plan for first query (using PLSQL Developer):
http://habreffect.ru/66c/f25faa5f8/plan2.jpg
Plan for second query:
http://habreffect.ru/f91/86e780cc7/plan1.jpg
What are the ways to optimize the query I have?
The indexes that can be useful and should be considered deal with
WHERE c.root = 100
AND cs.catalog = c.id
AND cs.securitygroup = 200
AND cu.catalog = c.id
AND cu.university = 300
So the following fields can be interesting for indexes
c: id, root
cs: catalog, securitygroup
cu: catalog, university
So, try creating
(catalog_securitygroup.catalog, catalog_securitygroup.securitygroup)
and
(catalog_university.catalog, catalog_university.university)
EDIT:
I missed the ORDER BY - these fields should also be considered, so
(catalog.name, catalog.id)
might be beneficial (or some other composite index that could be used for sorting and the conditions - possibly (catalog.root, catalog.name, catalog.id))
EDIT2
Although another question is accepted I'll provide some more food for thought.
I have created some test data and run some benchmarks.
The test cases are minimal in terms of record width (in catalog_securitygroup and catalog_university the primary keys are (catalog, securitygroup) and (catalog, university)). Here is the number of records per table:
test=# SELECT (SELECT COUNT(*) FROM catalog), (SELECT COUNT(*) FROM catalog_securitygroup), (SELECT COUNT(*) FROM catalog_university);
?column? | ?column? | ?column?
----------+----------+----------
500000 | 1497501 | 500000
(1 row)
Database is postgres 8.4, default ubuntu install, hardware i5, 4GRAM
First I rewrote the query to
SELECT c.id, c.name, c.owner
FROM catalog c, catalog_securitygroup cs, catalog_university cu
WHERE c.root < 50
AND cs.catalog = c.id
AND cu.catalog = c.id
AND cs.securitygroup < 200
AND cu.university < 200
ORDER BY c.name
LIMIT 50 OFFSET 100
note: the conditions are turned into less then to maintain comparable number of intermediate rows (the above query would return 198,801 rows without the LIMIT clause)
If run as above, without any extra indexes (save for PKs and foreign keys) it runs in 556 ms on a cold database (this is actually indication that I oversimplified the sample data somehow - I would be happier if I had 2-4s here without resorting to less then operators)
This bring me to my point - any straight query that only joins and filters (certain number of tables) and returns only a certain number of the records should run under 1s on any decent database without need to use cursors or to denormalize data (one of these days I'll have to write a post on that).
Furthermore, if a query is returning only 50 rows and does simple equality joins and restrictive equality conditions it should run even much faster.
Now let's see if I add some indexes, the biggest potential in queries like this is usually the sort order, so let me try that:
CREATE INDEX test1 ON catalog (name, id);
This makes execution time on the query - 22ms on a cold database.
And that's the point - if you are trying to get only a page of data, you should only get a page of data and execution times of queries such as this on normalized data with proper indexes should take less then 100ms on decent hardware.
I hope I didn't oversimplify the case to the point of no comparison (as I stated before some simplification is present as I don't know the cardinality of relationships between catalog and the many-to-many tables).
So, the conclusion is
if I were you I would not stop tweaking indexes (and the SQL) until I get the performance of the query to go below 200ms as rule of the thumb.
only if I would find an objective explanation why it can't go below such value I would resort to denormalisation and/or cursors, etc...
First I assume that your University and SecurityGroup tables are rather small. You posted the size of the large tables but it's really the other sizes that are part of the problem
Your problem is from the fact that you can't join the smallest tables first. Your join order should be from small to large. But because your mapping tables don't include a securitygroup-to-university table, you can't join the smallest ones first. So you wind up starting with one or the other, to a big table, to another big table and then with that large intermediate result you have to go to a small table.
If you always have current_univ and current_secgrp and root as inputs you want to use them to filter as soon as possible. The only way to do that is to change your schema some. In fact, you can leave the existing tables in place if you have to but you'll be adding to the space with this suggestion.
You've normalized the data very well. That's great for speed of update... not so great for querying. We denormalize to speed querying (that's the whole reason for datawarehouses (ok that and history)). Build a single mapping table with the following columns.
Univ_id, SecGrp_ID, Root, catalog_id. Make it an index organized table of the first 3 columns as pk.
Now when you query that index with all three PK values, you'll finish that index scan with a complete list of allowable catalog Id, now it's just a single join to the cat table to get the cat item details and you're off an running.
The Oracle cost-based optimizer makes use of all the information that it has to decide what the best access paths are for the data and what the least costly methods are for getting that data. So below are some random points related to your question.
The first three tables that you've listed all have primary keys. Do the other tables (catalog_university and catalog_securitygroup) also have primary keys on them?? A primary key defines a column or set of columns that are non-null and unique and are very important in a relational database.
Oracle generally enforces a primary key by generating a unique index on the given columns. The Oracle optimizer is more likely to make use of a unique index if it available as it is more likely to be more selective.
If possible an index that contains unique values should be defined as unique (CREATE UNIQUE INDEX...) and this will provide the optimizer with more information.
The additional indexes that you have provided are no more selective than the existing indexes. For example, the index on (catalog.id, catalog.name, catalog.owner) is unique but is less useful than the existing primary key index on (catalog.id). If a query is written to select on the catalog.name column, it is possible to do and index skip scan but this starts being costly (and most not even be possible in this case).
Since you are trying to select based in the catalog.root column, it might be worth adding an index on that column. This would mean that it could quickly find the relevant rows from the catalog table. The timing for the second query could be a bit misleading. It might be taking 2 seconds to find 50 matching rows from catalog, but these could easily be the first 50 rows from the catalog table..... finding 50 that match all your conditions might take longer, and not just because you need to join to other tables to get them. I would always use create table as select without restricting on rownum when trying to performance tune. With a complex query I would generally care about how long it take to get all the rows back... and a simple select with rownum can be misleading
Everything about Oracle performance tuning is about providing the optimizer enough information and the right tools (indexes, constraints, etc) to do its job properly. For this reason it's important to get optimizer statistics using something like DBMS_STATS.GATHER_TABLE_STATS(). Indexes should have stats gathered automatically in Oracle 10g or later.
Somehow this grew into quite a long answer about the Oracle optimizer. Hopefully some of it answers your question. Here is a summary of what is said above:
Give the optimizer as much information as possible, e.g if index is unique then declare it as such.
Add indexes on your access paths
Find the correct times for queries without limiting by rowwnum. It will always be quicker to find the first 50 M&Ms in a jar than finding the first 50 red M&Ms
Gather optimizer stats
Add unique/primary keys on all tables where they exist.
The use of rownum is wrong and causes all the rows to be processed. It will process all the rows, assigned them all a row number, and then find those between 0 and 50. When you want to look for in the explain plan is COUNT STOPKEY rather than just count
The query below should be an improvement as it will only get the first 50 rows... but there is still the issue of the joins to look at too:
SELECT ccc.* FROM (
SELECT cc.*, ROWNUM AS n FROM (
SELECT c.id, c.name, c.owner
FROM catalog c
WHERE c.root = 100
ORDER BY name
) cc
where rownum <= 50
) ccc WHERE ccc.n > 0 AND ccc.n <= 50;
Also, assuming this for a web page or something similar, maybe there is a better way to handle this than just running the query again to get the data for the next page.
try to declare a cursor. I dont know oracle, but in SqlServer would look like this:
declare #result
table (
id numeric,
name varchar(255)
);
declare __dyn_select_cursor cursor LOCAL SCROLL DYNAMIC for
--Select
select distinct
c.id, c.name
From [catalog] c
inner join university u
on u.catalog = c.id
and u.university = 300
inner join catalog_securitygroup s
on s.catalog = c.id
and s.securitygroup = 200
Where
c.root = 100
Order by name
--Cursor
declare #id numeric;
declare #name varchar(255);
open __dyn_select_cursor;
fetch relative 1 from __dyn_select_cursor into #id,#name declare #maxrowscount int
set #maxrowscount = 50
while (##fetch_status = 0 and #maxrowscount <> 0)
begin
insert into #result values (#id, #name);
set #maxrowscount = #maxrowscount - 1;
fetch next from __dyn_select_cursor into #id, #name;
end
close __dyn_select_cursor;
deallocate __dyn_select_cursor;
--Select temp, final result
select
id,
name
from #result;