Why the parallel hint is not working in oracle 19c - oracle

SELECT /*+PARALLEL 8*/
A.ACC_ID,
COALESCE (B.PR_ID, NULL),
COALESCE (B.DESC, NULL),
COALESCE (B.CNT, 0)
FROM (SELECT ACC_ID
FROM ACC_ID_IN
WHERE XXX = 1 AND YYY = 'ABCD') A
LEFT OUTER JOIN
( SELECT /*+PARALLEL 8*/
A.ACC_ID,
B.PR_ID,
B.DESC,
COUNT (DISTINCT A.ACC_NBR) CNT
FROM DB1.TABLE1 A,
TABLE2 B,
TABLE3 C
WHERE A.ACC_ID IN (8888888888)
AND A.P_ID = B.P_ID
AND A.P_ID = C.P_ID
AND A.START <=TO_DATE (SUBSTR (A.ACC_ID, 1, 4) || '1231','YYYY/MM/DD')
AND A.END > TO_DATE (SUBSTR (A.ACC_ID, 1, 4) || '1230','YYYY/MM/DD')
GROUP BY A.ACC_ID, B.PR_ID, B.DESC
ORDER BY A.ACC_ID, B.PR_ID, B.DESC) B
ON A.ACC_ID = B.ACC_ID
WHERE A.ACC_ID IN (8888888888)
GROUP BY A.ACC_ID,
COALESCE (B.PR_ID, NULL),
COALESCE (B.DESC, NULL),
COALESCE (B.CNT, 0)
The above query with parallel hints taking very long time to execute where as same query executed in secs when I removed the parallel hints. This issue observed when oracle is upgraded to 19c. Could you please help me to understand what's happening here and do I need to perform any settings to enable parallel hints as part of DB upgrade activity.

Parallelism is tricky and there are many reasons why the degree of parallelism can be unexpected. Below are potential issues with your query, some of which others mentioned in comments:
Do you even want parallelism for such a fast query? Parallelism works harder, not smarter, so you generally only want to use it on large objects that take a long time to process. If the query runs in seconds without parallelism, it's probably not a good candidate for parallelism in the first place.
Statement level hints The /*+ PARALLEL */ hint at the top of the query applies to the whole statement, so you don't need to also add hints to other query blocks.
Incorrect hint format If you want to specify the degree of parallelism, you need to use parentheses around the number, like /*+ PARALLEL(8) */. Without the parentheses, Oracle will compute the degree of parallelism. I can't even remember all of the rules and details, but if you ask Oracle to generate the DOP for you, it will often times use CPU_COUNT * PARALLEL_THREADS_PER_CPU * NUMBER_OF_INSTANCES.
CPU_COUNT Since you've just upgraded, check if the CPU_COUNT was changed. On some platforms, such as Solaris, Oracle counts every virtual processor as a CPU, and then still multiplies it by PARALLEL_THREADS_PER_CPU, which can lead to ridiculous numbers. Although that parameter seems like something you wouldn't touch, I've shrunk the CPU_COUNT on many systems with good results. Run the below query to look at the most relevant parallel parameters.
select *
from v$parameter
where name in ('cpu_count', 'parallel_threads_per_cpu', 'parallel_degree_policy', 'parallel_degree_limit');
Find the Degree of Parallelism There are several ways to find the DOP and to find information on why the DOP was chosen. In 19c, the "Note" and "Hint Report" section of the explain plan can give detailed information on how the DOP was calculated. Use explain plan for select ... and then select * from table(dbms_xplan.display); to find that information. As astentx suggested, select dbms_sqltune.report_sql_monitor('&SQL_ID') from dual can also be a great help with long-running queries.

Instead of providing hint to the entire query - You can specify to a driving table.
/*+ parallel(A,8) */
and then produce a explain plan and see the DOP was in place or not.

Related

When do we use WITH clause, and what are main benefits of it?

I was working on task about optimization queries. One of the improvement ways was using WITH clause. I notice that it did very good job, and it lead to shorter time of execution, but i am not sure now, when should I use WITH clause and is there any risk of using it?
Here is one of the queries that I am working on :
WITH MY_TABLE AS
( SELECT PROD_KY,
sum(GROUPISPRIVATE) AS ISPRIVATE,
sum(GROUPISSHARED) AS ISSHARED
FROM
(
SELECT GRP_PROD_CUSTOMER.PROD_KY,
1 as ISPRIVATE,
0 as ISSHARED
FROM CUSTOMER
JOIN GRP_CUSTOMER ON GRP_CUSTOMER.CUST_KY = CUSTOMER.CUST_KY
JOIN GRP_PROD_CUSTOMER ON GRP_PROD_CUSTOMER.GRP_KY = GRP_CUSTOMER.GRP_KY
GROUP BY GRP_PROD_CUSTOMER.PROD_KY
)
GROUP BY PROD_KY
)
SELECT * FROM MY_TABLE;
is there any risk of using it?
Yes. Oracle may decide to materialize the subquery, which means writing its result set to disk and then reading it back (except it might not mean that in 12cR2 or later). That unexpected I/O could be a performance hit. Not always, and usually we can trust the optimizer to make the correct choice. However, Oracle has provided us with hints to tell the optimizer how to handle the result set: /*+ materialize */ to um materialize it and /*+ inline */ to keep it in memory.
I start with this potential downside because I think it's important to understand that the WITH clause is not a silver bullet and it won't improve every single query, and may even degrade performance. For instance I share the scepticism of the other commenters that the query you posted is in any way faster because you re-wrote it as a common table expression.
Generally, the use cases for the WITH clause are:
We want to use the result set from the subquery multiple times
with cte as
( select blah from meh )
select *
from t1
join t2 on t1.id = t2.id
where t1.col1 in ( select blah from cte )
and t2.col2 not in ( select blah from cte)
We want to be build a cascade of subqueries:
with cte as
( select id, blah from meh )
, cte2 as
( select t2.*, cte.blah
from cte
join t2 on t2.id = cte.id)
, cte3 as
( select t3.*, cte2.*
from cte2
join t3 on t3.col2 = cte2.something )
….
This second approach is beguiling and can be useful for implementing complex business logic in pure SQL. But it can lead to a procedural mindset and lose the power sets and joins. This too is a risk.
We want to use recursive WITH clause. This allows us to replace Oracle's own CONNECT BY syntax with a more standard approach. Find out more
In 12c and later we can write user-defined functions in the WITH clause. This is a powerful feature, especially for users who need to implement some logic in PL/SQL but only have SELECT access to the database. Find out more
For the record I have seen some very successful and highly performative uses of the second type of WITH clause. However I have also seen uses of WITH when it would have been just as easy to write an inline view. For instance, this is just using the WITH clause as syntactic sugar ...
with cte as
( select id, blah from meh )
select t2.*, cte.blah
from t2
join cte on cte.id = t2.id
… and would be clearer as ...
select t2.*, cte.blah
from t2
join ( select id, blah from meh ) cte on cte.id = t2.id
WITH clause is introduced in oracle to match SQL-99 standard.
The main purpose is to reduce the complexity and repetitive code.
Lets say you need to find the average salary of one department and then need to fetch all the department(d1) with more than average salary of that department(d1).
This can make multiple references to the subquery more efficient and readable.
The MATERIALIZE and INLINE optimizer hints can be used to influence the decision. The undocumented MATERIALIZE hint tells the optimizer to resolve the subquery as a global temporary table, while the INLINE hint tells it to process the query inline. Decision to use the hint is purely depends on logic that we are going to implement in query.
In oracle 12c, declaration of PL/SQL Block in WITH clause is introduced.
You must refer it from oracle documents.
Cheers!!
Your query is rather useless in terms of WITH statement (aka Common Table Expression, CTE)
Anyway, using the WITH clause brings several benefits:
The query is better readable (in my opinion)
You can use the same subquery several times in the main query. You can even cascade them.
Oracle can materialize the subquery, i.e. Oracle may create a temporary table and stores result of the subquery in it. This can give better performance.
The WITH clause may be processed as an inline view or resolved as a temporary table. The SQL WITH clause is very similar to the use of Global temporary tables. This technique is often used to improve query speed for complex subqueries and enables the Oracle optimizer to push the necessary predicates into the views.
The advantage of the latter is that repeated references to the subquery may be more efficient as the data is easily retrieved from the temporary table, rather than being requeried by each reference. You should assess the performance implications of the WITH clause on a case-by-case basis.
You can read more here:
http://www.dba-oracle.com/t_with_clause.htm
https://oracle-base.com/articles/misc/with-clause
one point to consider is, that different RDBMS handle the with clause - aka common table expressions (CTE) aka subquery factoring - differently:
Oracle may use a materialization or an inlining (as already explained in the answer provided by APC)
postgres always uses a materialization in releases up to 11 (so here a CTE is an optimization fence). In postgres 12 the behaviour changes and is similar to Oracles approach: https://info.crunchydata.com/blog/with-queries-present-future-common-table-expressions. You even have something that almost looks like a hint (though it is known that postgres does not use hints...)
in SQL Server currently a CTE is always inlined, as explained in https://erikdarlingdata.com/2019/08/what-would-materialized-ctes-look-like-in-sql-server/
So depending on the RDBMS you use and its version your mileage may vary.

Difference between count (*) and count (1) with join [duplicate]

Just wondering if any of you people use Count(1) over Count(*) and if there is a noticeable difference in performance or if this is just a legacy habit that has been brought forward from days gone past?
The specific database is SQL Server 2005.
There is no difference.
Reason:
Books on-line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"
"1" is a non-null expression: so it's the same as COUNT(*).
The optimizer recognizes it for what it is: trivial.
The same as EXISTS (SELECT * ... or EXISTS (SELECT 1 ...
Example:
SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID
SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID
Same IO, same plan, the works
Edit, Aug 2011
Similar question on DBA.SE.
Edit, Dec 2011
COUNT(*) is mentioned specifically in ANSI-92 (look for "Scalar expressions 125")
Case:
a) If COUNT(*) is specified, then the result is the cardinality of T.
That is, the ANSI standard recognizes it as bleeding obvious what you mean. COUNT(1) has been optimized out by RDBMS vendors because of this superstition. Otherwise it would be evaluated as per ANSI
b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-
In SQL Server, these statements yield the same plans.
Contrary to the popular opinion, in Oracle they do too.
SYS_GUID() in Oracle is quite computation intensive function.
In my test database, t_even is a table with 1,000,000 rows
This query:
SELECT COUNT(SYS_GUID())
FROM t_even
runs for 48 seconds, since the function needs to evaluate each SYS_GUID() returned to make sure it's not a NULL.
However, this query:
SELECT COUNT(*)
FROM (
SELECT SYS_GUID()
FROM t_even
)
runs for but 2 seconds, since it doen't even try to evaluate SYS_GUID() (despite * being argument to COUNT(*))
I work on the SQL Server team and I can hopefully clarify a few points in this thread (I had not seen it previously, so I am sorry the engineering team has not done so previously).
First, there is no semantic difference between select count(1) from table vs. select count(*) from table. They return the same results in all cases (and it is a bug if not). As noted in the other answers, select count(column) from table is semantically different and does not always return the same results as count(*).
Second, with respect to performance, there are two aspects that would matter in SQL Server (and SQL Azure): compilation-time work and execution-time work. The Compilation time work is a trivially small amount of extra work in the current implementation. There is an expansion of the * to all columns in some cases followed by a reduction back to 1 column being output due to how some of the internal operations work in binding and optimization. I doubt it would show up in any measurable test, and it would likely get lost in the noise of all the other things that happen under the covers (such as auto-stats, xevent sessions, query store overhead, triggers, etc.). It is maybe a few thousand extra CPU instructions. So, count(1) does a tiny bit less work during compilation (which will usually happen once and the plan is cached across multiple subsequent executions). For execution time, assuming the plans are the same there should be no measurable difference. (One of the earlier examples shows a difference - it is most likely due to other factors on the machine if the plan is the same).
As to how the plan can potentially be different. These are extremely unlikely to happen, but it is potentially possible in the architecture of the current optimizer. SQL Server's optimizer works as a search program (think: computer program playing chess searching through various alternatives for different parts of the query and costing out the alternatives to find the cheapest plan in reasonable time). This search has a few limits on how it operates to keep query compilation finishing in reasonable time. For queries beyond the most trivial, there are phases of the search and they deal with tranches of queries based on how costly the optimizer thinks the query is to potentially execute. There are 3 main search phases, and each phase can run more aggressive(expensive) heuristics trying to find a cheaper plan than any prior solution. Ultimately, there is a decision process at the end of each phase that tries to determine whether it should return the plan it found so far or should it keep searching. This process uses the total time taken so far vs. the estimated cost of the best plan found so far. So, on different machines with different speeds of CPUs it is possible (albeit rare) to get different plans due to timing out in an earlier phase with a plan vs. continuing into the next search phase. There are also a few similar scenarios related to timing out of the last phase and potentially running out of memory on very, very expensive queries that consume all the memory on the machine (not usually a problem on 64-bit but it was a larger concern back on 32-bit servers). Ultimately, if you get a different plan the performance at runtime would differ. I don't think it is remotely likely that the difference in compilation time would EVER lead to any of these conditions happening.
Net-net: Please use whichever of the two you want as none of this matters in any practical form. (There are far, far larger factors that impact performance in SQL beyond this topic, honestly).
I hope this helps. I did write a book chapter about how the optimizer works but I don't know if its appropriate to post it here (as I get tiny royalties from it still I believe). So, instead of posting that I'll post a link to a talk I gave at SQLBits in the UK about how the optimizer works at a high level so you can see the different main phases of the search in a bit more detail if you want to learn about that. Here's the video link: https://sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer
Clearly, COUNT(*) and COUNT(1) will always return the same result. Therefore, if one were slower than the other it would effectively be due to an optimiser bug. Since both forms are used very frequently in queries, it would make no sense for a DBMS to allow such a bug to remain unfixed. Hence you will find that the performance of both forms is (probably) identical in all major SQL DBMSs.
In the SQL-92 Standard, COUNT(*) specifically means "the cardinality of the table expression" (could be a base table, `VIEW, derived table, CTE, etc).
I guess the idea was that COUNT(*) is easy to parse. Using any other expression requires the parser to ensure it doesn't reference any columns (COUNT('a') where a is a literal and COUNT(a) where a is a column can yield different results).
In the same vein, COUNT(*) can be easily picked out by a human coder familiar with the SQL Standards, a useful skill when working with more than one vendor's SQL offering.
Also, in the special case SELECT COUNT(*) FROM MyPersistedTable;, the thinking is the DBMS is likely to hold statistics for the cardinality of the table.
Therefore, because COUNT(1) and COUNT(*) are semantically equivalent, I use COUNT(*).
COUNT(*) and COUNT(1) are same in case of result and performance.
I would expect the optimiser to ensure there is no real difference outside weird edge cases.
As with anything, the only real way to tell is to measure your specific cases.
That said, I've always used COUNT(*).
As this question comes up again and again, here is one more answer. I hope to add something for beginners wondering about "best practice" here.
SELECT COUNT(*) FROM something counts records which is an easy task.
SELECT COUNT(1) FROM something retrieves a 1 per record and than counts the 1s that are not null, which is essentially counting records, only more complicated.
Having said this: Good dbms notice that the second statement will result in the same count as the first statement and re-interprete it accordingly, as not to do unnecessary work. So usually both statements will result in the same execution plan and take the same amount of time.
However from the point of readability you should use the first statement. You want to count records, so count records, not expressions. Use COUNT(expression) only when you want to count non-null occurences of something.
I ran a quick test on SQL Server 2012 on an 8 GB RAM hyper-v box. You can see the results for yourself. I was not running any other windowed application apart from SQL Server Management Studio while running these tests.
My table schema:
CREATE TABLE [dbo].[employee](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Total number of records in Employee table: 178090131 (~ 178 million rows)
First Query:
Set Statistics Time On
Go
Select Count(*) From Employee
Go
Set Statistics Time Off
Go
Result of First Query:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 35 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 10766 ms, elapsed time = 70265 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Second Query:
Set Statistics Time On
Go
Select Count(1) From Employee
Go
Set Statistics Time Off
Go
Result of Second Query:
SQL Server parse and compile time:
CPU time = 14 ms, elapsed time = 14 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 11031 ms, elapsed time = 70182 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
You can notice there is a difference of 83 (= 70265 - 70182) milliseconds which can easily be attributed to exact system condition at the time queries are run. Also I did a single run, so this difference will become more accurate if I do several runs and do some averaging. If for such a huge data-set the difference is coming less than 100 milliseconds, then we can easily conclude that the two queries do not have any performance difference exhibited by the SQL Server Engine.
Note : RAM hits close to 100% usage in both the runs. I restarted SQL Server service before starting both the runs.
SET STATISTICS TIME ON
select count(1) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 36 ms.
select count(*) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 37 ms.
I've ran this hundreds of times, clearing cache every time.. The results vary from time to time as server load varies, but almost always count(*) has higher cpu time.
There is an article showing that the COUNT(1) on Oracle is just an alias to COUNT(*), with a proof about that.
I will quote some parts:
There is a part of the database software that is called “The
Optimizer”, which is defined in the official documentation as
“Built-in database software that determines the most efficient way to
execute a SQL statement“.
One of the components of the optimizer is called “the transformer”,
whose role is to determine whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement
that could be more efficient.
Would you like to see what the optimizer does when you write a query
using COUNT(1)?
With a user with ALTER SESSION privilege, you can put a tracefile_identifier, enable the optimizer tracing and run the COUNT(1) select, like: SELECT /* test-1 */ COUNT(1) FROM employees;.
After that, you need to localize the trace files, what can be done with SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace';. Later on the file, you will find:
SELECT COUNT(*) “COUNT(1)” FROM “COURSE”.”EMPLOYEES” “EMPLOYEES”
As you can see, it's just an alias for COUNT(*).
Another important comment: the COUNT(*) was really faster two decades ago on Oracle, before Oracle 7.3:
Count(1) has been rewritten in count(*) since 7.3 because Oracle like
to Auto-tune mythic statements. In earlier Oracle7, oracle had to
evaluate (1) for each row, as a function, before DETERMINISTIC and
NON-DETERMINISTIC exist.
So two decades ago, count(*) was faster
For another databases as Sql Server, it should be researched individually for each one.
I know that this question is specific for SQL Server, but the other questions on SO about the same subject (without mention a specific database) were closed and marked as duplicated from this answer.
In all RDBMS, the two ways of counting are equivalent in terms of what result they produce. Regarding performance, I have not observed any performance difference in SQL Server, but it may be worth pointing out that some RDBMS, e.g. PostgreSQL 11, have less optimal implementations for COUNT(1) as they check for the argument expression's nullability as can be seen in this post.
I've found a 10% performance difference for 1M rows when running:
-- Faster
SELECT COUNT(*) FROM t;
-- 10% slower
SELECT COUNT(1) FROM t;
COUNT(1) is not substantially different from COUNT(*), if at all. As to the question of COUNTing NULLable COLUMNs, this can be straightforward to demo the differences between COUNT(*) and COUNT(<some col>)--
USE tempdb;
GO
IF OBJECT_ID( N'dbo.Blitzen', N'U') IS NOT NULL DROP TABLE dbo.Blitzen;
GO
CREATE TABLE dbo.Blitzen (ID INT NULL, Somelala CHAR(1) NULL);
INSERT dbo.Blitzen SELECT 1, 'A';
INSERT dbo.Blitzen SELECT NULL, NULL;
INSERT dbo.Blitzen SELECT NULL, 'A';
INSERT dbo.Blitzen SELECT 1, NULL;
SELECT COUNT(*), COUNT(1), COUNT(ID), COUNT(Somelala) FROM dbo.Blitzen;
GO
DROP TABLE dbo.Blitzen;
GO

Optimize Oracle 11g Procedure

I have a procedure to find the first, last, max and min prices for a series of transactions in a very large table which is organized by date, object name, and a code. I also need the sum of quantities transacted. There are about 3 billion rows in the table and this procedure takes many days to run. I would like to cut that time down as much as possible. I have an index on the distinct fields in the trans table, and looking at the explain plan on the select portion of the queries, the index is being used. I am open to suggestions on an alternate approach. I use Oracle 11g R2. Thank you.
declare
cursor c_iter is select distinct dt, obj, cd from trans;
r_iter c_iter%ROWTYPE;
v_fir number(15,8);
v_las number(15,8);
v_max number(15,8);
v_min number(15,8);
v_tot number;
begin
open c_iter;
loop
fetch c_iter into r_iter;
exit when c_iter%NOTFOUND;
select max(fir), max(las) into v_fir, v_las
from
( select
first_value(prc) over (order by seq) as "FIR",
first_value(prc) over (order by seq desc) as "LAS"
from trans
where dt = r_iter.DT and obj = r_iter.OBJ and cd = r_iter.CD );
select max(prc), min(prc), sum(qty) into v_max, v_min, v_tot
from trans
where dt = r_iter.DT and obj = r_iter.OBJ and cd = r_iter.CD;
insert into stats (obj, dt, cd, fir, las, max, min, tot )
values (r_iter.OBJ, r_iter.DT, r_iter.CD, v_fir, v_las, v_max, v_min, v_tot);
commit;
end loop;
close c_iter;
end;
alter session enable parallel dml;
insert /*+ append parallel(stats)*/
into stats(obj, dt, cd, fir, las, max, min, tot)
select /*+ parallel(trans) */ obj, dt, cd
,max(prc) keep (dense_rank first order by seq) fir
,max(prc) keep (dense_rank first order by seq desc) las
,max(prc) max, min(prc) min, sum(qty) tot
from trans
group by obj, dt, cd;
commit;
A single SQL statement is usually significantly faster than multiple SQL statements. They sometimes require more resources, like more temporary tablespace, but your distinct cursor is probably already sorting the entire table on disk anyway.
You may want to also enable parallel DML and parallel query, although depending on your object and system settings this may already be happening. (And it may not necessarily be a good thing, depending on your resources, but it usually helps large queries.)
Parallel write and APPEND should improve performance if the SQL writes a lot of data, but it also means that the new table will not be recoverable until the next backup. (Parallel DML will automatically use direct path writes, but I usually include APPEND anyway just in case the parallelism doesn't work correctly.)
There's a lot to consider, even for such a small query, but this is where I'd start.
Not the solid answer I'd like to give, but a few things to consider:
The first would be using a bulk collect. However, since you're using 11g, hopefully this is already being done for you automatically.
Do you really need to commit after every single iteration? I could be wrong, but am guessing this is one of your top time consumers.
Finally, +1 for jonearles' answer. (I wasn't sure if I'd be able to write everything into a single SQL query, but I was going to suggest this as well.)
You could try and make the query run in parallel, there is a reasonable Oracle White Paper on this here. This isn't an Oracle feature that I've ever had to use myself so I've no first hand experience of it to pass on. You will also need to have enough resources free on the Oracle server to allow you run the parallel processes that this will create.

oracle : how to ensure that a function in the where clause will be called only after all the remaining where clauses have filtered the result?

I am writing a query to this effect:
select *
from players
where player_name like '%K%
and player_rank<10
and check_if_player_is_eligible(player_name) > 1;
Now, the function check_if_player_is_eligible() is heavy and, therefore, I want the query to filter the search results sufficiently and then only run this function on the filtered results.
How can I ensure that the all filtering happens before the function is executed, so that it runs the minimum number of times ?
Here's two methods where you can trick Oracle into not evaluating your function before all the other WHERE clauses have been evaluated:
Using rownum
Using the pseudo-column rownum in a subquery will force Oracle to "materialize" the subquery. See for example this askTom thread for examples.
SELECT *
FROM (SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1)
WHERE check_if_player_is_eligible(player_name) > 1
Here's the documentation reference "Unnesting of Nested Subqueries":
The optimizer can unnest most subqueries, with some exceptions. Those exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery.
Using CASE
Using CASE you can force Oracle to only evaluate your function when the other conditions are evaluated to TRUE. Unfortunately it involves duplicating code if you want to make use of the other clauses to use indexes as in:
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND CASE
WHEN player_name LIKE '%K%'
AND player_rank < 10
THEN check_if_player_is_eligible(player_name)
END > 1
There is the NO_PUSH_PRED hint to do it without involving rownum evaluation (that is a good trick anyway) in the process!
SELECT /*+NO_PUSH_PRED(v)*/*
FROM (
SELECT *
FROM players
WHERE player_name LIKE '%K%'
AND player_rank < 10
) v
WHERE check_if_player_is_eligible(player_name) > 1
You usually want to avoid forcing a specific order of execution. If the data or the query changes, your hints and tricks may backfire. It's usually better to provide useful metadata to Oracle so it can make the correct decisions for you.
In this case, you can provide better optimizer statistics about the function with ASSOCIATE STATISTICS.
For example, if your function is very slow because it has to read 50 blocks each time it is called:
associate statistics with functions
check_if_player_is_eligible default cost(1000 /*cpu*/, 50 /*IO*/, 0 /*network*/);
By default Oracle assumes that a function will select a row 1/20th of the time. Oracle wants to eliminate as many rows as soon
as possible, changing the selectivity should make the function less likely to be executed first:
associate statistics with functions
check_if_player_is_eligible default selectivity 90;
But this raises some other issues. You have to pick a selectivity for ALL possible conditions, 90% certainly won't always be accurate. The IO cost is the number of blocks fetched, but CPU cost is "machine instructions used", what exactly does that mean?
There are more advanced ways to customize statistics,for example using the Oracle Data Cartridge Extensible Optimizer. But data cartridge is probably one of the most difficult Oracle features.
You did't specify whether player.player_name is unique or not. One could assume that it is and then the database has to call the function at least once per result record.
But, if player.player_name is not unique, you would want to minimize the calls down to count(distinct player.player_name) times. As (Ask)Tom shows in Oracle Magazine, the scalar subquery cache is an efficient way to do this.
You would have to wrap your function call into a subselect in order to make use of the scalar subquery cache:
SELECT players.*
FROM players,
(select check_if_player_is_eligible(player.player_name) eligible) subq
WHERE player_name LIKE '%K%'
AND player_rank < 10
AND ROWNUM >= 1
AND subq.eligible = 1
Put the original query in a derived table then place the additional predicate in the where clause of the derived table.
select *
from (
select *
from players
where player_name like '%K%
and player_rank<10
) derived_tab1
Where check_if_player_is_eligible(player_name) > 1;

Hibernate Index Query Slow

My question is similar to the one posed in this thread:
How to avoid this very heavy query that slows down the application?
We checked for missing indexes on foreign keys and found some. Adding the missing indexes actually had the opposite effect in that it slowed the query even more. One important piece of information is that our customer has a single Oracle install with our schema replicated on it 21 times. Each schema has just shy of 1,000 tables in it. Are we asking too much of Oracle with such a large number of tables (and of course indexes)? I don't know what their hardware is but my question is whether this is a reasonable approach or would it be be better to break up the users to different SIDs?
Below is the query that is being executed by Hibernate. The customer is telling us that this query is consuming about 45% of the processor when it is being executed (though I don't know for how long).
Any suggestions are appreciated,
Steve
SELECT NULL AS table_cat,
owner AS table_schem,
table_name,
0 AS non_unique,
NULL AS index_qualifier,
NULL AS index_name,
0 AS TYPE,
0 AS ordinal_position,
NULL AS column_name,
NULL AS asc_or_desc,
num_rows AS CARDINALITY,
blocks AS pages,
NULL AS filter_condition
FROM all_tables
WHERE table_name = 'BOOKING'
AND owner = 'FORWARD_TN'
UNION
SELECT NULL AS table_cat,
i.owner AS table_schem,
i.table_name,
DECODE (i.uniqueness, 'UNIQUE', 0, 1),
NULL AS index_qualifier,
i.index_name,
1 AS TYPE,
c.column_position AS ordinal_position,
c.column_name,
NULL AS asc_or_desc,
i.distinct_keys AS CARDINALITY,
i.leaf_blocks AS pages,
NULL AS filter_condition
FROM all_indexes i,
all_ind_columns c
WHERE i.table_name = 'BOOKING'
AND i.owner = 'FORWARD_TN'
AND i.index_name = c.index_name
AND i.table_owner = c.table_owner
AND i.table_name = c.table_name
AND i.owner = c.index_owner
ORDER BY non_unique,
TYPE,
index_name,
ordinal_position
You're not hitting any kind of capacity issue with 1,000 tables. That's still relatively small in the Oracle world. Just doing a quick check of our E-Business Suite install and it has 23,000 tables. A query using up a ton of CPU is almost always an execution plan problem. Some things to look at
Have you collected optimizer statistics? Without them the optimizer may be making a really poor decision on how to execute the query.
The next step is to look the execution plan itself. If you have the enterprise manager running, it probably has that query right up on the front page for consuming resources. You can just click on it and see what it's doing. Without that you have to use sql_trace or explain plan to see what's happening.
You could be easily hit by excessive parse time on the Oracle server if all of your statements use slightly different, literal SQL (i.e. no bind variables) for each of the 22.000 tables. If this is the case, just switch to bind variables and the CPU consumption should lower substantially.
Can your customer tell in what Oracle function the CPU time is spent? Oracle offers the neccessary statistics. As the CPU consumption is reported to be a significant part of the whole instance's consumption, your customer could run a statspack report (or ASH if he has licensed the diagnostic pack). That should show where exactly the CPU time is being spent.
Our customer reviewed their Oracle configuration and changed the configuration to the following values:
optimizer_index_caching=90
optimizer_index_cost_adj=15
optimizer_mode='CHOOSE'
They report that this seems to have fixed their speed issue.

Resources