Oracle EXECUTE IMMEDIATE changes explain plan of query - oracle

I have a stored procedure that I am calling using EXECUTE IMMEDIATE. The issue that I am facing is that the explain plan is different when I call the procedure directly vs when I use EXECUTE IMMEDIATE to call the procedure. This is causing the execution time to increase 5x. The main difference between the plans is that when I use execute immediate the optimizer isn't unnesting the subquery (I'm using a NOT EXISTS condition). We are using Rule Based Optimizer here at work for most queries but this one has a hint to use an index so the CBO is being used (however, we don't collect stats on tables). We are running Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production.
Example:
Fast:
begin
package.procedure;
end;
/
Slow:
begin
execute immediate 'begin package.' || proc_name || '; end;';
end;
/
Query:
SELECT /*+ INDEX(A IDX_A_1) */
a.store_cd,
b.itm_cd itm_cd,
CEIL ( (new_date - a.dt) / 7) week_num,
SUM (a.qty * b.demand_weighting * b.CONVERT) qty
FROM a
INNER JOIN
b
ON (a.itm_cd = b.old_itm_cd)
INNER JOIN
(SELECT g.store_grp_cd, g.store_cd
FROM g, h
WHERE g.store_grp_cd = h.fdo_cd AND h.fdo_type = '1') d
ON (a.store_cd = d.store_cd AND b.store_grp_cd = d.store_grp_cd)
CROSS JOIN
dow
WHERE a.dt BETWEEN dow.new_date - 91 AND dow.new_date - 1
AND a.sls_wr_cd = 'W'
AND b.demand_type = 'S'
AND b.old_itm_cd IS NOT NULL
AND NOT EXISTS
(SELECT
NULL
FROM f
WHERE f.store_grp_cd = a.store_cd
AND b.old_itm_cd = f.old_itm_cd)
GROUP BY a.store_cd, b.itm_cd, CEIL ( (dow.new_date - a.dt) / 7)
Good Explain Plan:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE ID PARENT_ID
SELECT STATEMENT 0
SORT GROUP BY 1 0
NESTED LOOPS 2 1
HASH JOIN ANTI 3 2
TABLE ACCESS BY INDEX ROWID H 4 3
NESTED LOOPS 5 4
NESTED LOOPS 6 5
NESTED LOOPS 7 6
TABLE ACCESS FULL B 8 7
TABLE ACCESS BY INDEX ROWID A 9 7
INDEX RANGE SCAN IDX_A_1 UNIQUE 10 9
INDEX UNIQUE SCAN G UNIQUE 11 6
INDEX RANGE SCAN H_UK UNIQUE 12 5
TABLE ACCESS FULL F 13 3
TABLE ACCESS FULL DOW 14 2
Bad Explain Plan:
OPERATION OPTIONS OBJECT_NAME OBJECT_TYPE ID PARENT_ID
SELECT STATEMENT 0
SORT GROUP BY 1 0
NESTED LOOPS 2 1
NESTED LOOPS 3 2
NESTED LOOPS 4 3
NESTED LOOPS 5 4
TABLE ACCESS FULL B 6 5
TABLE ACCESS BY INDEX ROWID A 7 5
INDEX RANGE SCAN IDX_A_1 UNIQUE 8 7
TABLE ACCESS FULL F 9 8
INDEX UNIQUE SCAN G UNIQUE 10 4
TABLE ACCESS BY INDEX ROWID H 11 3
INDEX RANGE SCAN H_UK UNIQUE 12 11
TABLE ACCESS FULL DOW 13 2
In the bad explain plan the subquery is not being unnested. I was able to reproduce the bad plan by adding a no_unnest hint to the subquery; however, I couldn't reproduce the good plan using the unnest hint (when running the procedure using execute immediate). Other hints are being considered by the optimizer when using the execute immediate just not the unnest hint.
This issue only occurs when I use execute immediate to call the procedure. If I use execute immediate on the query itself it uses the good plan.

You've used ANSI join syntax which will force the use of the CBO
(see http://jonathanlewis.wordpress.com/2008/03/20/ansi-sql/)
"Once you’re running cost-based with no statistics, there are all sorts of little things that might be enough to cause unexpected behaviour in execution plan."

There's a few steps you can take. the first is a 10046 trace.
Ideally I would start a trace on a single session that executes both the 'good' and 'bad' queries. The trace file should contain both queries with a hard parse. I'd be interested in WHY the second has a hard parse as, if it has the same SQL structure and same parsing user, there's not a lot of reason for the second hard parse. The same session should mean there's no oddities from different memory settings etc.
The SQL doesn't show any use of variables, so there should be no datatype issues. All columns are 'tied' to a table alias, so there seems no scope for confusing variables with columns.
The more extreme step is a 10053 trace. There's a viewer posted on Jonathan Lewis' site. That can allow you to get into the guts of the optimization to try to work out the reason for the differing plans.
In the wider view, 9i is pretty much dead and the RBO is pretty much dead. I'd be seriously evaluating a project to move the app to CBO. There are features that will force the CBO to be used and without stats this manner of problem will keep cropping up.

It turns out that this is a known bug in Oracle 9i. Below is the text from a bug report.
Execute Immediate Gives Bad Query Plan [ID 398605.1]
Modified 09-NOV-2006 Type PROBLEM Status MODERATED
This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process, and therefore has not been subject to an independent technical review.
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.6
This problem can occur on any platform.
Symptoms
When a procedure is run through execute immediate the plan produced is a different than when procedure is run directly.
Cause
The cause of this problem has been identified and verified in an unpublished Bug 2906307.
It is caused by the fact that SQL statements issued from PLSQL at a recursive
depth greater than 1 may get different execution plans to those issued directly from SQL.
There are multiple optimizer features affected by this bug (for example _unnest_subquery,_pred_move_around=true)
HINTS related to the features may also be ignored.
This bug covers the same basic issue as Bug 2871645 Complex view merging does not occur for
recursive SQL > depth 1 but for features other than complex view merging.
Bug 2906307 is closed as a duplicate of Bug 3182582 SQL STATEMENT RUN SLOWER IN DBMS_JOB THAN IN SQL*PLUS.
It is fixed in 10.2
Solution
For insert statements use hint BYPASS_RECURSIVE_CHECK:
INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO table
References
BUG:2871645 - COMPLEX VIEW MERGING DOES NOT OCCUR FOR RECURSIVE SQL > DEPTH 1
BUG:3182582 - SQL STATEMENT RUN SLOWER IN DBMS_JOB THAN IN SQL*PLUS

It turns out that this is a known bug in Oracle 9i. Below is the text from a bug report.
Execute Immediate Gives Bad Query Plan [ID 398605.1]
Modified 09-NOV-2006 Type PROBLEM Status MODERATED
This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process, and therefore has not been subject to an independent technical review.
Applies to: Oracle Server - Enterprise Edition - Version: 9.2.0.6 This problem can occur on any platform.
Symptoms When a procedure is run through execute immediate the plan produced is a different than when procedure is run directly.
Cause The cause of this problem has been identified and verified in an unpublished Bug 2906307. It is caused by the fact that SQL statements issued from PLSQL at a recursive depth greater than 1 may get different execution plans to those issued directly from SQL. There are multiple optimizer features affected by this bug (for example _unnest_subquery,_pred_move_around=true) HINTS related to the features may also be ignored.
This bug covers the same basic issue as Bug 2871645 Complex view merging does not occur for recursive SQL > depth 1 but for features other than complex view merging.
Bug 2906307 is closed as a duplicate of Bug 3182582 SQL STATEMENT RUN SLOWER IN DBMS_JOB THAN IN SQL*PLUS. It is fixed in 10.2
Solution For insert statements use hint BYPASS_RECURSIVE_CHECK: INSERT /*+ BYPASS_RECURSIVE_CHECK */ INTO table
References BUG:2871645 - COMPLEX VIEW MERGING DOES NOT OCCUR FOR RECURSIVE SQL > DEPTH 1 BUG:3182582 - SQL STATEMENT RUN SLOWER IN DBMS_JOB THAN IN SQL*PLUS

Related

SP using table/index with volatile statistics that differ at compile and run time

I’m a longtime MSSQL developer who finds himself back in PL/SQL for the first time since Oracle 7. I’m looking for some tuning advice re a large export stored procedure, which is sporadically and not very reproducably running slow at certain points. This happens around some static working tables which it truncates, fills and uses as part of the export. The code in outline typically looks like this:
create or replace Procedure BigMultiPurposeExport as (
-- about 2000 lines of other code
INSERT WORK_TABLE_5 SELECT WHATEVER1 FROM WHEREVER1;
INSERT WORK_TABLE_5 SELECT WHATEVER2 FROM WHEREVER2;
INSERT WORK_TABLE_5 SELECT WHATEVER3 FROM WHEREVER3;
INSERT WORK_TABLE_5 SELECT WHATEVER4 FROM WHEREVER4;
-- WORK_TABLE_5 now has 0 to ~500k rows whose content can vary drastically from run to run
-- e.g. one hourly run exports 3 whale sightings, next exports all tourist visits to Kenya this decade
-- about 1000 lines of other code
INSERT OUTPUT_TABLE_3
SELECT THIS, THAT, THE_OTHER
FROM BUSINESS_TABLE_1 BT1
INNER JOIN BUSINESS_TABLE_2 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_3 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_1 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_2 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_3 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_5 WT5 ON BT1.ID = WT5.BT1_ID AND WT5.RECORD_TYPE = 21
-- join above is now supported by indexes on BUSINESS_TABLE_1 (ID) and WORK_TABLE_5 (BT1_ID, RECORD_TYPE), originally wasn't
LEFT OUTER JOIN WORK_TABLE_6 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_7 ON etc -- typical join on indexed columns
-- about 4000 lines of other code
)
That final insert into OUTPUT_TABLE_3 usually runs in under 10 seconds, but once in a while on certain customer servers it times out at our default 99 minutes. Then we have them take the tiemout off and run it on Friday night, and it finishes but takes 16 hours.
I narrowed the problem down to the join to WORK_TABLE_5, which had no index support, and put an index on the join terms. The next run took 4 seconds. But success has been intermittent, the customer occasionally gets some slow runs when they drastically change their export selection (i.e. drastically change the data in WORK_TABLE_5). And if we update statistics and rebuild indexes after a timed out export, it runs fine at the next attempt.
So, I am wondering about how best to handle truncating/filling static work tables with static indexes, statistics updated overnight, and a stored procedure compiled when the statistics are nothing like runtime.
I have a few general questions about things I'd like to understand better:
Is the nature of the data in the work table going to substantially effect the query plan? Does Oracle form its query plan when you compile the stored procedure? Could we get a highly inappropriate query plan if we compile the stored procedure with the table empty then use a table with 500k rows at runtime?
I expect that if this were an ad-hoc script then updating statistics on the problem table just before selecting from it would eliminate the sporadic slowdowns. But what if I were to update statistics inside the stored procedure, which is compiled with different statistics from runtime?
Anything else you'd like to add...
Thanks for any advice. I hope my MSSQL preconceptions haven't made me too far off base.
This is happening in Oracle 11g, but the code is deployed to assorted customers using Oracle 10 through 12 and I'd like to cater to all of those if possible.
-- Joel
Huge differences in table or index sizes can most definitely cause performance problems. The solution is to add statistics gathering to the procedure instead of relying on the default statistics jobs.
If you've been away from Oracle since version 7, the most important new feature is the Cost Based Optimizer. Oracle now builds query execution plans based on the optimizer statistics of tables, indexes, columns, expressions, system statistics, outlines, directives, dynamic sampling, etc. If you're a full time Oracle developer you should probably spend a day reading about optimizer statistics. Start with Managing Optimizer Statistics and DBMS_STATS in the official documentation.
Eventually the stored procedure should look like this:
--1: Insert into working tables.
insert into work_table...
--2: Gather statistics on working tables.
dbms_stats.gather_table_stats('SCHEMA_NAME', 'WORK_TABLE', ...);
--3: Use working tables.
insert into other_table select * from work_table...
There are so many statistics features it's hard to know exactly what parameters to use in that second step above. Here are some guesses about some features you might find useful:
DEGREE - One reason people avoid gathering statistics inside a process is the time is takes. You can significantly improve the run time by setting the degree. Although this also uses significantly more resources.
NO_INVALIDATE - It can be tricky to know when exactly are the statistics "set" for a query. Gathering statistics usually quickly invalidates execution plans that were based on old statistics. But not always. If you want to be 100% sure that the next query is using the latest statistics you want to set NO_INVALIDATE=>FALSE.
ESTIMATE_PERCENT In 11g and above you definitely want to use the default, which uses a faster algorithm. In 10g and below you may need to set the value to something low to make the gathering fast enough.
Although Oracle 10g and above comes with default statistics gathering jobs you cannot rely on them for a few reasons:
They are scheduled and may not run at the right time. If a process significantly changes the data then new stats are needed right away, not at 10 PM. If there are a lot of tables that need to be analyzed the job may not get to them all in one day.
Many DBAs disable the jobs. This is ridiculous and almost always a mistake. But you'll find many DBAs that disabled the job because they think they can do it better. Instead of working with the auto tasks, and settings preferences, many DBAs like to throw the whole thing out and replace it with a custom procedure that rots over time.

Difference between count (*) and count (1) with join [duplicate]

Just wondering if any of you people use Count(1) over Count(*) and if there is a noticeable difference in performance or if this is just a legacy habit that has been brought forward from days gone past?
The specific database is SQL Server 2005.
There is no difference.
Reason:
Books on-line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"
"1" is a non-null expression: so it's the same as COUNT(*).
The optimizer recognizes it for what it is: trivial.
The same as EXISTS (SELECT * ... or EXISTS (SELECT 1 ...
Example:
SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID
SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID
Same IO, same plan, the works
Edit, Aug 2011
Similar question on DBA.SE.
Edit, Dec 2011
COUNT(*) is mentioned specifically in ANSI-92 (look for "Scalar expressions 125")
Case:
a) If COUNT(*) is specified, then the result is the cardinality of T.
That is, the ANSI standard recognizes it as bleeding obvious what you mean. COUNT(1) has been optimized out by RDBMS vendors because of this superstition. Otherwise it would be evaluated as per ANSI
b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-
In SQL Server, these statements yield the same plans.
Contrary to the popular opinion, in Oracle they do too.
SYS_GUID() in Oracle is quite computation intensive function.
In my test database, t_even is a table with 1,000,000 rows
This query:
SELECT COUNT(SYS_GUID())
FROM t_even
runs for 48 seconds, since the function needs to evaluate each SYS_GUID() returned to make sure it's not a NULL.
However, this query:
SELECT COUNT(*)
FROM (
SELECT SYS_GUID()
FROM t_even
)
runs for but 2 seconds, since it doen't even try to evaluate SYS_GUID() (despite * being argument to COUNT(*))
I work on the SQL Server team and I can hopefully clarify a few points in this thread (I had not seen it previously, so I am sorry the engineering team has not done so previously).
First, there is no semantic difference between select count(1) from table vs. select count(*) from table. They return the same results in all cases (and it is a bug if not). As noted in the other answers, select count(column) from table is semantically different and does not always return the same results as count(*).
Second, with respect to performance, there are two aspects that would matter in SQL Server (and SQL Azure): compilation-time work and execution-time work. The Compilation time work is a trivially small amount of extra work in the current implementation. There is an expansion of the * to all columns in some cases followed by a reduction back to 1 column being output due to how some of the internal operations work in binding and optimization. I doubt it would show up in any measurable test, and it would likely get lost in the noise of all the other things that happen under the covers (such as auto-stats, xevent sessions, query store overhead, triggers, etc.). It is maybe a few thousand extra CPU instructions. So, count(1) does a tiny bit less work during compilation (which will usually happen once and the plan is cached across multiple subsequent executions). For execution time, assuming the plans are the same there should be no measurable difference. (One of the earlier examples shows a difference - it is most likely due to other factors on the machine if the plan is the same).
As to how the plan can potentially be different. These are extremely unlikely to happen, but it is potentially possible in the architecture of the current optimizer. SQL Server's optimizer works as a search program (think: computer program playing chess searching through various alternatives for different parts of the query and costing out the alternatives to find the cheapest plan in reasonable time). This search has a few limits on how it operates to keep query compilation finishing in reasonable time. For queries beyond the most trivial, there are phases of the search and they deal with tranches of queries based on how costly the optimizer thinks the query is to potentially execute. There are 3 main search phases, and each phase can run more aggressive(expensive) heuristics trying to find a cheaper plan than any prior solution. Ultimately, there is a decision process at the end of each phase that tries to determine whether it should return the plan it found so far or should it keep searching. This process uses the total time taken so far vs. the estimated cost of the best plan found so far. So, on different machines with different speeds of CPUs it is possible (albeit rare) to get different plans due to timing out in an earlier phase with a plan vs. continuing into the next search phase. There are also a few similar scenarios related to timing out of the last phase and potentially running out of memory on very, very expensive queries that consume all the memory on the machine (not usually a problem on 64-bit but it was a larger concern back on 32-bit servers). Ultimately, if you get a different plan the performance at runtime would differ. I don't think it is remotely likely that the difference in compilation time would EVER lead to any of these conditions happening.
Net-net: Please use whichever of the two you want as none of this matters in any practical form. (There are far, far larger factors that impact performance in SQL beyond this topic, honestly).
I hope this helps. I did write a book chapter about how the optimizer works but I don't know if its appropriate to post it here (as I get tiny royalties from it still I believe). So, instead of posting that I'll post a link to a talk I gave at SQLBits in the UK about how the optimizer works at a high level so you can see the different main phases of the search in a bit more detail if you want to learn about that. Here's the video link: https://sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer
Clearly, COUNT(*) and COUNT(1) will always return the same result. Therefore, if one were slower than the other it would effectively be due to an optimiser bug. Since both forms are used very frequently in queries, it would make no sense for a DBMS to allow such a bug to remain unfixed. Hence you will find that the performance of both forms is (probably) identical in all major SQL DBMSs.
In the SQL-92 Standard, COUNT(*) specifically means "the cardinality of the table expression" (could be a base table, `VIEW, derived table, CTE, etc).
I guess the idea was that COUNT(*) is easy to parse. Using any other expression requires the parser to ensure it doesn't reference any columns (COUNT('a') where a is a literal and COUNT(a) where a is a column can yield different results).
In the same vein, COUNT(*) can be easily picked out by a human coder familiar with the SQL Standards, a useful skill when working with more than one vendor's SQL offering.
Also, in the special case SELECT COUNT(*) FROM MyPersistedTable;, the thinking is the DBMS is likely to hold statistics for the cardinality of the table.
Therefore, because COUNT(1) and COUNT(*) are semantically equivalent, I use COUNT(*).
COUNT(*) and COUNT(1) are same in case of result and performance.
I would expect the optimiser to ensure there is no real difference outside weird edge cases.
As with anything, the only real way to tell is to measure your specific cases.
That said, I've always used COUNT(*).
As this question comes up again and again, here is one more answer. I hope to add something for beginners wondering about "best practice" here.
SELECT COUNT(*) FROM something counts records which is an easy task.
SELECT COUNT(1) FROM something retrieves a 1 per record and than counts the 1s that are not null, which is essentially counting records, only more complicated.
Having said this: Good dbms notice that the second statement will result in the same count as the first statement and re-interprete it accordingly, as not to do unnecessary work. So usually both statements will result in the same execution plan and take the same amount of time.
However from the point of readability you should use the first statement. You want to count records, so count records, not expressions. Use COUNT(expression) only when you want to count non-null occurences of something.
I ran a quick test on SQL Server 2012 on an 8 GB RAM hyper-v box. You can see the results for yourself. I was not running any other windowed application apart from SQL Server Management Studio while running these tests.
My table schema:
CREATE TABLE [dbo].[employee](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Total number of records in Employee table: 178090131 (~ 178 million rows)
First Query:
Set Statistics Time On
Go
Select Count(*) From Employee
Go
Set Statistics Time Off
Go
Result of First Query:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 35 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 10766 ms, elapsed time = 70265 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Second Query:
Set Statistics Time On
Go
Select Count(1) From Employee
Go
Set Statistics Time Off
Go
Result of Second Query:
SQL Server parse and compile time:
CPU time = 14 ms, elapsed time = 14 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 11031 ms, elapsed time = 70182 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
You can notice there is a difference of 83 (= 70265 - 70182) milliseconds which can easily be attributed to exact system condition at the time queries are run. Also I did a single run, so this difference will become more accurate if I do several runs and do some averaging. If for such a huge data-set the difference is coming less than 100 milliseconds, then we can easily conclude that the two queries do not have any performance difference exhibited by the SQL Server Engine.
Note : RAM hits close to 100% usage in both the runs. I restarted SQL Server service before starting both the runs.
SET STATISTICS TIME ON
select count(1) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 36 ms.
select count(*) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 37 ms.
I've ran this hundreds of times, clearing cache every time.. The results vary from time to time as server load varies, but almost always count(*) has higher cpu time.
There is an article showing that the COUNT(1) on Oracle is just an alias to COUNT(*), with a proof about that.
I will quote some parts:
There is a part of the database software that is called “The
Optimizer”, which is defined in the official documentation as
“Built-in database software that determines the most efficient way to
execute a SQL statement“.
One of the components of the optimizer is called “the transformer”,
whose role is to determine whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement
that could be more efficient.
Would you like to see what the optimizer does when you write a query
using COUNT(1)?
With a user with ALTER SESSION privilege, you can put a tracefile_identifier, enable the optimizer tracing and run the COUNT(1) select, like: SELECT /* test-1 */ COUNT(1) FROM employees;.
After that, you need to localize the trace files, what can be done with SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace';. Later on the file, you will find:
SELECT COUNT(*) “COUNT(1)” FROM “COURSE”.”EMPLOYEES” “EMPLOYEES”
As you can see, it's just an alias for COUNT(*).
Another important comment: the COUNT(*) was really faster two decades ago on Oracle, before Oracle 7.3:
Count(1) has been rewritten in count(*) since 7.3 because Oracle like
to Auto-tune mythic statements. In earlier Oracle7, oracle had to
evaluate (1) for each row, as a function, before DETERMINISTIC and
NON-DETERMINISTIC exist.
So two decades ago, count(*) was faster
For another databases as Sql Server, it should be researched individually for each one.
I know that this question is specific for SQL Server, but the other questions on SO about the same subject (without mention a specific database) were closed and marked as duplicated from this answer.
In all RDBMS, the two ways of counting are equivalent in terms of what result they produce. Regarding performance, I have not observed any performance difference in SQL Server, but it may be worth pointing out that some RDBMS, e.g. PostgreSQL 11, have less optimal implementations for COUNT(1) as they check for the argument expression's nullability as can be seen in this post.
I've found a 10% performance difference for 1M rows when running:
-- Faster
SELECT COUNT(*) FROM t;
-- 10% slower
SELECT COUNT(1) FROM t;
COUNT(1) is not substantially different from COUNT(*), if at all. As to the question of COUNTing NULLable COLUMNs, this can be straightforward to demo the differences between COUNT(*) and COUNT(<some col>)--
USE tempdb;
GO
IF OBJECT_ID( N'dbo.Blitzen', N'U') IS NOT NULL DROP TABLE dbo.Blitzen;
GO
CREATE TABLE dbo.Blitzen (ID INT NULL, Somelala CHAR(1) NULL);
INSERT dbo.Blitzen SELECT 1, 'A';
INSERT dbo.Blitzen SELECT NULL, NULL;
INSERT dbo.Blitzen SELECT NULL, 'A';
INSERT dbo.Blitzen SELECT 1, NULL;
SELECT COUNT(*), COUNT(1), COUNT(ID), COUNT(Somelala) FROM dbo.Blitzen;
GO
DROP TABLE dbo.Blitzen;
GO

SELECT * FROM TABLE(pipelined function): can I be sure of the order of the rows in the result?

In the following example, will I always get “1, 2”, or is it possible to get “2, 1” and can you tell me where in the documentation you see that guarantee if it exists?
If the answer is yes, it means that without ORDER BY nor ORDER SIBLINGS there is a way to be sure of the result set order in a SELECT statement.
CREATE TYPE temp_row IS OBJECT(x number);
/
CREATE TYPE temp_table IS TABLE OF temp_row;
/
CREATE FUNCTION temp_func
RETURN temp_table PIPELINED
IS
BEGIN
PIPE ROW(temp_row(1));
PIPE ROW(temp_row(2));
END;
/
SELECT * FROM table(temp_func());
Thank you.
I don't think that there's anywhere in the documentation that guarantees the order that data will be returned in.
There's an old Tom Kyte thread from 2003 (so might be out of date) which states that relying on the implicit order would not be advisable, for the same reasons as you would not rely on the order in ordinary SQL.
1st: is the order of rows returned from the table function within a
SQL statement the exact same order in which the entries were "piped"
into the internal collection (so that no order by clause is needed)?
...
Followup May 18, 2003 - 10am UTC:
1) maybe, maybe not, I would not count on it. You should not count
on the order of rows in a result set without having an order by. If
you join or do something more complex then simply "select * from
table( f(x) )", the rows could well come back in some other order.
empirically -- they appear to come back as they are piped. I do not
believe it is documented that this is so.
In fact, collections of type NESTED TABLE are documented to explicitly
not have the ability to preserve order.
To be safe, you should do as you always would in a query, state an explicit ORDER BY, if you want the query results ordered.
Having said that I've taken your function and run 10 million iterations, to check whether the implicit order was ever broken; it wasn't.
SQL> begin
2 for i in 1 .. 10000000 loop
3 for j in ( SELECT a.*, rownum as rnum FROM table(temp_func()) a ) loop
4
5 if j.x <> j.rnum then
6 raise_application_error(-20000,'It broke');
7 end if;
8 end loop;
9 end loop;
10 end;
11 /
PL/SQL procedure successfully completed.
This procedural logic works differently to table-based queries. The reason that you cannot rely on orders in a select from a table is that you cannot rely on the order in which the RDBMS will identify rows as part of the required set. This is partly because of execution plans changing, and partly because there are very few situations in which the physical order of rows in a table is predictable.
However here you are selecting from a function that does guarantee the order in which the rows are emitted from the function. In the absence of joins, aggregations, or just about anything else (ie. for a straight "select ... from table(function)") I would be pretty certain that the row order is deterministic.
That advice does not apply where there is a table involved unless there is an explicit order-by, so if you load your pl/sql collection from a query that does not use an order-by then of course the order of rows in the collection is not deterministic.
The AskTom link in accepted answer is broken at this time but I found newer yet very similar question. After some "misunderstanding ping-pong", Connor McDonald finally admits the ordering is stable under certain conditions involving parallelism and ref cursors and related only to current releases. Citation:
Parallelism is the (potential) risk here.
As it currently stands, a pipelined function can be run in parallel only if it takes a ref cursor as input. There is of course no guarantee that this will not change in future.
So you could run on the assumption that in current releases you will get the rows back in order, but you could never 100% rely on it being the case now and forever more.
So no guarantee is given for future releases.
The function in question would pass this criterion hence it should provide stable ordering. However, I wouldn't personally trust it. My case (when I found this question) was even simpler: selecting from collection specified literally - select column_value from table(my_collection(5,3,7,2)) and I preferred explicit pairing between data and index anyway. It's not so hard and not much more longer.
Oracle should learn from Postgres where this situation is solved by unnest(array) with ordinality which is clearly understandable, trustworthy and well-documented feature.

How to find the cost of a stored procedure in Oracle and optimize it

Can anybody let me know if there is any way to find out cost of a stored procedure in Oracle? If no direct way is there, I would like to know any substitutes.
The way I found the cost is doing an auto trace of all the queries used in the stored procedure and then estimate the proc cost according to the frequency of the queries execution.
In addition to that I would like suggestions to optimize my stored procedure especially the query given below.
Logic of the procedure:
Below is the dynamic sql query used as a cursor in my stored procedure. This cursor is opened and fetched inside a loop. I fetch the info and put them in a varray, count the data and then insert it to a table.
My objective is to find out the cost of the proc as well as optimize the sp.
SELECT DISTINCT acct_no
FROM raw
WHERE 1=1
AND code = ''' || code ||
''' AND qty < 0
AND acct_no
IN (SELECT acct_no FROM ' || table_name || ' WHERE counter =
(SELECT MAX(counter) FROM ' || table_name || '))
One of the best tool in analyzing SQL and PLSQL performance is the native SQL trace.
enable tracing in your session:
SQL> alter session set SQL_TRACE=TRUE;
Session altered
Run your procedure
Exit your session
Navigate to your server udump directory and find your trace file (usually the latest)
Run tkprof
This will produce a file containing a list of all statements with lots of information, including the number of times each was executed, its query plan and statistics. This is more detailed and precise than manually running the plan for each select.
If you want to optimize performance on a procedure, you would usually sort the trace file by the time taken to execute (with sort=EXEELA) or fetch SQL and try to optimize the queries that make the most work.
You can also make the trace file log wait events by using the following command at step 1:
ALTER SESSION SET EVENTS '10046 trace name context forever, level 8';
The way to find out the cost (in execution of time) for a stored procedure is to employ a profiler. 11g introduced the Hierarchical Profiler which is highly neat. Find out more.
Prior to 11g there was only the DBMS_PROFILER, which is good enough, especially if your stored procedure doesn't use objects in other schemas. Find out more.
Trace is good for identifying poorly performing SQL. Profilers are good for identifying the cost of the PL/SQL elements of a stored proc. If your proc has some expensive computation elements which don't read or write to tables then that won't show up in SQL trace.
Likewise if you have a well-tuned SQL statement but use it badly ia profiler run is likely to be more help than trace. An example of what I mean is repeatedly executing the same SELECT statement inside a Cursor loop: I know that's not quite what you're doing but it's close enough.
Apparently the hierarchical profiler DBMS_HPROF is installed by default in 11g but a DBA has to grant some privileges to developers who want to use it. Find out more.
To install the DBMS_PROFILER in 10g (or earlier) a DBA has to run this script:
$ORACLE_HOME/rdbms/admin/proftab.sql
Be sure to get the reporting infrastructure as well:
$ORACLE_HOME/plsql/demo/profsum.sql
(The name or location of this script may vary in earlier versions).
The easy way is to execute the procedure and then query v$sql.
if you want a little tip to make your life easier (not just for packages) add a blank comment to the query inside the procedure, something like
select /* BIG DADDY */ * from dual;
and then query v$sql as follows
select * from v$sql where sql_text like '%BIG DADDY%';
the best way is definitely the way #Vincent Malgrat suggested.
good luck.

Sql vs Oracle - profiler [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
i work with sql server, but i must migrate to an application with Oracle DB.
for trace my application queries, in Sql Server i use wonderful Profiler tool. is there something of equivalent for Oracle?
I found an easy solution
Step1. connect to DB with an admin user using PLSQL or sqldeveloper or any other query interface
Step2. run the script bellow; in the S.SQL_TEXT column, you will see the executed queries
SELECT
S.LAST_ACTIVE_TIME,
S.MODULE,
S.SQL_FULLTEXT,
S.SQL_PROFILE,
S.EXECUTIONS,
S.LAST_LOAD_TIME,
S.PARSING_USER_ID,
S.SERVICE
FROM
SYS.V_$SQL S,
SYS.ALL_USERS U
WHERE
S.PARSING_USER_ID=U.USER_ID
AND UPPER(U.USERNAME) IN ('oracle user name here')
ORDER BY TO_DATE(S.LAST_LOAD_TIME, 'YYYY-MM-DD/HH24:MI:SS') desc;
The only issue with this is that I can't find a way to show the input parameters values(for function calls), but at least we can see what is ran in Oracle and the order of it without using a specific tool.
You can use The Oracle Enterprise Manager to monitor the active sessions, with the query that is being executed, its execution plan, locks, some statistics and even a progress bar for the longer tasks.
See: http://download.oracle.com/docs/cd/B10501_01/em.920/a96674/db_admin.htm#1013955
Go to Instance -> sessions and watch the SQL Tab of each session.
There are other ways. Enterprise manager just puts with pretty colors what is already available in specials views like those documented here:
http://www.oracle.com/pls/db92/db92.catalog_views?remark=homepage
And, of course you can also use Explain PLAN FOR, TRACE tool and tons of other ways of instrumentalization. There are some reports in the enterprise manager for the top most expensive SQL Queries. You can also search recent queries kept on the cache.
alter system set timed_statistics=true
--or
alter session set timed_statistics=true --if want to trace your own session
-- must be big enough:
select value from v$parameter p
where name='max_dump_file_size'
-- Find out sid and serial# of session you interested in:
select sid, serial# from v$session
where ...your_search_params...
--you can begin tracing with 10046 event, the fourth parameter sets the trace level(12 is the biggest):
begin
sys.dbms_system.set_ev(sid, serial#, 10046, 12, '');
end;
--turn off tracing with setting zero level:
begin
sys.dbms_system.set_ev(sid, serial#, 10046, 0, '');
end;
/*possible levels:
0 - turned off
1 - minimal level. Much like set sql_trace=true
4 - bind variables values are added to trace file
8 - waits are added
12 - both bind variable values and wait events are added
*/
--same if you want to trace your own session with bigger level:
alter session set events '10046 trace name context forever, level 12';
--turn off:
alter session set events '10046 trace name context off';
--file with raw trace information will be located:
select value from v$parameter p
where name='user_dump_dest'
--name of the file(*.trc) will contain spid:
select p.spid from v$session s, v$process p
where s.paddr=p.addr
and ...your_search_params...
--also you can set the name by yourself:
alter session set tracefile_identifier='UniqueString';
--finally, use TKPROF to make trace file more readable:
C:\ORACLE\admin\databaseSID\udump>
C:\ORACLE\admin\databaseSID\udump>tkprof my_trace_file.trc output=my_file.prf
TKPROF: Release 9.2.0.1.0 - Production on Wed Sep 22 18:05:00 2004
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
C:\ORACLE\admin\databaseSID\udump>
--to view state of trace file use:
set serveroutput on size 30000;
declare
ALevel binary_integer;
begin
SYS.DBMS_SYSTEM.Read_Ev(10046, ALevel);
if ALevel = 0 then
DBMS_OUTPUT.Put_Line('sql_trace is off');
else
DBMS_OUTPUT.Put_Line('sql_trace is on');
end if;
end;
/
Just kind of translated http://www.sql.ru/faq/faq_topic.aspx?fid=389 Original is fuller, but anyway this is better than what others posted IMHO
GI Oracle Profiler v1.2
It's a Tools for Oracle to capture queries executed similar to the SQL Server Profiler.
Indispensable tool for the maintenance of applications that use this database server.
you can download it from the official site iacosoft.com
Try PL/SQL Developer it has a nice user friendly GUI interface to the profiler. It's pretty nice give the trial a try. I swear by this tool when working on Oracle databases.
http://www.allroundautomations.com/plsqldev.html?gclid=CM6pz8e04p0CFQjyDAodNXqPDw
Seeing as I've just voted a recent question as a duplicate and pointed in this direction . . .
A couple more - in SQL*Plus - SET AUTOTRACE ON - will give explain plan and statistics for each statement executed.
TOAD also allows for client side profiling.
The disadvantage of both of these is that they only tell you the execution plan for the statement, but not how the optimiser arrived at that plan - for that you will need lower level server side tracing.
Another important one to understand is Statspack snapshots - they are a good way for looking at the performance of the database as a whole. Explain plan, etc, are good at finding individual SQL statements that are bottlenecks. Statspack is good at identifying the fact your problem is that a simple statement with a good execution plan is being called 1 million times in a minute.
The Catch is Capture all SQL run between two points in time. Like the way SQL Server also does.
There are situations where it is useful to capture the SQL that a particular user is running in the database. Usually you would simply enable session tracing for that user, but there are two potential problems with that approach.
The first is that many web based applications maintain a pool of persistent database connections which are shared amongst multiple users.
The second is that some applications connect, run some SQL and disconnect very quickly, making it tricky to enable session tracing at all (you could of course use a logon trigger to enable session tracing in this case).
A quick and dirty solution to the problem is to capture all SQL statements that are run between two points in time.
The following procedure will create two tables, each containing a snapshot of the database at a particular point. The tables will then be queried to produce a list of all SQL run during that period.
If possible, you should do this on a quiet development system - otherwise you risk getting way too much data back.
Take the first snapshot
Run the following sql to create the first snapshot:
create table sql_exec_before as
select executions,hash_value
from v$sqlarea
/
Get the user to perform their task within the application.
Take the second snapshot.
create table sql_exec_after as
select executions, hash_value
from v$sqlarea
/
Check the results
Now that you have captured the SQL it is time to query the results.
This first query will list all query hashes that have been executed:
select aft.hash_value
from sql_exec_after aft
left outer join sql_exec_before bef
on aft.hash_value = bef.hash_value
where aft.executions > bef.executions
or bef.executions is null;
/
This one will display the hash and the SQL itself:
set pages 999 lines 100
break on hash_value
select hash_value, sql_text
from v$sqltext
where hash_value in (
select aft.hash_value
from sql_exec_after aft
left outer join sql_exec_before bef
on aft.hash_value = bef.hash_value
where aft.executions > bef.executions
or bef.executions is null;
)
order by
hash_value, piece
/
5.
Tidy up Don't forget to remove the snapshot tables once you've finished:
drop table sql_exec_before
/
drop table sql_exec_after
/
Oracle, along with other databases, analyzes a given query to create an execution plan. This plan is the most efficient way of retrieving the data.
Oracle provides the 'explain plan' statement which analyzes the query but doesn't run it, instead populating a special table that you can query (the plan table).
The syntax (simple version, there are other options such as to mark the rows in the plan table with a special ID, or use a different plan table) is:
explain plan for <sql query>
The analysis of that data is left for another question, or your further research.
There is a commercial tool FlexTracer which can be used to trace Oracle SQL queries
This is an Oracle doc explaining how to trace SQL queries, including a couple of tools (SQL Trace and tkprof)
link
Apparently there is no small simple cheap utility that would help performing this task. There is however 101 way to do it in a complicated and inconvenient manner.
Following article describes several. There are probably dozens more...
http://www.petefinnigan.com/ramblings/how_to_set_trace.htm

Resources