Memory and CPU Utilization of an Oracle Query - oracle

Assuming I execute a query like the one below in Oracle;
SELECT t1.check_in,t1.check_out, t1.room_id,t2.room_name
FROM mydb.reservation_tb as t1
LEFT JOIN mydb.rooms_tb as t2 on t1.room_id = t2.room_id
WHERE t1.check_in = '2012-08-01' and t1.check_out = '2012-08-08' and t1.room_id = 12
I want to know if there is a way I can determine the amount of Memory and CPU time that were utilized by this query when it executed in ORACLE.

The below queries can find the CPU and memory for a query.
The dynamic performance views are tricky and can take a while to get used to. They may contain results from multiple runs and they will age out over time.
--1. First run the query, if it hasn't already been run.
select ...
--2. Find the SQL_ID by searching for some unique text in the query:
select sql_id, sql_fulltext from gv$sql where lower(sql_fulltext) like '%unique string%';
--3. Find the amount of CPU time used by this query.
--GV$SQL is based on the shared_pool, it may contain the time for multiple runs
--in the "recent past", usually the past few hours or days.
--The data can be flushed with: alter system flush shared_pool;
select cpu_time/1000000 cpu_seconds, gv$sql.*
from gv$sql
where sql_id = 'f6krg7hkt08k2';
--4. The memory and temporary tablespace used by a query.
--This only counts memory and temporary tablespace used for things like sorting and hashing.
--There's no easy way to tell how a query is affecting the buffer cache.
--(Calculating for a parallel query may require different aggregation.)
--GV$ACTIVE_SESSION_HISTORY also contains temporary samples and will usually only
--have data for the past few hours or days.
select
max(pga_allocated/1024/1024/1024) max_pga_allocated_gb,
max(temp_space_allocated/1024/1024/1024) max_temp_space_allocated_gb
from gv$active_session_history
where sql_id = 'f6krg7hkt08k2'
Also I recommend you change '2012-08-01' to date '2012-08-01'. Without the keyword date the query is using a date format that depends on the user's session settings. With the date keyword the string is now a date literal and conforms to ISO 8601 and will work on any client.

Related

Extract sql_id from hash_value

I have sql hash value of a query.
I don't have the sqltext, plan hash or sql_id - only the hash value of the sql.
When I'm querying the v$sql view there are no result (I guess the sql is not in the memory already):
select * from v$sql where hash_value = 'hv_Example';
Is there a way to achieve the sql_id from the hash_value?
I didn't found a view that holds the hash value and the sql_id togather...
V$SQL contains both HASH_VALUE and SQL_ID, but not forever.
According to Tanel Poder's blog, the HASH_VALUE can be derived from the SQL_ID, but the inverse operation is not possible. But converting HASH_VALUE to SQL_ID wouldn't solve your problem anyway.
First of all, I'm not sure why you would have access to the HASH_VALUE but not the SQL_ID. The SQL_ID is much more commonly used for performance tuning. Whatever program or source is providing the HASH_VALUE should be modified to return the SQL_ID instead.
But the real problem here, whichever value you have, is that data is not permanently stored in V$SQL. As statements get old they are aged out of the shared pool, and disappear from V$SQL. Also, if you're using a clustered system, make sure you're using GV$SQL.
If you are using Enterprise Edition, and have licensed the Oracle Diagnostic Pack, you may be able to find historical SQL_ID values with one of these two statements:
select sql_id
from gv$active_session_history
where sql_full_plan_hash_value = 'XYZ';
select sql_id
from dba_hist_active_sess_history
where sql_full_plan_hash_value = 'XYZ';
Data in GV$ACTIVE_SESSION_HISTORY typically lasts for about a day. Data in DBA_HIST_* stays for 8 days by default, but is configurable. You can check the AWR retention with this query: select retention from dba_hist_wr_control;
If the HASH_VALUE is less old than the retention period, but still not in AWR, then you can probably ignore it. ASH and AWR use sampling, which means they do not capture everything that happens in a database. ASH takes a sample every second and AWR takes a sample every 10 seconds.
A lot of people struggle with this concept but it's helpful to come to terms with the idea of sampling. Systems that capture all of the information, like tracing everything, are overwhelming and use up too much space and processing power. If we're looking at performance problems then we do not care about something that doesn't happen for more than 10 seconds out of 8 days.

SP using table/index with volatile statistics that differ at compile and run time

I’m a longtime MSSQL developer who finds himself back in PL/SQL for the first time since Oracle 7. I’m looking for some tuning advice re a large export stored procedure, which is sporadically and not very reproducably running slow at certain points. This happens around some static working tables which it truncates, fills and uses as part of the export. The code in outline typically looks like this:
create or replace Procedure BigMultiPurposeExport as (
-- about 2000 lines of other code
INSERT WORK_TABLE_5 SELECT WHATEVER1 FROM WHEREVER1;
INSERT WORK_TABLE_5 SELECT WHATEVER2 FROM WHEREVER2;
INSERT WORK_TABLE_5 SELECT WHATEVER3 FROM WHEREVER3;
INSERT WORK_TABLE_5 SELECT WHATEVER4 FROM WHEREVER4;
-- WORK_TABLE_5 now has 0 to ~500k rows whose content can vary drastically from run to run
-- e.g. one hourly run exports 3 whale sightings, next exports all tourist visits to Kenya this decade
-- about 1000 lines of other code
INSERT OUTPUT_TABLE_3
SELECT THIS, THAT, THE_OTHER
FROM BUSINESS_TABLE_1 BT1
INNER JOIN BUSINESS_TABLE_2 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_3 ON etc -- typical join on indexed columns
INNER JOIN BUSINESS_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_1 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_2 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_3 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_4 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_5 WT5 ON BT1.ID = WT5.BT1_ID AND WT5.RECORD_TYPE = 21
-- join above is now supported by indexes on BUSINESS_TABLE_1 (ID) and WORK_TABLE_5 (BT1_ID, RECORD_TYPE), originally wasn't
LEFT OUTER JOIN WORK_TABLE_6 ON etc -- typical join on indexed columns
LEFT OUTER JOIN WORK_TABLE_7 ON etc -- typical join on indexed columns
-- about 4000 lines of other code
)
That final insert into OUTPUT_TABLE_3 usually runs in under 10 seconds, but once in a while on certain customer servers it times out at our default 99 minutes. Then we have them take the tiemout off and run it on Friday night, and it finishes but takes 16 hours.
I narrowed the problem down to the join to WORK_TABLE_5, which had no index support, and put an index on the join terms. The next run took 4 seconds. But success has been intermittent, the customer occasionally gets some slow runs when they drastically change their export selection (i.e. drastically change the data in WORK_TABLE_5). And if we update statistics and rebuild indexes after a timed out export, it runs fine at the next attempt.
So, I am wondering about how best to handle truncating/filling static work tables with static indexes, statistics updated overnight, and a stored procedure compiled when the statistics are nothing like runtime.
I have a few general questions about things I'd like to understand better:
Is the nature of the data in the work table going to substantially effect the query plan? Does Oracle form its query plan when you compile the stored procedure? Could we get a highly inappropriate query plan if we compile the stored procedure with the table empty then use a table with 500k rows at runtime?
I expect that if this were an ad-hoc script then updating statistics on the problem table just before selecting from it would eliminate the sporadic slowdowns. But what if I were to update statistics inside the stored procedure, which is compiled with different statistics from runtime?
Anything else you'd like to add...
Thanks for any advice. I hope my MSSQL preconceptions haven't made me too far off base.
This is happening in Oracle 11g, but the code is deployed to assorted customers using Oracle 10 through 12 and I'd like to cater to all of those if possible.
-- Joel
Huge differences in table or index sizes can most definitely cause performance problems. The solution is to add statistics gathering to the procedure instead of relying on the default statistics jobs.
If you've been away from Oracle since version 7, the most important new feature is the Cost Based Optimizer. Oracle now builds query execution plans based on the optimizer statistics of tables, indexes, columns, expressions, system statistics, outlines, directives, dynamic sampling, etc. If you're a full time Oracle developer you should probably spend a day reading about optimizer statistics. Start with Managing Optimizer Statistics and DBMS_STATS in the official documentation.
Eventually the stored procedure should look like this:
--1: Insert into working tables.
insert into work_table...
--2: Gather statistics on working tables.
dbms_stats.gather_table_stats('SCHEMA_NAME', 'WORK_TABLE', ...);
--3: Use working tables.
insert into other_table select * from work_table...
There are so many statistics features it's hard to know exactly what parameters to use in that second step above. Here are some guesses about some features you might find useful:
DEGREE - One reason people avoid gathering statistics inside a process is the time is takes. You can significantly improve the run time by setting the degree. Although this also uses significantly more resources.
NO_INVALIDATE - It can be tricky to know when exactly are the statistics "set" for a query. Gathering statistics usually quickly invalidates execution plans that were based on old statistics. But not always. If you want to be 100% sure that the next query is using the latest statistics you want to set NO_INVALIDATE=>FALSE.
ESTIMATE_PERCENT In 11g and above you definitely want to use the default, which uses a faster algorithm. In 10g and below you may need to set the value to something low to make the gathering fast enough.
Although Oracle 10g and above comes with default statistics gathering jobs you cannot rely on them for a few reasons:
They are scheduled and may not run at the right time. If a process significantly changes the data then new stats are needed right away, not at 10 PM. If there are a lot of tables that need to be analyzed the job may not get to them all in one day.
Many DBAs disable the jobs. This is ridiculous and almost always a mistake. But you'll find many DBAs that disabled the job because they think they can do it better. Instead of working with the auto tasks, and settings preferences, many DBAs like to throw the whole thing out and replace it with a custom procedure that rots over time.

Difference between count (*) and count (1) with join [duplicate]

Just wondering if any of you people use Count(1) over Count(*) and if there is a noticeable difference in performance or if this is just a legacy habit that has been brought forward from days gone past?
The specific database is SQL Server 2005.
There is no difference.
Reason:
Books on-line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"
"1" is a non-null expression: so it's the same as COUNT(*).
The optimizer recognizes it for what it is: trivial.
The same as EXISTS (SELECT * ... or EXISTS (SELECT 1 ...
Example:
SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID
SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID
Same IO, same plan, the works
Edit, Aug 2011
Similar question on DBA.SE.
Edit, Dec 2011
COUNT(*) is mentioned specifically in ANSI-92 (look for "Scalar expressions 125")
Case:
a) If COUNT(*) is specified, then the result is the cardinality of T.
That is, the ANSI standard recognizes it as bleeding obvious what you mean. COUNT(1) has been optimized out by RDBMS vendors because of this superstition. Otherwise it would be evaluated as per ANSI
b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-
In SQL Server, these statements yield the same plans.
Contrary to the popular opinion, in Oracle they do too.
SYS_GUID() in Oracle is quite computation intensive function.
In my test database, t_even is a table with 1,000,000 rows
This query:
SELECT COUNT(SYS_GUID())
FROM t_even
runs for 48 seconds, since the function needs to evaluate each SYS_GUID() returned to make sure it's not a NULL.
However, this query:
SELECT COUNT(*)
FROM (
SELECT SYS_GUID()
FROM t_even
)
runs for but 2 seconds, since it doen't even try to evaluate SYS_GUID() (despite * being argument to COUNT(*))
I work on the SQL Server team and I can hopefully clarify a few points in this thread (I had not seen it previously, so I am sorry the engineering team has not done so previously).
First, there is no semantic difference between select count(1) from table vs. select count(*) from table. They return the same results in all cases (and it is a bug if not). As noted in the other answers, select count(column) from table is semantically different and does not always return the same results as count(*).
Second, with respect to performance, there are two aspects that would matter in SQL Server (and SQL Azure): compilation-time work and execution-time work. The Compilation time work is a trivially small amount of extra work in the current implementation. There is an expansion of the * to all columns in some cases followed by a reduction back to 1 column being output due to how some of the internal operations work in binding and optimization. I doubt it would show up in any measurable test, and it would likely get lost in the noise of all the other things that happen under the covers (such as auto-stats, xevent sessions, query store overhead, triggers, etc.). It is maybe a few thousand extra CPU instructions. So, count(1) does a tiny bit less work during compilation (which will usually happen once and the plan is cached across multiple subsequent executions). For execution time, assuming the plans are the same there should be no measurable difference. (One of the earlier examples shows a difference - it is most likely due to other factors on the machine if the plan is the same).
As to how the plan can potentially be different. These are extremely unlikely to happen, but it is potentially possible in the architecture of the current optimizer. SQL Server's optimizer works as a search program (think: computer program playing chess searching through various alternatives for different parts of the query and costing out the alternatives to find the cheapest plan in reasonable time). This search has a few limits on how it operates to keep query compilation finishing in reasonable time. For queries beyond the most trivial, there are phases of the search and they deal with tranches of queries based on how costly the optimizer thinks the query is to potentially execute. There are 3 main search phases, and each phase can run more aggressive(expensive) heuristics trying to find a cheaper plan than any prior solution. Ultimately, there is a decision process at the end of each phase that tries to determine whether it should return the plan it found so far or should it keep searching. This process uses the total time taken so far vs. the estimated cost of the best plan found so far. So, on different machines with different speeds of CPUs it is possible (albeit rare) to get different plans due to timing out in an earlier phase with a plan vs. continuing into the next search phase. There are also a few similar scenarios related to timing out of the last phase and potentially running out of memory on very, very expensive queries that consume all the memory on the machine (not usually a problem on 64-bit but it was a larger concern back on 32-bit servers). Ultimately, if you get a different plan the performance at runtime would differ. I don't think it is remotely likely that the difference in compilation time would EVER lead to any of these conditions happening.
Net-net: Please use whichever of the two you want as none of this matters in any practical form. (There are far, far larger factors that impact performance in SQL beyond this topic, honestly).
I hope this helps. I did write a book chapter about how the optimizer works but I don't know if its appropriate to post it here (as I get tiny royalties from it still I believe). So, instead of posting that I'll post a link to a talk I gave at SQLBits in the UK about how the optimizer works at a high level so you can see the different main phases of the search in a bit more detail if you want to learn about that. Here's the video link: https://sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer
Clearly, COUNT(*) and COUNT(1) will always return the same result. Therefore, if one were slower than the other it would effectively be due to an optimiser bug. Since both forms are used very frequently in queries, it would make no sense for a DBMS to allow such a bug to remain unfixed. Hence you will find that the performance of both forms is (probably) identical in all major SQL DBMSs.
In the SQL-92 Standard, COUNT(*) specifically means "the cardinality of the table expression" (could be a base table, `VIEW, derived table, CTE, etc).
I guess the idea was that COUNT(*) is easy to parse. Using any other expression requires the parser to ensure it doesn't reference any columns (COUNT('a') where a is a literal and COUNT(a) where a is a column can yield different results).
In the same vein, COUNT(*) can be easily picked out by a human coder familiar with the SQL Standards, a useful skill when working with more than one vendor's SQL offering.
Also, in the special case SELECT COUNT(*) FROM MyPersistedTable;, the thinking is the DBMS is likely to hold statistics for the cardinality of the table.
Therefore, because COUNT(1) and COUNT(*) are semantically equivalent, I use COUNT(*).
COUNT(*) and COUNT(1) are same in case of result and performance.
I would expect the optimiser to ensure there is no real difference outside weird edge cases.
As with anything, the only real way to tell is to measure your specific cases.
That said, I've always used COUNT(*).
As this question comes up again and again, here is one more answer. I hope to add something for beginners wondering about "best practice" here.
SELECT COUNT(*) FROM something counts records which is an easy task.
SELECT COUNT(1) FROM something retrieves a 1 per record and than counts the 1s that are not null, which is essentially counting records, only more complicated.
Having said this: Good dbms notice that the second statement will result in the same count as the first statement and re-interprete it accordingly, as not to do unnecessary work. So usually both statements will result in the same execution plan and take the same amount of time.
However from the point of readability you should use the first statement. You want to count records, so count records, not expressions. Use COUNT(expression) only when you want to count non-null occurences of something.
I ran a quick test on SQL Server 2012 on an 8 GB RAM hyper-v box. You can see the results for yourself. I was not running any other windowed application apart from SQL Server Management Studio while running these tests.
My table schema:
CREATE TABLE [dbo].[employee](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Total number of records in Employee table: 178090131 (~ 178 million rows)
First Query:
Set Statistics Time On
Go
Select Count(*) From Employee
Go
Set Statistics Time Off
Go
Result of First Query:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 35 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 10766 ms, elapsed time = 70265 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
Second Query:
Set Statistics Time On
Go
Select Count(1) From Employee
Go
Set Statistics Time Off
Go
Result of Second Query:
SQL Server parse and compile time:
CPU time = 14 ms, elapsed time = 14 ms.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 11031 ms, elapsed time = 70182 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
You can notice there is a difference of 83 (= 70265 - 70182) milliseconds which can easily be attributed to exact system condition at the time queries are run. Also I did a single run, so this difference will become more accurate if I do several runs and do some averaging. If for such a huge data-set the difference is coming less than 100 milliseconds, then we can easily conclude that the two queries do not have any performance difference exhibited by the SQL Server Engine.
Note : RAM hits close to 100% usage in both the runs. I restarted SQL Server service before starting both the runs.
SET STATISTICS TIME ON
select count(1) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 36 ms.
select count(*) from MyTable (nolock) -- table containing 1 million records.
SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 37 ms.
I've ran this hundreds of times, clearing cache every time.. The results vary from time to time as server load varies, but almost always count(*) has higher cpu time.
There is an article showing that the COUNT(1) on Oracle is just an alias to COUNT(*), with a proof about that.
I will quote some parts:
There is a part of the database software that is called “The
Optimizer”, which is defined in the official documentation as
“Built-in database software that determines the most efficient way to
execute a SQL statement“.
One of the components of the optimizer is called “the transformer”,
whose role is to determine whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement
that could be more efficient.
Would you like to see what the optimizer does when you write a query
using COUNT(1)?
With a user with ALTER SESSION privilege, you can put a tracefile_identifier, enable the optimizer tracing and run the COUNT(1) select, like: SELECT /* test-1 */ COUNT(1) FROM employees;.
After that, you need to localize the trace files, what can be done with SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace';. Later on the file, you will find:
SELECT COUNT(*) “COUNT(1)” FROM “COURSE”.”EMPLOYEES” “EMPLOYEES”
As you can see, it's just an alias for COUNT(*).
Another important comment: the COUNT(*) was really faster two decades ago on Oracle, before Oracle 7.3:
Count(1) has been rewritten in count(*) since 7.3 because Oracle like
to Auto-tune mythic statements. In earlier Oracle7, oracle had to
evaluate (1) for each row, as a function, before DETERMINISTIC and
NON-DETERMINISTIC exist.
So two decades ago, count(*) was faster
For another databases as Sql Server, it should be researched individually for each one.
I know that this question is specific for SQL Server, but the other questions on SO about the same subject (without mention a specific database) were closed and marked as duplicated from this answer.
In all RDBMS, the two ways of counting are equivalent in terms of what result they produce. Regarding performance, I have not observed any performance difference in SQL Server, but it may be worth pointing out that some RDBMS, e.g. PostgreSQL 11, have less optimal implementations for COUNT(1) as they check for the argument expression's nullability as can be seen in this post.
I've found a 10% performance difference for 1M rows when running:
-- Faster
SELECT COUNT(*) FROM t;
-- 10% slower
SELECT COUNT(1) FROM t;
COUNT(1) is not substantially different from COUNT(*), if at all. As to the question of COUNTing NULLable COLUMNs, this can be straightforward to demo the differences between COUNT(*) and COUNT(<some col>)--
USE tempdb;
GO
IF OBJECT_ID( N'dbo.Blitzen', N'U') IS NOT NULL DROP TABLE dbo.Blitzen;
GO
CREATE TABLE dbo.Blitzen (ID INT NULL, Somelala CHAR(1) NULL);
INSERT dbo.Blitzen SELECT 1, 'A';
INSERT dbo.Blitzen SELECT NULL, NULL;
INSERT dbo.Blitzen SELECT NULL, 'A';
INSERT dbo.Blitzen SELECT 1, NULL;
SELECT COUNT(*), COUNT(1), COUNT(ID), COUNT(Somelala) FROM dbo.Blitzen;
GO
DROP TABLE dbo.Blitzen;
GO

Oracle scalar function in WHERE clause leads to poor performance

I've written a scalar function (DYNAMIC_DATE) that converts a text value to a date/time. For example, DYANMIC_DATE('T-1') (T-1 = today minus 1 = 'yesterday') returns 08-AUG-2012 00:00:00. It also accepts date strings: DYNAMIC_DATE('10/10/1974').
The function makes use of CASE statements to parse the sole parameter and calculate a date relative to sysdate.
While it doesn't make use of any table in its schema, it does make use of TABLE type to store date-format strings:
TYPE VARCHAR_TABLE IS TABLE OF VARCHAR2(10);
formats VARCHAR_TABLE := VARCHAR_TABLE ('mm/dd/rrrr','mm-dd-rrrr','rrrr/mm/dd','rrrr-mm-dd');
When I use the function in the SELECT clause, the query returns in < 1 second:
SELECT DYNAMIC_DATE('MB-1') START_DATE, DYNAMIC_DATE('ME-1') END_DATE
FROM DUAL
If I use it against our date dimension table (91311 total records), the query completes in < 1 second:
SELECT count(1)
from date_dimension
where calendar_dt between DYNAMIC_DATE('MB-1') and DYNAMIC_DATE('ME-1')
Others, however, are having problems with the function if it is used against a larger table (26,301,317 records):
/*
cost: 148,840
records: 151,885
time: ~20 minutes
*/
SELECT count(1)
FROM ORDERS ord
WHERE trunc(ord.ordering_date) between DYNAMIC_DATE('mb-1') and DYNAMIC_DATE('me-1')
However, the same query, using 'hard coded' dates, returns fairly rapidly:
/*
cost: 144,257
records: 151,885
time: 62 seconds
*/
SELECT count(1)
FROM ORDERS ord
WHERE trunc(ord.ordering_date) between to_date('01-JUL-2012','dd-mon-yyyy') AND to_date('31-JUL-2012','dd-mon-yyyy')
The vendor's vanilla installation doesn't include an index on the ORDERING_DATE field.
The explain plans for both queries are similar:
with function:
with hard-coded dates:
Is the DYNAMIC_DATE function being called repeatedly in the WHERE clause?
What else might explain the disparity?
** edit **
A NONUNIQUE index was added to ORDERS table. Both queries execute in < 1 second. Both plans are the same (approach), but the one with the function is lower cost.
I removed the DETERMINISTIC keyword from the function; the query executed in < 1 second.
Is the issue really with the function or was it related to the table?
3 years from now, when this table is even larger, and if I don't include the DETERMINISTIC keyword, will query performance suffer?
Will the DETERMINISTIC keyword have any affect on the function's results? If I run DYNAMIC_DATE('T-1') tomorrow, will I get the same results as if I ran it today (08/09/2012)? If so, this approach won't work.
If the steps of the plan are identical, then the total amount of work being done should be identical. If you trace the session (something simple like set autotrace on in SQL*Plus or something more sophisticated like an event 10046 trace), or if you look at DBA_HIST_SQLSTAT assuming you have licensed access to the AWR tables, are you seeing (roughly) the same amount of logical I/O and CPU consumption for the two queries? Is it possible that the difference in runtime you are seeing is the result of the data being cached when you run the second query?
I am guessing that the problem isn't with your function. Try creating a function based index on trunc(ord.ordering_date) and see the explain plans.
CREATE INDEX ord_date_index ON ord(trunc(ord.ordering_date));

Oracle refuses to use index

I have a partitioned table like so:
create table demo (
ID NUMBER(22) not null,
TS TIMESTAMP not null,
KEY VARCHAR2(5) not null,
...lots more columns...
)
The partition is on the TS column (one partition per year).
Since I search a lot via the timestamp, I created a combined index:
create index demo.x1 on demo (ts, key);
The query looks like this:
select *
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
I also tried to add and t.KEY = '00101' but that doesn't help.
But EXPLAIN PLAN says that TABLE ACCESS and FULL:
# Operation Options Object Mode Cost Bytes Cardinality
0 SELECT STATEMENT ALL_ROWS 583804 287145 2127
1 PARTITION RANGE ALL 583804 287145 2127
2 TABLE ACCESS FULL HEADER ANALYZED 583804 287145 2127
No mention of the index. What could be wrong?
[EDIT] For some reason, Oracle completely miscalculated the cost for the operation. I have 112 million rows in that table. The cost for a full scan of a single partition should be 20 million, not 600'000. That's why it even ignores optimizer hints.
[EDIT2] During my tests, I ran over this puzzling result. When I run this select:
select tx_ts
from kt.header
where tx_ts = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
I get this EXPLAIN PLAN:
0 SELECT STATEMENT ALL_ROWS 152 15616 1952
1 PARTITION RANGE ALL 152 15616 1952
2 INDEX FAST FULL SCAN HEADERX2 ANALYZED 152 15616 1952
So when I restrict myself to the indexed column as the result of the select, Oracle decides to use the index. When I want to get all columns, I have to wait for a full table scan. What's going on here?
[EDIT2] Found it; see my answer below.
Okay, it was a mistake on my part: The column had the type DATE, not TIMESTAMP. Since I used to_timestamp(), Oracle saw no way to use the index.
I'm not really an expert on partitioning, but I think what has happened here is that you have created a global index -- a single index that covers rows in all of the partitions. Therefore, the optimizer has to choose between two mutually exclusive access paths: (A) an index range scan, or (B) partition pruning. I believe the PARTITION RANGE operation indicates that it has chosen B.
Updating statistics, as others have suggested, may change the behavior. When you drop and recreate the index, you discard any statistics that existed for the index.
Creating the index as UNIQUE, if the timestamp and key uniquely identify a row, would be a good idea and might change the behavior as well.
However, I think the real "fix" is that you should instead create local indexes -- a separate index on each partition. This should enable the optimizer to do partition pruning followed by an index lookup. Honestly, I'm not sure what the exact syntax is to do this. Maybe you just create the index on each partition explicitly using the individual partition names.
If everything else fails, you might try an optimizer hint:
select /*+ index(demo.demo demo.x1) */ *
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS')
Are your stats up to date? Invalid stats may mean that oracle believes a full table scan is faster than using the index. Are you using any hints in your query that might be telling oracle to do a full scan?
Can you supply us with the full query and explain plan results?
Edit: Aaron, you can update the stats using "dbms_stats.gather_schema_stats" or "dbms_stats.gather_table_stats" commands. See here for more information on the commands. This will update all the relevant stats for the schema or table specified. Oracle's Cost Based Optimiser will use the statistics to determine which execution plan to choose. It never uses the actual table sizes. You'll need to re-update your stats when the size of your table changes significantly ( +/- 10% or so)
Another thing. When you use a compound index, you need to specify all the columns used in the index in your query for the optimizer to consider the index (and I think you need to specify them in the same order as well, though I could be wrong about that, it's been a while since I looked at this stuff)
There may just be a typo in your transcription of the "CREATE INDEX..." statement that you posted, but are you sure you actually have created the index?
To give us some first-pass idea of the statistics, use these queries:
select table_name, num_rows
from user_tables
where table_name = 'DEMO';
select table_name, num_rows
from user_tab_partitions
where table_name = 'DEMO';
select index_name, num_rows from user_indexes
where table_name in
(select table_name
from user_tables where table_name = 'DEMO');
Also, exactly how are you generating the EXPLAIN PLAN? Do you have access to the database host to retrieve a trace file if you enable tracing?
[edit]
As I commented, it would be good to see the trace of an actual execution of the query. Since you've indicated you have access to the db host filesystems, run a SQL script that (in the same session) issues the following:
alter session set sql_trace=true;
select /* THIS IS THE TRACE */
*
from demo t
where t.TS = to_timestamp('2009-06-30 07:47:57', 'YYYY-MM-DD HH24:MI:SS');
exit
After the script exits, find out
where the trace file directory is by
this query:
select value from v$parameter where name = 'user_dump_dest';
Use whatever searching tool is
available to you to find the file
that includes the string "THIS IS THE
TRACE"
Process/profile the trace file by
issuing the OS command tkprof
traceFileName.trc tkprof.out
Examine this file - you'll see some overhead information, but there will be a section that details the actual execution plan and statistics for the query. If you see the same results in this information then the next step is to add another statement (after the first "alter session") that will dump information on why the Oracle CBO is ignoring the index:
ALTER SESSION SET EVENTS='10053 trace name context forever, level 1';

Resources