Hidden features in Oracle - oracle

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I enjoyed the answers and questions about hidden features in sql server
What can you tell us about Oracle?
Hidden tables, inner workings of ..., secret stored procs, package that has good utils...

Since Apex is now part of every Oracle database, these Apex utility functions are useful even if you aren't using Apex:
SQL> declare
2 v_array apex_application_global.vc_arr2;
3 v_string varchar2(2000);
4 begin
5
6 -- Convert delimited string to array
7 v_array := apex_util.string_to_table('alpha,beta,gamma,delta', ',');
8 for i in 1..v_array.count
9 loop
10 dbms_output.put_line(v_array(i));
11 end loop;
12
13 -- Convert array to delimited string
14 v_string := apex_util.table_to_string(v_array,'|');
15 dbms_output.put_line(v_string);
16 end;
17 /
alpha
beta
gamma
delta
alpha|beta|gamma|delta
PL/SQL procedure successfully completed.

"Full table scans are not always bad. Indexes are not always good."
An index-based access method is less efficient at reading rows than a full scan when you measure it in terms of rows accessed per unit of work (typically per logical read). However many tools will interpret a full table scan as a sign of inefficiency.
Take an example where you are reading a few hundred invoices frmo an invoice table and looking up a payment method in a small lookup table. Using an index to probe the lookup table for every invoice probably means three or four logical io's per invoice. However, a full scan of the lookup table in preparation for a hash join from the invoice data would probably require only a couple of logical reads, and the hash join itself would cmoplete in memory at almost no cost at all.
However many tools would look at this and see "full table scan", and tell you to try to use an index. If you do so then you may have just de-tuned your code.
Incidentally over reliance on indexes, as in the above example, causes the "Buffer Cache Hit Ratio" to rise. This is why the BCHR is mostly nonsense as a predictor of system efficiency.

The cardinality hint is mostly undocumented.
explain plan for
select /*+ cardinality(#inner 5000) */ *
from (select /*+ qb_name(inner) */ * from dual)
/
select * from table(dbms_xplan.display)
/
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 10000 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| DUAL | 1 | 2 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------

The Buffer Cache Hit Ratio is virtually meaningless as a predictor of system efficiency

You can view table data as of a previous time using Flashback Query, with certain limitations.
Select *
from my_table as of timestamp(timestamp '2008-12-01 15:21:13')
11g has a whole new feature set around preserving historical changes more robustly.

Frequent rebuilding of indexes is almost always a waste of time.

wm_concat works like the the MySql group_concat but it is undocumented.
with data:
-car- -maker-
Corvette Chevy
Taurus Ford
Impala Chevy
Aveo Chevy
select wm_concat(car) Cars, maker from cars
group by maker
gives you:
-Cars- -maker-
Corvette, Impala, Aveo Chevy
Taurus Ford

The OVERLAPS predicate is undocumented.
http://oraclesponge.wordpress.com/2008/06/12/the-overlaps-predicate/

I just found out about the pseudo-column Ora_rowSCN. If you don't set your table up for this, this pcolumn gives you the block SCN. This could be really useful for the emergency, "Oh crap I have no auditing on this table and wonder if someone has changed the data since yesterday."
But even better is if you create the table with Rowdependecies ON. That puts the SCN of the last change on every row. This will help you avoid a "Lost Edit" problem without having to include every column in your query.
IOW, when you app grabs a row for user modification, also select the Ora_rowscn. Then when you post the user's edits, include Ora_rowscn = v_rscn in addition to the unique key in the where clause. If someone has touched the row since you grabbed it, aka lost edit, the update will match zero rows since the ora_rowscn will have changed.
So cool.

If you get the value of PASSWORD column on DBA_USERS you can backup/restore passwords without knowing them:
ALTER USER xxx IDENTIFIED BY VALUES 'xxxx';

Bypass the buffer cache and read straight from disk using direct path reads.
alter session set "_serial_direct_read"=true;
Causes a tablespace (9i) or fast object (10g+) checkpoint, so careful on busy OLTP systems.

More undocumented stuff at http://awads.net/wp/tag/undocumented/
Warning: Use at your own risk.

I don't know if this counts as hidden, but I was pretty happy when I saw this way of quickly seeing what happened with a SQL statement you are tuning.
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM DUAL;
SELECT * FROM TABLE(dbms_xplan.display_cursor( NULL, NULL, 'RUNSTATS_LAST'))
;
PLAN_TABLE_OUTPUT
-----------------------------------------------------
SQL_ID 5z36y0tq909a8, child number 0
-------------------------------------
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM DUAL
Plan hash value: 272002086
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------
| 1 | TABLE ACCESS FULL| DUAL | 1 | 1 | 1 |00:00:00.02 | 3 | 2 |
---------------------------------------------------------------------------------------------
12 rows selected.
Where:
E-Rows is estimated rows.
A-Rows is actual rows.
A-Time is actual time.
Buffers is actual buffers.
Where the estimated plan varies from the actual execution by orders of magnitude, you know you have problems.

Not a hidden feature, but Finegrained-access-control (FGAC), also known as row-level security, is something I have used in the past and was impressed with the efficiency of its implementation. If you are looking for something that guarantees you can control the granularity of how rows are exposed to users with differing permissions - regardless of the application that is used to view data (SQL*Plus as well as your web app) - then this a gem.
The built-in fulltext indexing is more widely documented, but still stands out because of its stability (just try running a full-reindexing of fulltext-indexed columns on similar data samples on MS-SQL and Oracle and you'll see the speed difference).

WITH Clause

Snapshot tables. Also found in Oracle Lite, and extremely useful for rolling your own replication mechanism.

#Peter
You can actually bind a variable of type "Cursor" in TOAD, then use it in your statement and it will display the results in the result grid.
exec open :cur for select * from dual;

Q: How to call a stored with a cursor from TOAD?
A: Example, change to your cursor, packagename and stored proc name
declare cursor PCK_UTILS.typ_cursor;
begin
PCK_UTILS.spc_get_encodedstring(
'U',
10000002,
null,
'none',
cursor);
end;

The Model Clause (available for Oracle 10g and up)

WM_CONCAT for string aggregation

Scalar subquery caching is one of the most surprising features in Oracle
-- my_function is NOT deterministic but it is cached!
select t.x, t.y, (select my_function(t.x) from dual)
from t
-- logically equivalent to this, uncached
select t.x, t.y, my_function(t.x) from t
The "caching" subquery above evaluates my_function(t.x) only once per unique value of t.x. If you have large partitions of the same t.x value, this will immensely speed up your queries, even if my_function is not declared DETERMINISTIC. Even if it was DETERMINISTIC, you can safe yourself a possibly expensive SQL -> PL/SQL context switch.
Of course, if my_function is not a deterministic function, then this can lead to wrong results, so be careful!

Related

What does it mean when last_analyzed and stale_stats is null

We are currently running Oracle 11g and I am looking into if we need to run statistics after a large import. We have statistics_level set to 'TYPICAL'. Based on this I'm thinking that we do NOT need to update statistics:
Starting with Oracle Database 11g, the MONITORING and NOMONITORING
keywords have been deprecated and statistics are collected
automatically.
https://docs.oracle.com/cd/B28359_01/server.111/b28310/tables005.htm
However, after creating my database and running my modest import (100's of thousands to millions of records in a handful of tables and the creation of a number of indexes) all of the tables affected by the import show null for last_analyzed and stale_stats using the query below.
select
table_name,
stale_stats,
last_analyzed
from
dba_tab_statistics
where
owner = 'MY_SCHEMA'
order by
last_analyzed desc, table_name asc
;
Should I expect certain queries to have poor performance in this state?
Should I expect the statistics to eventually run and last_analyzed and stale_stats to eventually be populated (the documentation suggests that these values are updated about every three hours by default)?
It has been my experience that for moderately sized databases (tables with millions of records and less than 10's of millions of records) that mucking around with stats is not necessary and generally causes more problems than it solves. Is this generally the case?
* * * NOTES ON OUR RESOLUTION * * *
We were using this:
analyze table my_table compute statistics
We switched to this:
dbms_stats.gather_table_stats('MY_SCHEMA', 'MY_TABLE');
The analyze table statement took about 1:30 minutes in one environment and about 15:00 - 20:00 minutes in the second environment.
The gather_table_stats statement took about 0:30 to 1:00 minutes in both of the two instances we were able to examine.
Our plan moving forward is to switch our analyze table statements to gather_table_stats calls.
STATISTICS_LEVEL and gathering table/index statistics are entirely different things. STATISTICS_LEVEL affect if row source statistics are gathered during command execution. So then you're able to compare the optimizer estimates and actual values for each step in display cursor.
So table/index statistics are used for execution plan optimization and STATISTICS_LEVEL for gathering execution statistics when execution plan is being executed and it's mostly for diagnostic purposes.
When last_analyzed is null it means that table statistics hasn't been gathered yet.
stale_stats says whether the stats are considered fresh or stale, or if the stats will be gathered automatically next time or not. The default settings is 10 percent. If you gather table statistics and then insert/update/delete less than 10 percent of rows the statistics is considered fresh. When you reach 10 percent of modified rows they become stale.
Oracle by default gathers table/index statistics automatically during maintenance window which is automatically configured when a database is created. It's usually reconfigured by DBAs if there are specific requirements.
Regarding the STATISTICS_LEVEL, with default value TYPICAL it looks like this:
HUSQVIK#hq_pdb_tcp> select * from dual;
D
-
X
HUSQVIK#hq_pdb_tcp> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
SQL_ID a5ks9fhw2v9s1, child number 0
-------------------------------------
select * from dual
Plan hash value: 272002086
-------------------------------------------
| Id | Operation | Name | E-Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | TABLE ACCESS FULL| DUAL | 1 |
-------------------------------------------
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
We don't see anything more than estimated number of rows. If you set ALTER SESSION SET statistics_level = ALL then
HUSQVIK#hq_pdb_tcp> ALTER SESSION SET statistics_level = ALL;
HUSQVIK#hq_pdb_tcp> select * from dual;
D
-
X
HUSQVIK#hq_pdb_tcp> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID a5ks9fhw2v9s1, child number 1
-------------------------------------
select * from dual
Plan hash value: 272002086
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS FULL| DUAL | 1 | 1 | 1 |00:00:00.01 | 3 |
------------------------------------------------------------------------------------
Now we see also the actual number of rows and time taken to execute each step as well as number of consistent reads (buffers column).
With more complex queries you will get much more information than this. You should check the documentation at https://docs.oracle.com/database/121/ARPLS/d_xplan.htm
Also be aware that the statistics sampling is not done with every row but by default every 128 rows (can be changed using undocumented _rowsource_statistics_sampfreq parameter)
(Husqvik thoroughly explained the meaning of the columns and parameters, this answer only addresses how to gather statistics.)
Statistics should be manually gathered after any significant* change to a table. Oracle has a great default, automatic statistics gathering processes since 11g. But even with that new system there are still at least two good reasons to manually gather statistics. The default statistics gathering auto-task is normally meant for slowly-changing OLTP tables, not fast-changing data warehouse tables.
Significant data changes can easily lead to significant performance problems. If the tables are going to be used right after they are loaded then they need good statistics immediately.
A common problem in ETL processes is when tables go from 1 row to a million rows. The optimizer thinks there is still only one row in large tables and uses lots of nested loops joins instead of hash joins. Those algorithms work well in different contexts; without good statistics Oracle does not know the correct context.
It's important to note that a NULL LAST_ANALYZED is not the worst case scenario. When there are no statistics at all, Oracle will use dynamic sampling to generate quick statistics estimates. The worst case is when the statistics job ran last night when the table is empty; Oracle thinks it has good statistics when it really doesn't.
The statistics auto-task may not be able to keep up with large changes. The statistics auto-task is a low-priority, single-threaded process. If there are too many large tables left to the automatic process it may not be able to process them during the maintenance window.
The bad news is that developers can't ignore optimizer statistics. The DBAs can't just handle it later. It might help to read some of the chapters from the manuals, such as Managing Optimizer Statistics.
The goods news is that Oracle 11g finally has nice default settings. You usually don't need to muck around with the parameters. In most cases there's a simple rule to follow: if the table changed significantly, run this:
dbms_stats.gather_table_stats('SCHEMA_NAME', 'TABLE_NAME');
*: "Significant" is a subjective word. A change is normally significant in terms of relative size, not absolute. Adding one million rows to a table is significant if the table currently has one row, but not if the table has a billion rows.

Understanding Oracle Index Order

I'm a bit confused by this, I hope someone can help. I'm reading trough Markus Winand's excellent Use The Index Luke book and there's this thing about concatenated indexes.
There is an (EMPLOYEE_ID, SUBSIDIARY_ID) index created so when he queries
SELECT first_name, last_name
FROM employees
WHERE subsidiary_id = 20
This execution plan comes up :
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 478 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 106 | 478 |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SUBSIDIARY_ID"=20)
But here's the thing : on my own employees table (empno, ename, init, job, mgr, bdate, msal, comm, deptno) I created a concatenated index on (ENAME, JOB)
The query select ename from employees where job = 'TRAINER'; gives me the following execution plan :
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4271702361
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 45 | 1 (0)| 00:00:01 |
|* 1 | INDEX SKIP SCAN | ENAME_INDEX | 3 | 45 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - access("JOB"='TRAINER')
filter("JOB"='TRAINER')
So now I'm a bit confused.
1) How come, despite the order, my index was still used?
2) Does an Index Skip Scan work for any concatenated index where I don't use the first column in a where clause?
3) Does an Index Skip Scan have a major impact on performance?
4) How come there's both an access and a filter predicate?
And while I'm here, I have another question
5) Do I need to take any precautions in indexing dates?
Oracle does have the ability, via an index skip scan, to use a composite index when you don't specify the leading column of the index in your predicate. This is generally much less efficient than a regular index scan, however. Conceptually, you can think of it doing an index scan for every distinct value of the leading column of the index. Normally, Oracle would only consider this sort of plan if the leading column had a few distinct values and the trailing column(s) were particularly selective. I wouldn't expect either to be true here-- presumably ename is nearly unique and job is rather less selective. I would expect that a full scan of the table would be more efficient so I would guess that something about your statistics is "wonky". If your table is particularly small, that could certainly cause query plans to be unusual simply because every plan appears to be exceptionally cheap.
In the real world, there are exceptionally few cases that someone sees an "index skip scan" in a query plan and thinks "Great! That's the plan I wanted." It generally means that something has gone wrong but that it may not have gone as far wrong as it might have.
Good question.
Obviously, if you query had contained both ENAME and JOB, then Oracle would have used the index, either with a INDEX RANGE SCAN or INDEX UNIQUE SCAN. However, the leading edge of the index, ENAME, was not provided in the query predicates. So, Oracle's Cost Based Optimizer (CBO) has a choice. It may choose to do a FULL TABLE SCAN (ignoring the index), or an INDEX SKIP SCAN.
I assume you know what a FULL TABLE SCAN is, so I won't go into that.
So, what's an INDEX SKIP SCAN? Well, the CBO has the option, depending on the shape and size of the index, to do a skip scan. This usually happens when there is a relatively small number of distinct values for the leading column in the index. What happens is, Oracle takes the index and effectively breaks it down into several indexes. Suppose the leading column has 4 distinct values, (1-4). So, Oracle looks at the subset of the index where leading column equals 1, and does a range scan of that subset of the index, then does the same for the subsets of the index where leading column equals 2, then 3, then 4. In some cases, depending on how many distinct values the leading column has, and how selective the range scan on the second column is, this access path may be less costly than a FULL TABLE SCAN.
This is another reason why, if all other things are equal, you may want to put less selective columns in the leading edge of the index. (The other main reason being compression.)
Answers for most of your questions: https://oracle-base.com/articles/9i/index-skip-scanning
1) This is exactly what INDEX SKIP SCAN is for.
2) Yes, it can be used, but it depends on your statistics
3) It might and it might not -> depends on your statistics
4) Access path is about selecting data blocks to load and filter is about how row from data block are filtered out.
5) Index on DATE works very similarly to the index on other data types. DATE is 7 bytes long.

understanding explain plan in oracle

I was trying to understand the explain plan in oracle and wanted to know what conditions oracle considers while forming the explain plan
I was testing a simple query in HR schema present in oracle 11g
select * from countries
where region_id in (select region_id from regions where region_name = 'Europe');
When I ran the following queries:
explain plan for
select * from countries
where region_id in (select region_id from regions where region_name = 'Europe');
SELECT * FROM table(dbms_xplan.display(null,null,'basic'));
I got the following output in the explain table:
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | INDEX FULL SCAN | COUNTRY_C_ID_PK |
| 3 | TABLE ACCESS BY INDEX ROWID| REGIONS |
| 4 | INDEX UNIQUE SCAN | REG_ID_PK |
--------------------------------------------------------
Here I observed that the outer query was executed first, i.e countries table was executed first as indicated by Row 3.
Now I added an index on the region_name of the regions table and ran the explain plan again
and got the following output
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | TABLE ACCESS BY INDEX ROWID| REGIONS |
| 3 | INDEX RANGE SCAN | REGIONNAME_REGIONS_IX |
| 4 | INDEX UNIQUE SCAN | COUNTRY_C_ID_PK |
| 5 | INDEX RANGE SCAN | COUNTRIES_REGIONID_IX |
--------------------------------------------------------------
Now my question is:
Shouldn't the inner query be executed first irrespective of whether index is present or not
If the adding an index alters the execution plan, what other features can alter it?
In general case what is the execution process like is it sequential (first executes the join which occurs first and then goes to next join in the query) ?
Thanks in advance for your help.
-Varun
The explain plan relies heavily on the Cost Based Optimizer (CBO). You can help this process out by gathering statistics on the table(s) you are querying against. Now in terms of why would the index change the plan, that is because you have supplied critical information to the CBO that it did not have before. It is the equivalent of me asking you this question:
No index:
"Where is the street?"
With index:
"Where is the street that has a blue house on it?"
The second question gives greater context and is thus faster for you to deduce and you don't have to enumerate all such things that are streets.
You can supply hints to a query i.e.:
select /*+ parallel */ * from table
to give a hint to run this query in parallel.
For the third question, that I imagine is a bit of the Oracle process and is not documented for the world to consume.
In the first question, no not necessarily, it is all cost based.
I don't know if they changed anything in the execution plan outputs in 11g, but are you sure you are showing us the right query? You are selecting all columns (select *) from table countries, but the explain plan does not show any table access? Or does COUNTRY_C_ID_PK include all columns?
I would expect the following plan (without the index):
SELECT
NESTED LOOP
FULL TABLE SCAN (regions)
TABLE ACCESS BY INDEX ROWID (countries)
INDEX RANGE SCAN (COUNTRIES_REGIONID_IX)
With the index in place, I would expect something like this:
SELECT
NESTED LOOP
TABLE ACCESS BY INDEX ROWID (regions)
INDEX RANGE SCAN (REGIONNAME_REGIONS_IX)
TABLE ACCESS BY INDEX ROWID (countries)
INDEX RANGE SCAN (COUNTRIES_REGIONID_IX)
For your questions:
Oracle may drive the query from the inner or outer query as it sees fit depending on the available statistics
There are soo many things that influence the execution plan...
Oracle can only join two tables (or row sources) at a time. The result of a join is also a row source that can be joined to the next table
The cost-based optimiser goes through a few stages, including query transformation. Your query has almost certainly be rewritten by the optimiser to:
select countries.*
from countries join regions on (countries.region_id = regions.region_id)
where regions.region_name = 'Europe';
So the concept of inner and outer queries as represented in the original query may not apply post-transformation. Incidentally, this is why arguments concerning EXISTS () vs IN () are often moot -- the query in both cases can often be rewritten as a join.
Among the information that the optimiser uses (version dependent) are:
Statistics on the table
Statistics on the table columns
Histograms of table column values
Presence of indexes
Size and type of, and statistics on, indexes -- in particular the clustering factor
Presence of constraints -- including not null and check constraints.
Estimated cost of single and multiblock reads and cpu ops per second.
Partitioning
Presence and state of materialised views and/or query rewrite declarations.
Performance of previous versions of the query.
So in short, don't be surprised by anything the optimiser does. It's a very sophisticated piece of kit.

Materialized View fast refresh taking a long time

I have a large table that is replicated from Oracle 10.2.0.4 to and Oracle 9i database using MView replication over the network. The master table is about 50GB, 160M rows and there are about 2 - 3M new or updates rows per day.
The master table has a materialized view log created using rowid.
The full refresh of the view works and takes about 5 hours, which we can live with.
However the fast refresh is struggling to keep up. Oracle seems to require two queries against the mlog and master table to do the refresh, the first looks like this:
SELECT /*+ */
DISTINCT "A1"."M_ROW$$"
FROM "GENEVA_ADMIN"."MLOG$_BILLSUMMARY" "A1"
WHERE "A1"."M_ROW$$" <> ALL (SELECT "A2".ROWID
FROM "GENEVA_ADMIN"."BILLSUMMARY" "A2"
WHERE "A2".ROWID = "A1"."M_ROW$$")
AND "A1"."SNAPTIME$$" > :1
AND "A1"."DMLTYPE$$" <> 'I'
The current plan is:
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | HASH UNIQUE | |
| 2 | FILTER | |
| 3 | TABLE ACCESS BY INDEX ROWID| MLOG$_BILLSUMMARY |
| 4 | INDEX RANGE SCAN | MLOG$_BILLSUMMARY_AK1 |
| 5 | TABLE ACCESS BY USER ROWID | BILLSUMMARY |
When there are 3M rows changed, this query literally runs forever - its basically useless. However, if I rewrite it slightly and tell it to full scan the master table and mlog table, it completes in 20 minutes.
The problem is that the above query is coming out of the inners of Oracle and I cannot change it. The problem is really the FILTER operation on line 2 - if I could get it to full scan both tables and hash join / anti-join, I am confident I can get it to complete quick enough, but no receipe of hints I offer will get this query to stop using the FILTER operation - maybe its not even valid. I can use hints to get it to full scan both the tables, but the FILTER operation remains, and I understand it execute long 5 for each row returned by line 3, which will be 2- 3M rows.
Has anyone got any ideas on how to trick this query into the plan I want without changing the actual query, or better, any ways of getting replication to take a more sensible plan for my tablesizes?
Thanks,
Stephen.
As you wrote the queries are part of an internal Oracle mechanism so your tuning options are limited. The fast-refresh algorithm seems to behave differently in the more recent versions, check Alberto Dell’Era’s analysis.
You could also look into SQL profiles (10g feature). With the package DBMS_SQLTUNE this should allow you to tune individual SQL statements.
How do the estimated cardinalities look for the refresh query in comparison to the actual cardinalities? Maybe the MLOG$ table statistics are incorrect.
It might be better to have no statistics on the table and lock them in order to invoke dynamic sampling, which ought to give a reasonable estimation based on the multiple predicates in the query.

Update statement optimization (functions in the where clause)

I feel dumb right now.
I got to update 100.000 rows on a database that i don't have direct access to. The total row count of the table is roughtly 500.000 rows.
The update just adds one caracter to a field in case it's length is < 3. So Basically:
UPDATE X SET VALUE = '0'||VALUE WHERE LENGTH(VALUE) < 3
So i send this update to the DBA's and they return it to me saying that the statement has too much performance cost (because the full access table and the 100k commit)
and that i should write a proces instead. And then they provide me a code example, in case i don't know how to make one.
I say WTF, how a process would ever run faster than a single update statement? Afer doing doing some tests, my update takes 30 seconds to run,
the process, following their code example, takes 10 minutes.
So the real question, after all this frustation, is: Is there any way to avoid the full acces table when using such a function in the where clause? (the column is indexed)
Your statement is already optimized. It is set-based and queries the table in the most efficient way possible (Full Table Scan). You won't be able to write a program that does the same work with less resources / time. You CAN write a program that performs poorly, that is non-restartable in case of error (ie: commit every 100 rows) and will monopolize more resources.
Follow Tom Kyte's mantra:
You should do it in a single SQL statement if at all possible.
If you cannot do it in a single SQL Statement, then do it in PL/SQL.
If you cannot do it in PL/SQL, try a Java Stored Procedure.
If you cannot do it in Java, do it in a C external procedure.
If you cannot do it in a C external routine, you might want to seriously
think about why it is you need to do it
Accessing 100k rows out of 500k (i.e. 20%) by index will almost certainly require more logical IOs then full scan.
I believe your statement is OK.
On a side note it might be more robust to:
UPDATE X
SET VALUE = LPAD(VALUE,3,'0')
WHERE LENGTH(VALUE) < 3
... just in case.
The only reason not to do it in one statement is when you have to update so many rows that your rollback segments become too small.
In such a case (only!), sacrifizing some speed, you can do it like this in PL/SQL:
DECLARE mylimit 10000; /* Arbitrary limit */
BEGIN
LOOP
UPDATE X SET VALUE = '0'||VALUE WHERE LENGTH(VALUE) < 3 and ROWNUM<=mylimit;
EXIT WHEN SQL%ROWCOUNT<mylimit;
END LOOP;
END;
But this also doesn't work perfectly, because rows where length(VALUE)=1 will be updated twice, until they do no longer fulfill the WHERE condition. Sadly, this cannot easily be avoided...
It looks like the only option you have is to perform your UPDATE in chunks. If you for example put a LIMIT 1000 in your statement, the performance should not notably decrease (I assume this query has to be executed on a live database).
You say that you don't have direct access yourself; if these people are able to run Bash scripts you could just loop the statement with the LIMIT as many times as necessary, putting a sleep # in the loop. Maybe this would be a viable workaround.
As others pointed out - yes, the single UPDATE is already the fastest method. But it seems that his problem is that even this takes too long, so I proposed to do this chunk by chunk.
The chunked run will run even longer until it is done, but it shouldn't occupy the database and make it inaccessible for other users (if you choose a good interval, that is). The problem is the writing to the database, not the searching (i.e. using WHERE LENGTH(name) < 3). So while this method will increase the total strain on the database, it will spread it over time and therefore not block the database. You could e.g. run this on 100-chunks and pause two seconds after each. Have this run over night and nobody will notice.
If updating the entire table in one transaction is not an option (e.g. due to rollback space issues), another method is to break up the table into chunks (e.g. on ranges of PK values) and update them one chunk at a time.
Try to pick a "chunking" method that will tend to have all the rows within the same blocks - e.g. if the rows have generally been inserted roughly in order of ID, that'd be a good candidate - that way the update will tend to update all the rows in a block in one go.
If your predicate covers the majority of rows in the table, I'd expect full-table-scans for each update, which should be ok. (You can even monitor their progress by querying v$session_longops.)
A Function based index could help speed up the udpates.
create index x_idx1 on x(length(value));
Here is an example.
sqlplus>create table t
2 ( id NUMBER(9) PRIMARY KEY,
3 name VARCHAR2(100)
4 );
Table created.
sqlplus>insert into t select object_id, object_name from user_objects;
2188 rows created.
sqlplus>exec dbms_stats.gather_table_stats(ownname=>'test',tabname =>'t');
PL/SQL procedure successfully completed.
sqlplus>create index t_idx3 on t(length(name));
Index created.
sqlplus>explain plan for update t set name = name || '1' where length(name) < 25;
Explained.
sqlplus>select * from table(dbms_xplan.display);
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 109 | 2616 | 4 (0)| 00:00:01 |
| 1 | UPDATE | T | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T | 109 | 2616 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_IDX3 | 20 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(LENGTH("NAME")<25)
15 rows selected.

Resources