I just would like to hear different opinions about ROWID type usage as input parameter of any function or procedure.
I have normally used and seen primary keys used as input parameters but is there some kind of disadvantages to use ROWID as input parameter? I think it's kind a simple and selects are pretty quick if used in WHERE clause.
For example:
FUNCTION get_row(p_rowid IN ROWID) RETURN TABLE%ROWTYPE IS...
From the concept guide:
Physical rowids provide the fastest possible access to a row of a given table. They contain the physical address of a row (down to the specific block) and allow you to retrieve the row in a single block access. Oracle guarantees that as long as the row exists, its rowid does not change.
The main drawback of a ROWID is that while it is normally stable, it can change under some circumstances:
The table is rebuilt (ALTER TABLE MOVE...)
Export / Import obviously
Partition table with row movement enable
A primary key identifies a row logically, you will always find the correct row, even after a delete+insert. A ROWID identifies the row physically and is not as persistent as a primary key.
You can safely use ROWID in a single SQL statement since Oracle will guarantee the result is coherent, for example to remove duplicates in a table. To be on the safe side, I would suggest you only use the ROWID accross statements when you have a lock on the row (SELECT ... FOR UPDATE).
From a performance point of view, the Primary key access is a bit more expensive but you will normally notice this only if you do a lot of single row access. If performance is critical though, you usually can get greater benefit in that case from using set processing than single row processing with rowid. In particular, if there are a lot of roundtrips between the DB and the application, the cost of the row access will probably be negligible compared to the roundtrips cost.
Related
As per my design I want to fetch rowid as in
select rowid r from table_name;
into a C variable. I was wondering what is the max size / length in characters of the rowid.
Currently in one of the biggest tables in my DB we have the max length as 18 and its 18 throughout the table for rowid.
Thanks in advance.
Edit:
Currently the below block of code is iterated and used for multiple tables hence in-order to make the code flexible without introducing the need of defining every table's PK in the query we use ROWID.
select rowid from table_name ... where ....;
delete from table_name where rowid = selectedrowid;
I think as the rowid is picked and used then and there without storing it for future, it is safe to use in this particular scenario.
Please refer to below answer:
Is it safe to use ROWID to locate a Row/Record in Oracle?
I'd say no. This could be safe if for instance the application stores ROWID temporarily(say generating a list of select-able items, each identified with ROWID, but the list is routinely regenerated and not stored). But if ROWID is used in any persistent way it's not safe.
A physical ROWID has a fixed size in a given Oracle version, it does not depend on the number of rows in a table. It consists of the number of the datafile, the number of the block within this file, and the number of the row within this block. Therefore it is unique in the whole database and allows direct access to the block and row without any further lookup.
As things in the IT world continue to grow, it is safe to assume that the format will change in future.
Besides volume there are also structural changes, like the advent of transportable tablespaces, which made it necessary to store the object number (= internal number of the table/partition/subpartion) inside the ROWID.
Or the advent of Index organized tables (mentioned by #ibre5041), which look like a table, but are in reality just an index without such a physical address (because things are moving constantly in an index). This made it necessary to introduce UROWIDs which can store physical and index-based ROWIDs.
Please be aware that a ROWID can change, for instance if the row moves from one table partition to another one, or if the table is defragmented to fill the holes left by many DELETEs.
According documentation ROWID has a length of 10 Byte:
Rowids of Row Pieces
A rowid is effectively a 10-byte physical address of a row.
Every row in a heap-organized table has a rowid unique to this table
that corresponds to the physical address of a row piece. For table
clusters, rows in different tables that are in the same data block can
have the same rowid.
Oracle also documents the (current) format see, Rowid Format
In general you could use the ROWID in your application, provided the affected rows are locked!
Thus your statement may look like this:
CURSOR ... IS
select rowid from table_name ... where .... FOR UPDATE;
delete from table_name where rowid = selectedrowid;
see SELECT FOR UPDATE and FOR UPDATE Cursors
Oracle even provides a shortcut. Instead of where rowid = selectedrowid you can use WHERE CURRENT OF ...
According to Oracle Documentation
You should not use ROWID as the primary key of a table. If you delete
and reinsert a row with the Import and Export utilities, for example,
then its rowid may change. If you delete a row, then Oracle may
reassign its rowid to a new row inserted later.
I didn't understand the actual reason. Does it mean, when we use Import/Export utilities, then only we shouldn't use ROWID as primary key or we should never use ROWID as primary key ?
As explained above, when we delete the row and re-insert then same ROWID may get assign but on the other side the row was already deleted, so there won't be any problem if we get same ROWID. Isn't it ? Can anyone explain this with some example ?
If you rebuild your table then the ROWID of the table may change and you dont want your primary key to be changed.
Also if you delete one record then a new record could be given that ROWID. Also you should understand that ROWID does not persist across a database EXPORT and IMPORT process.
From here
If rows are moved, the ROWID will change. Rows can move due to
maintenance operations like shrinks and table moves. As a result,
storing ROWIDs for long periods of time is a bad idea. They should
only be used in a single transaction, preferably as part of a SELECT
... FOR UPDATE, where the row is locked, preventing row movement.
We should never use ROWIDs as primary keys for permanent and business-important data.
ROWID is a technical address of a row. There are several scenarious when
a) rowid of the existing records would be changed
b) different records would have the same rowid.
For example, if you have partitioned table, updating of record's partitioning key would bring us into record's rowid changing. Such scenarious prevents of using ROWID keys unless we can to forget it without serious consequences.
ROWID keys can be used for unnecessary temporary data, such as exceptions tables, or for short-term navigation, such as in WHERE CURRENT OF clause.
And for what is option CACHE?
CREATE SEQUENCE Race_SEQ
START WITH 1
INCREMENT BY 1
CACHE 10;
CREATE OR REPLACE TRIGGER RACE_INC
BEFORE INSERT ON RACE
FOR EACH ROW
BEGIN
:NEW.RACE_ID := RACE_SEQ.NEXTVAL;
END;
You can use the same sequence to generate the primary key for many tables. It generally doesn't make a lot of sense to do so, but you can. It doesn't cost you anything to have many different sequences and it generally makes for a more sensible application design when race_seq is used to populate the race table and foo_seq is used to populate the foo table rather than having all the tables use the same sequence.
You would need one trigger per table. A trigger can only be defined on a single table.
The cache attribute specifies how many values for the sequence should be cached (per node if you happen to be using RAC). That makes generating new values more efficient at the expense of increasing the number of gaps that are created. In general, you'd use larger caches for sequences the more frequently rows are inserted.
I am converting GTT's to oracle types as explained in an excellent answer by APC. however, some GTT's are being updated based on a select query from another table. For example:
UPDATE my_gtt_1 c
SET (street, city, STATE, zip) = (SELECT src.unit_address,
src.unit_city,
src.unit_state,
src.unit_zip_code
FROM (SELECT mbr.ROWID row_id,
unit_address,
RTRIM(a.unit_city) unit_city,
RTRIM(a.unit_state) unit_state,
RTRIM(a.unit_zip_code) unit_zip_code
FROM table_1 b,
table_2 a,
my_gtt_1 mbr
WHERE type = 'ABC'
AND id = b.ssn_head
AND a.h_id = b.h_id
AND row_id >= v_start_row
AND row_id <= v_end_row) src
WHERE c.ROWID = src.row_id)
WHERE state IS NULL
OR state = ' ';
if my_gtt_1 was not a global temporary table but an oracle collection type then is it possible to do updates this complex? Or in these cases we are better off using the global temporary table?
you can not perform set UPDATE operations on object types. You will have to do it row by row, as in:
FOR i IN l_tab.FIRST..l_tab.LAST LOOP
SELECT src.unit_address,
src.unit_city,
src.unit_state,
src.unit_zip_code
INTO l_tab(i).street,
l_tab(i).city,
l_tab(i).STATE,
l_tab(i).zip
FROM (your_query) src;
END LOOP;
You should therefore try to do all computations at creation time (where you can BULK COLLECT). Obviously, if your process needs many steps you might find that a global temporary table outperforms an in-memory structure.
From the last questions you have asked, it seems you are trying to replace all global temporary tables with object tables. I would suggest caution because in general, they are not interchangeable:
Objects tables are in-memory structures: you don't want to load a million+ rows table into memory. They are mainly used as a buffer: you load a few (100 for example) rows into the structure, perform what you need to do with these rows then load the next batch. You can not easily treat this structure as a regular table: for example you can only search this structure efficiently with the standard indexing key (you cannot search by rowid in your example unless you define the structure to be indexed by rowid).
Temporary tables on the other hand are very similar to ordinary tables. You can load millions of rows in them, perform joins, complex set operations. You can index the temporary table for further optimization.
In my opinion, the change your are trying to conduct will take a massive overhaul of your logic and it may not perform better. In general, you would not replace GTT with object tables. You may be able to remove GTT with significant gain in performance by using SET operations directly (perform massive UPDATE/DELETE/INSERT on your data directly without a staging table).
I would suggest performing benchmarks before choosing a solution (this is probably what you are doing right now :)
I think this part of APC's answer to your previous question is relevant here:
Global temporary tables are also good
if we have a lot of intermediate
processing which is just too
complicated to be solved with a single
SQL query. Especially if that
processing must be applied to subsets
of the retrieved rows.
You cannot update the in-memory data with an UPDATE statement like you can a GTT; you would need to write procedural code to locate and change the array elements in question.
We have two tables like so:
Event
id
type
... a bunch of other columns
ProcessedEvent
event_id
process
There are indexes defined for
Event(id) (PK)
ProcessedEvent (event_id, process)
The first represents events in an application.
The second represents the fact that a certain event got processes by a certain process. There are many processes that need to process a certain event, so there are multiple entries in the second table for each entry in the first.
In order to find all the events that need processing we execute the following query:
select * // of course we do name the columns in the production code
from Event
where type in ( 'typeA', 'typeB', 'typeC')
and id not in (
select event_id
from ProcessedEvent
where process = :1
)
Statistics are up to date
Since most events are processed, I think the best execution plan should look something like this
full index scan on the ProcessedEvent Index
full index scan on the Event Index
anti join between the two
table access with the rest
filter
Instead Oracle does the following
full index scan on the ProcessedEvent Index
full table scan on the Event table
filter the Event table
anti join between the two sets
With an index hint I get Oracle to do the following:
full index scan on the ProcessedEvent Index
full index scan on the Event Index
table acces on the Event table
filter the Event table
anti join between the two sets
which is really stupid IMHO.
So my question is: what might be the reason for oracle to insist on the early table access?
Addition:
The performance is bad. We are fixing the performance problem by selecting just the Event.IDs and then fetching the needed rows 'manually'. But of course that is just a work around.
your FULL INDEX SCAN will probably be faster than a FULL TABLE SCAN since the index is likely "thinner" than the table. Still, the FULL INDEX SCAN is a complete segment reading and it will be about the same cost as the FULL TABLE SCAN.
However, you're also adding a TABLE ACCESS BY ROWID step. It is an expensive step: one logical IO per row for the ROWID access whereas you get one logical IO per multi blocks (depending upon your db_file_multiblock_read_count parameter) for the FULL TABLE SCAN.
In conclusion, the optimizer computes that:
cost(FULL TABLE SCAN) < cost(FULL INDEX SCAN) + cost(TABLE ACCESS BY ROWID)
Update: The FULL TABLE SCAN also enables the filter on type sooner than in the FULL INDEX SCAN path (since the INDEX doesn't know what type an event is), therefore reducing the size of the set that will be anti-joined (yet another advantage of the FULL TABLE SCAN).
The optimizer does many things which do not make sense at first, but it has it's reasons. They may not always be right, but they're understandable.
The Event table may be easier to full-scan rather than by rowid access because of its size. It could be that there are significantly fewer IO operations involved to read the entire table sequentially than to read bits and pieces.
Is the performance bad, or are you just asking why the optimizer did that?
I can't explain the behavior of the optimizer, but my experience has been to avoid "NOT IN" at all costs, replacing it instead with MINUS, like so:
select * from Event
where id in (
select id from Event where type in ( 'typeA', 'typeB', 'typeC')
minus
select id from ProcessedEvent
)
I've seen orders of magnitude in query performance with similar transformations.
Something like:
WITH
PROCEEDED AS
(
SELECT
event_id
FROM
ProcessedEvent
WHERE
PROCESS = :1
)
SELECT
* // of course we do name the columns in the production code
FROM
EVENT
LEFT JOIN PROCEEDED P
ON
p.event_id = EVENT.event_id
WHERE
type IN ( 'typeA', 'typeB', 'typeC')
AND p.event_id IS NULL; -- exclude already proceeded
could work fast enough (at least much faster than NOT IN).