Detect if data in Oracle table has changed - oracle

I'm looking for a best performance solution to detect if data in an Oracle table has changed. This will be used to kick start a calculation that uses lots of data from the same tables. It would be too expensive to poll the data to track changes. The changes happen rarely.
I have analyzed the following solutions to the problem.
ORA_ROWSCN : Too slow, will do a full table scan.
Oracle Audit : Not possible to set up in my environment.
DBMS_ALERT : Writers are not able to signal.
Then I came up with the following simple idea. Add a trigger to the tables that increments a sequence on insert, update or delete. I know this will be materialized if rolled back but I can afford some false positives. My calculations service then polls the sequence current value to detect possible changes (= very cheap query)
CREATE OR REPLACE TRIGGER trg_change_tracker
before insert or update or delete on mytable
declare
dummy number;
begin
select seq_event_seqno.nextval into dummy from dual;
end;
How does this sound? Any pitfalls?
EDIT: Yes there is a major pitfall: When the writer holds back the commit, and the reader sees the new sequence value and query for changes before the writer has committed.

I can suggest you some addition to your trigger soulution:
When your polling see changes of the sequence, you can check opened transactions on interested table, if it exists, then skip current recalc.
UPD:
Also, you can save min SCN of opened transactions, and in case of frequently table changes you will not freeze
UPD2:
It is some heuristic improvement, not full problem solution. If you will skip recal every time when you see opened transaction, then you can freeze much time in case of frequently (and may be long) DML on table.
I mean, when seq is changed and the polling see opened transactions on your table, you can store min(start_scn) from v$transaction, and when the polling see opened transactions at next time you can compare current min(start_scn) with sotred min(start_scn), if current is greater then it is some chance that it is time to recalc.

You can use ( for particular XTABLE ):
SELECT * FROM dba_tab_modifications WHERE TABLE_NAME = 'XTABLE';
it will allow you ( among other data ) to see:
INSERTS UPDATES DELETES
43,708 1,845 0
.. there are issues ( bugs in some Oracle versions ), but if you will test thoroughly in your environment - you may find this useful.
.. as this question is pretty old - it would be nice to know how you implemented the change detection mechanism.

Related

Oracle Temporary Table to convert a Long Raw to Blob

Questions have been asked in the past that seems to handle pieces of my full question, but I'm not finding a totally good answer.
Here is the situation:
I'm importing data from an old, but operational and production, Oracle server.
One of the columns is created as LONG RAW.
I will not be able to convert the table to a BLOB.
I would like to use a global temporary table to pull out data each time I call to the server.
This feels like a good answer, from here: How to create a temporary table in Oracle
CREATE GLOBAL TEMPORARY TABLE newtable
ON COMMIT PRESERVE ROWS
AS SELECT
MYID, TO_LOB("LONGRAWDATA") "BLOBDATA"
FROM oldtable WHERE .....;
I do not want the table hanging around, and I'd only do a chunk of rows at a time, to pull out the old table in pieces, each time killing the table. Is it acceptable behavior to do the CREATE, then do the SELECT, then DROP?
Thanks..
--- EDIT ---
Just as a follow up, I decided to take an even different approach to this.
Branching the strong-oracle package, I was able to do what I originally hoped to do, which was to pull the data directly from the table without doing a conversion.
Here is the issue I've posted. If I am allowed to publish my code to a branch, I'll post a follow up here for completeness.
Oracle ODBC Driver Release 11.2.0.1.0 says that Prefetch for LONG RAW data types is supported, which is true.
One caveat is that LONG RAW can technically be up to 2GB in size. I had to set a hard max size of 10MB in the code, which is adequate for my use, so far at least. This probably could be a variable sent in to the connection.
This fix is a bit off original topic now however, but it might be useful to someone else.
With Oracle GTTs, it is not be necessary to drop and create each time, and you don't need to worry about data "hanging around." In fact, it's inadvisable to drop and re-create. The structure itself persists, but the data in it does not. The data only persists within each session. You can test this by opening up two separate clients, loading data with one, and you will notice it's not there in the second client.
In effect, each time you open a session, it's like you are reading a completely different table, which was just truncated.
If you want to empty the table within your stored procedure, you can always truncate it. Within a stored proc, you will need to execute immediate if you do this.
This is really handy, but it also can make debugging a bear if you are implementing GTTs through code.
Out of curiosity, why a chunk at a time and not the entire dataset? What kind of data volumes are you talking about?
-- EDIT --
Per our comments conversation, this is very raw and untested, but I hope it will give you an idea what I mean:
CREATE OR REPLACE PROCEDURE LOAD_DATA()
AS
TOTAL_ROWS number;
TABLE_ROWS number := 1;
ROWS_AT_A_TIME number := 100;
BEGIN
select count (*)
into TOTAL_ROWS
from oldtable;
WHILE TABLE_ROWS <= TOTAL_ROWS
LOOP
execute immediate 'truncate table MY_TEMP_TABLE';
insert into MY_TEMP_TABLE
SELECT
MYID, TO_LOB(LONGRAWDATA) as BLOBDATA
FROM oldtable
WHERE ROWNUM between TABLE_ROWS and TABLE_ROWS + ROWS_AT_A_TIME;
insert into MY_REMOTE_TABLE#MY_REMOTE_SERVER
select * from MY_TEMP_TABLE;
commit;
TABLE_ROWS := TABLE_ROWS + ROWS_AT_A_TIME;
END LOOP;
commit;
end LOAD_DATA;

Consecutive application threads and uncommitted data in Oracle

Our application reads a record from an Oracle 'Event' table. When the event record exists we update the 'count' field of that record. If the record doesn't exist we insert it. So we want only 1 record for a particular event in the table.
The problem with this is probably quite predictable: one application thread will read the table, see the event is not there, insert the new event and commit. But before it commits a second thread will also read the table and see the event is not there. And then both threads will insert the event and we end up with 2 records for the same event.
I guess synchronizing access to this particular method in our application will prevent this problem, but what is the best option in Oracle to prevent this? Will MERGE for example always prevent this problem?
Serialising access to the procedure that implements this functionality would be trivial to implement, using DBMS_LOCK to define and take an exclusive lock.
Serialising through SQL based methods is practically impossible, due to the read consistency model.
CREATE TABLE EVENTS (ID NUMBER PRIMARY KEY, COUNTER NUMBER NOT NULL);
MERGE INTO EVENTS
USING (SELECT ID, COUNTER FROM DUAL LEFT JOIN EVENTS ON EVENTS.ID = :EVENT_ID) SRC
ON (EVENTS.ID = SRC.ID)
WHEN MATCHED THEN UPDATE SET COUNTER = SRC.COUNTER + 1
WHEN NOT MATCHED THEN INSERT (ID, COUNTER) VALUES (:EVENT_ID, 1);
Simple SQL securing single record for each ID and consistently increasing the counter no matter what application fires it or number of concurrent thread. You don't need to code anything at all and it's very lightweight as well.
It also doesn't produce any exception related to data consistency so you don't need any special handling.
UPDATE: It actually produces unique violation exception if both threads are inserting. I thought the second merge would switch to update, but it doesn't.
UPDATE: Just tested the same case on SQL Server and when executing in parallel and the record doesn't exist one MERGE inserts and the second updates.

Oracle: difference between max(id)+1 and sequence.nextval

I am using Oracle
What is difference when we create ID using max(id)+1 and using sequance.nexval,where to use and when?
Like:
insert into student (id,name) values (select max(id)+1 from student, 'abc');
and
insert into student (id,name) values (SQ_STUDENT.nextval, 'abc');
SQ_STUDENT.nextval sometime gives error that duplicate record...
please help me on this doubt
With the select max(id) + 1 approach, two sessions inserting simultaneously will see the same current max ID from the table, and both insert the same new ID value. The only way to use this safely is to lock the table before starting the transaction, which is painful and serialises the transactions. (And as Stijn points out, values can be reused if the highest record is deleted). Basically, never use this approach. (There may very occasionally be a compelling reason to do so, but I'm not sure I've ever seen one).
The sequence guarantees that the two sessions will get different values, and no serialisation is needed. It will perform better and be safer, easier to code and easier to maintain.
The only way you can get duplicate errors using the sequence is if records already exist in the table with IDs above the sequence value, or if something is still inserting records without using the sequence. So if you had an existing table with manually entered IDs, say 1 to 10, and you created a sequence with a default start-with value of 1, the first insert using the sequence would try to insert an ID of 1 - which already exists. After trying that 10 times the sequence would give you 11, which would work. If you then used the max-ID approach to do the next insert that would use 12, but the sequence would still be on 11 and would also give you 12 next time you called nextval.
The sequence and table are not related. The sequence is not automatically updated if a manually-generated ID value is inserted into the table, so the two approaches don't mix. (Among other things, the same sequence can be used to generate IDs for multiple tables, as mentioned in the docs).
If you're changing from a manual approach to a sequence approach, you need to make sure the sequence is created with a start-with value that is higher than all existing IDs in the table, and that everything that does an insert uses the sequence only in the future.
Using a sequence works if you intend to have multiple users. Using a max does not.
If you do a max(id) + 1 and you allow multiple users, then multiple sessions that are both operating at the same time will regularly see the same max and, thus, will generate the same new key. Assuming you've configured your constraints correctly, that will generate an error that you'll have to handle. You'll handle it by retrying the INSERT which may fail again and again if other sessions block you before your session retries but that's a lot of extra code for every INSERT operation.
It will also serialize your code. If I insert a new row in my session and go off to lunch before I remember to commit (or my client application crashes before I can commit), every other user will be prevented from inserting a new row until I get back and commit or the DBA kills my session, forcing a reboot.
To add to the other answers, a couple of issues.
Your max(id)+1 syntax will also fail if there are no rows in the table already, so use:
Coalesce(Max(id),0) + 1
There's nothing wrong with this technique if you only have a single process that inserts into the table, as might be the case with a data warehouse load, and if max(id) is fast (which it probably is).
It also avoids the need for code to synchronise values between tables and sequences if you are moving restoring data to a test system, for example.
You can extend this method to multirow insert by using:
Coalesce(max(id),0) + rownum
I expect that might serialise a parallel insert, though.
Some techniques don't work well with these methods. They rely of course on being able to issue the select statement, so SQL*Loader might be ruled out. However SQL*Loader has support for this technique in general through the SEQUENCE parameter of the column specification: http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_field_list.htm#i1008234
Assuming MAX(ID) is actually fast enough, wouldn't it be possible to:
First get MAX(ID)+1
Then get NEXTVAL
Compare those two and increase sequence in case NEXTVAL is smaller then MAX(ID)+1
Use NEXTVAL in INSERT statement
In that case I would have a fully stable procedure and manual inserts would also be allowed without worrying about updating the sequence

Do the time of the COMMIT and ROLLBACK affect performance?

Suppose I have a set of ID . For each ID , I will insert many records to many different tables based on the ID .Between inserting difference tables, different business checks will be called . If any checking fail , all the records that are inserted based on this ID will be ROLLBACK .This bulk insert action is done through using PL/SQL . Do the time of the COMMIT and ROLLBACK affect the performance and how does it affect ? For example , should I COMMIT after finish the process for one ID or COMMIT after finish all ID?
This is not so much of a performance decision but a process design decision. Do you want the other IDs to stay in the database when you have to roll back a faulty ID?
For obvious reasons, rollback takes longer when more rows must be rolled back. Rollback usually takes longer (sometimes much longer!) than the operations that have to be rolled back. Commit is always fast in Oracle, so it probably doesn't matter how often you commit in that regard.
Your problem description indicates you have a large set of smaller logical transactions (each new ID is a transaction). You should commit each logical transaction. The two reasons to wait to commit the entire set of transactions are:
If the entire set of transactions is in fact a transaction itself - all inserts must succeed for any rows to be committed. In that context, your smaller "transactions" aren't truly transactions.
You don't have a restart capability in your bulk load process, which in effect makes this a special case of item 1. If your bulk load process aborts, you need a way to skip successfully applied ID's.
Tom Kyte's advice is to commit each logical unit of work - the transaction.
Don't take the transaction time longer. make it short as possible as you can. Because according to your query some locks have been created. This locks may cause perfomance issues... so do it ID by ID...
There are two "forces" at work....
locking
during your open transaction, oracle puts locks on the changed rows.
whenever another transaction needs to update any of the locked rows,
it has to wait.
in the worst case, you can even build a deadlock.
synchronous write
every commit performs a synchronous write.
(there are ways to disable that, but it is usually the thing everybody wants: integrity).
that synchronous write can take (much) longer then the a regular write (that can be buffered).
Not to forget that there is usually an additional network round trip involved with an commit.
so, the one force says "commit as soon as possible (considering your integrity requirements)" the other says "commit as as less often as possible".
There are some other issues to consider as well, e.g. the maximum transaction size. every uncommited transaction needs some temporary space. the bigger the transaction gets, the more you need. You can also run into ORA-01555 "snapshot too old".
If there is any advice to give, then it is to implement a configurable "commit frequency" so that you can easily change it as needed.
One option if you need to control the individual sets but retain the ability to commit or rollback the entire transaction is to use savepoints. You can set a savepoint at the beginning of the outermost loop, then rollback to it if an error occurs. You might end up with something like this:
begin
--Initial batch logging
for r_record in cur_cursor loop
savepoint s_cursor loop;
begin
--Process rows
exception
when others then
rollback to s_cursor;
end;
end loop;
--Final batch logging
exception
when others then
rollback;
raise;
end;

How to find out when an Oracle table was updated the last time

Can I find out when the last INSERT, UPDATE or DELETE statement was performed on a table in an Oracle database and if so, how?
A little background: The Oracle version is 10g. I have a batch application that runs regularly, reads data from a single Oracle table and writes it into a file. I would like to skip this if the data hasn't changed since the last time the job ran.
The application is written in C++ and communicates with Oracle via OCI. It logs into Oracle with a "normal" user, so I can't use any special admin stuff.
Edit: Okay, "Special Admin Stuff" wasn't exactly a good description. What I mean is: I can't do anything besides SELECTing from tables and calling stored procedures. Changing anything about the database itself (like adding triggers), is sadly not an option if want to get it done before 2010.
I'm really late to this party but here's how I did it:
SELECT SCN_TO_TIMESTAMP(MAX(ora_rowscn)) from myTable;
It's close enough for my purposes.
Since you are on 10g, you could potentially use the ORA_ROWSCN pseudocolumn. That gives you an upper bound of the last SCN (system change number) that caused a change in the row. Since this is an increasing sequence, you could store off the maximum ORA_ROWSCN that you've seen and then look only for data with an SCN greater than that.
By default, ORA_ROWSCN is actually maintained at the block level, so a change to any row in a block will change the ORA_ROWSCN for all rows in the block. This is probably quite sufficient if the intention is to minimize the number of rows you process multiple times with no changes if we're talking about "normal" data access patterns. You can rebuild the table with ROWDEPENDENCIES which will cause the ORA_ROWSCN to be tracked at the row level, which gives you more granular information but requires a one-time effort to rebuild the table.
Another option would be to configure something like Change Data Capture (CDC) and to make your OCI application a subscriber to changes to the table, but that also requires a one-time effort to configure CDC.
Ask your DBA about auditing. He can start an audit with a simple command like :
AUDIT INSERT ON user.table
Then you can query the table USER_AUDIT_OBJECT to determine if there has been an insert on your table since the last export.
google for Oracle auditing for more info...
SELECT * FROM all_tab_modifications;
Could you run a checksum of some sort on the result and store that locally? Then when your application queries the database, you can compare its checksum and determine if you should import it?
It looks like you may be able to use the ORA_HASH function to accomplish this.
Update: Another good resource: 10g’s ORA_HASH function to determine if two Oracle tables’ data are equal
Oracle can watch tables for changes and when a change occurs can execute a callback function in PL/SQL or OCI. The callback gets an object that's a collection of tables which changed, and that has a collection of rowid which changed, and the type of action, Ins, upd, del.
So you don't even go to the table, you sit and wait to be called. You'll only go if there are changes to write.
It's called Database Change Notification. It's much simpler than CDC as Justin mentioned, but both require some fancy admin stuff. The good part is that neither of these require changes to the APPLICATION.
The caveat is that CDC is fine for high volume tables, DCN is not.
If the auditing is enabled on the server, just simply use
SELECT *
FROM ALL_TAB_MODIFICATIONS
WHERE TABLE_NAME IN ()
You would need to add a trigger on insert, update, delete that sets a value in another table to sysdate.
When you run application, it would read the value and save it somewhere so that the next time it is run it has a reference to compare.
Would you consider that "Special Admin Stuff"?
It would be better to describe what you're actually doing so you get clearer answers.
How long does the batch process take to write the file? It may be easiest to let it go ahead and then compare the file against a copy of the file from the previous run to see if they are identical.
If any one is still looking for an answer they can use Oracle Database Change Notification feature coming with Oracle 10g. It requires CHANGE NOTIFICATION system privilege. You can register listeners when to trigger a notification back to the application.
Please use the below statement
select * from all_objects ao where ao.OBJECT_TYPE = 'TABLE' and ao.OWNER = 'YOUR_SCHEMA_NAME'

Resources