Oracle LogMiner Results Inconsistent - oracle

I'm working on a LogMiner-based solution for capturing changes and I've uncovered what appears to either an unusual set of expectations when attempting to mine redo events that pertain to CLOB or BLOB operations.
In my use case, I've inserted record into a table that contains 3 CLOB fields where one of the CLOB fields value is small while the other two CLOB fields must be set using LOB_WRITE operations.
When I set a starting LogMiner SCN range that starts before and ends after the transaction commit, I get the full expected rows in V$LOGMNR_CONTENTS, which are:
0a00070084220000 37717288 START
0a00070084220000 37717288 INSERT
0a00070084220000 37717312 SEL_LOB_LOCATOR
0a00070084220000 37717312 LOB_WRITE (several of these as expected)
0a00070084220000 37717331 SEL_LOB_LOCATOR
0a00070084220000 37717331 LOB_WRITE (several of these as expected)
0a00070084220000 37717332 INSERT (sets the smaller clob data values)
0a00070084220000 37717334 COMMIT
The unusual bit occurs when starting the mining session with certain start/end SCN ranges.
For example, when I mine from 37717239 to 37717289, I expected LogMiner to provide both the START and the INSERT in the table; however only the START operation was present.
Additionally, when I mine from 37717290 to 37717340, I expected LogMiner to provide all the SEL_LOB_LOCATOR, LOB_WRITE, and subsequent INSERT and the COMMIT; however only the subsequent INSERT and COMMIT were present.
The only assertion I can make from this is that LogMiner seems to have trouble when you split a transction where certain redo events represent various synthetic operations as they relate to LOB operations and therefore the only way I've been able to actually always reconstruct the series of events has been to mine from 37717288 forward to force LogMiner to have the full scope of the transaction available when it materializes the rows in the contents view.
Why does LogMiner behave like this? Why does it not correctly materialize when splitting the transaction with the SCN ranges I presented above?

For Logminer any single command is atomic by definition.
In this case it starts at 37717288 and ends at 37717332.
It cannot be split. If you will ask any range that splits it - Logminer will not fetch it on purpose (so you won't have partial results of a single command).
This is also right for large non-LOB commands, like DDL that generate many internal commands (e.g. alter table modify column default value)
Besides that, pay attention that fetching the values of LOB from Logminer is not reliable. just play around with the values and you will see that it is highly inconsistent. (I have somewhere tests to prove it, so if you are interested I can provide them).
Here is the test: define a table with 2 lobs, create 2 rows - first with single lob and second with two lobs.
drop table sample1.clobs2;
create table sample1.clobs2 (id number not null, clob1 clob not null, clob2 clob);
--start
select current_scn from v$database;
insert into sample1.clobs2 (id, clob1) values (3, 'abc');
insert into sample1.clobs2 (id, clob1, clob2) values (4, 'abc', '2abc');
commit;
update sample1.clobs2 set clob1='def' where id=3;
update sample1.clobs2 set clob1='def', clob2='2def' where id=4;
commit;
update sample1.clobs2 set clob1=rpad('ghj',30000,'Z') where id=3;
update sample1.clobs2 set clob1=rpad('ghj',30000,'Z'), clob2=rpad('ghj',30000,'Z') where id=4;
commit;
--end
select current_scn from v$database;
Start the logminer:
exec DBMS_LOGMNR.end_LOGMNR;
exec DBMS_LOGMNR.ADD_LOGFILE('put here any logfile(select MEMBER from v$logfile), logminer will do the rest');
begin DBMS_LOGMNR.START_LOGMNR(
STARTSCN => put here the scn from the above test,
ENDSCN => put here the scn from the above test,
OPTIONS => -- I leave all the possible parameters here just for you to play
--DBMS_LOGMNR.DICT_FROM_REDO_LOGS +
--DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG +
DBMS_LOGMNR.CONTINUOUS_MINE +
--DBMS_LOGMNR.COMMITTED_DATA_ONLY+
--DBMS_LOGMNR.DDL_DICT_TRACKING+
DBMS_LOGMNR.NO_ROWID_IN_STMT+
DBMS_LOGMNR.NO_SQL_DELIMITER
);
end;
/
Check the results:
select scn, (XIDUSN || '.' || XIDSLT || '.' || XIDSQN) AS transaction_id, operation, seg_name, ROW_ID, rollback, csf,SQL_REDO, c.*
from v$logmnr_contents c
where 1=1
(seg_name='OBJ# put here the object id of the table sample1.clobs2' or operation='COMMIT');
You will see:
large lobs behave differently from small ones
in case when only one lob is updated - it is impossible to understand which one was it (the first or the second)
In addition, this behaviour changes between different Oracle versions.

Related

Precise difference between statement on Row and on Table

What's the difference between these two blocks and when to use the first or the second?
Create OR Replace trigger trig_before_insert before insert on Employee For each Row
Begin
DBMS_OUTPUT.PUT_LINE('Inserting');
END;
And
Create OR Replace trigger trig_before_insert before insert on Employee
Begin
DBMS_OUTPUT.PUT_LINE('Inserting');
END;
If you perform an
INSERT INTO EMPLOYEE
SELECT ...
and that SELECT returns 100 rows so that the INSERT inserts 100 rows, your first trigger will execute 100 times, once for each row. In the same situation, your second trigger will execute only once.
You can use a BEFORE INSERT...FOR EACH ROW trigger to change the values that are being inserted by accessing them via the :NEW variable. E.g.,
:new.column_1 := 'a different value';
You cannot do that in a statement level trigger (which is what your 2nd trigger is).
There are also limitations in row level triggers (which is what your 1st trigger is). In particular, you may not SELECT from the trigger's base table (EMPLOYEES in this case), because that table is said to be "mutating". The exact reasons, as I understand them, go back to the core principles of relational databases -- specifically that the results of a statement (like INSERT INTO...SELECT) should not depend on the order in which the rows are processed. There are workarounds to this limitation, however, which are beyond the scope of your original question, I think.

ETL into operational oracle database - used by jsp/spring/hibernate app

I am needing to have some legacy data loaded into an operational oracle (11gR2) database. The database is being used by a jsp/spring/hibernate (3.2.5.ga) application. A sequence is used for generating unique-keys across all the tables. the sequence definition is as below:
CREATE SEQUENCE "TEST"."HIBERNATE_SEQUENCE" MINVALUE 1 MAXVALUE 999999999999999999999999999 INCREMENT BY 1 START WITH 1000 CACHE 20 NOORDER NOCYCLE
The idea for the data load/ETL is to come up wtih a script that starts out with the max sequence value by running
select HIBERNATE_SEQUENCE.NEXTVAL from dual
at the beginning of the script generation process - and generated SQL Insert statements for the data that needs to be populated. there is some logic involved in handling data cleanup, business rules etc that get applied applied through the script and the generated SQL Insert statements are expected to be run in one batch and that should be able to bring in all of the legacy data.
assuming that the max sequence value was 1000 - the script uses this as as variable and increments is as necessary, and the output SQL INSERTS will be as below:
INSERT INTO USER_STATUS(ID, CREATE_DATE, UPDATE_DATE, STATUS_ID, USER_ID)
VALUES (**1001**, CURRENT_DATE, CURRENT_DATE, 20, 445);
INSERT INTO USER_ACTIVITY_LOG(ID, CREATE_DATE, UPDATE_DATE, DETAILS, LAST_USER_STATUS_ID)
VALUES (**1002**, CURRENT_DATE, CURRENT_DATE, 'USER ACTIVITY 1', **1001**);
INSERT INTO USER_STATUS(ID, CREATE_DATE, UPDATE_DATE, STATUS_ID, USER_ID)
VALUES (**1003**, CURRENT_DATE, CURRENT_DATE, 10, 445);
INSERT INTO USER_ACTIVITY_LOG(ID, CREATE_DATE, UPDATE_DATE, DETAILS, LAST_USER_STATUS_ID)
VALUES (**1004**, CURRENT_DATE, CURRENT_DATE, 'USER ACTIVITY 3', **1003**);
I have created some mock SQL to show the idea of how the output INSERTS are going to be - there are going to be a lot more tables involved in the insert operations. whenever we need to make data changes from the back-end we would use the HIBERNATE_SEQUENCE.NEXTVAL to get the next unique key value. but since the sql generation script runs in a disconnected mode, it does not use the HIBERNATE_SEQUENCE.NEXTVAL, but tries to increment a local variable instead.
The assumption we are having about being able to generate (and run) this script is to
have the application taken down for maintenance
have no database activity during the time of running the script and start out with the max sequence value.
generate the SQL
run the SQL - commit.
assuming that, in the process of script generation, the max sequence value goes up from 1000 to 5000 - after the script is run and the data is loaded, the HIBERNATE_SEQUENCE would need to dropped/created to start at 5001.
bring the application back up.
Now, to the reason i am posting this, in such detail... i am needing your suggestions/input about any loopholes in this design and if there is anything i am overlooking.
Any input is appreciated.
Thanks!
I would suggest against dropping and creating the sequence if its used for any other task in your application, doing so means you also need to re-add any permissions, synonyms,etc.
Do you know at the start of the script how many inserts you will do? If so, and assuming that you wont have any other activity, then you can adjust the 'increment by' value of the sequence , so a single select from it will move the sequence forward by whatever value you want.
> drop sequence seq_test;
sequence SEQ_TEST dropped.
> create sequence seq_test start with 1 increment by 1;
sequence SEQ_TEST created.
> select seq_test.nextval from dual;
NEXTVAL
----------------------
1
> alter sequence seq_test increment by 500;
sequence SEQ_TEST altered.
> select seq_test.nextval from dual;
NEXTVAL
----------------------
501
> alter sequence seq_test increment by 1;
sequence SEQ_TEST altered.
> select seq_test.nextval from dual;
NEXTVAL
----------------------
502
Just be aware that the DDL statements will issue an implicit commit, so once they have run any inflight transaction will be commited, and any work performed after them will be a separate transaction.

Incrementing Oracle Sequence by certain amount

I am programming a Windows Application (in Qt 4.6) which - at some point - inserts any number of datasets between 1 and around 76000 into some oracle (10.2) table. The application has to retrieve the primary keys, or at least the primary key range, from a sequence. It will then store the IDs in a list which is used for Batch Execution of a prepared query.
(Note: Triggers shall not be used, and the sequence is used by other tasks as well)
In order to avoid calling the sequence X times, I would like to increment the sequence by X instead.
What I have found out so far, is that the following code would be possible in a procedure:
ALTER SEQUENCE my_sequence INCREMENT BY X;
SELECT my_sequence.CURVAL + 1, my_sequence.NEXTVAL
INTO v_first_number, v_last_number
FROM dual;
ALTER SEQUENCE my_sequence INCREMENT BY 1;
I have two major concerns though:
I have read that ALTER SEQUENCE produces an implicit commit. Does this mean the transaction started by the Windows Application will be commited? If so, can you somehow avoid it?
Is this concept multi-user proof? Or could the following thing happen:
Sequence is at 10,000
Session A sets increment to 2,000
Session A selects 10,001 as first and 12,000 as last
Session B sets increment to 5,000
Session A sets increment to 1
Session B selects 12,001 as first and 12,001 as last
Session B sets increment to 1
Even if the procedure would be rather quick, it is not that unlikely in my application that two different users cause the procedure to be called almost simultaneously
1) ALTER SEQUENCE is DDL so it implicitly commits before and after the statement. The database transaction started by the Windows application will be committed. If you are using a distributed transaction coordinator other than the Oracle database, hopefully the transaction coordinator will commit the entire distributed transaction but transaction coordinators will sometimes have problems with commits issued that it is not aware of.
There is nothing that you can do to prevent DDL from committing.
2) The scenario you outline with multiple users is quite possible. So it doesn't sound like this approach would behave correctly in your environment.
You could potentially use the DBMS_LOCK package to ensure that only one session is calling your procedure at any point in time and then call the sequence N times from a single SQL statement. But if other processes are also using the sequence, there is no guarantee that you'll get a contiguous set of values.
CREATE PROCEDURE some_proc( p_num_rows IN NUMBER,
p_first_val OUT NUMBER,
p_last_val OUT NUMBER )
AS
l_lockhandle VARCHAR2(128);
l_lock_return_code INTEGER;
BEGIN
dbms_lock.allocate_unique( 'SOME_PROC_LOCK',
l_lockhandle );
l_lock_return_code := dbms_lock.request( lockhandle => l_lockhandle,
lockmode => dbms_lock.x_mode,
release_on_commit => true );
if( l_lock_return_code IN (0, 4) ) -- Success or already owned
then
<<do something>>
end if;
dbms_lock.release( l_lockhandle );
END;
Altering the sequence in this scenario is really bad idea. Particularly in multiuser environment. You'll get your transaction committed and probably several "race condition" data bugs or integrity errors.
It would be appropriate if you had legacy data alredy imported and want to insert new data with ids from sequence. Then you may alter the sequence to move currval to max existing ...
It seems to me that here you want to generate Ids from the sequence. That need not to be done by
select seq.nextval into l_variable from dual;
insert into table (id, ...) values (l_variable, ....);
You can use the sequence directly in the insert:
insert into table values (id, ...) values (seq.nextval, ....);
and optionally get the assigned value back by
insert into table values (id, ...) values (seq.nextval, ....)
returning id into l_variable;
It certainly is possible even for bulk operations with execBatch. Either just creating the ids or even returning them. I am not sure about the right syntax in java but it will be something about the lines
insert into table values (id, ...) values (seq.nextval, ....)
returning id bulk collect into l_cursor;
and you'll be given a ResultSet to browse the assigned numbers.
You can't prevent the implicit commit.
Your solution is not multi user proof. It is perfectly possible that another session will have 'restored' the increment to 1, just as you described.
I would suggest you keep fetching values one by one from the sequence, store these IDs one by one on your list and have the batch execution operate on that list.
What is the reason that you want to fetch a contiguous block of values from the sequence? I would not be too worried about performance, but maybe there are other requirements that I don't know of.
In Oracle, you can use following query to get next N values from a sequence that increments by one:
select level, PDQ_ACT_COMB_SEQ.nextval as seq from dual connect by level <= 5;

select * through dblink

I have some trouble when trying to update a table by looping cursor which select from source table through dblink.
I have two database DB1, DB2.
They are two different database instance.
And I am using this following statement in DB1:
CURSOR TestCursor IS
SELECT a.*, 'A' TEST_COL_A, 'B' TEST_COL_B
FROM rpt.SOURCE#DB2 a;
BEGIN
For C1 in TestCursor loop
INSERT into RPT.TARGET
(
/*The company_name and cust_id are select from SOURCE table from DB2*/
COMPANY_NAME, CUST_ID, TEST_COL_A, TEST_COL_B
)
values
(
C1.COMPANY_NAME, C1.CUST_ID, C1.TEST_COL_A , C1.TEST_COL_B
) ;
End loop;
/*Some code...*/
End
Everything works fine until I add a column "NEW_COL" to SOURCE table#DB2
The insert data got the wrong value.
The value of TEST_COL_A , as I expect, should be 'A'.
However, it contains the value of NEW_COL which i add at SOURCE table.
And the value of TEST_COL_B contains 'A'.
Have anyone encounter the same issue?
It seems like oracle cache the table columns when it compile.
Is there any way to add a column to source table without recompile?
According to this:
Oracle Database does not manage
dependencies among remote schema
objects other than
local-procedure-to-remote-procedure
dependencies.
For example, assume that a local view
is created and defined by a query that
references a remote table. Also assume
that a local procedure includes a SQL
statement that references the same
remote table. Later, the definition of
the table is altered.
Therefore, the local view and
procedure are never invalidated, even
if the view or procedure is used after
the table is altered, and even if the
view or procedure now returns errors
when used. In this case, the view or
procedure must be altered manually so
that errors are not returned. In such
cases, lack of dependency management
is preferable to unnecessary
recompilations of dependent objects.
In this case you aren't quite seeing errors, but the cause is the same. You also wouldn't have a problem if you used explicit column names instead of *, which is usually safer anyway. If you're using * you can't avoid recompiling (unless, I suppose, the * is the last item in the select list, in which case any extra columns on the end wouldn't cause a problem - as long as their names didn't clash).
I recommend that you use a single set processing insert statement in DB1 rather than a row at a time cursor for loop for the insert, for example:
INSERT into RPT.TARGET
select COMPANY_NAME, CUST_ID, 'A' TEST_COL_A, 'B' TEST_COL_B
FROM rpt.SOURCE#DB2
;
Rationale:
Set processing will almost always out perform Row-at-a-time
processing [which is really slow-at-a-time processing].
Set processing the insert is a scalable solution. If the application will need to scale to tens of thousands of rows or millions of rows, the row-at-a-time solution will not likely scale.
Also, using the select * construct is dangerous for the reason you
encountered [and other similar reasons].

select only new row in oracle

I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.
My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.
Do you have idea how to query with good performance?
I am able to add new column if necessary.
What do you think this process should do by?
plsql?
java?
Everyone here is really really close. However:
Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.
Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:
CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
(CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);
You'd then query it using the same expression:
SELECT * FROM MY_TABLE
WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';
The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.
Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.
Then the query should only be for those rows that have been newly added, and not processed.
This should be easily done using sql queries.
Ah, I really hate to add another answer when the others have come so close to nailing it. But
As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.
Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.
The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.
In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.
So, to answer #1, I would add a new column.
To answer #2, I would do it in plsql for performance.
Mark
"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..
You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.
Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).
I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.
If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.
Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.
I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.
The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).
The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).
But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.
If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.
In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.
For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.
If there's not much information in the table, consider making it Index Organized.
What about using Materialized view logs? You have a lot of options to play with:
SQL> create table test (id_test number primary key, dummy varchar2(1000));
Table created
SQL> create materialized view log on test;
Materialized view log created
SQL> insert into test values (1, 'hello');
1 row inserted
SQL> insert into test values (2, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
1 01/01/4000 I N FE
2 01/01/4000 I N FE
SQL> delete from mlog$_test where id_test in (1,2);
2 rows deleted
SQL> insert into test values (3, 'hello');
1 row inserted
SQL> insert into test values (4, 'bye');
1 row inserted
SQL> select * from mlog$_test;
ID_TEST SNAPTIME$$ DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
3 01/01/4000 I N FE
4 01/01/4000 I N FE
I think this solution should work..
What you need to do following steps
For the first run, you will have to copy all records. In first run you need to execute following query
insert into new_table(max_rowid) as (Select max(rowid) from yourtable);
Now next time when you want to get only newly inserted values, you can do it by executing follwing command
Select * from yourtable where rowid > (select max_rowid from new_table);
Once you are done with processing above query, simply truncate new_table and insert max(rowid) from yourtable
I think this should work and would be fastest solution;

Resources