IBM DB2 batch update behavior on duplicate key - jdbc

I am currently writing a Java application and using Batch Insertion in autocommit mode. My question is if I insert 4 rows in a batch and a BatchUpdateException is thrown because the second row of the batch have trigger a Duplicate Key violation! Does the DBC driver continue to process the 2 remaining row leaving the database with 3 inserted rows? Or does it stop at row 2 leaving the database with 1 inserted rows ? Or it rollback the whole batch leaving the database state with 0 inserted rows?

It works as this:
You have the chunk size mentioned in the step. Say for example the chunk size is 10.
So, every time a batch of 10 items will be committed.
Say, in a batch of 10 item, the 4th item throws duplicate key exception as is your case.
In that case, the whole batch will be rejected and the job will stop (if the skip policy is not implemented).
However, all the previous correct chunks which are already committed, will not be rolled back.
To add further, if after removing the incorrect data, if the same job is restarted, then the job will start exactly from the chunk where it errored last.
So, nothing happens to the data already written.

Related

Database read locking

I have a use case where I need to do the following things in one transaction:
start the transaction
INSERT an item into a table
SELECT all the items in the table
dump the selected items into a file (this file is versioned and another program always uses the latest version)
If all the above things succeed, commit the transaction, if not, rollback.
If two transactions begin almost simultaneously, it is possible that before the first transaction A commits what it has inserted into the table (step 4), the second transaction B has already performed the SELECT operation(step 2) whose result doesn't contain yet the inserted item by the first transaction(as it is not yet committed by A, so not visible to B). In this case, when A finishes, it will have correctly dumped a file File1 containing its inserted item. Later, B finishes, it will have dumped another file File2 containing only its inserted item but not the one inserted by A. Since File2 is more recent, we will use File2. The problem is that File2 doesn't contain the item inserted by A even though this item is well in the DB.
I would like to know if it is feasible to solve this problem by locking the read(SELECT) of the table when a transaction inserts something into the table until its commit or rollback and if yes, how this locking can be implemented in Spring with Oracle as DB.
You need some sort of synchronization between the transactions:
start the transaction
Obtain a lock to prevent the transaction in another session to proceed or wait until the transaction in the other session finishes
INSERT an item into a table
SELECT ......
......
Commit and release the lock
The easiest way is to use LOCK TABLE command, at least in SHARE mode (SHARE ROW EXCLUSIVE or EXCLUSIVE modes can also be used, but they are too restrictve for this case).
The advantage of this approach is that the lock is automatically released at commit or rollback.
The disadvantage is a fact, that this lock can interfere with other transactions in the system that update this table at the same time, and could reduce an overall performance.
Another approach is to use DBMS_LOCK package. This lock doesn't affect other transactions that don't explicitely use that lock. The drawaback is that this package is difficult to use, the lock is not released on commit nor rollback, you must explicitelly release the lock at the end of the transaction, and thus all exceptions must be carefully handled, othervise a deadlock easily could occur.
One more solution is to create a "dummy" table with a single row in it, for example:
CREATE TABLE my_special_lock_table(
int x
);
INSERT INTO my_special_lock_table VALUES(1);
COMMIT:
and then use SELECT x FROM my_special_lock_table FOR UPDATE
or - even easier - simple UPDATE my_special_lock_table SET x=x in your transaction.
This will place an exclusive lock on a row in this table and synchronize only this one transaction.
A drawback is that another "dummy" table must be created.
But this solution doesn't affect the other transactions in the system, the lock is automatically released upon commit or rollback, and it is portable - it should work in all other databases, not only in Oracle.
Use spring's REPEATABLE_READ or SERIALIZABLE isolation levels:
REPEATABLE_READ A constant indicating that dirty reads and
non-repeatable reads are prevented; phantom reads can occur. This
level prohibits a transaction from reading a row with uncommitted
changes in it, and it also prohibits the situation where one
transaction reads a row, a second transaction alters the row, and the
first transaction rereads the row, getting different values the second
time (a "non-repeatable read").
SERIALIZABLE A constant indicating that dirty reads, non-repeatable
reads and phantom reads are prevented. This level includes the
prohibitions in ISOLATION_REPEATABLE_READ and further prohibits the
situation where one transaction reads all rows that satisfy a WHERE
condition, a second transaction inserts a row that satisfies that
WHERE condition, and the first transaction rereads for the same
condition, retrieving the additional "phantom" row in the second read.
with serializable or repeatable read, the group will be protected from non-repeatable reads:
connection 1: connection 2:
set transaction isolation level
repeatable read
begin transaction
select name from users where id = 1
update user set name = 'Bill' where id = 1
select name from users where id = 1 |
commit transaction |
|--> executed here
In this scenario, the update will block until the first transaction is complete.
Higher isolation levels are rarely used because they lower the number of people that can work in the database at the same time. At the highest level, serializable, a reporting query halts any update activity.
I think you need to serialize the whole transaction. While a SELECT ... FOR UPDATE could work, it does not really buy you anything, since you would be selecting all rows. You may as well just take and release a lock, using DBMS_LOCK()

Reset cursor position after ResultSet updateRow

For some reason, the Oracle JDBC driver may move the ResultSet cursor somewhere else (not to the same row, and not to the next row) when updateRow is called (I am also inserting and deleting rows). How can I avoid this problem?
Note: The results are ordered by the primary key of the table (I've specified this in the SQL). But I'm increasingly suspecting that the "order by" clause is not working properly.
No where in the documentation says that after an updateRow operation the cursor is moved to the next row...
Sources:
http://docs.oracle.com/cd/B28359_01/java.111/b31224/resltset.htm
http://docs.oracle.com/javase/tutorial/jdbc/basics/retrieving.html#rs_update
http://docs.oracle.com/cd/A97335_02/apps.102/a83724/resltse4.htm
http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#updateRow%28%29
Forward only updatable result sets maintains a cursor which can only move in one direction (forward), and also update rows. This has to be created with concurrency mode ResultSet.CONCUR_UPDATABLE and type ResultSet.TYPE_FORWARD_ONLY.
Note: The default type is ResultSet.TYPE_FORWARD_ONLY.
Visibility of changes
After an update or delete is made on a forward only result set, the result set's cursor is no longer on the row just updated or deleted, but immediately before the next row in the result set (it is necessary to move to the next row before any further row operations are allowed). This means that changes made by ResultSet.updateRow() and ResultSet.deleteRow() are never visible.
If a row has been inserted, i.e using ResultSet.insertRow() it may be visible in a forward only result set.
Conflicting operations
The current row of the result set cannot be changed by other transactions, since it will be locked with an update lock. Result sets held open after a commit have to move to the next row before allowing any operations on it.
Some conflicts may prevent the result set from doing updates/deletes:
If the current row is deleted by a statement in the same transaction, calls to ResultSet.updateRow() will cause an exception, since the cursor is no longer positioned on a valid row.
This problem was due to user error. I had another copy of the application running on a different machine, which I had forgotten about, and that was changing the database at the same time.

JDBC batch creation in Sybase

I have a requirement of updating a table which has about 5 million rows.
So for that purpose i want to create batch statements in java and update as a bulk operation.
Righht now I have 100 batches aand it works fine.But when i increase the number of batches over hundred i get an exceptio as : com.sybase.jdbc2.jdbc.SybBatchUpdateException: JZ0BE: BatchUpdateException: Error occurred while executing batch statement: Message empty.
How can i have more batch statements in my CallableStatement object.
Not enough reputation to leave comments...but what types of statements are you batching? how many of these rows are you updating? Does the table have a primary key? How many columns in the table, and how many of those columns are you updating?
Generic answer:
The JDBC framework in sybase is extremely fast. You might at least consider writing a simple procedure that receives the primary key (or other) information you're using to identify the row, along with the new values that row will be updated to as input variables. this procedure will update a single row only.
Wrap this procedure in it's own java method that handles the callablestatement, register your out error number and error message params, etc.
Then you can loop through whatever constructs you're using now to update data, and use the same java method to call the procedure to update the values row by row.
Again, i don't know the volume of what you're trying to do...but I do know if you're trying to do single row updates, this will be VERY fast.

Finding all statements involved in a deadlock from an Oracle trace file?

As I understand it, the typical case of a deadlock involving row-locking requires four SQL statements. Two in one transaction to update row A and row B, and then a further two in a separate transaction to update the same rows, and require the same locks, but in the reverse order.
Transaction 1 gets the lock on row A before transaction 2 can request it, transaction 2 gets the lock on row B before transaction 1 can get it, and neither can get the remaining required locks. One or either transaction has to be rolled back, so the other can complete.
When I review an Oracle trace file after a deadlock, it only seems to highlight two queries. These seem to be the last one out of each transaction.
How can I identify the other statements involved in each transaction, or is this missing in an Oracle trace file?
I can include relevant bits of the specific trace file if required.
You're correct, in a typical row-level deadlock, you'll have session 1 execute sql_a that will lock row 1. Then session 2 will execute sql_b that will lock row 2. Then session 1 will execute sql_c to attempt to lock row 2, but session 2 has not committed, and so session 1 starts waiting. Finally, session 2 comes along, and it issues sql_d, attempting to lock row 1, but, since session 1 holds that lock, it starts waiting. Three seconds later, the deadlock is detected, and one of the sessions will catch ORA-00060 and the trace file is written.
In this scenario, the trace file will contain sql_c and sql_d, but not sql_a or sql_b.
The problem is that information just really isn't available anywhere. Consider that you execute a DML, it starts a transaction if one doesn't exist, generates a bunch of undo and redo, and the change is made. But, once that happens, the session is no longer associated with that SQL statement. There's really no clean way to go back and find that information.
sql_c and sql_d, on the other hand, are the statements that were associated with those sessions when the deadlock occurred, so, clearly, Oracle can identify them, and include that in the trace file.
So, you're correct, the information about sql_a and sql_b is not in the trace, and it's really not readily available.
Hope that helps.

Select from table in Oracle with multiple sessions

I have multiple threads that are processing rows from the same table that is - in fact - a queue.
I want that each row will be handled by one thread only. So, I've added a column - "IsInProccess", and in the threads SELECT statement I've added "WHERE IsInProccess = 0".
In addition I use "SELECT FOR UPDATE" so after a thread get a row from the table no other thread will get it before it puts 1 in "IsInProccess".
The problem is that I have many threads and in many times the following scenario happens:
Thread A selecting by "SELECT FOR UPDATE" form the table and getting row no. 1. Before it changes IsInProccess to 1, thread B selects in the same way from the table and get row no 1 too. Oracle save row no. 1 to thread A session and thread B can't change this row and return an error - "Fail to be fetched".
I want that when some thread select from the table Oracle will return rows that are no saved to other open session.
Can I do that?
This is a sketch of a solution that I've seen used very successfully before:
Use the SELECT FOR UPDATE NOWAIT syntax so that if the session cannot get a lock on a row immediately it raises an exception instead of waiting for the lock. The exception handler could wait a few seconds (e.g. with dbms_lock.sleep); the whole block could be wrapped in a loop which tries again a certain number of times before it gives up.
Add WHERE ROWNUM<=n to the query so that it only tries to get a certain number of rows (e.g. 1) at a time; if it succeeds, update the rows as "in process".
A good way to mark rows as "in process" (that I've seen used successfully) is to have two columns - SID and SERIAL# - and update these columns with the SID and SERIAL# for the current session.
In case a session fails while it has the rows marked as "in process", another process could "clean up" the rows that were marked as "in process" by searching for any rows that have SID/SERIAL# that are not found as active sessions in v$session.
Oracle have already solved this for you: use the Advanced Queueing API
If you have 11g, look at SKIP LOCKED
It is there, but undocumented (and therefore unsupported and maybe buggy) in 10g.
That way, when Session A locks the row, Session B can skip it and process the next.
One solution:
Put the select and update queries into a transaction. This sort of problem is exactly why transactions were invented.
You also need to worry about "orphan" rows - e.g. a thread picks up the row and then dies without finishing the work. To solve that, one solution is to have 2 columns: "IsInProcess" and "StartprocessingTime".
isInProcess will have 3 values: 0 (not processed), 1 (picked up), 2 (done).
The original transaction will set the row's "isInProcess" to 1 and "StartprocessingTime" to something else, but the select will also add this to where clause (assuming you can specify a valid timeout period)
"WHERE isInProcess = 0 OR (isInProcess = 1 AND StartprocessingTime < getdate()-timeout)".
Please note that the syntax above is not ORACLE, just pseudo-code.

Resources