How to handle binary(Blob) data in Spring Batch application - oracle

My requirement is reading data from a Database aggregate it and convert to bytes then stream it to another database(Oracle) in a Blob column.
Oracle requires to disable JDBC autocommit to stream to a Blob column and call Connection#Commit when finished.
I currently have 3 steps.
Step 1(Tasklet):
It has two SQL queries. one to initialize the column (UPDATE DATABASEUSER.TABLENAME SET payload = empty_blob() WHERE PrimaryKey= ?)
the second one returns the Blob locator (SELECT payload AS payload FROM DATABASEUSER.TABLENAME WHERE PrimaryKey = ? FOR UPDATE)
I also get the connection object from the datasource to disable autocommit
Step 2(Chuck)
I have an IteamReader that reads data from the source DB in a generic way and a Processor that takes converts the rows to a CSV format but in bytes. Then I have a Custom ItemWriter to stream the data to the Blob column.
Step 3(Tasklet)
This is when I cleanup and commit the connection.
Question 1: Is this the correct strategy? Appreciate any direction as I'm kinda unsure

I solved it.
I used the ResourcelessTransactionManager transaction manager in all my steps. In step 1 I get a connection from the datasource, disable autocommit and call commit on the final step. I use the same connection in all steps.

Related

Failure loading parquet in Synapse Analytics - INT mapped as UTF8

We have an on-premise Oracle database from which we need to extract data and store this in a Synapse dedicated pool. I have created a Synapse pipeline which first copies the data from Oracle to a datalake in a parquet file, which should then be imported into Synapse using a second copy task.
The data from Oracle is extracted through a dynamically created query. This query has 2 hard-coded INT values which are generated at runtime. The query runs fine and the parquet file is created correctly, but if I use polybase or copy command to import the file to Synapse it fails with the following error:
"errorCode": "2200",
"message": "ErrorCode=UserErrorSqlDWCopyCommandError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=SQL DW Copy Command operation failed with error 'HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ',Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ,},],'",
Bulk insert works but is less efficient on large quantities of data so I don't want to use that.
The mapping for the copy activities is created dynamically based on the target database table definition. However, when I created a separate copy task and import the mapping to check what is going on, I noticed that the 2 INT columns are mapped as UTF8 on the parquet source side. The sink table is INT32. When I exclude both columns the copy task completes successfully. It seems that the copy activity fails because it cannot implicitly cast a string to an integer.
The 2 columns are explicitly cast as integers in the Oracle query that is the source for the parquet file.
SELECT t.*
, CAST(419 AS INT) AS "Execution_id"
, CAST(4832 AS INT) AS "Task_id"
, TO_DATE('2022-07-05 14:40:34', 'YYYY-MM-DD HH24:MI:SS') AS "ProcessedDTS"
, t.DEMUTDT AS "EffectiveDTS"
FROM CBO.DRKASTR t
WHERE DEMUTDT >= TO_DATE('2022-07-05 13:37:35', 'YYYY-MM-DD HH24:MI:SS');
Adding an explicit mapping for Oracle to parquet mapping them as INT also doesn't solve the problem.
How do I prevent these 2 columns from being interpreted as integers instead of strings?!
We ended up resolving this by first importing the data as strings in the database and casting to the correct database during further processing.

Why Update using #Query in Spring Data Jpa requires #Transactional

I am new to spring boot. I use spring data jpa to deal with database. I have a method to update a table in the database using #Query. But when I try to update I get an exception of invalidDataAccessApiUsageException. when I tried it with #Transactional it gets updated successfully. Aren't updates a single operation so wouldn't it get committed automatically.
There are 2 ways in which transaction execute in SQL
Implicit -> One that means database, while running the write query(UPDATE, INSERT ....), create an isolation and then execute. If an error occurs, the isolation is discarded and no change will be written.
Explicit -> In this you explicitly specify the isolation start using BEGIN , discarding by ROLLBACK and finally writing by COMMIT
Initial versions of postgres (<7.4) had a configuration to called AUTOCOMMIT which when set off, will DISABLE implicit transactions. But this was disabled in 2003 since the databases were smart enough to discard isolations and not create inconsistencies.
In a nutshell at any point running following queries
UPDATE table_name WHERE id IN (....)
or
BEGIN
UPDATE table_name WHERE id IN (....)
COMMIT
are EXACTLY the same.
In JPA autocommit is now just a runtime validation for write queries.

Update table on Spring Batch Failure

I am using spring batch to update employee status based on input received from thirdparty API. Can anyone help me how can I update status of employee in EMPLOYEE table if step fails with some exception and overall job status to FAILED to my table instead of spring batch tables?
You can proceed in two steps:
step1 (tasklet): make the rest call and save the result in a file (remove the file after the job if necessary)
step2 (chunk-oriented): read employee items and update their statuses in the database
For the writer, you can use a JdbcBatchItemWriter configured with a sql statement like: update table employee set status = ? where id = ?.
As per the step failure question, if any exception occurs during the processing of a chunk, the transaction will be rolled back and no updates will be committed to the database. More details about this in the reference documentation here.
Hope this helps.

oracle two different session

in our work we create two .net listener,
first one:
calling oracle stored procedure that insert bulk of data into table(table1) using insert into select syntax:
insert into table1 select c1,c2... from tbl2 inner join tbl3....
then we use explicity commit;
second listener:
calling oracle procedure that reading data inserted into table1 via listener1
but we notice that even the record inserted into table1 listener2 couldn't see that recordat same time even that commit is use.
my question is how does cmmit work when we use insert ...select?
is this issue related to session?when listener 1 session end listener 2 can read data?
please help,
thank in advance.
You're using the wrong terms...
A listener is a server application that listens to the incoming client requests and hands it to the DB engine. A listener is not being used on the client end.
A session is not related to the data you can see, a transaction is the object that controls that.
Oracle works in a very clear way - After a transaction has committed - all the new transactions can see it, and already existing transactions can see the new content based on it transaction configurations..
I recommend you reading about isolation levels in that context http://msdn.microsoft.com/en-us/library/system.transactions.isolationlevel(v=vs.110).aspx
By default - the moment (and in DB it is defined by SCN) a transaction have been committed - the data is visible to the client.
Bottom line - your issue is related either to transaction isolation levels (in the case the reading transaction started before the commit), or to the writer, which does not commit the data when you think it is (a transaction issue).
After the call to transaction.Commit() in .net returned - the data is already visible, and other transactions are seeing it.
You're second question was how commit works.
This is a very complicated process in Oracle, so I'll give a really short description:
1. When you commit, Oracle first runs some verifications before the commit itself (for example, runs the deferred constraints).
2. After oracle knows it can safely commit the changes it gets the system time (SCN) , write the commit itself to the redo log, and flushes the data to disk (for consistency).
3. Sends an ACK to the user, that the data is already visible to the world.
4. marks the buffers been used as free.
Something I want to add, just to make sure (and I'm writing it half a sleep - so excuse me if it does not compile...)
In you're .net code - your code should be logically equivalent to it:
OracleConnection con = new OracleConnection(connStr);
con.Open();
OracleTransaction trans = con.BeginTransaction();
OracleCommand cmd = con.CreateCommand();
cmd.Connection = cmd;
cmd.CommandText = "insert into ...";
cmd.ExecuteNonQuery();
cmd.Dispose();
trans.Commit();
trans.Dispose();
con.Close();
con.Dispose();
and if you're using LINQ - make sure you create the transaction scope on the right area.

JDBC batch creation in Sybase

I have a requirement of updating a table which has about 5 million rows.
So for that purpose i want to create batch statements in java and update as a bulk operation.
Righht now I have 100 batches aand it works fine.But when i increase the number of batches over hundred i get an exceptio as : com.sybase.jdbc2.jdbc.SybBatchUpdateException: JZ0BE: BatchUpdateException: Error occurred while executing batch statement: Message empty.
How can i have more batch statements in my CallableStatement object.
Not enough reputation to leave comments...but what types of statements are you batching? how many of these rows are you updating? Does the table have a primary key? How many columns in the table, and how many of those columns are you updating?
Generic answer:
The JDBC framework in sybase is extremely fast. You might at least consider writing a simple procedure that receives the primary key (or other) information you're using to identify the row, along with the new values that row will be updated to as input variables. this procedure will update a single row only.
Wrap this procedure in it's own java method that handles the callablestatement, register your out error number and error message params, etc.
Then you can loop through whatever constructs you're using now to update data, and use the same java method to call the procedure to update the values row by row.
Again, i don't know the volume of what you're trying to do...but I do know if you're trying to do single row updates, this will be VERY fast.

Resources