Spring int-jdbc:inbound-channel-adapter transaction - spring

I have gone through this link
spring integration jdbc adapter for multiple nodes.Which is quite helpful.I have doubt on below point.
I have multi thread environment(Multiple Nodes),where a select query which has n rows eligible,but I have configure max-rows-per-poll=5,followed by a update for these 5 records.
Poller is configure with transaction.
while these 5 records are processed bye one thread in one node,all other thread will wait or they will pick 5 records each from n-5 records and process ?
I am using int-jdbc:inbound-channel-adapter and Oracle Database.

You need to read about a difference between max-messages-per-poll and max-rows: https://docs.spring.io/spring-integration/docs/5.0.7.RELEASE/reference/html/jdbc.html#jdbc-max-rows-per-poll-versus-max-messages-per-poll.
Also for Oracle I would recommend to use FOR UPDATE SKIP LOCKED if you really would like to get new records and don’t wait for already locked.

Related

What is the best approach while pooling data from DB and query DB again to fetch additional information?

The spring boot application that I am working on
pools 1000 messages from table X [ This table X is populated by another service s1]
From each message get the account number and query table Y to get additional information about account.
I am using spring integrating to pool messages from table X and reading additional information for account, I am planning to use Spring JDBC.
We are expecting about 10k messages very day.
Is above approach, to query table Y for each message, a good approach ?
No, that indeed not. If all of that data is in the same database, consider to write a proper SELECT to join those tables in a single query performed by that source polling channel adapter.
Another approach is to implement a stored procedure which will do that job for you and will return the whole needed data: https://docs.spring.io/spring-integration/reference/html/jdbc.html#stored-procedures.
Although if the memory for that number of records to handle at once is a limit in your environment or you don't care how fast all of them are processed, then indeed an integration flow with parallel processing of splitted polling result is OK. For that goal you can use a JdbcOutboundGateway as a service in your flow instead of playing with plain JdbcTemplate: https://docs.spring.io/spring-integration/reference/html/jdbc.html#jdbc-outbound-gateway

JdbcBatchItemWriterBuilder vs org.springframework.jdbc.core.jdbcTemplate.batchUpdate

I understand jdbcTemplate.batchUpdate is used for sending several records to data base in one communication.
Lets say i have 1000 records to be updated, instead of 1000 communications from Application to database, the Application will send 1000 records in request.
Coming to JdbcBatchItemWriterBuilder its combination of Tasks in a job.
My question is, if there is 1000 records to be processed(INSERT statements) via JdbcBatchItemWriterBuilder, all INSERTS executed in one go? or one after one?
If one after one, connecting to database 1000 times using JdbcBatchItemWriterBuilder causes perf issues? hows that handled?
i would like to understand if Spring batch performs better than running 1000 INSERT staments using jdbcTemplate.update ?
The JdbcBatchItemWriter uses java.sql.PreparedStatement#addBatch and java.sql.Statement#executeBatch internally (See https://github.com/spring-projects/spring-batch/blob/c4010fbffa6b71cbcfe79d523023251ce73666a4/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/database/JdbcBatchItemWriter.java#L189-L195), so there will be a single batch insert for all items of the chunk.
Moreover, this will be executed in a single transaction as described in the Chunk-oriented Processing section of the reference documentation.

How to lock on select and release lock after update is committed using spring?

I have started using spring from last few months and I have a question on transactions. I have a java method inside my spring batch job which first does a select operation to get first 100 rows with status as 'NOT COMPLETED' and does a update on the selected rows to change the status to 'IN PROGRESS'. Since I'm processing around 10 million records, I want to run multiple instances of my batch job and each instance has multiple threads. For a single instance, to make sure two threads are not fetching the same set of records, I have made my method as synchonized. But if I run multiple instances of my batch job (multiple JVMs), there is high probability that same set of records might be fetched by both the instances even if I use "optimistic" or "pesimistic lock" or "select for update" since we cannot lock records during selection. Below is the example shown. Transaction 1 has fetched 100 records and meanwhile Transaction2 also fetched 100 records but if I enable locking transaction 2 waits until transaction 1 is updated and committed. But Transaction 2 again does the same update.
Is there any way in spring to make transaction 2's select operation to wait until transaction 1's select is completed ?
Transaction1 Transaction2
fetch 100 records
fetch 100 records
update 100 records
commit
update 100 records
commit
#Transactional
public synchronized List<Student> processStudentRecords(){
List<Student> students = getNotCompletedRecords();
if(null != students && students.size() > 0){
updateStatusToInProgress(students);
}
return student;
}
Note: I cannot perform update first and then select. I would appreciate if any alternative approach is suggested ?
Transaction synchronization should be left to the database server and not managed at the application level. From the database server point of view, no matter how many JVMs (threads) you have, those are concurrent database clients asking for read/write operations. You should not bother yourself with such concerns.
What you should do though is try to minimize contention as much as possible in the design of your solution, for example, by using the (remote) partitioning technique.
if I run multiple instances of my batch job (multiple JVMs), there is high probability that same set of records might be fetched by both the instances even if I use "optimistic" or "pesimistic lock" or "select for update" since we cannot lock records during selection
Partitioning data will by design remove all these problems. If you give each instance a set of data to work on, there is no chance that a worker would select the same of records of another worker. Michael gave a detailed example in this answer: https://stackoverflow.com/a/54889092/5019386.
(Logical) Partitioning however will not solve the contention problem since all workers would read/write from/to the same table, but that's the nature of the problem you are trying to solve. What I'm saying is that you don't need to start locking/unlocking the table in your design, leave this to the database. Some database severs like Oracle can write data of the same table to different partitions on disk to optimize concurrent access (which might help if you use partitioning), but again that's Oracle's business, not Spring's (or any other framework) business.
Not everybody can afford Oracle so I would look for a solution at the conceptual level. I have successfully used the following solution ("Pseudo" physical partitioning) to a problem similar to yours:
Step 1 (in serial): copy/partition unprocessed data to temporary tables (in serial)
Step 2 (in parallel): run multiple workers on these tables instead of the source table with millions of rows.
Step 3 (in serial): copy/update processed data back to the original table
Step 2 removes the contention problem. Usually, the cost of (Step 1 + Step 3) is neglectable compared to Step 2 (even more neglectable if Step 2 is done in serial). This works well if the processing is the bottleneck.
Hope this helps.

Processing of Million records using Spring Bach including Pattern Matching

My use case is as follows:
1) Read 20 Million records from Db2 databse and read the filter criteria from Db2 where it involves with multiple columns and some of the columns has patterns like Column A with value %EMP%.
2) Now for each combination of the rules filter the data on 20M dataset and at the same time update the database column which has a flag indicating this record is filtered out.
3) At the end of the process, we will invoke a Informatica workflow which will take the unfiltered records for 20 M and process it.
We do not want to have the filtering logic on Informatica as it would be expensive so looking for an option to do it using Spring Batch where we can span multiple threads and run the filtering logic.
I am not sure if the Spring Batch is the right candidate for this. But I need some suggestions if I need to implement this on Java.
Please suggest
You should consider using Camel routes and Spring Boot.
You could use a Camel JPA consumer to place the records on an ActiveMQ queue. Use an JMS consumer with multiple consumers to process the records.
Use an aggregation strategy to invoke your Informatica.
I haven't used Spring Batch so I can't say if its a better solution but Spring Boot and Camel are pretty sweet.

BPEL for data synchronization

I am trying to use Oracle SOA BPEL to synch data of about 1000 employees between an HR service and our local db. I get IDs of all employees with a findEmp call and loop through it empCount times to getEmp(empID) from the same HR service and update/insert into our db in every loop. This times out after about 60 odd employees, though this process is an asynch process. How should I redesign the process flow?
The timeout is occuring because you don't have any dehydration points in your BPEL code. Oracle BPEL needs to dehydrate before the Java transaction times out.
If you are using the Oracle BPEL DB Adapter, you can actually submit many objects at once for processing to the database, simply put more than one in the element from the DB Adapter. This may help a lot, since you can fetch all your data at once, then write it all at once.
Additionally, you can extend the transaction timeout for Oracle BPEL- it's a configuration parameter in transaction-manager.xml (there's also some tweaks to the EJB timeouts you need to do on 10.1.3.3.x & 10.1.3.4.x). The Oracle BPEL docs tell you how to change this variable.

Resources