I'm working on a Spring Batch application that was running into a DB2 deadlock while using a default JdbcCursorItemReader. When the batch job ran into an error, we had set up a SkipListener to write an "Error" status to the relevant row, which is when the deadlock occurred.
We found that by using a default JdbcPagingItemReader, we were able to avoid the deadlock scenario, though we aren't exactly sure why this is the case.
My understanding of Spring Batch is that either Reader should have freed up the lock on the database once the ResultSet was read in from the query, but this didn't appear to be happening with the JdbcCursorItemReader.
Would anyone be able to help me understand why this is the case?
Thanks!
The JdbcCursorItemReader will maintain a position (Cursor) within the database so it knows where to read from next. This Cursor is maintained by a Lock. The JdbcPageItemReader appears to be submitting queries requesting data from a known start and end point such that it only reads the data between these two points and does not require locks between calls.
Related
I need to read from a DB and based on that result I need to fetch data from another DB which is on another server and after need to write it in file. Now solution that came in mind to use Spring Batch reader for reading from first DB and using we can read from 2nd DB in process.
But in this process what I feel that in process reading is not good idea because it processes single data in one time. (Please correct me if I am wrong)
Is there any other way to do this so that we can perform this task in efficient way.
Thanks in advance
Please Suggest me what could be the options
Using Spring boot and JPA/hibernate , I'm looking for a solution to avoid a table record being read by another process while I'm reading then updating an entity. Isolation levels Dirty read, Nonrepeatable read and Phantom read are not so clear for me. I mean if process #1 starts a read/update i don't want a process #2 to be able to read the old value (before updated by #1) and then update the structure with wrong values.
Isolation levels all prevent reading changes in different levels of strictness:
Dirty Read -> reading not yet committed changes
Nonrepeatable read -> querying the same row second time finds data changes
Phantom read -> like previous but instead of data changes, it finds more data added (more here)
Serializable level, being the strictest, would prevent reading any changes yet, essentially resulting in sequential processing in DB (no concurrency) and would probably solve your problem
What you are looking for, if I understood correctly, is to block second process from doing any work until row update is complete - that is called row locking, and can be controlled directly as well (without setting serializable isolation)
See more about row locking with Spring JPA here: https://www.baeldung.com/java-jpa-transaction-locks
If it wasn't different process (different program) but just a different thread within the same Java program a simple synchronized would do the trick as well.
Is there a way to redefine the database "transactional" boundary on a spring batch job?
Context:
We have a simple payment processing job that reads x number of payment records, processes and marks the records in the database as processed. Currently, the writer does a REST API call (to the payment gateway), processes the API response and marks the records as processed. We're doing a chunk oriented approach so the updates aren't flushed to the database until the whole chunk has completed. Since, basically the whole read/write is within a transaction, we are starting to see excessive database locks and contentions. For example, if the API takes a long time to respond (say 30 seconds), the whole application starts to suffer.
We can obviously reduce the timeout for the API call to be a smaller value.. but that still doesn't solve the issue of the tables potentially getting locked for longer than desirable duration. Ideally, we want to keep the database transaction as short lived as possible. Our thought is that if the "meat" of what the job does can be done outside of the database transaction, we could get around this issue. So, if the API call happens outside of a database transaction.. we can afford it to take a few more seconds to accept the response and not cause/add to the long lock duration.
Is this the right approach? If not, what would be the recommended way to approach this "simple" job in spring-batch fashion? Are there other batch tools better suited for the task? (if spring-batch is not the right choice).
Open to providing more context if needed.
I don't have a precise answer to all your questions but I will try to give some guidelines.
Since, basically the whole read/write is within a transaction, we are starting to see excessive database locks and contentions. For example, if the API takes a long time to respond (say 30 seconds), the whole application starts to suffer.
Since its inception, the term batch processing or processing data in "batches" is based on the idea that a batch of records is treated as a unit: either all records are processed (whatever the term "process" means) or none of the records is processed. This "all or nothing" semantic is exactly what Spring Batch implements in its chunk-oriented processing model. Achieving such a (powerful) property comes with trade-offs. In your case, you need to make a trade-off between consistency and responsiveness.
We can obviously reduce the timeout for the API call to be a smaller value.. but that still doesn't solve the issue of the tables potentially getting locked for longer than desirable duration.
The chunk-size is the most impactful parameter on the transaction behaviour. What you can do is try to reduce the number of records to be processed within a single transaction and see the result. There is no best value, this is an empirical process. This will also depend on the responsiveness of the API you are calling during the processing of a chunk.
Our thought is that if the "meat" of what the job does can be done outside of the database transaction, we could get around this issue. So, if the API call happens outside of a database transaction.. we can afford it to take a few more seconds to accept the response and not cause/add to the long lock duration.
A common technique to avoid doing such updates on a live system is to offload the processing against another datastore and then replicate the updates in a single transaction. The idea is to mark records with a given batch id and copy those records to a different datastore (or even a temporary table within the same datastore) that the batch process can use without impacting the live datastore. Once the processing is done (which could be done in parallel to improve performance), records can be marked as processed in the live system within in a single transaction (this is usually very fast and could be based on the batch id to identify which records to update).
Am currently working with spring batch for the first time. In spring batch i've set commit level to 1000 which gave me better performance but now I ve the issues in identifying the corrupt or exception item. We need to send mail update with the record line or item number with the exception data.
I tried item listener, chunk listener, step listener and job listener but am not able to figure out how to get those information from execution listener context while generating mail in job listener. Am able to get the information about exception and not able to track which record has the issue and item count in the chunk.
For example, if I have 1000 lines in file or db and commit level 100. If we have issue in 165 item. I need to get the line number as 165 in any listener so I can attach that in context to populate logging info to have a quick turn around time to fix the issue before reprocessing.
I Searched but I couldn't get suggestion or idea. I believe this will be a common problem in chunk commit greater than 1. Please suggest the better way to handle.
Thanks in advance
You'll want to perform the checks that can cause an issue in the processor, and create an error item out of them which will get persisted to its own table/file. Some errors are unavoidable, and unfortunately you'll need to do manual debugging within that chunk.
Edit:
To find the commit range, you would need to preserve order. If using a FlatFileItemReader, it will store the line for you if your POJO implements ItemCountAware. If running against a DB, you'll want to make sure the query preserves order with an order by on the unique index. Then you'll be able to track the chunk down by checking where read_count from the batch_step_execution table.
You can enable skipping. Spring Batch processes each item of a chunk again in a separate transaction after a chunk fails due to a skippable exception. It detects the item that caused the exception in this way.
I am using mongodb to store user's events, there's a document for every user, containing an array of events. The system processes thousands of events a minute and inserts each one of them to mongo.
The problem is that I get poor performance for the update operation, using a profiler, I notice that the WriteResult.getError is the one that incur the performance impact.
That makes sense, the update is async, but if one wants to retrieve the operation result he needs to wait until the operation is completed.
My question, is there a way to keep the update async, but only get an exception if error occurs (99.999 of the times there is no error, so the system waits for nothing). I understand it means the exception will be raised somewhere further down the process flow, but I can live with that.
Any other suggestions?
The application is written in Java so we're using the Java driver, but I am not sure it's related.
have you done indexing on your records?
it may be a problem to your performance.
if not done before you should do Indexing on ur collection like
db.collectionName.ensureIndex({"event.type":1})
for more help visit http://www.mongodb.org/display/DOCS/Indexes