How can I define a "global" job variable that each processor can read/update using Spring Batch? - spring

I have a Spring Batch job with a reader/processor/writer that reads a batch of EmailQueue records, processes/sends them, and then writes the results (success, fail) back into the EmailQueue database table. However, if during the job I have 5+ emails that fail to send (e.g. because the email API is down), I would like the processor to not attempt the send, but instead, mark the remaining EmailQueue objects as "failed" - and then store back in to the database with the writer. I would like my processor look something like the one below, but I can't figure out how to have a "global" monitor for the job that the processor can access.
It may be important to note that my appUserEmailSender.send(emailQueue) method doesn't throw an error if the email failed to send, it only stores the results in the EmailQueue object itself so I can write the results back into the EmailQueue db table.
public EmailQueue process(#NonNull EmailQueue emailQueue) {
// can this variable be defined globally for each job somewhere???
int emailFailSendCount = 0;
// if fail count less than 5, attempt to send email
if (emailFailSendCount<5) {
// send the email
EmailQueue result = appUserEmailSender.send(emailQueue);
// If failed, increase fail count
if (EmailQueueState.FAILED == result.getEmailQueueState()) {
emailFailSendCount++;
}
// if fail count > 5, don't attempt to send, just mark as "failed"
} else {
emailQueue.setEmailQueueState(EmailQueueState.FAILED);
}
return emailQueue;
}
Clearly the above code wouldn't work, but my question is can I define a "global" emailFailSendCount variable that each process can read or update on each processing step?

Related

Spring Boot JPA save() method trying to insert exisiting row

I have a simple kafka consumer that collects events and based on the data in them inserts or updates a record in the database - table has a unique ID constraint on ID column and also in the entity field.
Everything works fine when the table is pre-populated and inserts happen every now and then. However when i truncate the table and send a couple thousand events with limited number of ID (i was doing 50 unique ID within 3k events) then events are processed simultaneously and the save() method randomly fails with Unique constraint violation exception. I debugged it and the outcome is pretty simple.
event1={id = 1 ... //somedata} gets picked up, service method saveOrUpdateRecord() looks for the record by ID=1, finds none, inserts a new record.
event2={id = 1 ... //somedata} gets picked up almost at the same time, service method saveOrUpdateRecord() looks for the record by ID=1, finds none (previous one is mid-inserting), tries to insert and fails with constraint violation exception - should find this record and merge it with the input from the event based on my conditions.
How can i get the saveOrUpdateRecord() to run only when the previous one was fully executed to prevent such behaviour? I really dont want to slow kafka consumer down with poll size etc, i just want my service to execute one transaction at a time.
The service method:
public void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}
Will #Transactional annotaion on method do the job?
Make your service thread safe.
Use this:
public synchronized void saveOrUpdateRecord(Object input) {
Object output = repository.findById(input.getId));
if (output == null) {
repository.save(input);
} else {
mergeRecord(input, output);
repository.save(output);
}
}

Stop KafkaListener ( Spring Kafka Consumer) after it has read all messages till some specific time

I am trying to schedule my consumption process from a single partition topic. I can start it using endpointlistenerregistry.start() but I want to stop it after I have consumed all the messages in current partition i.e. when I reach to last offset in current partition. Production into the topic is done after I have finished the consumption and close it. How should I achieve the assurance that I have read all the messages till the time I started scheduler and stop my consumer ? I am using #Kafkalistener for consumer.
Set the idleEventInterval container property and add an #EventListener method to listen for ListenerContainerIdleEvents.
Then stop the container.
To read till the last offset, you simply poll till you are getting empty records.
You can invoke kafkaConsumer.pause() at the end of consumption. During next schedule it is required to invoke kafkaConsumer.resume().
Suspend fetching from the requested partitions. Future calls to poll(Duration) will not return any records from these partitions until they have been resumed using resume(Collection). Note that this method does not affect partition subscription. In particular, it does not cause a group rebalance when automatic assignment is used.
Something like this,
List<TopicPartition> topicPartitions = new ArrayList<>();
void scheduleProcess() {
topicPartitions = ... // assign partition info for this
kafkaConsumer.resume(topicPartitions)
while(true) {
ConsumerRecords<String, Object> events = kafkaConsumer.poll(Duration.ofMillis(1000));
if(!events.isEmpty()) {
// processing logic
} else {
kafkaConsumer.pause(List.of(topicPartition));
break;
}
}
}

Spring Boot Manual Acknowledgement of kafka messages is not working

I have a spring boot kafka consumer which consume data from a topic and store it in a Database and acknowledge it once stored.
It is working fine but the problem is happening if the application failed to get the DB connection after consuming the record ,in this case we are not sending the acknowledgement but still the message never consumed until or unless we change the group id and restart the consumer
My consumer looks like below
#KafkaListener(id = "${group.id}", topics = {"${kafka.edi.topic}"})
public void onMessage(ConsumerRecord record, Acknowledgment acknowledgment) {
boolean shouldAcknowledge = false;
try {
String tNo = getTrackingNumber((String) record.key());
log.info("Check Duplicate By Comparing With DB records");
if (!ediRecordService.isDuplicate(tNo)) {---this checks the record in my DB
shouldAcknowledge = insertEDIRecord(record, tNo); --this return true
} else {
log.warn("Duplicate record found.");
shouldAcknowledge = true;
}
if (shouldAcknowledge) {
acknowledgment.acknowledge();
}```
So if you see the above snippet we did not sent acknowledgment.
That is not how kafka offset works here
The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
From the above statement For example, from the first poll consumer get the message at offset 300 and if it failed to persist into database because of some issue and it will not submit the offset.
So in the next poll it will get the next record where offset is 301 and if it persist data into database successfully then it will commit the offset 301 (which means all records in that partitions are processed till that offset, in above example it is 301)
Solution for this : use retry mechanism until it successfully stores data into database with some limited retries or just save failed data into error topic and reprocess it later, or save the offset of failed records somewhere so later you can reprocess them.

when api call fails on a laravel job

I have a background job that fetches data from google adwords. Now my issue is when I fetch the data using a background worker.
When the response its empty what it the best thing to do is there any way to re run again or what is the best approach ?
public function handle()
{
$googleService = new GoogleAds;
$data = $googleService->report()
->from('CRITERIA_PERFORMANCE_REPORT')
->during('20170101', '20170210')
->select('CampaignId, Id, Criteria, IsNegative, Clicks, Ctr, Cost, Labels')
->getObject();
if(!isset($data->result) || empty($data->result)){
//what to do when no data back ?
}
$this->transform->response($data);
}
You can throw an exception, then it will go back to your queue, and the worker will try to execute it again.
When you launch your worker, there is a --tries parameter that indicates how many time it will try to execute before it goes to the table failed_jobs.
You can check the reference in the official documentation.

How can I receive real-time updates from a long asynchronous process?

I'm writing a small, internal web application that reads in form data and creates an excel file which then gets emailed to the user.
However, I'm struggling to understand how I can implement real-time updates for the user as the process is being completed. Sometimes the process takes 10 seconds, and sometimes the process takes 5 minutes.
Currently the user waits until the process is complete before they see any results - They do not see any updates as the process is being completed. The front-end waits for a 201 response from the server before displaying the report information and the user is "blocked" until the RC is complete.
I'm having difficulty understanding how I can asynchronously start the Report Creation (RC) process and at the same time allow the user to navigate to other pages of the site. or see updates happening in the background. I should clarify here that the some of the steps in the RC process use Promises.
I'd like to poll the server every second to get an update on the report being generated.
Here's some simple code to clarify my understanding:
Endpoints
// CREATE REPORT
router.route('/report')
.post(function(req, res, next) {
// Generate unique ID to keep track of report later on.
const uid = generateRandomID();
// Start report process ... this should keep executing even after a response (201) is returned.
CustomReportLibrary.createNewReport(req.formData, uid);
// Respond with a successful creation.
res.status(201);
}
}
// GET REPORT
router.route('/report/:id')
.get(function(req, res, next){
// Get our report from ID.
let report = CustomReportLibrary.getReport(req.params.id);
// Respond with report data
if(report) { res.status(200).json(report); }
else { res.status(404); }
}
CustomReportLibrary
// Initialize array to hold reports
let _dataStorage = [];
function createNewReport(data, id) {
// Create an object to store our report information
let reportObject = {
id: id,
status: 'Report has started the process',
data: data
}
// Add new report to global array.
_dataStorage.push(reportObject);
// ... continue with report generation. Assume this takes 5 minutes.
// ...
// ... update _dataStorage[length-1].status after each step
// ...
// ... finish generation.
}
function getReport(id) {
// Iterate through array until report with matching ID is found.
// Return report if match is found.
// Return null if no match is found.
}
From my understanding, CustomerReportLibrary.createNewReport() will execute in the background even after a 201 response is returned. In the front-end, I'd make an AJAX call to /report/:id on an interval basis to get updates on my report. Is this the right way to do this? Is there a better way to do this?
I think you are on the right way. HTTP 202 (The request has been accepted for processing, but the processing has not been completed) is a proper way to handle your case.
It can be done like this:
client sends POST /reports, server starts creating new report and returns:
202 Accepted
Location: http://api.domain.com/reports/1
client issues GET /reports/1 to get status of the report
All the above flow is async, so users are not blocked.

Resources