Concurrent transaction issue in activiti framework we have fixed but need some feedback - jdeveloper

We have used Activiti framework version 5.15 and we are getting concurrent modification issue for job execution.
also mentioned error stack trace
2018-09-19 16:13:46,083 ERROR [org.activiti.engine.impl.jobexecutor.ExecuteJobsRunnable] (pool-4-thread-30) exception during job execution: ProcessInstance[34391064] was updated by another transaction concurrently: org.activiti.engine.ActivitiOptimisticLockingException: ProcessInstance[34391064] was updated by another transaction concurrently
at org.activiti.engine.impl.db.DbSqlSession.flushUpdates(DbSqlSession.java:622)
at org.activiti.engine.impl.db.DbSqlSession.flush(DbSqlSession.java:503)
at org.activiti.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:182)
at org.activiti.engine.impl.interceptor.CommandContext.close(CommandContext.java:128)
at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:66)
at org.activiti.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:47)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at org.activiti.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:45)
at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
at org.activiti.engine.impl.jobexecutor.ExecuteJobsRunnable.run(ExecuteJobsRunnable.java:52)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
We have fix this issue with hazelcast locking mechanisum into processInstanceId because it is unique during hole workflow execution.
We have taken hazelcast lock when activiti begin transaction(**in ExecuteJobCommand.java before job.execute(commandContext); **) for executing service task, And release this lock when activiti commit transaction (for service task) in CommandContext.java after transactionContext.commit();
Using this mechanisum we had solve issue(concurrent modification exception).
I need some suggestion that it will create any problem in production for job execution?
and suggest that if anyone have another solution for solving this issue.

It is normal to see this with multiple Activiti 5 engines running and async execution. This is part of the design of the Activiti 5 job executor - if you have multiple of them running then each will try to run the job and the first to get there wins. You can see this as a benign exception as what happens in the losing executions will not be committed (unless you do something in custom code in that execution which can't be rolled back like an http call). See https://community.alfresco.com/thread/221722-activitioptimisticlockingexception-on-even-the-simplest-process The most popular way to avoid seeing this tends to be to disable the job executor on all but one of the engines so that only that one is processing async jobs (effectively it is a 'leader').

I'm not familiar with internals of Activiti but Hazelcast distributed locks work very well. I think the main problem you might have would be performance related. As with any locked/synchronized code block, you are serializing your process there which could be an issue as you scale up. Also you need to be very sure that you always release the lock when a process leaves the code block, especially on exceptions. You will get a deadlock if you don't.

Related

JobRunr Job Processing at least once or exactly once?

We want to use JobRunr along with Spring Boot and i am looking at the documentation and it is kinda confusing.
On the main page it says the following thing
Reliable
Once a background job was created without any exception,
JobRunr takes the responsibility to process it at least once.
And in the FAQ page https://www.jobrunr.io/en/documentation/faq/ it says
How does JobRunr make sure to only process a job once?
I guess what is written in the FAQ it means that it uses optimistic locking to do the coordination that the job is processed once - but this does not mean it will get processed once exactly - because it might get processed, but not updated in the DB - which means double processing can occur.
Am i getting it correct?
Also from the FAQ i can't see what happens when the status is updated to PROCESSING but the actual processing fails. This is not explained there.
Thanks a lot for the feedback.
Best Regards
It seems that this has been answered already in the Discussion tab in Github.
If no exceptions during the run of your job, JobRunr
will process your job exactly once by means of optimistic locking.
If however your job is existing out of multiple phases and
one of those last phases fails, all the prior phases will
be re-executed when your job is retried.
https://github.com/jobrunr/jobrunr/discussions/358

Using custom executor with quartz scheduling library

I am using quartz library, mainly to schedule multiple tasks that will run forever at specific time. I have a customized executor (a retry executor that reschedule a task for a specified number of times in case of failure (retries are customizable). I want to know if there is a way to setup quartz to use this customized executor? Currently I am using the executor inside the Job i.e. call executor.execute() inside the Job's execute method.
From the Guide there is only explained how to configure your own ThreadPool.
Fine-tuning the Scheduler
You can also implement Quatz's interface ThreadExecutor or adjust your "RetryExecutor" implementation to do so.
Then it can be passed as component via setThreadExecutor to the QuartzSchedulerResources which in turn are used to configure QuartzScheduler - the heart of Quartz.
Keep the Job isolated
It's discouraged to modify scheduling or execute additional jobs from within a job's execute method. This control-flow is kept outside from the jobs by Quartz. It's part of the Scheduler's responsibilities. Thus your current solution:
using the executor inside the Job i.e. call executor.execute() inside the Job's execute method
can inflict the correct function of the Scheduler itself.
Retry controlled from within the job
There might be a couple of ways how to deal with retries in Quartz.
Take a search here in Stackoverflow for [quartz-scheduler] retry:
Automatic retry of failed jobs
Count-based retries, increasing delays between retries, etc.
This question explains some:
Quartz retry when failure

Launch Mongock faster so when changelog fails the application crashes before a heath check can pass

We recently added MongoCk to our Spring 5 app (using the Spring runner), but are having some issues during our deploys. Our final step in the deploy process is a health check where the deployment server checks a health page every 5s for 5 minutes. Once it gets the correct response the deployment is considered successful and it finishes.
The issue is that MongoCk seems to only start the migration around 30s after the application context loads, resulting in the health check passing and the migration possibly failing after the service was "successfully" launched.
Using a standalone runner might solve this, but we really like the availability of other beans during the changelogs. So is there a way to enforce the changelogs to be processed as part of loading the application context? Or where is this delay coming from, and how can we reduce it?
You don't provide much information, but you are saying that Mongock starts 30 secs after the application context is loaded. That could be happening for two reasons:
The most likely possibility is that you are using runner-type ApplicationRunner(by default). This means that Spring decides when to run it after the entire context is loaded. From what you are saying runner-type InitializingBean is a better fit for you .
Please try this:
mongock:
runner-type: InitializingBean
You have multiple instances fighting for the lock. There is nothing we can do about it, this process is optimised(Although we are improving even more). However, as said, I believe the issue is related with the runner-type

Scheduled tasks with multiple servers - single point of responsibility

We have a Spring + JPA web application.
We use two tomcat servers that run both application and uses the same DB.
One of our application requirmemnt is to preform cron \ scheduled tasks.
After a short research we found that spring framework delivers a very straight forward solution to cron jobs,
(Annotation based solution)
However since both tomcats running the same webapp - if we will use this spring's solution we will create a very problematic scenario where 2 crons are running at the same time (each on a different tomcat)
Is there any way to solve this issue? maybe this alternative is not good for our purpose?
thanks!
As a general rule, you're going to want to save a setting to indicate that a job is running. Similar to how "Spring Batch" does the trick, you might want to create a table in your database simply for storing a job execution. You can choose to implement this however you'd like, but ultimately, your scheduled tasks should check the database to see if an identical task is already running, and if not, proceed with the task execution. Once the task has completed, update the database appropriately so that a future execution will be able to proceed.
#kungfuters solution is certainly a better end goal, but as a simple first implementation, you could use a property to enable/disable the tasks, and only have the tasks run on one of the servers.

Weblogic "Abandoning transaction" warning

We randomly get warnings such as below on our WL server. We'd like to better understand what exactly these warnings are and what we should possibly do to avoid them.
Abandoning transaction after 86,606
seconds:
Xid=BEA1-52CE4A8A9B5CD2587CA9(14534444),
Status=Committing,numRepliesOwedMe=0,numRepliesOwedOthers=0,seconds
since begin=86605, seconds
left=0,XAServerResourceInfo[JMS_goJDBCStore]=(ServerResourceInfo[JMS_goJDBCStore]= (state=committed,assigned=go_server),xar=JMS_goJDBCStore,re-Registered
= true),XAServerResourceInfo[weblogic.jdbc.wrapper.JTSXAResourceImpl]=
(ServerResourceInfo[weblogic.jdbc.wrapper.JTSXAResourceImpl]=(state=new,assigned=none),xar=
weblogic.jdbc.wrapper.JTSXAResourceImpl#1a8fb80,re-Registered
= true),SCInfo[go+go_server]= (state=committed),properties=({weblogic.jdbc=t3://10.6.202.37:18080}),local
properties=
({weblogic.transaction.recoveredTransaction=true}),OwnerTransactionManager=
ServerTM[ServerCoordinatorDescriptor=(CoordinatorURL=go_server+10.6.202.37:18080+go+t3+,
XAResources={JMS_goJDBCStore,
weblogic.jdbc.wrapper.JTSXAResourceImpl},NonXAResources=
{})],CoordinatorURL=go_server+10.6.202.37:18080+go+t3+)
I do understand the BEA explanation:
Error: Abandoning transaction after secs seconds: tx
Description: When a transaction is abandoned,
knowledge of the transaction is
removed from the transaction manager
that was attempting to drive the
transaction to completion. The JTA
configuration attribute
AbandonTimeoutSeconds determines how
long the transaction manager should
persist in trying to commit or
rollback the transaction.
Cause: A resource or participating server may
have been unavailable for the duration of the
AbandonTimeoutSeconds period.
Action: Check participating resources for heuristic
completions and correct any data inconsistencies.
We have observed that you can get rid of these warnings by deleting the *.tlog files but this doesn't seem like the right strategy to deal with the warnings.
The warnings refer to JMS and our JMS store. We do use JMS. We just don't understand why transactions are hanging out there and why they would be "abandoned"??
I know it's not very satisfying, but we do delete *.tlog files before startup in our app hosted on WLS 7.
Our app is an event-processing back-end, largely driven by JMS. We aren't interested in preserving transactions across WLS restarts. If it didn't complete before the shutdown, it tends not to complete after a restart. So doing this *.tlog cleanup just eliminates some warnings and potential flaky behavior.
I don't think JMS is fundamental to any of this, by the way. At least not that I know.
By the way, we moved from JDBC JMS store to local files. That was said to be better performing and we didn't need the location independence we'd get from using JDBC. If that describes your situation also, maybe moving to local files would eliminate the root cause for you?

Resources