I use JBPM as my workflow engine, but my system is so busy, I may start 500 process per second. JBPM Session is lightweight, but it also cost some time.Can JBPM Session reused? Or how to clear the state of session
You can reuse a session if you want, or there are other strategies like a session per request or per process instance. The state of a process instance is stored separately in the database, so you don't have to clear a session.
For reaching 500 processes per second (when using persistence), you probably want to run multiple sessions in parallel though, possibly across multiple nodes in a cluster.
Related
My application currently polls database every n seconds to see if there are any new records.
To reduce network round trips, and CPU cycles of this polling i was thinking to replace it with CQN based approach where database will itself update subscribed application if there is any Commit to database.
The only problem is what if Oracle was NOT able to notify application due to any connection issue between oracle and subscribed application or if the application was crashed or killed due to any reason? ... Is there a way to know if application have missed any CQN notification?
Is polling database via application code itself the only way for mission critical applications?
You didn't say whether every 'n' seconds means you're expecting data every few seconds, or you just need your "staleness" to as low as that. That has an impact on the choice of CQN, because as per docs, https://docs.oracle.com/en/database/oracle/oracle-database/12.2/adfns/cqn.html#GUID-98FB4276-0827-4A50-9506-E5C1CA0B7778
"Good candidates for CQN are applications that cache the result sets of queries on infrequently changed objects in the middle tier, to avoid network round trips to the database. These applications can use CQN to register the queries to be cached. When such an application receives a notification, it can refresh its cache by rerunning the registered queries"
However, you have control over how persistent you want the notifcations to be:
"Reliable Option:
By default, a CQN registration is stored in shared memory. To store it in a persistent database queue instead—that is, to generate reliable notifications—specify QOS_RELIABLE in the QOSFLAGS attribute of the CQ_NOTIFICATION$_REG_INFO object.
The advantage of reliable notifications is that if the database fails after generating them, it can still deliver them after it restarts. In an Oracle RAC environment, a surviving database instance can deliver them.
The disadvantage of reliable notifications is that they have higher CPU and I/O costs than default notifications do."
I've got multiple servers sharing a database - on each of them a cron job fires ever 5 min checking if a text message log entry doesn't exist, creates a text message log entry and sends out a text message. I thought that there would never be a situation where text messages are sent multiple times, as one server should be first.
Well - I was wrong and that scenario did happen:
A - check if log exists - it doesn't
B - check if log exists - it doesn't
A - create log
B - create log
A - send message
B - send message
I've changed this behaviour to introduce queue, which should mitigate the issue. While the crons will still fire, multiple jobs will be queued, and workers should pick up given jobs at different times, thus preventing of sending of message twice. Though it might as well end up being:
A - pick up job 1
B - pick up job 2
A - check if log exists - it doesn't
B - check if log exists - it doesn't
Etc or A and B might as well pickup the same job at exactly the same time.
The solution would be, I guess, to run one worker server. But then I've the situation that jobs from multiple servers are queued many times, and I can't check if they're already enqueued as we end up with first scenario.
I'm at loss on how to proceed here - while multiple server, one worker server setup will work, I don't want to end up with instances of the same job (coming from different servers) multiple times in the queue.
Maybe the solution to go for is to have one cron/queue/worker server, but I don't have experience with Laravel/multiserver environment to set it up.
The other problematic thing for me is - how to test this? I can't, I guess, test it locally unless there's a way I can spin VM instances that are synchronized with each other.
The easy answer:
The code that checks the database for the existing database entry could use a database transaction with a level high enough to make sure that everyone else that is trying to do the same thing at the same time will be blocked and wait for the job to finish/commit.
A really naive solution (assuming mysql) would be LOCK TABLES entries WRITE; followed by the logic, then UNLOCK TABLES when you're done.
This also means that no one can access the table while your job is doing the check. I hope the check is really quick, because you'll block all access to the table for a small time period every five minutes.
WRITE lock:
The session that holds the lock can read and write the table.
Only the session that holds the lock can access the table. No other session can access it until the lock is released.
Lock requests for the table by other sessions block while the WRITE lock is held.
Source: https://dev.mysql.com/doc/refman/5.7/en/lock-tables.html
That was a really boring answer, so I'll move on to the answer you're probably more interested in...
The server architecture answer:
Your wish to only have one job per time interval in your queue means that you should only have one machine dispatching the jobs. This is easiest done with one dedicated machine that only dispatches jobs from scheduled commands. (Laravel 5.5 introduced the ability to dispatch jobs directly from the scheduler; see Scheduling Queued Jobs)
You can then have an several worker machines processing the queue, and only one of them will pick up the job and execute it. Two worker machines will never execute the same job at the same time if everything works as usual*.
I would split up the web machines from the worker machines so that they can scale independently. I prefer having my web machines dedicated to web traffic, they are not processing jobs to make sure that any large amount of queued jobs will not affect my http response times.
So, I recommend the following machine types in your setup;
The scheduler - one single machine that runs the schedule and dispatches jobs.
Worker machines that handles your queue.
Web machines that handles visitors' traffic.
All machines will have identical source code for your Laravel application. They will also also have an identical configuration. The only think that is unique per machine type is ...
The scheduler has php artisan schedule:run in the crontab.
The workers have supervisor (or something similar) that runs php artisan queue:work.
The web servers have nginx + php-fpm and handles incoming web requests.
This setup will make sure that you will only get one job per 5 minute since there is only one machine that is pushing it. This setup will also make sure that the cpu load generated by the workers aren't affecting the web requests.
One issue with my answer is obvious; that single scheduler machine is a single point of failure. If it dies you will no longer have any of these scheduled jobs dispatched to the queue. That touches areas like server monitoring and health checks, which is out-of-scope of your question and are also highly dependant on your hosting provider.
Regarding that little asterisk; I can make up weird scenarios where a job is executed on several machines. This involves jobs that sleeps for longer than the timeout, while at the same time you've got an environment without support for terminating the job. This will cause the first worker to keep executing the job (since it cannot terminate it), and a second worker will consider the job as timed-out and retry it.
Since Laravel 5.6+ you can ensure your scheduled tasks only run on a single instance using the onOneServer function e.g.
$schedule->command('loggingTask')
->everyFiveMinutes()
->onOneServer();
This requires an APC or Redis cache to be set up because it seems to use a mutual exclusion lock, probably RedisLock if Redis is set up.
Using a queue you shouldn't really have such a problem because popping a task off a queue should be an atomic operation.
Source
I'm looking to store sessions in an external data store instead of in memory. I want to do this in order to have session data available to all my Jetty server instances.
So I read the Jetty docs:
Session Clustering with a Database
Session Clustering with MongoDB
Both docs have the following statement:
The persistent session mechanism works in conjunction with a load balancer that supports stickiness.
My question is - why do I need stickiness in this mechanism? The whole point is to allow any server in the cluster serve any request because the session data is stored externally.
The problem is that the servlet specification does not provide any transactional semantics around sessions. Thus, unless your sessions are sticky its possible for concurrent requests involving the same session to go to multiple servers and produce inconsistent results, depending on the interleaving of the writes. If your sessions are read-mostly you may get away with non-sticky sessions, although you may find that sessions time out a little sooner or a little later than you'd expect, due to having multiple servers trying to manage the same session.
In a normal enterprise application, there just one user (set in hibernate.xml or other config) and multi concurrent connection/multi concurrent session (cos its multi threaded application).
so, will those ONE user's multi session interfare each other?
Depends what you mean by "interfere".
Your middle tier connection pool will open a number of physical connections to the database. Sessions in the middle tier will request a connection from the pool, do some work, and return the connection to the pool. Assuming that your connection pool is large enough to handle the number of simultaneous calls being made from your application (based on the number of sessions, the length of time each session needs a logical connection, and the fraction of "think time" to "action time" in each session), you won't experience contention due to opening connections.
Oracle is perfectly happy to run queries in multiple sessions simultaneously. Obviously, though, there is the potential for one session to contend with another session for resources. Two sessions might contend for the same row-level lock if they are both trying to update the same row. If you have enough sessions, you might end up in a situation where CPU or RAM or I/O is being overtaxed and the load that one session creates causes performance issues in another session. Oracle doesn't care which Oracle user(s) are involved in this sort of contention-- you'd have the same potential for interference with 10 sessions all running as 1 user as you would if there were 10 sessions running as 10 different users assuming the sessions were doing the same things.
I am working on a webapp project and we are considering deploying it on multiple servers.
What solution do you advise for clustering/load-balancing with Spring?
What are the issues to take into account?
For example: How do singletons behave in a cluster of machines? What about session replication? Are there any other issues to take into account?
Here is the list of possible issues (not necessarily Spring-related):
stateful beans - if your beans have state, like collections accumulating something or counters, you need to think whether this state should be replicated or not. E.g. should this counter be local to one JVM or global in the whole cluster? In the latter case consider terracotta and hazelcast
filesystem - as long as all instances use the same database, everything is fine. But if one node writes to disk, other instance can't read it. Solutions? Either use database for all storage or distributed file system
HTTP sessions - either use sticky session or replicate sessions. If you go for replication, keep sessions as small as possible.
asynchronous jobs - if you have a job running every hour, should it run on every machine, or just on a dedicated one (or maybe on random)?