Strategy to create task scheduler

Strategy to create task scheduler - algorithm

I am creating an App which allows users to schedule repeated task with friends. I am thinking of ways to create the scheduling queue.
The first way would be to let the task be handled by the server. The task can be created and the scheduling algorithm would run on the server which will send notifications to the involved users. The problem with this approach is that it would overload the server if many users would be using it.
The second approach would be to let the user app handle the task scheduling on their mobile as a background service, this would be ideal as it would not overload the server. The disadvantage of this approach would be that the notifications would depend on the status of the user phone(Internet connection, on/off, etc...).
Which of the two methods would be ideal, or is there some other way to approach this problem?

I would be an adept of running this on the server, because of the following advantages:
only one place to keep all the details for one notification (e.g. message text, schedule). This ensures that we don't somehow end up with clients having different notification data in their client applications. This results in simpler logic if the notification details can be updated, or new friends can be added. Although sometimes underestimated, I think that simplicity should be an important criterion in the architecture choice.
it seems that it is relatively easy to scale horizontally -- multiple message queues can be set up on multiple machines, without interference between the queues.

Related

How to design notification system that sends real-time alerts created by users

I've been thinking about how to design a system that supports user created scheduled alerts. My problem is once the alerts are created and inserted into a database, I don't know what the best way to go about scheduling those alerts. Polling the database to see which alerts need to go out next doesn't seem entirely right to me.
What are some ways this could be handled on a scale where say a million users could create their own custom alerts like change baby diaper at 3pm everyday?

This problem is very suitable for cloud platforms. For example, you could use GCP Cloud Scheduler to invoke a cloud function when the alert is supposed to be sent out. The cloud function then calls some API to alert the user.
If cloud platforms are not an option, you could have your application spawn a new thread when an alert is created, and sleep that thread for a certain duration. When it wakes up, it sends the alert. Less elegant and less scalable than the first solution, but it would still work.

Microservice - persisting to RDBMS & queue within a transaction

I have a REST service - all its requests are persisted to its own relational database. So far, good. But, there is also a small business functionality (email notification, sms alert) that should be run on the newly received/updated data. For this process to work on data in background, it requires some way to know about the persisted data - a message queue would fix the problem. Three common ways I see designing this,
The REST service inserts into the database, also, publish to the queue, too.
The problem here is, distributed transaction - combining different types within one transaction - relational database & the queue. Some tools may support, some may not.
As usual REST service persists only to its database. Additionally it also inserts the data into another table to which a scheduled job queries, publishes them to queue (from which the background job should start its work).
The problem I see is the scheduler - not reactive, batchprocessing, limited by the time slot, not realtime, slow and others.
The REST endpoint publishes the data directly to a topic. A consumer persists it to the database, whereas another process it in the background.
Something like eventsourcing. TMU, it is bit complex to implement as the number of services grow. Also, if the db is down, the persistent service would fail to save the data, however the background service (say, the emailer) would send email which is functionaly wrong. This may lead to inconsistency among the services, also functional.
I have also thought of reading database transaction-logs, but it seems more complex, requires tools to configurations to make it work, also, it seems right for data processing systems than for our use case.
What's your thought on this - did I miss anything? How do you manage such scenarios? What should be looked for? Thinking reactive, say Vertx?
Apologies if this looks very naive, but I have to ask.

I think best approach is 2 with a CDC(change data capture) system like debezium.
See [https://microservices.io/patterns/data/transactional-outbox.html][1]

I usually recommend option 3 if you don't need immediate read after write consistency. Background job should retry if the database record is still not updated by the message it processes.
Your post exemplifies why queues shouldn't be used for these types of scenarios. They are good for delivering analytical data or logs, but for task orchestration developers have to reinvent the wheel every time.
The much better approach is to use a task orchestration system like Cadence Workflow that eliminates issues you described and makes multi-service orchestration much simpler.
See this presentation that explains the Cadence programming model.

Micro-services architecture, need advise

We are working on a system that is supposed to 'run' jobs on distributed systems.
When jobs are accepted they need to go through a pipeline before they can be executed on the end system.
We've decided to go with a micro-services architecture but there one thing that bothers me and i'm not sure what would be the best practice.
When a job is accepted it will first be persisted into a database, then - each micro-service in the pipeline will do some additional work to prepare the job for execution.
I want the persisted data to be updated on each such station in the pipeline to reflect the actual state of the job, or the its status in the pipeline.
In addition, while a job is being executed on the end system - its status should also get updated.
What would be the best practice in sense of updating the database (job's status) in each station:
Each such station (micro-service) in the pipeline accesses the database directly and updates the job's status
There is another micro-service that exposes the data (REST) and serves as DAL, each micro-service in the pipeline updates the job's status through this service
Other?....
Help/advise would be highly appreciated.
Thanx a lot!!

To add to what was said by #Anunay and #Mohamed Abdul Jawad
I'd consider writing the state from the units of work in your pipeline to a view (table/cache(insert only)), you can use messaging or simply insert a row into that view and have the readers of the state pick up the correct state based on some logic (date or state or a composite key). as this view is not really owned by any domain service it can be available to any readers (read-only) to consume...

Consider also SAGA Pattern
A Saga is a sequence of local transactions where each transaction updates data within a single service. The first transaction is initiated by an external request corresponding to the system operation, and then each subsequent step is triggered by the completion of the previous one.
http://microservices.io/patterns/data/saga.html
https://dzone.com/articles/saga-pattern-how-to-implement-business-transaction
https://medium.com/#tomasz_96685/saga-pattern-and-microservices-architecture-d4b46071afcf

If you would like to code the workflow:
Micorservice A which accepts the Job and command for update the job
Micorservice B which provide read model for the Job
Based on JobCreatedEvents use some messaging queue and process and update the job through queue pipelines and keep updating JobStatus through every node in pipeline.
I am assuming you know things about queues and consumers.
Myself new to Camunda(workflow engine), that might be used not completely sure

accessing some shared database between microservices is highly not recommended as this will violate the basic rule of microservices architecture.
microservice must be autonomous and keep it own logic and data
also to achive a good microservice design you should losely couple your microservices

Multiple microservices accessing the database is not recommended. Here you have the case where each of the service needs to be triggered, then they update the data and then some how call the next service.
You really need a mechanism to orchestrate the services. A workflow engine might fit the bill.
I would however suggest an event driven system. I might be going beyond with a limited knowledge of the data that you have. Have one service that gives you basic crud on data and other services that have logic to change the data (I would at this point would like to ask why you want different services to change the state, if its a biz req, its fine) Once you get the data written just create an event to which services can subscribe and react to it.
This will allow you to easily add more states to your pipeline in future.
You will need a service to manage the event queue.
As far as logging the state of the event was concerned it can be done easily by logging the events.
If you opt for workflow route you may use Amazon SWF or Camunda or really there quite a few options out there.
If going for the event route you need to look into event driven system in mciroservies.

Web server and ZeroMQ patterns

I am running an Apache server that receives HTTP requests and connects to a daemon script over ZeroMQ. The script implements the Multithreaded Server pattern (http://zguide.zeromq.org/page:all#header-73), it successfully receives the request and dispatches it to one of its worker threads, performs the action, responds back to the server, and the server responds back to the client. Everything is done synchronously as the client needs to receive a success or failure response to its request.
As the number of users is growing into a few thousands, I am looking into potentially improving this. The first thing I looked at is the different patterns of ZeroMQ, and whether what I am using is optimal for my scenario. I've read the guide but I find it challenging understanding all the details and differences across patterns. I was looking for example at the Load Balancing Message Broker pattern (http://zguide.zeromq.org/page:all#header-73). It seems quite a bit more complicated to implement than what I am currently using, and if I understand things correctly, its advantages are:
Actual load balancing vs the round-robin task distribution that I currently have
Asynchronous requests/replies
Is that everything? Am I missing something? Given the description of my problem, and the synchronous requirement of it, what would you say is the best pattern to use? Lastly, how would the answer change, if I want to make my setup distributed (i.e. having the Apache server load balance the requests across different machines). I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.

Some thoughts about the subject...
Keep it simple
I would try to keep things simple and "plain" ZeroMQ as long as possible. To increase performance, I would simply to change your backend script to send request out from dealer socket and move the request handling code to own program. Then you could just run multiple worker servers in different machines to get more requests handled.
I assume this was the approach you took:
I was thinking of doing that by simply creating yet another layer, based on the Multithreaded Server pattern, and have that layer bridge the communication between the web server and my workers.
Only problem here is that there is no request retry in the backend. If worker fails to handle given task it is forever lost. However one could write worker servers so that they handle all the request they got before shutting down. With this kind of setup it is possible to update backend workers without clients to notice any shortages. This will not save requests that get lost if the server crashes.
I have the feeling that in common scenarios this kind of approach would be more than enough.
Mongrel2
Mongrel2 seems to handle quite many things you have already implemented. It might be worth while to check it out. It probably does not completely solve your problems, but it provides tested infrastructure to distribute the workload. This could be used to deliver the request to be handled to multithreaded servers running on different machines.
Broker
One solution to increase the robustness of the setup is a broker. In this scenario brokers main role would be to provide robustness by implementing queue for the requests. I understood that all the requests the worker handle are basically the same type. If requests would have different types then broker could also do lookups to find correct server for the requests.
Using the queue provides a way to ensure that every request is being handled by some broker even if worker servers crashed. This does not come without price. The broker is by itself a single point of failure. If it crashes or is restarted all messages could be lost.
These problems can be avoided, but it requires quite much work: the requests could be persisted to the disk, servers could be clustered. Need has to be weighted against the payoffs. Does one want to use time to write a message broker or the actual system?
If message broker seems a good idea the time which is required to implement one can be reduced by using already implemented product (like RabbitMQ). Negative side effect is that there could be a lot of unwanted features and adding new things is not so straight forward as to self made broker.
Writing own broker could covert toward inventing the wheel again. Many brokers provide similar things: security, logging, management interface and so on. It seems likely that these are eventually needed in home made solution also. But if not then single home made broker which does single thing and does it well can be good choice.
Even if broker product is chosen I think it is a good idea to hide the broker behind ZeroMQ proxy, a dedicated code that sends/receives messages from the broker. Then no other part of the system has to know anything about the broker and it can be easily replaced.
Using broker is somewhat developer time heavy. You either need time to implement the broker or time to get use to some product. I would avoid this route until it is clearly needed.
Some links
Comparison between broker and brokerless
RabbitMQ
Mongrel2

How to perform process interrupts in jBPM based on business data

I've been doing some research into BPM solutions and am looking to hopefully use jBPM to achieve my goal. I am aware it is possible to start a process instance with an event signal sent to the process engine, but I would like to be able to interact with process instances currently running in that engine WITHOUT knowing their instance ID.
I am aiming to achieve this in an interrupt fashion by sending an event to the process engine, with business data, that will match to the process instance containing that specific match in business data (for instance a customer number unique to a process instance).
I have not yet been able to figure out how to do this, another of my goals is to expose this via REST/SOAP, and I am aware that this functionality is NOT currently implemented in the jBPM5 console REST interface.
How would I go about doing this, what are the standard patterns for doing so, or what other process engines should I be looking at to achieve this?

Yeah, you can achieve that with jbpm and I would recommend you to check jbpm6 CR2..
In order to do what you need you can start multiple processes inside a KieSession and then send your customer as the payload of your event. Only the process that has that customer will catch the event ( if it's modeled correctly with the catch event node that filter by customer).
The Rest endpoints are already there in jbpm6.
Hope it helps

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio