Microservice Decomposition for Batch Job

Microservice Decomposition for Batch Job - spring

I am reading different posts and books on Microservice Architecture in the hunt to answer my question which is related to the Decomposition Strategies. The question is, should we create a new microservice specifically to handle the batch job?
To my context, the nature of the batch job is to read the data from the database and make REST calls to external system if the data is in the particular state. Additionally, the batch job is suppose to run only once a day.
My questions related to this are
Is this an industry norm/practice that when we have to run BATCH job, it should be a new microservice because batch job consumes resources which can hinder the incoming traffic and increases latency.
Does running a batch job effect the latency of the APIs exposed towards client?

I would say yes, it makes sense. Usually batch jobs have very different development lifecycle and deployment frequency.
I've done something similar by myself and I'm totally sure it's worth it.
Also It would then possible to spin instance to run job once a day - which can save money in cloud environments.
Latency: it depends on that other system. You might want to come with throttling your requests to other system, to not put it down under the heavy load.

Related

Monolith to microservice

We are planning to migrate from monolith to micro-services based architecture. Now i own the responsibility of talking a module out of monolith.
Existing Monolith:
1) Code is very tightly coupled.
2) APIs are called recursively with different parameters.
3) Some of the calls with-in the module which i am planning to extract out contains calls to a system which takes approx 9 minutes to complete.
Unfortunately that's a synchronous.
Points to note:
1) I am starting with a single api migration which is a very important one and is not performing well.
2) This api consists of parallel calls to another system for performing
bunch of tasks. All the calls are blocking and time-consuming (consider
avg response time to be 5-6 min)
Moving to microservice based architecture : There are 2 approaches that comes to my mind while moving the aforementioned api from monolith to a separate microservice, along with solving the problem of blocking threads due to time taking blocking calls.
a) moving in phases :
- Create a separate module
- In this module provide an api to push events to kafka, another
module will in-turn process the request and push the response back
to kafka
- monolith for now will call above mentioned api to push events to
kafka
- New module will inturn call back the monolith when the task
complete (received response on a separate topic in kafka)
- Monolith once get response for all the tasks will trigger some post
processing activity.
Advantage:
1) It will solve the problem of sync- blocking call.
Disadvantage:
1) Changes are required in the monolith, which could introduce some
bugs.
2) No fallbacks are available for the case if bug gets introduced.
b) Move the API at once to the microservice :
Initially which will share common
data source with the monolith and solve the problem of blocking calls
via introduction of kafka between new microservice and the module which
takes time to process the request.
Advantage:
1) Fallback is available in monolith
Disadvantage:
1) Initially data source is shared between the systems.
What should be the best approach to do these kinds of complex tasks ?

You have to take care of several things.
First
Going to microservice will be slower (90% of the time) than a monolith because you introduce latency. So never forget it when you go with it.
Second
You ask if it is a good way to go with kafka. I may answer yes in most of the case but you mentioned that today the process is synchronous. If it is for transactional reasons, you won't be able to solve it with a message broker I guess because you'll update your strong consistency system to an eventually one. https://en.wikipedia.org/wiki/Eventual_consistency
I am not saying that it is a bad solution only that it change your workflow and may impact some business rules.
As a solution I offer this:
1 - Break the seams in your monolith by introducing functional key and api composition inside the monolith (read Sam Newman's book to help).
2 - Introduce the eventual consistency inside the monolith to test if it fits the purpose. It will be easier to rollback if not.
No you have to possibility:
The second step went well so go ahead and put the code of the service into a microservice out of the monolith.
The second step did not fit then think about doing the risky thing in a specific service or use distributed transactions (be careful with this solution it could be hard to manage).

I believe the best approach would be moving option 1: Moving in phases. However, it is possible to do it while having a fallback strategy. You can keep the a version of the untouched backend to serve as a backup if your new service encounters issues.
The approach is described in more details in the article: Low risk monolith to microservice evolution It provides more details in the implementation and the thought processes behind why a phased approach has lower risk. However, the need to change the backend would still be present, but hopefully mitigated through unit testing.

Are service fabric services entirely single-threaded?

I'm trying to get to grips with service fabric and I'm struggling a little bit. Some questions:
are all service fabric service instances single-threaded? I created a stateless web api, one instance, with a method that did a Task.Delay, then returned a string. Two requests to this service were served one after the other, not concurrently. So am I right in thinking then that the number of concurrent requests that can be served is purely a function of the service instance count in the application manifest? Edit Thinking about this, it is probably to do with the set up of OWIN Wep Api. Could it be it is blocking by session? I assumed there is no session by default?
I have long-running operations that I need to perform in service fabric (that can take several hours). Is there a recommended pattern that I can use for this in service fabric? These are currently handled using a storage queue that triggers a webjob. Maybe something with Reliable Queues and a RunAsync loop?

It seems you handled the first part so I will comment on the second part: "long-running operations".
We can see long running operations / workflows being handled far before service fabric came about. For this reason, we can build on the shoulders of giants by looking on the design patterns that software experts have been using for decades. For example, the famous and all inclusive Process Manager. Mind you that this pattern is sometimes an overkill. If it is in your case, just check out the rest of the related patterns in the Enterprise Integration Patterns book (by Gregor Hohpe).
As for the use of reliable collections, those are implementation details when choosing a data structure supporting the chosen design pattern.
I hope that helps

With regards to your second point - It really depends on the nature of your long running task.
Is your long running task the kind of workload that runs on an isolated thread that depends on local OS/VM level resources and eventually comes back with a result (A)? or is it the kind of long running task that goes through stages and builds up a model of the result through a series of persisted state changes (B)?
From what I understand of Service Fabric, it isn't really designed for running long running workloads (A), but more for writing horizontally-scalable, highly-available systems.
If you were absolutely keen on using service fabric (and your kind of workload tends to be more like B than A) I would definitely find a way to break down those long running tasks that could be processed in parallel across the cluster. But even then, there is probably more appropriate technologies designed for this such as Azure Batch?
P.s. If you are going to put a long running process in the RunAsync method, you should design the workload so it is interruptable and its state can be persisted in a way that can be resumed from another node in the cluster
In a stateful service, only the primary replica has write access to
state and thus is generally when the service is performing actual
work. The RunAsync method in a stateful service is executed only when
the stateful service replica is primary. The RunAsync method is
cancelled when a primary replica's role changes away from primary, as
well as during the close and abort events.
P.s.s Long running operations are the devil when trying to write scalable systems. Try and tackle that now and save yourself the future pain if possibe.

To the first point - this is purely a client issue. Chrome saw my requests as indentical and so delayed the 2nd request until the 1st got a response. Varying the parameter of the requests allowed them to be served concurrently.

what should I use for scheduling

I am using JSF-2, Spring 4, hibernate 4 in my application. I have Spring type service layer, Dao Layers , Models and other thing. I want to schedule some of the services which should be automatically executed or called at specified time, usually these services or business logic would perform some kind of data-mapping from excel-file to database.
I want to perform these task without user-intervention and scheduler should take care all these data-mapping.
Note : I am calling these services from my view as well as these services also should be used in scheduler to perform data-mapping.
I am newbie at utmost level, never used any kind of scheduler or anything. So my question :
1)what should I have to use to schedule these task?
2)I am confused regarding Spring Batch and Spring-sheduler? are they both perform scheduling ,if no then what is actual use of sping-batch?
3)Can spring-scheduler itself sufficient enough to perform these scheduling
Any help would be highly considerable.

1)what should I have to schedule these task?
Basically you need the classes that support the operations that you want to do (excel creation from database queries), spring in both cases.
2)I am confused regarding Spring Batch and Spring-sheduler? are they both perform scheduling ,if know then what is actual use of sping-batch?
Spring Batch provides reusable functions that are essential in
processing large volumes of records, including logging/tracing,
transaction management, job processing statistics, job restart, skip,
and resource management. It also provides more advanced technical
services and features that will enable extremely high-volume and high
performance batch jobs though optimization and partitioning techniques
Spring scheduler just run any method at certain time, it is not so robust, and only execute the logic involve on a process, not statistic, not job restart, just start a process during predefined period of time (calling a method of a class)
3)Can spring-scheduler itself sufficient enough to perform these scheduling?
Yes it is, if you are not very related with spring-batch this will take more time that just call the methods you already have.
Scheduler A scheduler is a software product that allows an enterprise
to schedule and track computer batch tasks
Scheduler just ran the process.

CPU bound/stateful distributed system design

I'm working on a web application frontend to a legacy system which involves a lot of CPU bound background processing. The application is also stateful on the server side and the domain objects needs to be held in memory across the entire session as the user operates on it via the web based interface. Think of it as something like a web UI front end to photoshop where each filter can take 20-30 seconds to execute on the server side, so the app still has to interact with the user in real time while they wait.
The main problem is that each instance of the server can only support around 4-8 instances of each "workspace" at once and I need to support a few hundreds of concurrent users at once. I'm going to be building this on Amazon EC2 to make use of the auto scaling functionality. So to summarize, the system is:
A web application frontend to a legacy backend system
task performed are CPU bound
Stateful, most calls will be some sort of RPC, the user will make multiple actions that interact with the stateful objects held in server side memory
Most tasks are semi-realtime, where they have to execute for 20-30 seconds and return the results to the user in the same session
Use amazon aws auto scaling
I'm wondering what is the best way to make a system like this distributed.
Obviously I will need a web server to interact with the browser and then send the cpu-bound tasks from the web server to a bunch of dedicated servers that does the background processing. The question is how to best hook up the 2 tiers together for my specific neeeds.
I've been looking at message Queue systems such as rabbitMQ but these seems to be geared towards one time task where any worker node can simply grab a job form a queue, execute it and forget the state. My needs are a little different since there could be multiple 'tasks' that needs to be 'sticky', for example if step 1 is started in node 1 then step 2 for the same workspace has to go to the same worker process.
Another problem I see is that most worker queue systems seems to be geared towards background tasks that can be processed anytime rather than a system that has to provide user feedback that I'm dealing with.
My question is, is there an off the shelf solution for something like this that will allow me to easily build a system that can scale? Would love to hear your thoughts.

RabbitMQ is has an RPC tutorial. I haven't used this pattern in particular but I am running RabbitMQ on a couple of nodes and it can handle hundreds of connections and millions of messages. With a little work in monitoring you can detect when there is more work to do then you have consumers for. Messages can also timeout so queues won't backup too greatly. To scale out capacity you can create multiple RabbitMQ nodes/clusters. You could have multiple rounds of RPC so that after the first response you include the information required to get second message to the correct destination.
0MQ has this as a basic pattern which will fanout work as needed. I've only played with this but it is simpler to code and possibly simpler to maintain (as it doesn't need a broker, devices can provide one though). This may not handle stickiness by default but it should be possible to write your own routing layer to handle it.
Don't discount HTTP for this as well. When you want request/reply, a strict throughput per backend node, and something that scales well, HTTP is well supported. With AWS you can use their ELB easily in front of an autoscaling group to provide the routing from frontend to backend. ELB supports sticky sessions as well.
I'm a big fan of RabbitMQ but if this is the whole scope then HTTP would work nicely and have fewer moving parts in AWS than the other solutions.

Spring Batch Parallel Job Scaling

I'm currently working on a Spring Batch POC and have got a pretty good handle on most of the actual Spring Batch features. I've currently got a program that uses Spring Integration to receive an HttpRequest and use message channels to eventually send the job executions to the job launcher in a queue. What we'd really like to do is implement some kind of "scheduler/load balancer" (not quite sure what to call it) before the job launcher that will look at the currently running worker nodes and the size of the input file and make a decision on how many worker nodes the job should be allowed. We would probably also want to be able to change the amount of worker nodes a job has while it is running to allow more jobs to run.
The idea is that we'd have a server running that could accept many job requests at any time, and a large cluster of machines that jobs will be partitioned onto. We'd like to be able to scale horizontally, so whenever the server isn't busy it can make full use of the hardware, as well as being able to make sure that small jobs don't get constantly blocked by larger jobs.
From my research it seems like we'd have to implement another framework to do this (do GridGain and Hadoop allow this?), but I figured I'd ask to see what people recommended to do something like this, and if there's a way to do it without implementing another large framework.
Sorry if anything is unclear or confusing, I'm just a lowly intern who started learning Spring and Spring Batch last month and I'm far from completely understanding everything, especially this scaling stuff. Just ask and I'll try to clear things up.
Thanks for any help!

Take a look at the 'spring-batch-integration' project under the spring-batch-admin umbrella project https://github.com/SpringSource/spring-batch-admin
It has a number of examples of using spring-integration to distribute work to other nodes. IN particular see the chunk and partition packages. Just swap out the spring integration channels with jms channel adapters. By distributing work partitions via JMS, you can scale out the number of worker nodes as needed.
There are a number of threads on this subject in the spring integration forum; search for 'PartitionHandler'.
Hope that helps.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio