Batch processing for micro service - Spring service design? - spring

I am curious to learn, Which design approach you would take in following scenario?
I have a FTP server where my java spring app (Service 1) will go and fetch the file. I will then store that file on to a S3 bucket. this file may have up to a million records. There is going to be another micro service (Service 2) which would need to need read this file data.
How would you design and expose these million records from service1 so that those would readable by Service 2?
Which technology stack you would use on either side and why?
these both services are going to hosted on same servers(For now).
Can you suggest a quick and efficient solution?
Thanks

Here's my approach. I wouldn't even create 2 services provided your scope of work. The only service will use Spring Integration and Spring Batch.
Spring Integration part will get the file from FTP and copy it to S3 and hands over to Spring Batch for processing
After processing with Batch, save the records.
If the job is to run once in a while and don't have much records, I would even not use Spring Batch. I would just use Spring Integration and process the records in parallel. But if you know you'll have millions of records and tons of files, you should consider using Spring Batch for batch processing.

Related

Message Aggregation using SQS and SpringBoot

I have a use case/situation wherein, SQS(standard) will be flooded with messages (north of 500k+), a microservice (spring boot based) listens to these events, consumes it, and makes a rest API call (batch-based) to 3rd party SaaS system (have attached a high-level diagram for the same)
The limitation here is that the spring boot consumer can receive a max of 10 messages from the SQS, transform the payload, and makes the rest API call with these 10 messages(records).
Is there a way to aggregate these messages to say 100 messages, before making the rest API call (assuming that the target SaaS System accepts 100 records of data)? Would spring batch helps in this case?
Should I have to look at a different stack for this kind of need? Any help/guidance is much appreciated.
Thanks
What you are describing is actually the chunk-oriented processing model of Spring Batch: items could be read from the queue, accumulated in chunks of 100 items (that is the configurable chunk-size) and posted to your REST API in bulk mode.
Spring Batch handles the chunking of items (and much more) for you. So yes, even though I'm biased, I believe Spring Batch is a very good option for your use case.
Maybe you should try Spring Aggregator(Spring Integration).
The Aggregator combines a group of related messages, by correlating
and storing them until the group is deemed to be complete. At that
point, the aggregator creates a single message by processing the whole
group and sends the aggregated message as output.
https://docs.spring.io/spring-integration/reference/html/aggregator.html
And please refer to this GitHub repo for spring integration with AWS services
https://github.com/spring-projects/spring-integration-aws/tree/main/src/test/java/org/springframework/integration/aws
I'm assuming you are having multiple instances of your application and can scale up easily if required (since you have 500k+ messages). But still, your application is prone to data loss. So building a reliable system is always challenging. Since you are already on the cloud and maybe you should think about utilizing different cloud services.
I think for your case, you should have a look at the AWS Kinesis dataStream and Kinesis data fire hose.
You can refer this,
https://aws.amazon.com/blogs/big-data/stream-data-to-an-http-endpoint-with-amazon-kinesis-data-firehose/

Spring Batch : Read from database and post to third party webserivce

I need to develop an application to perform the following activities using Spring Batch. Read data from the database --> Process data and prepare rest API request --> Write or Post into third party restful service.
I have seen a lot of examples for reading from the database to write to CSV, DB, JMS.
But I don't find any options to write into web service. Is it possible to perform this activity using spring batch or please suggest some other technology.
Spring Batch does not have a RestItemReader since the possibilities are a bit too great as to what we'd need to send. That being said, writing your own ItemWriter implementation that uses a RestTemplate to send the data would be very straight forward to do.

How to dynamic deploy for standalone Spring batch using Spring Cloud Task

We are planning to retire the existing legacy java batch applications and recreate it with the latest available batch framework.
Given that we have a large number of batch jobs to be modernised, we are looking for a framework or architecture that would allow us to
Develop a batch solution that would allow us to dynamically deploy a new batch as and when they are created, without disturbing the existing deployed applications. - Does Spring cloud Task provide any of this feature. Note: We are looking only to deploy the apps to our local server, and has nothing to do with cloud.
If Spring Batch/Boot can provide us the feature we typically expect from a batch application, what is the special value add to go for Spring Cloud Task? - I wasn't able to completely understand this from the Spring documentation available online.
From the documentation of the Spring Cloud Task, I was able to understand that it allows an application to have many tasks within it. What should I do if each of the tasks have their own library dependencies, which might contradict with the dependencies of other Tasks? So in that case, should each of these tasks moved to a new application or this there a work around for that?
To answer your questions:
Does Spring Cloud Task handle orchestration - No. Spring Cloud Task does not handle orchestration of tasks or jobs. The component in this ecosystem that handles the deployment/orchestration of tasks or jobs is really Spring Cloud Data Flow (which is why I asked if you use any type of cloud platform including YARN, Cloud Foundry, Kubernetes, or Mesos...the environments supported by Spring Cloud Data Flow).
What added value does Spring Cloud Task provide over Spring Boot/Spring Batch - Spring Cloud Task is designed to provide a few things:
Similar abilities to Spring Batch with regards to state management without needing to create a batch job. When running a Boot application on a cloud environment, there is no standard way of getting the results from environment to environment (YARN handles job results differently from tasks on Cloud Foundry which is different from jobs on Kubernetes, etc). Spring Batch provides this but now all short lived processes need the overhead of the Batch API so Spring Cloud Task provides a lighter touch to those use cases.
Automatically adds informational listeners. With Spring XD, when you ran a job in an XD container, the XD container automatically added a number of informational listeners that broadcast events that you could listen for. Spring Cloud Task brings the same functionality without the need for the XD container.
Integration with Spring Cloud Stream. Spring Cloud Task provides the ability to launch tasks from messages received from Spring Cloud Stream. Also, the informational messages previously mentioned (both Batch events as well as Task events) are sent via Spring Cloud Stream channels.
The DeployerPartitionHandler. When working in a cloud environment, this PartitionHandler implementation allows you to launch workers for a partitioned batch job as tasks. This allows for the dynamic scaling of partitioned batch jobs instead of the traditional option of pre-deploying workers that listen for work which wastes resources in a modern cloud environment.
How does the packaging of multiple tasks work with dependencies - In short, this is not recommended. The idea of a Spring Cloud Task is that the execution of the Spring Boot application is the Task. While you could package up multiple tasks and using different methods, have them execute based on different stimulus, that goes against the 12 factor application concepts which are essential for correct use of Spring Cloud Task.
My two cents
For the best option for a modern batch platform, you really need to look into some from of platform first and that begins at the Cloud Foundry/Kubernetes/Mesos/YARN layer. Without that, you end up building a large part of the infrastructure yourself. That is why Spring XD evolved into Spring Cloud Data Flow. The added complexity that lived in the containers of Spring XD is removed by requiring a modern platform to run on (since they all handle those guarantees themselves). Without that piece, you're going to spend a lot of time managing the deployment and orchestration of applications that most modern platforms handle for you.
From there, the choice becomes pretty easy IMHO with Spring Cloud Task for simple tasks, Spring Batch for batch jobs, and Spring Cloud Data Flow for orchestration.

combine spring batch and spring integration?

thanks for attention,i defined a combine spring batch and spring integration project and communicate with ftp server to retrieve file and process on it and write on ftp, i am looking for a good architecture for my project, i designed an architecture with spring integration as bellow diagram:
when retrieve file from server process on it and route files based on condition to mvChannel and toGet channel, i have many process scenario on the retrieved file from server that i defined a router that router handle the scenario , and route to job channels and run spring batch
now, my is question is that are right the architecture?
Really looks good. It is is typical pattern to get gain from both Spring Integration and Spring Batch worlds integration them.
However I don't see reason in the last <aggregator>. All your jobs can send their result to the <int-ftp:outbound-gateway command="PUT">. Looks like no need to wait for all results to do some analysis on them in the single place.

accessing remote spring batch jobs from spring batch admin

I am new to spring batch. I want to run spring batch jobs on server a and want to launch those jobs from server b using spring batch admin.is it possible? I have searched the following two ways:
1.JMX way: i could convert spring batch beans into mbeans but i cant read them from spring batch admin.can you tell how to read mbeans from spring batch admin and launch them?
2.common repository: i think if i use the same db repository for both spring batch and spring batch admin then i can launch remote jobs from spring batch admin (from server b).but in the job xml file in spring batch admin what should be the classpath for tasklet?
can you help in the above or tell me if any new way exists?
we ended up implementing a framework using mq communication to handle this. each 'batch node' registers itself and any 'batch class' parameters such as 'nodeType=A' or 'jobSizeiCanHandle=BIG' (these are fictitious but you get the point). The client console reads this information and queries the nodes via MQ for the job list. It then submits job requests with parameters via a rudimentary text based protocol (property file format).
command=START_JOB
job=JobABC
param1=x
param2=y
One of the batch nodes will pick up the message and start the job, it will return success/fail status in the same manner with a message with the same correlation id. so the client can show response to the user.
this allows us to do what you're talking about AND spark the jobs via an external scheduler (Control-M) . The 'nodeType=A' mentioned above allows us to query individual nodes (the nodes listen where 'nodeType=A or nodeType=*'. This allows commands to be 'targeted' to specific nodes if that is necessary.
Keep in mind, this is our own console, not the spring batch admin console. So perhaps that doesn't help you, but building up a simple console doesn't take that long using the spring batch APIs (4 or 5 asps).
The batch nodes could also have started up simple services like HTTP REST services or 'whatever' but we use MQ heavily and i liked the idea of not having to preregister nodes (the framework code doesn't know/care that it's in an HTTP container, so it couldn't register the endpoint easily). With MQ, the channel is preconfigured and all apps just 'use it' so it seemed easier.
Good luck.
I am trying to do the same thing. But it seems that in order to launch job directly from Spring batch admin, all the job resource has to be added to the spring batch web app. May be try restful job submission with spring MVC
#chau
One way to use Spring batch admin as is, but "discover" and "invoke" remote jobs is to provide your own implementations for org.springframework.batch.admin.service.JobService and org.springframework.batch.core.launch.JobOperator that can query and invoke jobs from remote job registry/repository.
You can find custom implementation for JobService and JMX enabled Job administrator in : https://github.com/regunathb/Trooper/tree/master/batch-core as: org.trpr.platform.batch.impl.spring.admin.SimpleJobService and org.trpr.platform.batch.impl.spring.jmx.JobAdministrator
Spring beans XML that uses these beans are here : https://github.com/regunathb/Trooper/blob/master/batch-core/src/main/resources/packaged/common-batch-config.xml

Resources