Is it sutable to use Spring Cloud DataFlow to orchestrate long running external batch jobs inside infinite running apps? - spring-boot

We have String Batch applications with triggers defined in each app.
Each Batch application runs tens of similar jobs with different parameters and is able to do that with 1400 MiB per app.
We use Spring Batch Admin, which is deprecated years ago, to launch individual job and to get brief overview what is going in jobs. Migration guide recommends to replace Spring Batch Admin with Spring Cloud DataFlow.
Spring Cloud DataFlow docs says about grabbing jar from Maven repo and running it with some parameters. I don't like idea to wait 20 sec for application downloading, 2 min to application launching and all that security/certificates/firewall issues (how can I download proprietary jar across intranets?).
I'd like to register existing applications in Spring Cloud DataFlow via IP/port and pass job definitions to Spring Batch applications and monitor executions (including ability to stop job). Is Spring Cloud DataFlow usable for that?

Few things to unpack here. Here's an attempt at it.
Spring Cloud DataFlow docs says about grabbing jar from Maven repo and running it with some parameters. I don't like idea to wait 20 sec for application downloading, 2 min to application launching and all that security/certificates/firewall issues
Yes, there's an App resolution process. However, once downloaded, we would reuse the App from Maven cache.
As for the 2mins bootstrapping window, it is up to Boot and the number of configuration objects, and of course, your business logic. Maybe all that in your case is 2mins.
how can I download proprietary jar across intranets?
There's an option to resolve artifacts from a Maven artifactory hosted behind the firewall through proxies - we have users on this model for proprietary JARs.
Each Batch application runs tens of similar jobs with different parameters and is able to do that with 1400 MiB per app.
You may want to consider the Composed Task feature. It not only provides the ability to launch child Tasks as Direct Acyclic Graphs, but it also allows transitions based on exit-codes at each node, to further split and branch to launch more Tasks. All this, of course, is automatically recorded at each execution level for further tracking and monitoring from the SCDF Dashboard.
I'd like to register existing applications in Spring Cloud DataFlow via IP/port and pass job definitions to Spring Batch applications and monitor executions (including ability to stop job).
As far as the batch-jobs are wrapped into Spring Cloud Task Apps, yes, you'd be able to register them in SCDF and use it in the DSL or drag & drop them into the visual canvas, to create coherent data pipelines. We have a few "batch-job as task" samples here and here.

Related

Spring Batch disable Spring Boot AutoConfiguration for specific jobs

I have multiple jobs for my Spring Batch application, but only a single job uses some specific Spring Boot auto configuration features:
a job that uses spring-data-jpa auto configuration, to configure a database for business transactions (not for Spring Batch management)
a job that does not use the database at all
I have packaged both jobs in the same unit because it makes sense from business perspective. Both jobs will work together and the output of one job will be the input of the other job.
Is it possible to disable database specific auto configuration when I run the second job?
I just tried using profiles and I have disabled the autoconfiguration for a specific profile. I am pretty happy with this solution but I wonder if there are other solutions?
This is similar to trying to lazy load beans specific to a given job: How to apply something like #Lazy to Spring Batch?. While the Spring profiles feature may fix your issue, I believe it is working around the root issue which is packaging all jobs in a monolithic way.
I would package each job separately and this problem (and many others) disappears by design. There are several advantages to this approach:
Independent lifecyle management (bugs, features, etc)
Flexible deployment
Separate logs
Separate configurations (as in the current issue)
Easier/Better scalability
And all the good reasons to make one thing do one thing and do it well.

What modules to use for a synchronization service in Java/Spring?

I'm willing to build a synchronization service in Java. The use case is, that i'm fetching data from an exchange-service (via Exchange Web Services), normalize the data a bit (process probably) and then write it to a backend via GraphQL. I already had a look around the spring modules, but am not quite sure what modules to use. I found spring batch and spring quartz.
The synchronization will have to trigger all X seconds, fetch information from the Exchange, look what's in the backend already and update what's needed.
Do you guys have any suggestions? I started implementing this whole thing in nodejs before, but as it has to run on both, Windows Servers and Docker/Linux, it has been a real pain to keep it running smooth (mostly because bundling nodejs to an application for Windows is pain).
Difference between Spring Batch & Quartz:
Spring Batch and Quartz have different goals. Spring Batch provides functionality for processing large volumes of data and Quartz provides functionality for scheduling tasks.
So Quartz could complement Spring Batch, A common combination would be to use Quartz as a trigger for a Spring Batch job using a Cron expression.
Conclusion : So basically Spring Batch defines what should be done, Quartz defines when it should be done.
Quartz is a scheduling framework. Like "execute something every hour or every last friday of the month"
Spring Batch is a framework that defines that "something" that will be executed.
You can define a job, that consists of steps. Usually a step is something that consists of item reader, optional item processor and item writer, but you can define a custom stem. You can also tell Spring batch to commit on every 10 items and a lot of other stuff.
You can use Quartz to start Spring Batch jobs.
Recommended for your use case :
Quartz scheduling as you want trigger after specific interval.
Reference :https://projects.spring.io/spring-batch/faq.html

spring-cloud-task how to pass messages or flag between two apps

I have already made a Ingestion job using spring batch which reads xml file and ingest into AEM and its working fine.
Now, I am trying to convert this apps into Spring cloud Task. I want to split this apps into 4 different part which is individual apps. I need to connect them into spring cloud data workflow and pass some data and flags based on that next flow will be execute.
Is it possible on spring cloud Task? if yes then how can I bind them? please provide some programming tutorial.
In the recent 1.2.0.RELEASE, we have released a new feature called the "Composed Tasks". With this, you could define a directed graph that's made of several spring-cloud-task (SCT) applications.
Each step in your flow can be an independent SCT application, which you can develop, test, and CI/CD in isolation. Once you're ready to orchestrate them as a composed graph, you'd then register and use them in the specially designed composed-task DSL or the drag & drop GUI.
Checkout this screencast for more details.

Spring Integration Invoking Spring Batch

Just looking for some information if others have solved this pattern. I want to use Spring Integration and Spring Batch together. Both of these are SpringBoot applications and ideally I'd like to keep them and their respective configuration separated, so they are both their own executable jar. I'm having problems executing them in their own process space and I believe I want, unless someone can convince me otherwise, each to run like they are their own Spring Boot app and initialize themselves with their own profiles and properties. What I'm having trouble with though is the invocation of the job in my SpringBatch project from my SpringIntegration project. At first I couldn't get the properties loaded from the batch project, so I realized I need to pass the spring.active.profiles as a Job Parameter and that seemed to solve that. But there are other things in the Spring Boot Batch application that aren't loading correctly like the schema-platform.sql file and the database isn't getting initialized, etc.
On this initial launch of the job I might want the response to go back to Spring Integration for some messaging on Job Status. There might be times when I want to run a job without Spring Integration kicking off the job, but still take advantage of sending statuses back to the Spring Integration project providing its listening on a channel or something.
I've reviewed quite a few Spring samples and have yet to find my exact scenario, most are with the two dependencies in the same project, so maybe I'm doing something that's not possible, but I'm sure I'm just missing a little something in the Spring configuration.
My questions/issues are:
I don't want the Spring Integration project to know anything about the SpringBatch configuration other than the job its kicking off. I have found a good way to do that reference to the Job Bean without getting my entire batch configuration loading.
Should I keep these two projects separated or would it be better to combine them since I have two-way communication between both.
How should the Job be launch from the integration project. We're using the spring-batch-integration project with JobLaunchRequest and JobLauncher. This seems to run it in the same process as the Spring Integration project and I'm missing a lot of my SpringBootBatch projects initialization
Should I be using a CommandLineRunner instead to force it to another process.
Is SpringApplication.run(BatchConfiguration.class) the answer?
Looking for some general project configuration setup to meet these requirements.
Spring Cloud Data Flow in combination with Spring Cloud Task does exactly what you're asking. It launches Spring Cloud Task applications (which can contain batch jobs) as new processes on the platform you choose. I'd encourage you to check out that project here: http://cloud.spring.io/spring-cloud-dataflow/

how replace Spring Batch Admin by Spring Boot Admin

My straight question is: in Spring Batch Admin there is a clear concept and practical functionalities to manage the jobs. So, in Spring Batch Admin you can launch, stop one job, stop all jobs, reatart, abandon, see the status for each job and check if it was successful or failed. In case I use Spring Cloud Task deployed as war and Spring Boot Admin deployed as war in same server, how can I manage my jobs?
Explaining better my context: I have developed few Spring Batch jobs in the last 6 months. All them were designed in order to run periodically, in other words, to be schedulled. I wasn't planned deploy them in Web Server but I was informed in my company in the moment to publish to production that I must run any solution inside of our Mainframe Websphere.
It wasn't any issue at all since I realized that I could use Spring Batch Admin in order to start/stop all the Spring Batch deployed in the same EAR. Unfortunately, Websphere ND 8.5 isn't comply with Spring Batch Admin (you may heard someone telling he/she has managed to make Spring Batch Admin up and running in Websphere 8.5 ND but I got the final position from IBM that neither JSR352 nor Spring Batch Admin is safe to be used).
One week ago, I firstly got in touch with Spring Boot Admin thanks to certain comment to my question how to register org.springframework.integration.monitor.IntegrationMBeanExporter.
In such comment I read "...consider to get rid of batch-admin. This project is obsolete already in favor of cloud.spring.io/spring-cloud-task..." however it doesn't seem to me to really provide the same functionalities Spring Batch Admin does. It seems to me be a bit more generic and aimed for application boxes instead of job executions.
To make my question even more clear, let's say I have this project deployed as war https://github.com/spring-cloud/spring-cloud-task/blob/master/spring-cloud-task-samples/batch-job/src/main/java/io/spring/configuration/JobConfiguration.java and the two jobs are both schedulled for run every 5 minutes. Addtionally, in such project I added the necessary configuration to make it visible to Spring Boot Admin (e.g spring-boot-admin-starter-client). In the same server I have this Spring Boot Admin deployed as war too https://github.com/codecentric/spring-boot-admin/tree/master/spring-boot-admin-samples/spring-boot-admin-sample-war. Now, my question that is the same from title but a bit more exemplified is: will I be able to stop one job and let the other keep running from Spring Boot Admin? Can I launch one and leave other stopped?
If you, reader, has an overview of Spring Batch Admin will probably understand quickly what is my intention. I read in th Spring Boot Admin manual
http://codecentric.github.io/spring-boot-admin/1.4.1/#jmx-bean-management
3.2. JMX-bean management "... To interact with JMX-beans in the admin UI you have to include Jolokia in your application". Is there some trick via Jolokia in order to manage each job individually?

Resources