jdbc to HDFS import using spring batch job - spring

I am able to import data from my MS sql to HDFS using JDBCHDFC Spring Batch jobs.But if that containers fails , the job does not shift to other container. How do I proceed to make the job fault tolerant.
I am using spring xd 1.0.1 release

You don't mention which version of Spring XD you're currently using so I can't verify the exact behavior. However, on a container failure with a batch job running in the current version, the job should be re-deployed to a new eligible container. That being said, it will not restart the job automatically. We are currently looking at options for how to allow a user to specify if they want it restarted (there are scenarios that fall into both camps so we need to allow a user to configure that).

Related

Spring Batch with unknown datasource

I have a working Spring Boot application which embeds a Spring Batch Job. The job is not run on a schedule, instead we kick it with an endpoint. It is working as it should. The basics of the batch are
Kick the endpoint to start the job
Reader reads from input file
Processor reads from oracle database using jpa repository and simple spring datasource config
Writer writes to output file
However there are new requirements:
The schema of the repository database is from here on unknown on application startup. The tables are the same, it is just an unknown schema. This fact is out of our control and you might think it is stupid but there are reasons for it and this cant be changed. This means that with current functionality we need to reconfigure the datasource when we know the new schema name, and restart the application. This is a job that we will run for a number of times when migrating from one system to another, so it has a limited lifecycle and we just need a "quick fix" to be able to use it without rewriting the whole app. So what I would like to do is:
Send the schema name as a query param to the application, put it in job parameters and then - get a new datasource when the processor reads from the repository. Would this be doable at all using Spring Batch? Any help appreciated!

Spring task:scheduled or #Scheduler to restrict a Job to run in multiple instance

I have one #Scheduler job which runs on multiple servers in a clustered environment. However I want to restrict the job to run in only in one server and other servers should not run the same job once any other server has started it .
I have explored Spring Batch has lock mechanism using some Database table , but looking for any a solution only in spring task:scheduler.
I had the same problem and the solution what I implemented was a the Lock mechanism with Hazelcast and to made it easy to use I also added a proper annotation and a bit of spring AOP for that. So with this trick I was able to enforce a single schedule over the cluster done with a single annotation.
Spring Batch has this nice functionality that it would not run the job with same job arguments twice.
You can use this feature so that when a spring batch job kicks start in another server it does not run.
Usually people pass a timestamp as argument so it will by pass this logic, which you can change it.

Run MapReduce Jar in Spring cloud data

I need to run a mapreduce spring boot application in spring cloud data flow. Usually applications registered in scdf is executed using "java -jar jar-name" command. But my program is a mapreduce and it has to be executed using "hadoop jar jar-name". How do I achieve this ? What would be better approach to run mapreduce application in scdf ? Is it possible to directly register mapreduce apps ?
I'm using local data flow server to register the application.
In SCDF the format of the command to run a JAR file is managed by a deployer. For example, there are local deployer. Cloud Foundry etc... There is/was Hadoop/YARN but it was discontinued I believe.
Given that the deployer itself is an SPI you can easily implement your own or even fork/extend local-deployer and modify only what's needed.

When spring-boot app multi-node deploy, how to handle cron job?

When I use spring task handle a simple sync job! But when I deploy multi-node, how I make sure the cron job just run one time.
Maybe you say that:
1. Use distributed-lock control a flag before the crob job run.
2. Integrated quartz cluster function.
But I hope spring task #EnableScheduling can add a flag argument, so as we can set a flag when launch app.
We are using https://github.com/lukas-krecan/ShedLock with success, zookeeper provider in particular.
Spring boot, in a nutshell, doesn't allow any type of coordination between multiple instances
of the same microservice.
All the work of such a coordination is done by the third parties that spring boot gets integrated with.
One example of this is indeed a #Scheduled annotation.
Another is DB migration support via flyway.
When many nodes start and the migration has to be done, flyway is responsible to lock the migration table by itself, spring boot has nothing to do with it.
So, bottom line, there is no such support and all options that you've raised can work.

how replace Spring Batch Admin by Spring Boot Admin

My straight question is: in Spring Batch Admin there is a clear concept and practical functionalities to manage the jobs. So, in Spring Batch Admin you can launch, stop one job, stop all jobs, reatart, abandon, see the status for each job and check if it was successful or failed. In case I use Spring Cloud Task deployed as war and Spring Boot Admin deployed as war in same server, how can I manage my jobs?
Explaining better my context: I have developed few Spring Batch jobs in the last 6 months. All them were designed in order to run periodically, in other words, to be schedulled. I wasn't planned deploy them in Web Server but I was informed in my company in the moment to publish to production that I must run any solution inside of our Mainframe Websphere.
It wasn't any issue at all since I realized that I could use Spring Batch Admin in order to start/stop all the Spring Batch deployed in the same EAR. Unfortunately, Websphere ND 8.5 isn't comply with Spring Batch Admin (you may heard someone telling he/she has managed to make Spring Batch Admin up and running in Websphere 8.5 ND but I got the final position from IBM that neither JSR352 nor Spring Batch Admin is safe to be used).
One week ago, I firstly got in touch with Spring Boot Admin thanks to certain comment to my question how to register org.springframework.integration.monitor.IntegrationMBeanExporter.
In such comment I read "...consider to get rid of batch-admin. This project is obsolete already in favor of cloud.spring.io/spring-cloud-task..." however it doesn't seem to me to really provide the same functionalities Spring Batch Admin does. It seems to me be a bit more generic and aimed for application boxes instead of job executions.
To make my question even more clear, let's say I have this project deployed as war https://github.com/spring-cloud/spring-cloud-task/blob/master/spring-cloud-task-samples/batch-job/src/main/java/io/spring/configuration/JobConfiguration.java and the two jobs are both schedulled for run every 5 minutes. Addtionally, in such project I added the necessary configuration to make it visible to Spring Boot Admin (e.g spring-boot-admin-starter-client). In the same server I have this Spring Boot Admin deployed as war too https://github.com/codecentric/spring-boot-admin/tree/master/spring-boot-admin-samples/spring-boot-admin-sample-war. Now, my question that is the same from title but a bit more exemplified is: will I be able to stop one job and let the other keep running from Spring Boot Admin? Can I launch one and leave other stopped?
If you, reader, has an overview of Spring Batch Admin will probably understand quickly what is my intention. I read in th Spring Boot Admin manual
http://codecentric.github.io/spring-boot-admin/1.4.1/#jmx-bean-management
3.2. JMX-bean management "... To interact with JMX-beans in the admin UI you have to include Jolokia in your application". Is there some trick via Jolokia in order to manage each job individually?

Resources