Spring Batch with unknown datasource - spring-boot

I have a working Spring Boot application which embeds a Spring Batch Job. The job is not run on a schedule, instead we kick it with an endpoint. It is working as it should. The basics of the batch are
Kick the endpoint to start the job
Reader reads from input file
Processor reads from oracle database using jpa repository and simple spring datasource config
Writer writes to output file
However there are new requirements:
The schema of the repository database is from here on unknown on application startup. The tables are the same, it is just an unknown schema. This fact is out of our control and you might think it is stupid but there are reasons for it and this cant be changed. This means that with current functionality we need to reconfigure the datasource when we know the new schema name, and restart the application. This is a job that we will run for a number of times when migrating from one system to another, so it has a limited lifecycle and we just need a "quick fix" to be able to use it without rewriting the whole app. So what I would like to do is:
Send the schema name as a query param to the application, put it in job parameters and then - get a new datasource when the processor reads from the repository. Would this be doable at all using Spring Batch? Any help appreciated!

Related

Programmatically recreate H2 database schema in SpringBoot application (not while unit testing)?

I have a SpringBoot application with in memory H2 database and Spring Data JPA.
I need to configure a #Scheduled job that drops and recreates the schema and loads it with fresh data from a file.
How can I programmatically recreate the schema in my application?
You can use database version control tool like eg Liquibase to create and maintain database schema definition as well as initial data. Than, you will be able to easily invoke database migration including drop of whole schema during applicaiton runtime. IT has some integration with Spring Boot already.
Keep in mind, that you will have to lock database access in order to execute migration - DDL is not transactional, so database will be of no use anyway during the migration process and you app can yeld many errors during that time.
If locking is not an option - you should be able to create another instance or at least separate schema in running instance, run migration against it and if everything is done, "switch" peristence context to use brand new schema (and probably remove the old one)

Populate database using spring / hibernate / flyway / postgresql

I'm trying to populate my database with around 150 different values (one for each row).
So far, I've found two different ways to implement the inserts, but none of them seems to be the best way to do it.
Flyway + Postgres: One of them is to create a migration file and make use of the COPY command from postgres but to do so, I need to give superuser permissions to the user and that doesn't seem to be a good choice.
Spring boot: place a data.sql file in the classpath with a lot of inserts. If I'm not wrong I would have to write 150 insert into... statements.
In previous projects, I have used liquibase and it has a loadData command which is very convenient to do what is says it does. You just give the file, table name and that's it. You end up with your csv file values in your table rows.
Is there an alike way to do that in flyway? What is the best way to populate the database?
Actually there is a way, you can find more info on the official documentation's page
You need to add some spring boot properties too:
spring.flyway.enabled=true
spring.flyway.locations=classpath:/db/migration
spring.flyway.schemas=public
Properties details here
In my case, a use Repetables scripts by my needs but take care with the prefixes
Flyway is a direct competitor of liquidbase, so if you need to track the status of migrations, manage distributed migration (many instances of the same service start simultaneously, and only one instance should actually execute a migration), check upon startup which migration should be applied and execute only relevant migrations, and all other benefits that you have previously expected from "migration management system", then you should use Flyway rather than managing SQLs directly.
Spring boot has integrations with both Flyway and Liquidbase, so you can place your migrations in the "resources" folder, define a couple of properties and spring boot will run Flyway automatically.
For example, here you can find a tutorial of Flyway integration with spring boot.
Since flyway's migrations are SQL files- you can place there whatever you want (even plSQL I believe), it will even manage transaction per migration guaranteeing that the migration "atomicity" (all or nothing, no partial migration).
So the straightforward approach would be creating a SQL file file with 150 inserts and running it via flyway in spring or even via maven depending on your actual setup.
If you want more fine-grained control and the SQL is not flexible enough, its possible to implement Migration in Java Code. See Official Flyway Documentation

Spring task:scheduled or #Scheduler to restrict a Job to run in multiple instance

I have one #Scheduler job which runs on multiple servers in a clustered environment. However I want to restrict the job to run in only in one server and other servers should not run the same job once any other server has started it .
I have explored Spring Batch has lock mechanism using some Database table , but looking for any a solution only in spring task:scheduler.
I had the same problem and the solution what I implemented was a the Lock mechanism with Hazelcast and to made it easy to use I also added a proper annotation and a bit of spring AOP for that. So with this trick I was able to enforce a single schedule over the cluster done with a single annotation.
Spring Batch has this nice functionality that it would not run the job with same job arguments twice.
You can use this feature so that when a spring batch job kicks start in another server it does not run.
Usually people pass a timestamp as argument so it will by pass this logic, which you can change it.

Disable autowire of some of jobs

I have one job in my Spring Batch that doesn't use one of the data sources (Oracle Database). However, because I'm using autowire and autoconfigure in Spring Configuration files; that job also made a connection to Oracle when it starts.
I reduce the pool-size to 1 because the data source will not be used. But, I really need to remove that connection also, meaning that I don't want to even have one Inactive and not used connection.
Any idea of how to solve that in Spring Batch?

Spring Integration Invoking Spring Batch

Just looking for some information if others have solved this pattern. I want to use Spring Integration and Spring Batch together. Both of these are SpringBoot applications and ideally I'd like to keep them and their respective configuration separated, so they are both their own executable jar. I'm having problems executing them in their own process space and I believe I want, unless someone can convince me otherwise, each to run like they are their own Spring Boot app and initialize themselves with their own profiles and properties. What I'm having trouble with though is the invocation of the job in my SpringBatch project from my SpringIntegration project. At first I couldn't get the properties loaded from the batch project, so I realized I need to pass the spring.active.profiles as a Job Parameter and that seemed to solve that. But there are other things in the Spring Boot Batch application that aren't loading correctly like the schema-platform.sql file and the database isn't getting initialized, etc.
On this initial launch of the job I might want the response to go back to Spring Integration for some messaging on Job Status. There might be times when I want to run a job without Spring Integration kicking off the job, but still take advantage of sending statuses back to the Spring Integration project providing its listening on a channel or something.
I've reviewed quite a few Spring samples and have yet to find my exact scenario, most are with the two dependencies in the same project, so maybe I'm doing something that's not possible, but I'm sure I'm just missing a little something in the Spring configuration.
My questions/issues are:
I don't want the Spring Integration project to know anything about the SpringBatch configuration other than the job its kicking off. I have found a good way to do that reference to the Job Bean without getting my entire batch configuration loading.
Should I keep these two projects separated or would it be better to combine them since I have two-way communication between both.
How should the Job be launch from the integration project. We're using the spring-batch-integration project with JobLaunchRequest and JobLauncher. This seems to run it in the same process as the Spring Integration project and I'm missing a lot of my SpringBootBatch projects initialization
Should I be using a CommandLineRunner instead to force it to another process.
Is SpringApplication.run(BatchConfiguration.class) the answer?
Looking for some general project configuration setup to meet these requirements.
Spring Cloud Data Flow in combination with Spring Cloud Task does exactly what you're asking. It launches Spring Cloud Task applications (which can contain batch jobs) as new processes on the platform you choose. I'd encourage you to check out that project here: http://cloud.spring.io/spring-cloud-dataflow/

Resources