We have Spring Cloud Data Flow running in Kubernetes in order to orchestrate Spring Batch jobs. For each new file we have in, Spring Cloud Data Flow spins up a new Spring Batch task.
Spring Batch accesses database and uses the connection pool, holding (by default) 10 connections to database. That limits us the number of jobs that we can run at the same time, going against scalability principles. Only solutions we've found so far are are:
Reduce the Spring Batch connection pool: we cannot reduce it too much since we apply multithreading.
Increase the max number of connections in the database: it does not scale.
We were wondering whether there is any way of delegating the interaction of the Spring Batch database tables to Spring Cloud Data Flow through API.
Thanks.
Related
I was reviewing a code which seems to lose connection to DB approximately once a month intermittently. It leaves no logs on DB side but server prints a log of Connection closed by peer.
Its a Spring Boot App (2.1.9) which uses Spring Integration to read from queues (JMS).
Under pom.xml spring-boot-started-jdbc , Hikari is excluded and tomcat is used.
The app is designed in such a way that it has an in-queue to get traffic, then series of internal queues to split the workload and finally an out queue to the destination.
This app has 4 datasources created all pointing to the same oracle database.
First datasource is used for all CRUD operations in the flow between in-queue and first internal queue. Second datasource is used for all CRUD operations in the flow between first and second internal queue. Third datasource is used between second internal queue and destination queue. Fourth datasource is not used anywhere which suggests some flow was commented but datasource was never removed.
Each datasource has different min and max connections but rest of the properties are same.
Wanted to understand is that a good practice to create
datasources for each spring integration flow pointing to same database?
Can it cause any problems to leave a datasource initialized without using it anywhere?
Is there any obvious reason why HikariCP shouldn't be chosen for production when lot of matrix online show Hikari is superior to Tomcat and Spring boot 2 has chosen it as default for a reason?
I have seen a few projects where developers are using RxJava with Tomcat and Mysql on Spring Boot.
As per my knowledge:
The main advantage of reactive streams is that it only creates a single thread per multiple requests, and hence database connection should also be non-blocking.
Tomcat creates threads per request.
Spring Data Jpa is blocking.
I know that there are libraries for non-blocking Relational databases (Like r2dbc).
So, I am specifically confused about the tomcat and RxJava benefits.
I would like to know the benefits of RxJava for the following scenarios:
Rest Api on tomcat with Spring data JPA (Mysql).
Rest Api on tomcat with R2dbc (MySql).
Thanks.
Benefits of Spring MVC and JPA (blocking), linear, easy to write and debug code. Slow clients may be slowing your app down.
Reactive Spring:
Small pool of threads handling many more requests - less memory consumed.
Downside: Takes time to start thinking 'reactively'.
More:
https://www.baeldung.com/spring-mvc-async-vs-webflux
Also:
https://dzone.com/articles/micronaut-mastery-using-reactor-mono-and-flux
If you rest API doesn't always go to the database you could benefit from that approach.
I have a multiple Spring Boot based Micro services which connect a DB2 data base (Master BD). We want to have same replica of Master DB which is called Slave DB2 DB. Every month we have some maintenance on master DB for 5-10 hrs during this time we want all our apps to automatically connect to Slave DB after this time period apps should switch back to Master without manual intervention.
Is this possible to achieve in Sprint Boot. I thought of using Spring Cloud Hystrix but is it correct architectural pattern. Any other better approach.
It's possible to do this on the infrastructure level, your apps does not need to know that there was a failover.
If you want to solve this on the application side, you can use Spring Cloud Circuitbreaker (Hystrix is deprecated, but you can use it with Resilience4J).
I would like to build a multi tenant Spring Boot application. I prefer persisting the data for every tenant in a separate database.
This approach implies the usage of a data source (and a connection pool) for each tenant/database.
In my scenario I have about 1000 tenants. Does anyone have experience with having 1000 connection pools in a Spring Boot application?
What do you recommend for such a scenario?
Thanks!
We are planning to retire the existing legacy java batch applications and recreate it with the latest available batch framework.
Given that we have a large number of batch jobs to be modernised, we are looking for a framework or architecture that would allow us to
Develop a batch solution that would allow us to dynamically deploy a new batch as and when they are created, without disturbing the existing deployed applications. - Does Spring cloud Task provide any of this feature. Note: We are looking only to deploy the apps to our local server, and has nothing to do with cloud.
If Spring Batch/Boot can provide us the feature we typically expect from a batch application, what is the special value add to go for Spring Cloud Task? - I wasn't able to completely understand this from the Spring documentation available online.
From the documentation of the Spring Cloud Task, I was able to understand that it allows an application to have many tasks within it. What should I do if each of the tasks have their own library dependencies, which might contradict with the dependencies of other Tasks? So in that case, should each of these tasks moved to a new application or this there a work around for that?
To answer your questions:
Does Spring Cloud Task handle orchestration - No. Spring Cloud Task does not handle orchestration of tasks or jobs. The component in this ecosystem that handles the deployment/orchestration of tasks or jobs is really Spring Cloud Data Flow (which is why I asked if you use any type of cloud platform including YARN, Cloud Foundry, Kubernetes, or Mesos...the environments supported by Spring Cloud Data Flow).
What added value does Spring Cloud Task provide over Spring Boot/Spring Batch - Spring Cloud Task is designed to provide a few things:
Similar abilities to Spring Batch with regards to state management without needing to create a batch job. When running a Boot application on a cloud environment, there is no standard way of getting the results from environment to environment (YARN handles job results differently from tasks on Cloud Foundry which is different from jobs on Kubernetes, etc). Spring Batch provides this but now all short lived processes need the overhead of the Batch API so Spring Cloud Task provides a lighter touch to those use cases.
Automatically adds informational listeners. With Spring XD, when you ran a job in an XD container, the XD container automatically added a number of informational listeners that broadcast events that you could listen for. Spring Cloud Task brings the same functionality without the need for the XD container.
Integration with Spring Cloud Stream. Spring Cloud Task provides the ability to launch tasks from messages received from Spring Cloud Stream. Also, the informational messages previously mentioned (both Batch events as well as Task events) are sent via Spring Cloud Stream channels.
The DeployerPartitionHandler. When working in a cloud environment, this PartitionHandler implementation allows you to launch workers for a partitioned batch job as tasks. This allows for the dynamic scaling of partitioned batch jobs instead of the traditional option of pre-deploying workers that listen for work which wastes resources in a modern cloud environment.
How does the packaging of multiple tasks work with dependencies - In short, this is not recommended. The idea of a Spring Cloud Task is that the execution of the Spring Boot application is the Task. While you could package up multiple tasks and using different methods, have them execute based on different stimulus, that goes against the 12 factor application concepts which are essential for correct use of Spring Cloud Task.
My two cents
For the best option for a modern batch platform, you really need to look into some from of platform first and that begins at the Cloud Foundry/Kubernetes/Mesos/YARN layer. Without that, you end up building a large part of the infrastructure yourself. That is why Spring XD evolved into Spring Cloud Data Flow. The added complexity that lived in the containers of Spring XD is removed by requiring a modern platform to run on (since they all handle those guarantees themselves). Without that piece, you're going to spend a lot of time managing the deployment and orchestration of applications that most modern platforms handle for you.
From there, the choice becomes pretty easy IMHO with Spring Cloud Task for simple tasks, Spring Batch for batch jobs, and Spring Cloud Data Flow for orchestration.