Scale SpringBoot App based on Thread Pool State - spring-boot

We have a Spring Boot microservice which should get some data from old / legacy system. This microservice exposes external modern REST API. Sometimes we have to issue 7-10 requests to the legacy system in order to get all the data we need for single API call. Unfortunately we can't use Reactor / WebClient and have to stick with WebServiceTemplate to issue those "legacy" calls. We can't also use Reactive Spring WebClient - Making a SOAP call
What is the best way to scale such a miroservice in Kubernetes? We have very big concerns that Thread Pool used for parallel WebServiceTemplate invocation will be depleted very fast, but I'm not sure that creating and exposing custom metric based on active threads count / thread pool size is a good idea.
Any advice will be helpful.

Enable Prometheus exporter in Spring
Make sure metrics are scraped. You're going to watch for a threadpool_size metric. Refer your k8s/prometheus distro docs to get prometheus service discovery working for you.
Write a horizontal pod autoscaler (HPA) based on a Prometheus metric:
Setup Prometheus-Adapter and follow the HPA walkthrough.
Or follow this guide https://github.com/stefanprodan/k8s-prom-hpa
Depending on what k8s distro you are using, you might have different ways to get the Prometheus and prometheus discovery:
(example platform built-in) https://cloud.google.com/stackdriver/docs/solutions/gke/prometheus
(example product) https://docs.datadoghq.com/integrations/prometheus/
(example opensource) https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
any other prometheus solution

Related

Spring Reactive Stack with Spring for Apache Kafka

In a few words:
I'm trying to decide between using the default Spring for Apache Kafka stack, KafkaTemplate or the pair, ReactiveKafkaProducerTemplate and ReactiveKafkaConsumerTemplate for my Reactor based application.
Some more context:
In the company I work we're developing a high-disponibility application aiming to publish a set of requests directly to a Kafka Broker. Since this is an API centric application expecting to receive a few millions of requests per week, we decided to go with a stack based on the Project Reactor with Spring WebFlux and Kotlin.
After doing some digging I've discovered that the Spring for Apache Kafka has a simple wrapper designed around the Reactor Kafka implementation, but this wrapper lacks a lot of the functionalities present in the default KafkaTemplate mentioned before, things like: A Metrics Binder out of the box (for prometheus integration), associated factories, extensive documentation, Auto configuration, etc.
I'm trying to understand what I'm really giving up when using the default implementation in favor of the Reactive one. Am I giving up back pressure functionality? Am I sacrificing the Reactive Stack present in my application? Will this be a toll in the future? Does anyone has some experience in working with a Reactive Stack alongside a non-reactive solution?
I have, also, a few concerns regarding the DLT flow facilitated in the default implementation, things like the SeekToCurrentErrorHandler strategy

Jaeger with ElasticSearch

I have created a microservice based architecture using Spring Boot and deployed the application on Kubernetes/Istio platform.
The different microservices communicate with each other using either JMS (ActiveMQ) or REST API.
I am getting the tracing of REST communication on Istio's Jaeger but the JMS based communication is missing in Jaeger.
I am using ElasticSearch to store my application logs.
Is it possible to use the same ElasticSearch as a backend(DB) of Jaeger?
If yes then I will store tracing specific logs in ElasticSearch and query them on Jaeger UI.
I believe you can reuse Elasticsearch for multiple purposes - each would use a different set of indices, so separation is good.
from: https://www.jaegertracing.io/docs/1.11/deployment/ :
Collectors require a persistent storage backend. Cassandra and Elasticsearch are the primary supported storage backends
Tying the networking all together, a docker-compose example:
How to configure Jaeger with elasticsearch?
While this isn't exactly what you asked, it sounds like what you're trying to achieve is seeing tracing for your JMS calls in Jaegar. If that is the case, you could use an OpenTracing tracing solution for JMS or ActiveMQ to report tracing data directly to Jaegar. Here's one potential solution I found with a quick google. There may be others.
https://github.com/opentracing-contrib/java-jms

Spring Boot Micro Service Tracing Options

I am having below requirement for which is there any open source library will cover all of them.
1.We are building a distributed micro service architecture with Spring Boot.Which includes more than 100 micro services.
2.There is a lot if inter micro service communications possible to achieve single transaction.
3.We want to trace every micro service call and the trace should provide following information.
a.Transaction ID/Trace ID
b. Back end transaction status-HTTP status for REST.Like wise for SOAP as well.
c.Time taken for that call.
d.Request and Response payload.
Currently we are achieving this using indigenous tracing frame work.Is there any open source project will handle all this without any coding from developer.I know we have few options with spring Boot Cloud Zipkin,Seluth etc does this handle above requirements.
My project has similar requirements to yours. IMHO, Spring-cloud-sleuth + Zipkin work well in my case.
For any inter microservices communication, we are using Kafka, and Spring-cloud-sleuth + zipkin has no problem to trace all the call, from REST -> Kafka -> More Kafka -> REST.
To enable Kafka Tracing, just simply add
spring:
sleuth:
propagation-keys: some-key
sampler:
probability: 1
messaging:
kafka:
enabled: true
We are also using Azure ApplicationInsights to do centralized logging, which is well integrated with Spring Cloud.
Hope above give you some confidence of using Sleuth + Zipkin.

How can you scale a Spring Boot application?

I understand that Spring Boot has a built-in Tomcat server (or Jetty) which facilitates rapid development. But what do you do when you need to scale out your application because traffic has increased?
As pointed out in the comments, there is no silver bullet here, it depends on your infrastructure and there are several tools out there to help you, you only need to choose what works best for you.
For load balancing you can either choose something like an Nginx or leave it to spring cloud which also has a lot of other handy features for scaling/clustering.
Scaling shouldn't be very hard because spring boot runs on it's own server.
Some tools that help with scaling/clustering:
Spring boot app:
If you are going to scale, your app has to be near-stateless (e.g: you cannot have a scheduled task or something like that because when you scale to x instances, they are executed x times).
You can use the spring cloud project for extra added features like service discovery and other goodies that make scaling easier (e.g: When you spin up a new instance, it can get the config easily from a config server, 'register' to ease the loadbalancing between services, have cluster-like behaviour, etc...).
Infrastructure and containers:
Docker is a no-brainer here to handle easy launching of your applications and their replicas, if needed. If you can go further with resources and go with Kubernetes but it all depends on the use case.
Various servers (nodes), in case one of them fails and to easily distribute loads.
Ngnix for load balancing is pretty straightforward if you already don't have something done with spring cloud.
Database:
You really do NOT want to go with MySQL here because it can not scale well as your spring apps. You can choose something like Cassandra or Redis but that would mean restructuring your data model. Maybe the least-painful transition from MySQL to something NoSQL that can scale is a MongoDB (imho: Cassandra performs better).
Logging:
This can be a nightmare but spring also has a solution for this. Check out zipkin and spring sleuth.
Also, there are a lot resources here that talk a lot about architecture in general and how it is necessary to change the mindset when trying to run distributed services.
Hope this helps.
Update 2021-02-23
Today, Kubernetes is pretty much a de-facto standard when we talk about scaling and is preferred because of the rich set of features that you will be able to leverage and focus your app purely on business domain logic and can remove things like spring cloud for service discovery. If you can use some public clouds like EKS and GKE, you are better off without having to manage the clusters by yourself.
It provides autoscaling and built-in healthchecks. Starting from Spring Boot 2.4, you have many added benefits for running Spring Boot on K8s like dedicated healthcheck endpoints for liveness and readiness probes, graceful shutdown, etc....
On the database side, aim for something that is managed and scales easily such as AWS Aurora or similar.
An important thing to mention when managing spring boot services at scale is probably configuration management. A very useful solution that you can use out of the box is Consul. This will enable you to hot reload the configuration which is important when you have 50 services that you need to restart only to change one boolean variable. Depending on how big is your application, the startup can be costly, in terms of time as well as CPU/memory resources

How to monitor streaming apps Inside SCDF?

I am novice to Spring Cloud Data flow and Stream Cloud Streaming Applications.
Currently my project diagram looks like following :
I route a POST request from outside client using zuul API gateway to a microservice called Composite. Composite creates a stream using REST POST and deployes onto Spring Cloud Data Flow Server. As far as I know the microservices mongodb and file run as co-existing JVM processes. If My client has to know the status of stream, status of the processed data, How should Composite Microservice interact with Spring Cloud Data Flow Server? Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server. Does SCDF expose any hooks to look at the individual apps? Also how can I change the flow #runtime to create a dynamic mesh?
Currently I am using Local Spring Cloud Data Flow Server for development.
Runtime platform is local
Local runtime is recommended only for development purpose and if you're preparing for production, please make sure to choose a platform variant (eg: cf, k8s, yarn, ..) that comes with non-functional requirements to support reliable and durable execution of all the applications running in streaming pipeline.
As far as I know the microservices mongodb and file run as co-existing JVM processes.
If your stream definition is file | mongodb, you'd have 2 different JVM's even when using Local runtime. They're independent Boot applications.
How should Composite Microservice interact with Spring Cloud Data Flow Server?
Not clear what you mean by "composite" here. All the microservice applications in SCDF communicate via messaging middleware such as Kafka or Rabbit. SCDF provides the orchestration capability to run such applications into various runtime platforms.
Currently when I make POST call to deploy the stream I dont even get the status from SCDF Server
You can use SCDF's REST-APIs to query for current status of the apps and it is platform agnostic. You can view the list of supported APIs by hitting the root URL (see image below) - there's a gap in docs - we will fix it. Following APIs could be useful for status checks.
Does SCDF expose any hooks to look at the individual apps?
Once the apps are deployed in a runtime platform, you can take advantage of Boot's actuator endpoints to explore more details such as trace, metrics, health, env among others at each application level. See Boot's actuator endpoints for more details. For instance, if your mongodb app is running locally and on port 23000, then you can check granular metrics for this application at: http://localhost:23000/metrics.
[As an FYI: future SCDF releases would include integrating Spring Boot + Spring Cloud Sleuth metrics and visual representation of the same.]
Also how can I change the flow #runtime to create a dynamic mesh?
If you're referring to editing a running streaming pipeline with addition/deletes, we are currently exploring design approach to support this functionality.

Resources