Spring Boot Admin high cpu usage / details page refresh interval - performance

I am in the 'forbidden' scenario of having SBA server and client in one app (so there is only one app registered for any SBA server instance, which is itself). For various reasons I can't change that.
I see a huge cpu usage when being on the insights>details page. No clue why, the only thing I can think of is I have a "lot of" caches (like 40). Any guesses on that cpu usage?
The second thing is: I am unable to find a config setting how often the graphs on the detail page get updated. Is there no way to get that "slower" ? To try out if that may be the reason.

My hints to less the performance hunger of Spring Boot Admin is to reduce the checking interval of the services. You can do this in your Spring Boot Admin properties/yml file via the following in yml for example to refresh status of connected services every 2 minutes:
spring:
boot:
admin:
monitor:
status-interval: 120000ms
status-lifetime: 120000ms
info-interval: 120000ms
info-lifetime: 120000ms
Further details for configuration the monitor interval can be found here: https://codecentric.github.io/spring-boot-admin/current/#_configuration_options
I am not aware how to configure the refresh interval of the graphics such as Threads, Memory, etc. and I think there is no configuration possible for that, currently.
I hope that this information helps you.

Related

Configure timing of opening ports in Spring-Boot application

Question:
Is there an option within spring or its embedded servlet container to open ports when spring is ready to handle traffic?
Situation:
In the current setup i use a spring boot application running in google cloud run.
Circumstances:
Cloud run does not support liveness/readyness probes, it considers an open port as "application ready".
Cloud run sends request to the container although spring is not ready to handle requests.
Spring start its servlet container, open its ports while still spinning up its beans.
Problem:
Traffic to an unready application will result in a lot of http 429 status codes.
This affects:
new deployments
scaling capabilities of cloud run
My desire:
Configure spring/servlet container to delay opening ports when application is actually ready
Delaying opening ports to the time the application is ready would ease much pain without interfering too much with the existing code base.
Any alternatives not causing too much pain?
Things i found and considered not viable
Using native-image is not an option as it is considered experimental and consumes more RAM at compile time than our deployment pipeline agents allow to allocate (max 8GB vs needed 13GB)
another answer i found: readiness check for google cloud run - how?
which i don't see how it could satisfy my needs, since spring-boot startup time is still slow. That's why my initial idea was to delay opening ports
I did not have time to test the following, but one thing i stumbled upon is
a blogpost about using multiple processes within a container. Though it is against the recommendation of containers principles, it seems viable for the time until cloud run supports probes of any type.
As you are well aware of the fact that “Cloud Run currently does not have a readiness/liveness check to avoid sending requests to unready applications” I would say there is not much that can be done on Cloud Run’s side except :
Try and optimise the Spring boot app as per the docs.
Make a heavier entrypoint in Cloud Run service that takes care of
more setup tasks. This stackoverflow thread mentions how “A
’heavier’ entrypoint will help post-deploy responsiveness, at the
cost of slower cold-starts” ( this is the most relevant solution
from a Cloud Run perspective and outlines the issue correctly)
Run multiple processes in a container in Cloud Run as you
mentioned.
This question seems more directed at Spring Boot specifically and I found an article with a similar requirement.
However, if you absolutely need the app ready to serve when requests come in, we have another alternative to Cloud Run, Google Kubernetes Engine (GKE) which makes use of readiness/liveness probes.

502 server error in Google App Engine Flexible when load testing with JMeter

I have deployed a simple Spring boot app in Google App Engine Flexible. The app. has two APIs, one to add the user data into the DB (xxx.appspot.com/add) the other to get all the user data from the DB (xxx.appspot.com/all).
I wanted to see how GAE scales for the load, hence used JMeter to create a load with 100 user concurrency ramped up in 10 seconds and calls these two APIs in half a second delay, forever. While it runs fine for sometime (with just one instance), it starts to fail after 30 seconds or so with a "java.net.SocketException" or "The server responded with a status of 502".
After this error, when I try to access the same API from the browser, it displays,
Error: Server Error
The server encountered a temporary error and could not complete your
request. Please try again in 30 seconds.
The service is back to normal after 30 mins or so, and whenever the load test happens it repeats the same behavior as mentioned above. I expect GAE to auto-scale based on the load coming in to handle it without any down time (using multiple instances), instead it just crashes or blocks the service (without any information in the log). My app.yaml configuration is,
runtime: java
env: flex
service: hello-service
automatic_scaling:
min_num_instances: 1
max_num_instances: 10
I am a bit stuck with this one, Any help would be greatly appreciated. Thanks in advance.
The solution was to increase the resource configuration, details below.
Given that I did not set a resource parameter, it defaulted to the pre-defined values for both CPU and Memory. In this case, the default
memory was set at 0.6GB. App Engine Flex instances uses about 0.4GB
for overhead processes. Given Java is known to consume higher memory, there is a
great likelihood that the overhead processes consumed more than the
approximate 0.4GB value. Now instances in App Engine are restarted due
to a variety of reasons including optimization due to memory use. This
explains why your instances went off and it shows Tomcat is starting
up (they got restarted) and ends up in 502 error due to the nginx is
not able to complete the request. Fixing the above may lessen if not completely eliminate the 502s.
After I have specified the resources attribute and increased the configuration in app.yaml 502 error seems to be gone.

Eureka registry issue

I am running few test with Eureka and seeing the issue though I shut down the micoservices , it still shows services are up and running, ribbon got the server list and call failed with 404. I went through the eureka docs 85% rule, still this one is tricky. If I disabled the self preservation mode it works, but I don't want to do that as per recommendations in prod. so what is the best configuration to not face this issue?
The configuration options are very rich both on the client and the server side, but firstly you must bear in mind that default properties' values are supposed to work for Netflix, where are hundreds of microservices. When you have a small infrastructure, then 85% threshold is pretty strict. One way is to decrease it using eureka.server.renevalPercentThreshold property. You need to estimate the best value for your needs, depending mainly on the number of instances that register in Eureka.
When you decide to switch self preservation mode off, then you can configure eureka.server.evictionIntervalTimerInMs property, so that services will disappear from registry after time period prefered by yourself. Moreover you can configure (per each instance that registers in Eureka) eureka.instance.leaseExpirationDurationInSeconds, which is a time that Eureka server waits since it received last heartbeat from the service before it evicts it.
The following classes are very well documented, and you can figure out what is configurable and may be useful for you:
com.netflix.discovery.EurekaClientConfig.java, com.netflix.appinfo.EurekaInstanceConfig.java, com.netflix.eureka.EurekaServerConfig.java

java.net.SocketException: Connection reset at Jmeter

I'm doing a load test on a web application, and with minimum of 14-15 users am getting this connection reset issue and I ensure the following from my end:
Request retries has been set to 1 in user.properties files
stale check is set to true
Test data and lan connectivity is good.
number of users are less hence it wont need more RAM for jmeter
Hence could this be concluded as an issue in application design and not an issue from Jmeter?
To avoid long trail of comments, I'll try to summarize it and answer.
This issue looks from application deployment system.
JMeter ---------------> ( Web server <-> App server <-> DB )
Find out in which area bottleneck is present using profilers.
Issue could be in anyone of below layers,
Web Server :
If Web server is bottleneck then try to tune the web server for handling more load. Like more threadpool size, more timeouts, buffers, queues
Application Server :
If app server is bottleneck then tune your application server. Again check configurations, any specific settings for handling more load and if required code improvement should be done.
Database Server :
If DB is bottleneck then check queries, indexes, statistics and optimize them for your needs. config settings also help sometimes.
For all layers check server resource utilization. If it is not much then there is room for perf. improvement else server vertical/horizontal scaling is required.
You are saying problem is because some ids were not generated in DB. so you can start with DB layer for possible bottlenecks.
Hope this helps :)

About App Engine scalability and the 60 seconds timeout

I have an app-engine+spring+hibernate mobile/web backend configured with F2 instances and D2 cloud sql instance. I've also configure warm-ups by configuring Idle Instances to be minimum 1.
I have two questions:
Is it possible to configure cloud sql instances to scale up when
needed?
My app takes about 20-40 seconds to start (after removing autowiring and doing all the optimization tips described here: https://developers.google.com/appengine/articles/spring_optimization). Still I get slow latency (~20-40 seconds) for some of the requests during load testing. I believe this is because app engine starts new instances and it takes them this much time to start. After the instances are up and running, everything is working fine, until too many users connect and again the delay. Is there a way I can solve this other then configuring more minimum live instances?
For your question regarding Cloud SQL, it currently does not have auto scaling capability.
As Tony said, you can't configure Cloud SQL to automatically scale according to demand. You could, of course, configure it to serve a higher expected demand from the beginning.
On the side, I'd like to suggest different things you could do with your app servers:
Change from F2 to F4 or F4_1G (if you're using a lot of memory) to see if that reduces your startup time.
If you're not doing it yet, you could use AppStats [1] to get a better understanding of which are the bottlenecks of your app. If it's only the startup time, and (1) doesn't help, I'm sorry that configure more idle servers would be the answer you're looking for.
[1] https://developers.google.com/appengine/docs/java/tools/appstats

Resources