Spring Boot application not responding after pushing a large number of requests - spring-boot

I have a problem with a server that called server A:
Server A: Red Hat Enterprise Linux Server release 7.2 (Maipo)
Server B: Red Hat Enterprise Linux Server release 7.7 (Maipo)
jdk-8u231 installed on all of servers.
I have an Spring Boot application running on 2 servers.
Whenever i use Jmeter to send 100 concurrency request to application running on each servers, the application running on Server B have no problem.
But in Server A, the application will be not responding, that mean the Process (PID) still running but I can't visit actuator endpoint, cannot visit Swagger page, cannot send new request ... log file show nothing since that time.
Thread dump and heap dump have no significant difference.
Could anyone show me how to analysis that problem?
I still have no idea why the problem occur.

Well, I can only speculate here, but here some ideas that can help:
There are two possible sources of issue here Java Application and Linux (+its network policies, firewalls and so forth).
Since You don't know for sure, what happens, try working by "elimination".
Create a script that will run 100 concurrent requests. Place the script at the Server A (the problematic one) and run The script will run against "localhost" (obviously). If you see that it works, then the issue is not in Java at all. Probably some network policies or linux setup, who knows.
Place a log message in the controller of the java application and examine the log. The log should print the request number among other things, so that you'll be able to understand whether you get stuck after a well defined number of requests or its always a different number.
Check the configurations of Spring Boot application. Maybe there is a difference in the number of threads allocated to serve the request by the embedded web server that runs inside the spring boot application (assuming you're not using a reactive stack) and this number differs. In this case you won't be able to call rest endpoints, actuator, etc.
If JMX connection is available to the setup, connect via the JMX and check the MBean of Tomcat (again, assuming there is a tomcat under the hood) to check pretty much the same information as in 4.
You've mentioned thread dumps. Try to take more than one thread dump but one before you're running JMeter test, one during the running (when everything still works), one when everything is stuck.
In the thread dumps check the actual stacktraces, maybe all the threads are stuck working with Database or something and can't serve requests like I've explained in "4"
Examine GC logs, maybe GC works so hard that you can't really interact with the application.

Related

Managing ElasticSearch resources in Tomcat

I have a web app running in Tomcat that needs to connect to ES using the High Level Java API. I am usure about the best practices for managing the ES resources (client, transport) in that context.
In the past, I would create a brand new client for every request and close it (as well as its transport) when I was done with the request.
But now I read that it's best to use a single client in my app and across all threads (the client is apparently thread-safe).
I can see two issues with that approach.
Issue 1: client timing out
If the single client hasn't been used in a while, it may have timed out. So before I use the client, I need a way to check if the client is still alive. But I can't find clear doc on how to do that (at least not without pinging the server everytime).
Issue 2: can't tell when Tomcat is done with the client
When I run my app as a comnmand line main() app, I can close the client's trasport at the end of that main. But in a Tomcat context, my code has no way of knowing when Tomcat is done with the client and its transport.
I tried all sorts of tricks using finalize() but none of them work consistently. And from what I read, it's unwise to rely on finalize() to close resources as the JVM offers no garantee as to when an object will be GCed (if ever!).
Thx for your guidance.

Configure timing of opening ports in Spring-Boot application

Question:
Is there an option within spring or its embedded servlet container to open ports when spring is ready to handle traffic?
Situation:
In the current setup i use a spring boot application running in google cloud run.
Circumstances:
Cloud run does not support liveness/readyness probes, it considers an open port as "application ready".
Cloud run sends request to the container although spring is not ready to handle requests.
Spring start its servlet container, open its ports while still spinning up its beans.
Problem:
Traffic to an unready application will result in a lot of http 429 status codes.
This affects:
new deployments
scaling capabilities of cloud run
My desire:
Configure spring/servlet container to delay opening ports when application is actually ready
Delaying opening ports to the time the application is ready would ease much pain without interfering too much with the existing code base.
Any alternatives not causing too much pain?
Things i found and considered not viable
Using native-image is not an option as it is considered experimental and consumes more RAM at compile time than our deployment pipeline agents allow to allocate (max 8GB vs needed 13GB)
another answer i found: readiness check for google cloud run - how?
which i don't see how it could satisfy my needs, since spring-boot startup time is still slow. That's why my initial idea was to delay opening ports
I did not have time to test the following, but one thing i stumbled upon is
a blogpost about using multiple processes within a container. Though it is against the recommendation of containers principles, it seems viable for the time until cloud run supports probes of any type.
As you are well aware of the fact that “Cloud Run currently does not have a readiness/liveness check to avoid sending requests to unready applications” I would say there is not much that can be done on Cloud Run’s side except :
Try and optimise the Spring boot app as per the docs.
Make a heavier entrypoint in Cloud Run service that takes care of
more setup tasks. This stackoverflow thread mentions how “A
’heavier’ entrypoint will help post-deploy responsiveness, at the
cost of slower cold-starts” ( this is the most relevant solution
from a Cloud Run perspective and outlines the issue correctly)
Run multiple processes in a container in Cloud Run as you
mentioned.
This question seems more directed at Spring Boot specifically and I found an article with a similar requirement.
However, if you absolutely need the app ready to serve when requests come in, we have another alternative to Cloud Run, Google Kubernetes Engine (GKE) which makes use of readiness/liveness probes.

503 error on server load tests on Wildfly server on Jelastic

I have an app deployed on a wildfly server on the Jelastic PaaS. This app functions normally with a few users. I'm trying to do some load tests, by using JMeter, in this case calling a REST api 300 times in 1 second.
This leads to around 60% error rate on the requests, all of them being 503 (service temporarily unavailable). I don't know what things I have to tweak in the environment to get rid of those errors. I'm pretty sure it's not my app's fault, since it is not heavy and i get the same results even trying to test the load on the Index page.
The topology of the environment is simply 1 wildfly node (with 20 cloudlets) and a Postgres database with 20 cloudlets. I had fancier topologies, but trying to narrow the problem down I cut the load balancer (NGINX) and the multiple wildfly nodes.
Requests via the shared load balancer (i.e. when your internet facing node does not have a public IP) face strict QoS limits to protect platform stability. The whole point of the shared load balancer is it's shared by many users, so you can't take 100% of its resources for yourself.
With a public IP, your traffic goes straight from the internet to your node and therefore those QoS limits are not needed or applicable.
As stated in the documentation, you need a public IP for production workloads (a load test should be considered 'production' in this context).
I don't know what things I have to tweak in the environment to get rid of those errors
we don't know either and as your question doesn't provide sufficient level of details we can come up only with generic suggestions like:
Check WildFly log for any suspicious entries. HTTP 503 is a server-side error so it should be logged along with the stacktrace which will lead you to the root cause
Check whether Wildfly instance(s) have enough headroom to operate in terms of CPU, RAM, et, it can be done using i.e. JMeter PerfMon Plugin
Check JVM and WildFly specific JMX metrics using JVisualVM or the aforementioned JMeter PerfMon Plugin
Double check Undertow subsystem configuration for any connection/request/rate limiting entries
Use a profiler tool like JProfiler or YourKit to see what are the slowest functions, largest objects, etc.

How to diagnose why a web server with idle applications becomes unresponsive?

I have a Digital Ocean droplet (512MB RAM, 20GB SSD Disk, Ubuntu 13.10 x64) on which
a MongoDB instance and
a Tomcat 7 server
run.
On the Tomcat server, following applications are installed
Apache CXF-based application, which takes processes web service requests, interacts with the database and executes scheduled jobs,
Vaadin application,
JSF (Primefaces) application and
Psi Probe.
When I
restart Tomcat,
use the Vaadin and/or JSF application,
then for several weeks do nothing on that machine (it basically is idle during that time),
then try to open the JSF and/or Vaadin application,
I find the site unresponsive (nothing is displayed after I enter the URL in the browser).
When I restart Tomcat (sudo service tomcat7 restart), everything works again. I don't see any obvious problems in the Tomcat logs.
How can I find out,
whether the problem is on the Tomcat side (one of the applications consumes too many resources even if idle) or on the OS side (nothing happens on the machine and therefore the OS puts itself into a "hibernating" mode) and
if the problem is with Tomcat, exactly which of the application is causing it?
Please start from top to bottom.
then try to open the JSF and/or Vaadin application,
I find the site unresponsive (nothing is displayed after I enter the
URL in the browser).
Check if the service is still running before restarting sudo service tomcat7 status and/or ps -ef | grep tomcat
Check with netstat -patune | grep <portnumber, e.g. 443> if the server is listening on the configured ports
Check your httpd/Apache/Tomcat access logs if the request reaches the server and if yes, check if there are errors or timeouts related to the requests
Check if the DB connection is still possible
To force some error logs, try to change your maxIdle, maxActive and maxWait attributes of your Tomcat's Connection Pool configuration. maxWait default is -1, connections created sometimes during these weeks will wait forever.
#Patrick provided some excellent basic tests.
I notice that you only have 512 MB of RAM. With some fairly beefy software such as tomcat, plus MongoDB on top of that, your machine may simply be overloaded.
Based on that, I would propose a couple additional things to check:
sudo free
Should tell you how much memory is being used, and how much swap space you use.
sudo top
Will tell you which process is using the most memory. You may want to sort the output of top by memory (default is usually by CPU utilization).
Most importantly, check the log files in /var/log (in particular /var/log/messages). You may find indications that the kernel killed one of your processes (probably tomcat).

mod_jk vs mod_cluster

Can someone please tell me the pro's and con's of mod_jk vs mod_cluster.
We are looking to do very simple load balancing.. We are going to be using sticky sessions and just need something to route new requests to a new server if one server goes down. I feel that mod_jk does this and does a good job so why do I need mod_cluster?
If your JBoss version is 5.x or above, you should use mod_cluster, it will give you a better performance and reliability than mod_jk. Here you've some reasons:
better load balacing between app servers: the load balancing logic is calculated based on information and metrics provided directly by the applications servers (bear in mind they have first hand information about its load), in contrast with mod_jk with which the logic is calculated by the proxy itself. For that, mod_cluster uses an extra connection between the servers and the proxy (a part from the data one), used to send this load information.
better integration with the lifecycle of the applications deployed in the servers: the servers keep the proxy informed about the changes of the application in each respective node (for example if you undeploy the application in one of the nodes, the node will inform the proxy (mod_cluster) immediately, avoiding this way the inconvenient 404 errors.
it doesn't require ajp: you can also use it with http or https.
better management of the servers lifecycle events: when a server shutdowns or it's restarted, it informs the proxy about its state, so that the proxy can reconfigure itself automatically.
You can use sticky sessions as well with mod cluster, though of course, if one of the nodes fails, mod cluster won't help to keep the user sessions (as it would happen as well with other balancers, unless you've the JBoss nodes in cluster). But due to the reasons given above (keeping track of the server lifecycle events, and better load balancing mainly), in case one of the servers goes down, mod cluster will manage it better and more transparently to the user (the proxy will be informed immediately, and so it will never send requests to that node, until it's informed that it's restarted).
Remember that you can use mod_cluster with JBoss AS/EAP 5.x or JBoss Web 2.1.1 or above (in the case of Tomcat I think it's version 6 or above).
To sum up, though your use case of load balancing is simple, mod_cluster offers a better performance and scalability.
You can look for more information in the JBoss site for mod_cluster, and in its documentation page.

Resources