spring-cloud-config refresh cause thread leaks - spring-boot

dependency:
spring-boot: 1.5.2
spring-cloud: Dalston.SR3
using spring-cloud series:
config erueka zuul bus kafka
I use git webhook and bus to auto refresh zuul routing config. After a week, found out that over 3000 threads were created.
thread dump report here: report from fastthread
And figure out that, everytime when call the XXX/bus/refresh endpoint, will cause threads number increase by 7.
increased thread list:
DiscoveryClient-0
DiscoveryClient-1
DiscoveryClient-2
...
After some debugging and tracing, I find that when refresh ,
first will call EurekaClientConfiguration#eurekaClient and then call RefreshableEurekaClientConfiguration#eurekaClient.
Since annotation of these are #ConditionalOnMissingRefreshScope and #ConditionalOnRefreshScope, I think only one of them will be invoked.
I am not sure whether or not is this cause the problem. But when I removed config parts, everything works fine. Can anyone help? Thx!

Related

How to debug spring boot application not starting

Spring lists SO as the only place to ask questions on their community page, which is why I ask this rather generic question here. It may not be the best fit for SO, but, according to Spring's community overview page, there's no other adequate place to ask such questions.
I have a spring boot application built on spring cloud gateway (version 2) which also uses an embedded hazelcast cluster. It runs in multiple instances, which communicate via hazelcast. Everything works fine, except under heavy load. If one instance fails, restarting it is no longer possible.
When the instance is restarted while the cluster of instances is under heavy load, it will start creating and wiring beans, up to some point, after which it will not do anything spring-related anymore. Hazelcast-generated messages are visible in the log (with root log level DEBUG), past that point, but nothing generated by spring or the application itself.
In order to restart that one instance that failed, I need to stop the load generation, wait some 10-15 minutes, then restart the failed instance. Then the new/restarted instance starts up rather quickly, with no problems at all.
The load consists of http requests which get proxied to another application, and is of such nature that it generates a lot of read accesses to hazelcast's distributed storage, but very few writes.
My problem: I have no idea how to debug this. Since the http endpoint never becomes available, there's no way I can query metrics or other actuator information.
So my question is: what tools or mechanisms can I employ to debug this problem? I.e. how can I find out exactly how the boot sequence under heavy load of the other instances of the hazelcast cluster differs from the boot sequence when there is no load at all in the cluster? Once I have this information, the problem is narrowed down enough for me to investigate it further on my own.
I didn't find a way to debug the problem, but had an idea of what might cause it, tried it, and it was a fix.
My application was running as a Kubernetes deployment. A few beans inside the application were relying on a usable CP subsystem during their initialization. Spring's bean initialization process is by necessity sequential and blocking, to account for inter-bean dependencies.
I hypothesized that under heavy load, for whatever reason, the initialization of those beans was blocking forever. As a first experiment, I made that initialization code async, so that Spring can finish bean wiring, even if, until that async part finished too, the instance was unable to perform usable work, to see if that was the problem, at least.
To my surprise, that fully fixed the problem. This way, Spring finished bean wiring, the HZ-dependant initialization also finished rather quickly, when executed async, even under high load, and the instance became usable soon after being started.
I didn't have the time to dig deeper to find out what the precise failure mechanism was. What I believe might have been the problem is the interaction between HZ and K8s. K8s-based discovery works using a K8S service. A pod/instance isn't added to the service until it becomes healthy. If a bean inside the application prevents initialization, the instance is never added to the service. As such, discovery never finds the new/restarted instance. I don't know what effect this might have on the HZ cluster's inner workings.

Spring Boot Scheduler

I have used a Spring Boot Scheduler with #Scheduled annotation along with fixedRateString of 1 sec. This scheduler intermittently stops working for approx 2 min and then starts working automatically. What can be the possible reasons for this behavior and do we have any resolution to this?
Below is the code snippet for the scheduler.
1st) Please read SO guidelines
DO NOT post images of code, data, error messages, etc. - copy or type
the text into the question. Please reserve the use of images for
diagrams or demonstrating rendering bugs, things that are impossible
to describe accurately via text.
2nd) To your problem
You use a xml spring based configuration where you have configured your sheduler. Then you also use the annotation #Scheduled. You should not mix those 2 different types of configuring beans.
Also you use some type of thread synchronization into this method. Probably some thread is stuck outside of the method because of the lock and this messes the functionality that you want.
Clean either the xml configuration or the annotation for scheduling and try with debug to see why the method behaves as it does which most probable would be from what I have mentioned above about the locks and the multiple configurations.

spring boot datasource tomcat jdbc properties not working

I have a Spring Boot application (version 1.5.1.RELEASE) and I am using spring-boot-starter-data-jpa as a dependency to manage my database. I am using postgres as my database and configured it using the below properties.
spring.datasource.url=${POSTGRES_URL}
spring.datasource.username=${POSTGRES_USER}
Now when I run my tests which are almost 120, I get too many client already open error for abou 10 test cases while starting the test case itself and it fails.(remaining 100 test cases pass with success as they are able to get a connection to database)
First thing I did is increased my default postgres max connections count from 100 to 200 in the postgres server config file and my tests pass successfully after this change.
Now I investigated a bit and tried setting the parameters for connection pooling properties such as :
spring.datasource.tomcat.max-active=200
spring.datasource.tomcat.test-on-borrow=true
spring.datasource.tomcat.max-wait=10000
However these properties do not work and the tests fails again giving the same error as above. I tried reading from multiple different blogs and spring documentation for setting the connection pool properties but did not find what might be going wrong with me.
I also think that if I set the above property spring.datasource.tomcat.max-active to 100 connections it should work with the help of tomcat jdbc pooling as i think in current scenario it is trying to open a new connection to database for each test case and I am in a fear that this same scenario might happen when I deploy this code to production environment and a new connection will be opened to the database for each request.
Does anyone have faced this problem before or is there something wrong I am doing.
Thanks in advance for the help.
Try upgrading Spring boot version, 1.5.10-RELEASE is the current version.
Also, I found the connection pool properties for my application were not being applied when the property prefix tomcat was included. If you are still having issues try removing that.
i.e.
spring.datasource.tomcat.max-active=200
Becomes
spring.datasource.max-active=200
See https://artofcode.wordpress.com/2017/10/19/spring-boot-configuration-for-tomcats-pooling-data-source

HawtIO + Camel plugin - Multiple context not showing up - Limits to max3

Our application is enterprise application which contains multiple web application. Each web application contains one or more camel context. Recently we are exploring the option of using HawtIO for monitoring and administrative purposes.
We are using camel (fuse) version -2.12.0.redhat-610379 with Wildfly 8.1(Dev env -prod being WAS8.5). I have tried with HawtIO web app version ranging from 1.4.10 to 14 and with no-slf4j version as well. But HawtIO is showing maximum 3 camelcontext only. I have tried giving managementNamePattern as well but still no postive results.
If I comment out some of listed camel contexts then other one are getting listed. Please note that each camel context would contain around 10 to 15 routes and endpoint (spring beans) will be around 30 .
But I am able to find unlisted camel context in JMX Dashboard under org.apache.camel. Kindly let me know any work around for it or if I am missing something in configuration. My camel context would refer multiple route context.
Not sure if you still need to know this, but what you may need to do is in the HawtIO preferences, under Jolokia, increase the "Max Collection Size", as HawtIO just grabs everything and then appears to filter on the client side, so if you have a lot of MBeans, you won't see everything (as it only fetches the first 500 entries by default).
I had a similar issue - but while I was seeing all the camel contexts, I was not seeing all the routes, which was the big issue for me.
It defaults to 500. I increased it to 5000, which was enough for me. You may wish to try fiddling with that yourself, and see if it makes a difference.

glassfish 3.1.2 monitoring EJB container, bean-methods

the glassfish application server provides a nice monitoring REST interface.
To use it u can enable several monitorable items in the admin console, for example the EJB container. The documentation says, you can retreive EJB-statistics for every deployed application.
If you request a URL like localhost:4848/monitoring/domain1/server/applications/APPNAME/EJBNAME you will get statistics for a given EJB of the application.
Further, there is a possibility to look more deeply into each bean-method of the ejb, for example the executiontime, about which the documentation says:
"Time, in milliseconds, spent executing the method for the last successful/unsuccessful attempt to run the operation. This is collected for stateless and stateful session beans and entity beans if monitoring is enabled on the EJB container."
The problem now is, monitoring is enabled on the EJB-container (Level set to HIGH), but nothing is sampled in any bean-method in any EJB in any deployed application.
Is there something special to do in the bean and/or the glassfish ?
Thanks in advance for help,
Chris
EDIT:
Ok, I noticed something more about that behaviour:
In the server log you get a log message for each deployed EJB like that:
INFO: EJB5181:Portable JNDI names for EJB DataFetcher // ...
If I set the ejb-container monitoring level to HIGH (which is what I want to do), I get the following warning for each deployed EJB, regardless which app I deploy:
WARNING: MNTG0201:Flashlight listener registration failed for listener class : com.sun.ejb.monitoring.stats.StatelessSessionBeanStatsProvider , will retry later
I googled the warning but none of the resulst really help me enabling EJB monitoring...
This seems to be a Bug in Glassfish.
EJB Monitoring is currently not working in 3.1.2.
JIRA issue is already raised: http://java.net/jira/browse/GLASSFISH-19677
There is nothing "special" to do.
http://docs.oracle.com/cd/E18930_01/html/821-2431/abeea.html
For me it seems as if you probably enabled the monitoring option on the wrong configuration. Please double check.
To get rid of this message you can disable the monitoring on ejb container option below in the image
From Monitor Data--->Configure monitoring--->make ejb container log off

Resources