Spring boot service restarts in loop, after adding Mail Server Dependency - spring

We have spring boot micro-services working well.
Recently we wanted to add mail notification feature, so added boot mail starter dependency.
As soon as we did this change, all our services shutdown and start continuously, and below is text on the console log
Saw local status change event StatusChangeEvent [timestamp=...... , current=DOWN, previous=UP]
Saw local status change event StatusChangeEvent [timestamp=...... , current=UP, previous=DOWN]
Also after 4 lines as above there is one more line like
Ignoring onDemand update due to rate limiter
Not sure what could be an issue, but it seems server trying to ping mail server and may be if not getting pulse trying to shutdown, and next pulse get connection so again making it up.
Has anyone faced such an issue.

Finally after debugging the code, found that some issues when checking health of the mail server, as have actuator used for all services for health check.
Whenever doing health check if there is no response from the mail server, the service goes down. It check for health every 30 seconds. Tried finding parameter to reduce frequency but could not find parameter for it.
So for now did set the management.health.mail.enabled to false, and now service is not shutting down in loop.

Related

Why Tomcat can stop respoinding until deplyed app is stopped?

Tomcat8 is running in docker container with single app deployed there.
App is mainly busy with processing users requests and cron jobs(usually additional work needs to be done after user request is finished).
What is the problem (by looking at the logs):
App (deployed under /mysoawesomeapp) is working as usual, processing requests and cron jobs.
There's a couple minutes gap, like the app would freeze
Docker is running health check on localhost:8080, every 30s waiting for response for 10s, then it restarts the container.
I can see shutdown request in logs, and then I can also see those health check responses with 200 status. It doesn't really matter now, since server is being shutdown.
My question is: how is it possible, that localhost:8080 request that would normally load tomcat home page can be halted until server shutdown occurs. How mysoawesomeapp can have an impact? And how can I confirm it?

Kubernetes pods graceful shutdown with TCP connections (Spring boot)

I am hosting my services on azure cloud, sometimes I get "BackendConnectionFailure" without any apparent reason, after investigation I found a correlation between this exception and autoscale (scaling down) almost at the same second in most of the cases.
According to documentation termination grace period by default is 30 seconds, which is the case. The pod will be marked terminating and the loadbalancer will not consider it anymore, so receiving no more requests. According to this if my service takes far less time than 30 seconds, I should not need prestop hook or any special implementation in my application (please correct me if I am wrong).
If the previous paragraph is correct, why does this exception occur relatively frequent? My thought is when the pod is marked terminating and the loadbalancer does not forward anymore requests to the pod while it should do.
Edit 1:
The Architecture is simply like this
Client -> Firewall(azure) -> API(azure APIM) -> Microservices(Spring boot) -> backend(third party) or azure RDB depending on the service
I think the Exception comes from APIM, I found two patterns for this exception:
Message The underlying connection was closed: The connection was closed unexpectedly.
Exception type BackendConnectionFailure
Failed method forward-request
Response time 10.0 s
Message The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
Exception type BackendConnectionFailure
Failed method forward-request
Response time 3.6 ms
Spring Boot doesn't do graceful termination by default.
The Spring Boot app and it's application container (not linux container) are in control of what happens to existing connections during the termination grace period. The protocols being used and how a client reacts to a "close" also have a part to play.
If you get to the end of the grace period, then everything gets a hard reset.
Kubernetes
When a pod is deleted in k8s, the Pod Endpoint removal from Services is triggered at the same time as the SIGTERM signal to the container(s).
At this point the cluster nodes will be reconfigured to remove any rules directing new traffic to the Pod. Any existing TCP connections to the Pod/containers will remain in connection tracking until they are closed (by the client, server or network stack).
For HTTP Keep Alive or HTTP/2 services, the client will continue hitting the same Pod Endpoint until it is told to close the connection (or it is forcibly reset)
App
The basic rules are, on SIGTERM the application should:
Allow running transactions to complete
Do any application cleanup required
Stop accepting new connections, just in case
Close any inactive connections it can (keep alive requests, websockets)
Some circumstances you might not be able to handle (depends on the client)
A keep alive connection that doesn't complete a request in the grace period, can't get a Connection: close header. It will need a TCP level FIN close.
A slow client with a long transfer, in a one way HTTP transfer these will have to be waited for or forcibly closed.
Although keepalive clients should respect a TCP FIN close, every client reacts differently. Microsoft APIM might be sensitive and produce the error even though there was no real world impact. It's best to load test your setup while scaling to see if there is a real world impact.
For more spring boot info see:
https://github.com/spring-projects/spring-boot/issues/4657
https://github.com/corentin59/spring-boot-graceful-shutdown
https://github.com/SchweizerischeBundesbahnen/springboot-graceful-shutdown
You can use a preStop sleep if needed. While the pod is removed from the service endpoints immediately, it still takes time (10-100ms) for the endpoint update to be sent to every node and for them to update iptables.
When your applications receives a SIGTERM (from the Pod termination) it needs to first stop reporting it is ready (fail the readinessProbe) but still serve requests as they come in from clients. After a certain time (depending on your readinessProbe settings) you can shut down the application.
For Spring Boot there is a small library doing exactly that: springboot-graceful-shutdown

How can I listen in my client service to a newly registered service in the Eureka server?Listen to Eureka Server events

I have a service 'A' which is being registered with the Eureka server, and I would like the Eureka server to notify that service every time a new service is registered within Eureka.
Is there any way of doing it?
Why don't you poll the registry regularly from the app that needs to know?
Failing that I don't believe there's a built in feature with Eureka to push this kind of alert. You could achieve it though by customising the Eureka project and reacting to the registration. According to similar discussions elsewhere, EurekaInstanceRenewedEvent that is fired when a new instance first heartbeats is a reliable event to work around. I'm not sure how quickly you need to be notified.

How to use Report action in OSB proxy service to record retry attempts

I want to record the retry attempts of a proxy service in OSB using report action.
I have created a JMS transport proxy service which would pick messages from an IN_QUEUE and routes the message to a business service which would push the message to an OUT_QUEUE and reports the status (success or failure).
However if there is an error while processing, the proxy service should retry for 5 times before getting failed. To acheive this, I have configured the routing options and gave the retry count as 5 and it works good.
All I want now is to record the retry attempts (using report action) of the proxy service. Please suggest me how to do this.
Logging the retry attempts of a business service is difficult, since it's handled out of the scope of the proxy. About the closest you can come is to set up a SLA alert to notify you when the bizref fails, but that doesn't trigger on every message - just if it detects errors during the aggregation interval.
Logging the retry attempts of the proxy is a lot easier, especially since it's a JMS proxy. Failed processing will put the message back on the queue (XA-enabled resources, you may want to enable Same Transaction For Response), and retries will increment a counter inside the JMS transport header, which the proxy can extract and decide whether to report on it or not.
Just remember that unless you set QoS to Best Effort on the publishes/reports, the publishes themselves will be rolled back if a failure happens, which is probably not what you want.

JMS Listener Not Picking Up Message From the Queue

I am planning to do code change for an existing application which has a JMS listener.
To test whether the listener works on my local server, I deploy the application to my localhost and shutdown other containers that running the same application.
But my local listener won't pick up any message. It is confirmed that other containers work fine and can pick up and process new messages in the queue.
Can you think of any possible cause of this?
Way too general, too many missing points...but some things to look at:
if the message queue is on a different server, can you ping it from the local device? could be that development environment can't see production server, perhaps
does a netstat -n show the correct port number, you should see a remote port with the port on which the message provider is listening itself
can you verify that the messaging provider sees you as a consumer? I use activemq, I can look at the management console, dive into a specific queue, and view active consumers, most providers will have something similar
are you running in an identical environment? Running a listener in a JEE environment where the queue is a jndi reference might be different running in a debugger where you need the actual queue name
any JMS filtering going on, where the filter for your local envionrment doesn't match up with what's already on the queue?
any transaction manager stuff that may be getting in the way?
Again, just throwing stuff to see what sticks to the wall, but these are the really obvious things.
Thanks Scott for answering my question.
I finally find that Eclipse somehow created another container and my listener was deployed to it. That's why I cannot find it working in my current container.

Resources