Kubernetes pods graceful shutdown with TCP connections (Spring boot) - spring-boot

I am hosting my services on azure cloud, sometimes I get "BackendConnectionFailure" without any apparent reason, after investigation I found a correlation between this exception and autoscale (scaling down) almost at the same second in most of the cases.
According to documentation termination grace period by default is 30 seconds, which is the case. The pod will be marked terminating and the loadbalancer will not consider it anymore, so receiving no more requests. According to this if my service takes far less time than 30 seconds, I should not need prestop hook or any special implementation in my application (please correct me if I am wrong).
If the previous paragraph is correct, why does this exception occur relatively frequent? My thought is when the pod is marked terminating and the loadbalancer does not forward anymore requests to the pod while it should do.
Edit 1:
The Architecture is simply like this
Client -> Firewall(azure) -> API(azure APIM) -> Microservices(Spring boot) -> backend(third party) or azure RDB depending on the service
I think the Exception comes from APIM, I found two patterns for this exception:
Message The underlying connection was closed: The connection was closed unexpectedly.
Exception type BackendConnectionFailure
Failed method forward-request
Response time 10.0 s
Message The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
Exception type BackendConnectionFailure
Failed method forward-request
Response time 3.6 ms

Spring Boot doesn't do graceful termination by default.
The Spring Boot app and it's application container (not linux container) are in control of what happens to existing connections during the termination grace period. The protocols being used and how a client reacts to a "close" also have a part to play.
If you get to the end of the grace period, then everything gets a hard reset.
Kubernetes
When a pod is deleted in k8s, the Pod Endpoint removal from Services is triggered at the same time as the SIGTERM signal to the container(s).
At this point the cluster nodes will be reconfigured to remove any rules directing new traffic to the Pod. Any existing TCP connections to the Pod/containers will remain in connection tracking until they are closed (by the client, server or network stack).
For HTTP Keep Alive or HTTP/2 services, the client will continue hitting the same Pod Endpoint until it is told to close the connection (or it is forcibly reset)
App
The basic rules are, on SIGTERM the application should:
Allow running transactions to complete
Do any application cleanup required
Stop accepting new connections, just in case
Close any inactive connections it can (keep alive requests, websockets)
Some circumstances you might not be able to handle (depends on the client)
A keep alive connection that doesn't complete a request in the grace period, can't get a Connection: close header. It will need a TCP level FIN close.
A slow client with a long transfer, in a one way HTTP transfer these will have to be waited for or forcibly closed.
Although keepalive clients should respect a TCP FIN close, every client reacts differently. Microsoft APIM might be sensitive and produce the error even though there was no real world impact. It's best to load test your setup while scaling to see if there is a real world impact.
For more spring boot info see:
https://github.com/spring-projects/spring-boot/issues/4657
https://github.com/corentin59/spring-boot-graceful-shutdown
https://github.com/SchweizerischeBundesbahnen/springboot-graceful-shutdown

You can use a preStop sleep if needed. While the pod is removed from the service endpoints immediately, it still takes time (10-100ms) for the endpoint update to be sent to every node and for them to update iptables.

When your applications receives a SIGTERM (from the Pod termination) it needs to first stop reporting it is ready (fail the readinessProbe) but still serve requests as they come in from clients. After a certain time (depending on your readinessProbe settings) you can shut down the application.
For Spring Boot there is a small library doing exactly that: springboot-graceful-shutdown

Related

Reconnect Interval

I am looking for best practices to handle server restarts. Specifically, I push stock prices to users using websockets for a day trading simulation web app. I have 10k concurrent users. To ensure a responsive ux, I reconnect to the websocket when the onclose event is fired. As our user base has grown we have had to scale our hardware. In addition to better hardware, we have implemented a random delay before reconnecting. The goal of this is to spread out the influx of handshakes when the server restarts ever night (Continuous Deployment). However some of our users have poor internet (isp and or wifi). Their connection constantly drops. For these users I would prefer they reconnect immediately. Is there a solution for this problem that doesn't have the aforementioned tradeoffs?
The question is calling for a subjective response, here is mine :)
Discriminating a client disconnection and a server shutdown:
This can be achieved by sending a shutdown message over the websocket so that active clients can prepare and reconnect with a random delay. Thus, a client that encounters an onclose event without a proper shutdown broadcast would be able to reconnect asap. This means that the client application needs to be modified to account for this special shutdown event.
Handle the handshake load: Some web servers can handle incoming connections as an asynchronous parallel event queue, thus at most X connections will be initialized at the same time (in parallel) and others will wait in a queue until their turn comes. This allows to safeguard the server performance and the websocket handshake will thus be automatically delayed based on the true processing capabilities of the server. Of course, this means a change of web server technology and depends on your use-case.

Akka Websocket complete, but client still connected

(Akka 2.6.5, akka-http 10.1.12)
I have a server/client websocket setup using Source.queue and Sink.actorRef on each side of the connection.
It turns out my system has a rather critical and unexpected flaw (found in production no less):
Sink actor fails and terminates (Dead letters are logged)
Sink actor is sent Stream Failure message (configured in Sink.actorRef construction) - this also is logged at dead letters, since the actor is indeed dead.
So we have a finished web socket stream right? That's what Half-closed websockets would say (although, just noticing the heading id is "half-closed-client-websockets...)
What happens instead is... nothing. The connected client stays connected - there's no complete message or failure.
Is there something configuration I need to actively tell akka to fully close Http on failures like this?
Testing
I reproduced the issue in integrated testing:
Establish connection
Sleep for 70 seconds (just to ensure keep-alives are configured/working properly)
Send a message from server
Ensure receipt on client
Kill server actor sink (and see same Stream Failure -> dead letters as above)
Wait for client to acknowledge completion (100 seconds) - either:
If I did nothing -> Timeout
If I sent message from client to server before waiting for completion:
After 60s: Aborting tcp connection ... because of upstream failure: TcpIdleTimeoutException
Stream failed sent to client sink.
Notes
I've deliberately not included code at this stage because I'm trying to understand the technology properly - either I've found a bug, or a fundamental misunderstanding of how the web sockets are meant to work (and fail). If you think I should include code, you'll need to convince me on how it might help create an answer
In production, this failure to close meant that the websocket client was waiting for data for 12 hours (it wasn't attempting to send messages at the time)

GKE + WebSocket + NodePort 30s dropped connections

I have a golang service that implements a WebSocket client using gorilla that is exposed to a Google Container Engine (GKE)/k8s cluster via a NodePort (30002 in this case).
I've got a manually created load balancer (i.e. NOT at k8s ingress/load balancer) with HTTP/HTTPS frontends (i.e. 80/443) that forward traffic to nodes in my GKE/k8s cluster on port 30002.
I can get my JavaScript WebSocket implementation in the browser (Chrome 58.0.3029.110 on OSX) to connect, upgrade and send / receive messages.
I log ping/pongs in the golang WebSocket client and all looks good until 30s in. 30s after connection my golang WebSocket client gets an EOF / close 1006 (abnormal closure) and my JavaScript code gets a close event. As far as I can tell, neither my Golang or JavaScript code is initiating the WebSocket closure.
I don't particularly care about session affinity in this case AFAIK, but I have tried both IP and cookie based affinity in the load balancer with long lived cookies.
Additionally, this exact same set of k8s deployment/pod/service specs and golang service code works great on my KOPS based k8s cluster on AWS through AWS' ELBs.
Any ideas where the 30s forced closures might be coming from? Could that be a k8s default cluster setting specific to GKE or something on the GCE load balancer?
Thanks for reading!
-- UPDATE --
There is a backend configuration timeout setting on the load balancer which is for "How long to wait for the backend service to respond before considering it a failed request".
The WebSocket is not unresponsive. It is sending ping/pong and other messages right up until getting killed which I can verify by console.log's in the browser and logs in the golang service.
That said, if I bump the load balancer backend timeout setting to 30000 seconds, things "work".
Doesn't feel like a real fix though because the load balancer will continue to feed actual unresponsive services traffic inappropriately, never mind if the WebSocket does become unresponsive.
I've isolated the high timeout setting to a specific backend setting using a path map, but hoping to come up with a real fix to the problem.
I think this may be Working as Intended. Google just updated the documentation today (about an hour ago).
LB Proxy Support docs
Backend Service Components docs
Cheers,
Matt
Check out the following example: https://github.com/kubernetes/ingress-gce/tree/master/examples/websocket

nodeJS being bombarded with reconnections after restart

We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.

Websocket onclose/onerror events does not fire if server crashes

I have observed the following behavior in Firefox 4 and Chrome 7:
If the server running the websocket daemon crashes, reboots, loses network connectivity, etc then the 'onclose' or 'onerror' events are not fired on the client-side. I would expect one of those events to be fired when the connection is broken for any reason.
If however the daemon is shutdown cleanly first, then the 'onclose' event is fired (as expected).
Why do the clients perceive the websocket connection as open when the daemon is not shutdown properly?
I want to rely on the expected behavior to inform the user that the server has become unavailable or that the client's internet connection has suffered a disruption.
TCP is like that. The most recent WebSockets standard draft (v76) has a clean shutdown message mechanism. But without that (or if it doesn't have a chance to be sent) you are relying on normal TCP socket cleanup which make take several minutes (or hours).
I would suggest adding some sort of signal handler/exit trap to the server so that when the server is killed/shutdown, a clean shutdown message is sent to all connected clients.
You could also add a heartbeat mechanism (ala TCP keep alive) to your application to detect when the other side goes away.

Resources