What if server didn't receive fd_close - windows

I have a high performance client server system programmed from the scratch. i am still improving my system. the server using io overlapping to handle connections. the server correctly handles disconnections and resource deallocations. at the client side i used shutdown command with sd_receive to notify the server that the client has no data to receive after final send from the client. this works well. and server detects that as a graceful disconnection. rarely i have observed when the connection is very slow the server doesn't detect this. I feel that the shutdown partial closure doesn't reach the server. how can i handle this. this is important the server shouldn't contain this kind of connections if so the server can not be stopped. and i do not want to close all such connection by force.

at the client side i used shutdown command with sd_receive to notify the server that the client has no data to receive after final send from the client.
It doesn't do that.
this works well
It doesn't work at all. The shutdown command with SD_RECEIVE that you're using is completely pointless. A close, or a shutdown with SD_SEND or SD_BOTH, sends a FIN: shutdown with SD_RECEIVE does exactly nothing on the wire, and specifically it does not 'notify the server' of anything.
I feel that the shutdown partial closure doesn't reach the server.
It never reaches the server. Your code doesn't work the way you think it does. What reaches the server is the FIN, which in turn is the result of the close, not the shutdown SD_RECEIVE.
What you need here is a read timeout at the server end. As you're using select() or whatever is delivering you the events, you will have to implement the timeout manually yourself.

Related

Ways to wait if server is not available in gRPC from client side

I hope who ever is reading this is doing well.
Here's a scenario that I'm wondering about: there's a global ClientConn that is being used for all grpc requests to a server. Then that server goes down. I was wondering if there's a way to wait for this server to go up with some timeout in order for the usage of grpc in this scenario to be more resilient to failures(either a transient failure or server goes down). I was thinking keep looping if the clientConn state is connecting or a transient failure and if a timeout occurs when the clientConn state was a transient failure then return an error since the server might be down.
I was wondering if this would work if there are multiple requests coming in the client side that would need this ClientConn so then multiple go routines would be running this loop. Would appreciate any other alternatives, suggestions, or advice.
When you call grpc.Dial to connect to a server and receive a grpc.ClientConn, it will automatically handle reconnections for you. When you call a method or request a stream, it will fail if it can't connect to the server or if there is an error processing the request.
You could retry a few times if the error indicates that it is due to the network. You can check the grpc status codes in here https://github.com/grpc/grpc-go/blob/master/codes/codes.go#L31 and extract them from the returned error using status.FromError: https://pkg.go.dev/google.golang.org/grpc/status#FromError
You also have the grpc.WaitForReady option (https://pkg.go.dev/google.golang.org/grpc#WaitForReady) which can be used to block the grpc call until the server is ready if it is in a transient failure. In that case, you don't need to retry, but you should probably add a timeout that cancels the context to have control over how long you stay blocked.
If you want to even avoid trying to call the server, you could use ClientConn.WaitForStateChange (which is experimental) to detect any state change and call ClientConn.GetState to determine in what state is the connection to know when it is safe to start calling the server again.

Keep process alive during shutdown

I have two desktop apps on the same machine, let's call them Client and Server. When Windows goes into shutdown I would like to have the Client do some short housecleaning with the Server. Client knows it's closing time because in OnFormClosing the FormClosingEventArgs.CloseReason is CloseReason.WindowsShutDown. But in the mean time the Server may be forcefully killed by the OS. Is it possible to have the Server alive for as long as possible, so that all the Clients can finish their jobs, but not halt the shutdown entirely?
The Server does not know which Clients are alive and in need of housecleaning.
Both Client and Server should not cause the Windows to show the message saying that the app is preventing the Windows from shutting down.
I guess I'm asking for some Windows API calls that can negotiate with Windows to kill the process last if possible, but any working solution is welcome. The Client is written in C# and the Server is written in C++.
The Server should be keeping track of the Clients that are connected to it. So, if your apps are busy performing housecleaning, they ARE blocking shutdown, even if just momentarily. So what is wrong with letting Windows show a message to the user saying that?
When the Server gets notified of an imminent shutdown, have it call ShutdownBlockReasonCreate() if there are any Clients connected. Regardless of whether the Clients perform housecleaning or not, when the last Client disconnects then the Server can call ShutdownBlockReasonDestroy().
The obvious solution is to make the server a Windows service.
As a stop-gap solution you can try SetProcessShutdownParameters.
This function sets a shutdown order for a process relative to the other processes in the system.

nodeJS being bombarded with reconnections after restart

We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.

How to drop inactive/disconnected peers in ZMQ

I have a client/server setup in which clients send a single request message to the server and gets a bunch of data messages back.
The server is implemented using a ROUTER socket and the clients using a DEALER. The communication is asynchronous.
The clients are typically iPads/iPhones and they connect over wifi so the connection is not 100% reliable.
The issue I’m concern about is if the client connects to the server and sends a request for data but before the response messages are delivered back the communication goes down (e.g. out of wifi coverage).
In this case the messages will be queued up on the server side waiting for the client to reconnect. That is fine for a short time but eventually I would like to drop the messages and the connection to release resources.
By checking activity/timeouts it would be possible in the server and the client applications to identify that the connection is gone. The client can shutdown the socket and in this way free resources but how can it be done in the server?
Per the ZMQ FAQ:
How can I flush all messages that are in the ZeroMQ socket queue?
There is no explicit command for flushing a specific message or all messages from the message queue. You may set ZMQ_LINGER to 0 and close the socket to discard any unsent messages.
Per this mailing list discussion from 2013:
There is no option to drop old messages [from an outgoing message queue].
Your best bet is to implement heartbeating and, when one client stops responding without explicitly disconnecting, restart your ROUTER socket. Messy, I know, this is really something that should have a companion option to HWM. Pieter Hintjens is clearly on board (he created ZMQ) - but that was from 2011, so it looks like nothing ever came of it.
This is a bit late but setting tcp keepalive to a reasonable value will cause dead sockets to close after the timeouts have expired.
Heartbeating is necessary for either side to determine the other side is still responding.
The only thing I'm not sure about is how to go about heartbeating many thousands of clients without spending all available cpu just on dealing with the heartbeats.

Websockets and uwsgi - detect broken connections client side?

I'm using uwsgi's websockets support and so far it's looking great, the server detects when the client disconnects and the client as well when the server goes down. But i'm concerned this will not work in every case/browser.
In other frameworks, namely sockjs, the connection is monitored by sending regular messages that work as heartbeats/pings. But uwsgi sends PING/PONG frames (ie. not regular messages/control frames) according to the websockets spec and so from the client side i have no way to know when the last ping was received from the server. So my question is this:
If the connection is dropped or blocked by some proxy will browsers reliably (ie. Chrome, IE, Firefox, Opera) detect no PING was received from the server and signal the connection as down or should i implement some additional ping/pong system so that the connection is detected as closed from the client side?
Thanks
You are totally right. There is no way from client side to track or send ping/pongs. So if the connection drops, the server is able of detecting this condition through the ping/pong, but the client is let hung... until it tries to send something and the underlying TCP mechanism detect that the other side is not ACKnowledging its packets.
Therefore, if the client application expects to be "listening" most of the time, it may be convenient to implement a keep alive system that works "both ways" as Stephen Clearly explains in the link you posted. But, this keep alive system would be part of your application layer, rather than part of the transport layer as ping/pongs.
For example you can have a message "{token:'whatever'}" that the server and client just echoes with a 5 seconds delay. The client should have a timer with a 10 seconds timeout that stops every time that messages is received and starts every time the message is echoed, if the timer triggers, the connection can be consider dropped.
Although browsers that implement the same RFC as uWSGI should detect reliably when the server closes the connection cleanly they won't detect when the connection is interrupted midway (half open connections)t. So from what i understand we should employ an extra mechanism like application level pings.

Resources