Windows temporarily shuts down my TCP stack when I stress test my HTTP server - windows

I'm building an HTTP server for Windows that uses IO Completion ports (IOCP). I have a stress test app that hits the server continuously with HTTP requests. After a couple seconds (a varying, unpredictable interval), my machine is unable to open any new TCP connections. I know this because my browser is unable to open any new connections, and the server just waits for an AcceptEx call to complete. If I cool off the stress process, then everything comes back to life again after a few seconds. I don't think it's a backlog issue because the stresser is synchronous - it waits for a result before issuing the next request. The stresser does run a couple (call it N) threads in parallel, but that can't cause more than a backlog of N (small HTTP) requests.
I'm on Windows 7 Pro. Will test on a Windows Server OS on Monday. What is causing this behaviour?

Are you running out of TCP ports due to a very large number of them staying in the TIMED_WAIT state?

Google Windows ephemeral ports.

Related

TCP connection limit/timeout in virtual machine and native macOS/ARM-based Mac gRPC Go client?

I am currently working on a gRPC microservice, which is deployed to a Kubernetes cluster. As usual, I am benchmarking and load-/stress-testing my service, testing different load balancing settings, impact of SSL and so forth.
Initially, I used my Macbook and my gRPC client written in Go and executed this setup either in Docker or directly in containerd with nerdctl. The framework I use for this is called Colima and basically builds on a lean Alpine VM to provide the container engine. Herein, I ran into issues with connection timeouts and refusals once I crossed a certain number of parallel sessions, which I guess is a result from the container engine.
Therefore, I went ahead and ran my Go client natively on macOS. This setup somehow runs into the default 20s keepalive timeout for gRPC (https://grpc.github.io/grpc/cpp/md_doc_keepalive.html) the moment my parallel connections exceed the number of traffic I can work out by some margin (#1).
When I run the very same Go client on an x86 Ubuntu 22 desktop, there are no such issues whatsoever and I can start way more sessions in parallel, which are then processed accordingly without any issues with the 20s keepalive timeout.
Any ideas how that comes to being and if I could make some changes to my setup to be able to run my stress-test benchmarks from macOS?
#1: Let's say I can process and reply 1 request per second with my service. For stress testing, I now start 20 parallel sessions and would expect them to be processed sequentially.

nodeJS being bombarded with reconnections after restart

We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.

TCP Retransmit and TCPCopy when using loopback device on Windows 7

I have two programs running on the same Windows 7 System which connect via TCP. The server transmits unencoded VGA resolution images to the client in regular intervals.
The problem is, that from time to time, the transmission speed goes down by a factor of ~10 or so and stays that way for some time or until the client process is restarted.
I used the sysinternals process monitor to get some inside in what is going on.
When the transmission speed is reduced I can see that following an initial TCP Send event on the server side, I eventually (after a couple of receive/send pairs) get a number of TCPCopy events on the client side followed by a ~300ms pause in which no TCP events are recorded, followed by a TCP Retransmit event on the server side. I only get those TCPCopy events and the retransmit event when the speed is reduced.
I tried to find out what the TCPCopy event is all about but did not find a lot on the internet.
I have two questions:
What is the TCPCopy event?
What does the TCPCopy event and the Retransmit event tell me about the problems in the TCP connection?
TCPCopy event represented by antivirus softwares sometimes. And many on i saw on web, people who deactivate their antivirus software that was fixed the issue. Especially Eset Nod32. Please try to deactivate your antivirus software both on server and client side and check it again.

TCP connection time in windows

I am doing some performance testing with a large number of threads. Each thread is sending HTTP requests to another IP. It looks like at some stages the connections are closed (because there are too many threads) and then of course have to be reopned.
I am looking to get some ball park figures for how long it takes windows to Open TCP connections.
Is there any way I can get this?
Thanks.
This is highly dependent on the endpoints you're trying to connect to, is it not?
As an extreme best case, you can test it yourself by targeting an IIS on localhost.
I wouldn't be surprised if routers and servers that you are connecting through may drop connections as a measure against what could be perceived as connection storms or even denial-of-service attacks.

windows server 2003 - unable to create socket - exception

I have the following system:
A Windows 2003 server running WebSphere Application Server, listening on port 8080.
A lot of clients of this server.
I tried a loads test - making clients connect to the server and asking for services. This didn't end well: Many clients were denied service and the server started reporting it was unable to create new sockets.
My question is which parameters should I change in my Windows?
I thought about number of connections, but I am not sure this exists on 2003 (from what I have read). Instead, there is a number of userPorts, which I don't think is what I need, since I am only using one port (8080) on the server side.
Am I wrong assuming that I am only using one port in the server side?
Are there parameters for number of connections per port, per system, or perhaps this is affected by the amount of data transferred. I pass a lot of data, so a reference to amount of data (if there is such a parameter that might limit, I am glad to hear it).
Should I also reduce the amount of wait each connection waits after tear down? This may allow the pool of connections to be more available. If so which Parameter is this?
Any other parameters that are consistent with this problem?

Resources