Maximum websocket connections per client - websocket

I am developing a tool to load test my signalR application, using Signal R Client HUB APIs.
I want to test my application behavior over the websockets against 10K client.
The issue is that my Tool is not able to create new connection after certain number of connections.The maximum connections it can create is in the range of 2000-3000 only.
I am not able to figure out what is the reason behind this. I checked on server side and found no issues.As server is able to response to browsers clients after.
The error getting logged in tool is:
System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server ---> System.Net.WebException: The remote server returned an error: (503) Server Unavailable.
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Net.WebSockets.ClientWebSocket.<ConnectAsyncCore>d__0.MoveNext()
at System.Net.WebSockets.ClientWebSocket.<ConnectAsyncCore>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNet.SignalR.Client.Transports.WebSocketTransport.<PerformConnect>d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.AspNet.SignalR.Client.Transports.WebSocketTransport.<DoReconnect>d__9.MoveNext()
Sometimes this tool gets hang when websocket client increases to 1700 or sometimes more than that.
This tool is .NET 4.5 windows application. The signalR server I am hitting is also on same network.
I am executing certain methods in my signalR service. One method is to pass input data from client to server and another method which client is subscribing to receive data from server.
Also I tested this tool from different machine from where it is able to generate the distributed load but I don't want to run multiple instance of the tool. My question is why single instance is not able to generate bulk websocket request to my signal R service?

Explain a little bit more about what do you do in the test, and also where are executing client and server.
If you are sending messages non-stop, probably you may be already saturating the channel. What is the throughput of the NIC when the connections starts to fail? What is the CPU usage? Maybe your client or your server cannot get going with so much data.
Have you tried to run two client in parallel? Maybe the problem is your client, that is unable of handling that amount of connections appropriately.
UPDATE
You should use some form of threading to create multiple parallel connections and keep them alive during the test. Please take a look at my WebSocket echo load generator that I use for doing performance tests, and follow the approach to create a SignalR load generator.

Related

Ways to wait if server is not available in gRPC from client side

I hope who ever is reading this is doing well.
Here's a scenario that I'm wondering about: there's a global ClientConn that is being used for all grpc requests to a server. Then that server goes down. I was wondering if there's a way to wait for this server to go up with some timeout in order for the usage of grpc in this scenario to be more resilient to failures(either a transient failure or server goes down). I was thinking keep looping if the clientConn state is connecting or a transient failure and if a timeout occurs when the clientConn state was a transient failure then return an error since the server might be down.
I was wondering if this would work if there are multiple requests coming in the client side that would need this ClientConn so then multiple go routines would be running this loop. Would appreciate any other alternatives, suggestions, or advice.
When you call grpc.Dial to connect to a server and receive a grpc.ClientConn, it will automatically handle reconnections for you. When you call a method or request a stream, it will fail if it can't connect to the server or if there is an error processing the request.
You could retry a few times if the error indicates that it is due to the network. You can check the grpc status codes in here https://github.com/grpc/grpc-go/blob/master/codes/codes.go#L31 and extract them from the returned error using status.FromError: https://pkg.go.dev/google.golang.org/grpc/status#FromError
You also have the grpc.WaitForReady option (https://pkg.go.dev/google.golang.org/grpc#WaitForReady) which can be used to block the grpc call until the server is ready if it is in a transient failure. In that case, you don't need to retry, but you should probably add a timeout that cancels the context to have control over how long you stay blocked.
If you want to even avoid trying to call the server, you could use ClientConn.WaitForStateChange (which is experimental) to detect any state change and call ClientConn.GetState to determine in what state is the connection to know when it is safe to start calling the server again.

Dealing with OkHttp HTTP/2 REFUSED_STREAM errors

We are Using OkHttp3 (v4.9.1) to establish h2c (HTTP/2 without TLS) connections in a highly concurrent fashion from a Spring Boot application. To do so, we have narrowed down the supported protocols using:
builder.protocols(List.of(Protocol.H2_PRIOR_KNOWLEDGE))
Establishing connections usually works fine and HTTP/2 streams are used instead of dedicated connections. However, we observe sporadic error bursts when the server (based on nginx, single node, single address) closes the connection after a certain number of requests has been reached (as instructed by its keepalive_request option). When this happens, OkHttp does not seem to attempt to retry the connection, but instead just throws an exception to the caller:
okhttp3.internal.http2.StreamResetException: stream was reset: REFUSED_STREAM
at okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.kt:148)
at okhttp3.internal.http2.Http2ExchangeCodec.readResponseHeaders(Http2ExchangeCodec.kt:96)
at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:106)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:79)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at org.example.OkHttpAutoConfiguration.lambda$authenticate$3(OkHttpAutoConfiguration.java:95)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
[...]
Requests are initiated like this:
httpClient.newCall(buildRequest(uri)).execute()
What is the recommended way to deal with these errors?
Is there an option (we may have missed) so OkHttp takes care of this transparently to the application?
With this issue you've made the case for us to fix it in OkHttp. https://github.com/square/okhttp/issues/6700
In the interim, you'll want an interceptor that uses a try/catch block, and attempts again in the catch clause if the exception matches this criteria.

Internal Server Error(500) while running load test in JMETER

I am unable to get response from server so how will correlate dynamic values too
As per HTTP Status Code 500 description:
The HyperText Transfer Protocol (HTTP) 500 Internal Server Error server error response code indicates that the server encountered an unexpected condition that prevented it from fulfilling the request.
This error response is a generic "catch-all" response. Usually, this indicates the server cannot find a better 5xx error code to response. Sometimes, server administrators log error responses like the 500 status code with more details about the request to prevent the error from happening again in the future.
If your test works fine with 1-2 users and you're seeing this HTTP 500 error only when your application is under the load, most probably your application gets overloaded hence fails to provide valid response.
You can already report it as the bug, or if you want you can investigate it a little bit further, to wit:
Use Active Threads Over Time and Response Codes per Second to see when these errors start occurring (i.e. application works fine till 200 concurrent users and after 201 it starts throwing HTTP 500 errors)
Inspect your application logs
Make sure that the application has enough headroom to operate in terms of CPU, RAM, Network, Disk, etc, you can use JMeter PerfMon Plugin for this
Inspect your application middleware configuration and logs (load balancer, application server, database, etc.)
Consider collecting APM and/or profiler tools output during the load test execution, this way you will be able to precisely identify the root cause of the problem

Unable to connect to Elastic Search intermittently

I am trying to connect to elastic search via Jest Client.
Sometimes, the client is not able to connect to the elastic search cluster.
Stack Trace :
org.apache.http.NoHttpResponseException: search-xxx-yyy.ap-southeast-1.es.amazonaws.com:443 failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
The elastic search cluster is in a public domain, so I am not understanding why the client is unable to connect.
Also, the issue happens intermittently, if I retry the request, it connects sometimes.
Any help is appreciated. Thanks
When JestClient initiates the http request, it will call read() on the socket and block. When this read returns -1, this means that the server closed the connection before or during client was waiting for the response.
Why it happens
There's two main causes for NoHttpResponseException:
. The server end of the connection was closed before the client attempts to send a request down it.
. The server end of the connection closes the connection in the middle of a request.
Stale Connection (connection closed before request)
Most often this is a stale connection. When using persistent connections, you may have a connection sit around in the connection pool not being used for a while. If it is idle for longer than the server or load balancer's HTTP keep alive timeout, then the server or load balancer will close the connection due to its idleness. The Jakarta client isn't structured to receive a notification of this happening (it doesn't use NIO), so the connection sits around in a half-closed state. The only way the client can detect this state is by reading from the socket. So when you send a request, the write is successful because the socket is only half closed (writes succeed until you close your end) but then the read indicates the socket was closed. This causes the request to fail.
Connection Closed Mid-Request
The other reason this might occur is the connection was actually closed while the service was working on it. Anything between your client and service may close the connection, including load balancers, proxies, or the HTTP endpoint fronting your service. If your activities are quite long-running or you're transferring a lot of data, the window for something to go wrong is larger and the connection is more likely to be lost in the middle of the request. An example of this happening is a Java server process exiting after an OutOfMemoryException occurs due to trying to return a large amount of data. You can verify whether this is the problem by looking at TCP dumps to see whether the connection is closed while the request is in flight. Also, failures of this type usually occur some time after sending the request, whereas stale connection failures always occur immediately when the request is made.
Diagnosing The Cause
NoHttpResponseException is usually a stale connection (according to problems I've observed and helped people with)
When the failure always occurs immediately after submitting the request, stale connection is almost certainly the problem
When failures occur some non-trivial amount of time after waiting for the response, then the connection wasn't stale when the request was made and the connection is being closed in the middle of the request
TCPDumps can be more conclusive. You can see when the connection is being closed (before or during the request).
What can be done about it
Use a better client
Nonblocking HTTP clients exist that allow the caller to know when a connection is closed without having to try to read from the connection.
Retry failed requests
If your call is safe to retry (e.g. it's idempotent), this is a good option. It also covers all sorts of transient failures besides stale connection failures. NoHttpResponseException isn't necessarily a stale connection and it's possible that the service received the request, so you should take care to retry only when safe.

How can I make Sinatra simulate a refused connection?

I've been using Sinatra with Rack for simulating external services when running integration tests, and would like to write a test for the case when the server is down. Is it possible to have Sinatra simulate a 'Connection Refused' error without entirely shutting down the server process?
So far I've tried:
Raising an exception
Immediately closing the stream, as illustrated here, before the method returns or anything:
post '/external_app' do
stream(:keep_open) do |out|
out.close
end
end
Thanks!
You are trying to test server down, the approach you've done still relies on the response of sinatra server.
you can set a very short connection timeout in your http client (any http client should be able to do that)
And then have something like a sleep method in your sinatra action block with x seconds that's greater than the max timeout you set.
But actually you may not need to make things that complicate, you can just throw an exception that your http client throw for connection timeout (and any other connection exceptions), and test to see if your application is able to catch and process the exception(s) accordingly.

Resources