I am testing my application using jmeter tool. I am having 2 EC2 m1.small instances behind an ELB (not autoscaling group), 2 caching nodes huge RDS db (Multi-AZ). My Apache (peforked) is configured with default values like 256 maxclients. Both can handle 256-256 request. Now when jmeter throws 500 request I see connection time out error in one of sampler of the jmeter. Can anyone figure out what the problem is?
Thanks in advance.
What does JMeter throw back?
There are couple of things that could happen.
Connection reset: If this error is thrown out from your JMeter then it means that the server has maxed out and cannot support any more concurrent requests. i.e. the 256 threads allotted are all in use serving other requests and this basically means you have hit your limit on the server.
"Address already in use" exception: These kinds of errors are what you must not get when you do load tests. This basically means that there are no available ports on your system make more requests and all ports are busy. This could happen for a variety of reasons but you could try tweaking system settings like ulimit for linux or if your using a windows box you may want to look at the tcpTimedWaitDelay and corresponding settings to see the average turn around time for the ports to be handed over back into the active pool to be reused for the next connection. This condition is called tcp port exhaustion (http://www.outsystems.com/NetworkForums/ViewTopic.aspx?TopicId=6956&Topic=How-to-tune-the-TCP%2FIP-stack-for-high-volume-of-web-requests)
TO get around this you could also try distributed load testing and/or use timers to ensure that you always have the ports to make new connections.
Related
In K6, I'm observing more fail request in my performance test execution with dial tcp : I/O timeout. Please suggest any fine tuning if I missed at K6.
With low concurrent let’s with 225 users no issues but when increase user to 300 am facing this issue and I'm using MacBook for the test execution
This error indicates that your server under test isn't able to keep up with TCP connection attempts from k6, which is usually a hint that you're reaching the performance limits of what your server is able to deliver.
At this point you would have to tweak your server settings or increase app performance to reach the levels you're aiming for. One sanity check you can do on the server side is confirm if the maximum number of file descriptors (which includes network sockets) is sufficient for your test. See ulimit.
I am building TCP Proxy: client <-> proxy <-> Vertica
I have a net.TCPListener, which takes incoming requests by AcceptTCP() and creating connections, then, making connection to destination socket by net.DialTCP("tcp", nil, raddr). Looks like a bridge. Default proxy model.
Firstly, at first version, i have a trouble: if i have 59 parallel incoming request, everything is fine. But if i have one more (60), i have a trouble: 1-59 connections are OK, but 60 and newer are fault. I cant catch error properly. Looks like some socket unexpectedly closes
Secondly, i tried to set queue for listener. It helps me a lot: but if i have more than 258 requests, i get error again.
My question: is there any limit of connections in net package? May be it is system limitation?
For external info: Vertica running in docker container, hw/system: macbook, vertica limit connection pool: 5, but pool logic implemented into proxy.
I also tried set "raw" proxy without pool logic (thats why i set queue for listener: i must not exceed threshold of Vertica User's pool), result is 258 requests..
UPDATED: (05.04.2020)
Looks like it is system limitations fault. Did I mention anywhere that I trying to run the whole system on one PC?
So, what I had:
300 parallel processes as requests (making by multiprocessing.Pool
Python) (300 sockets)
Listener that creates 300 connections (once
more 300 sockets)
And series of rapidly creating/closing sockets in
deep of proxy (according to queue and Vertica pool)
What I have now:
300 python requests making from another PC in my local network (on Windows)
Proxy works fine
But I have several errors on Windows PC, which creating requests to my proxy. Errors like low memory in "swap file".
I still need to make some stress test for proxy. Adding less memory for swap file didn't solve my problem on Windows PC. I will be grateful for any suggestions and ideas. Thanks!
How does the proxy connect to Vertica?
There is by default a maximum of 50 ordinary mortal users to be connected to one Vertica node at any one time. The superuser "dbadmin" always has 5 connections in addition to that.
So if I try to connect 60 times as dbadmin, I get this on a default Vertica configuration:
Connection attempt failed: FATAL 4060: New session rejected due to limit, already 55 sessions active
You can increase the Vertica config item MaxClientSessions from its default of 50 per node.
Command is : ALTER NODE <_node_name_> SET MaxClientSessions = 100, for example.
I suppose you are always connecting to the same Vertica node, and that you have set ConnectionLoadBalancing to FALSE. So you always connect to the same node, and soon reach the default maximum of 50.
Hope that's the reason found ....
It is a SpringBoot website and deployed in one Linux server. We use Jmeter to do the load test.
We mock 500 users to visit the webiste index page simultaneously. The index page is very simple html, no database connection,so it is a quite short connection.
After about 2 minutes, Jmeter starts to throw timeout exception as bleow
I guess this is because of website reaching its capacity and running out of connection.
I get one quesiton here, why does website reach its capacity 2 minutes later after Jemter starts. If its TCP connection capacity for this website is 1000, I guess it will reach 1000 very soon after the Jmeter starts, not 2 minutes.
Besides, I see many TCP connections are in TIME_WAIT status in Linux server. I guess this may be related with the connection timeout?
Edit: Someone thinks it is running of port. Someone thinks it is running out of connection. And someone thinks it is running out of processing thread(eg. What does this messge java.net.ConnectException/Connection timed out mean in log.jtl file of Jmeter?). I don't know which one is the exact reason...
Most probably this is due to underlying Linux TCP/IP kernel stack configuration, as per Linux TCP/IP tuning for scalability article:
By default, a connection is supposed to stay in the TIME_WAIT state for twice the msl. Its purpose is to make sure any lost packets that arrive after a connection is closed do not confuse the TCP subsystem (the full details of this are beyond the scope of this article, but ask me if you’d like details). The default msl is 60 seconds, which puts the default TIME_WAIT timeout value at 2 minutes. Which means you’ll run out of available ports if you receive more than about 400 requests a second, or if we look back to how nginx does proxies, this actually translates to 200 requests per second. Not good for scaling.
SO double check timeouts along with maximum number of ports/sockets/files on the Linux server - my expectation is that the aforementioned parameters need to be tuned for high loads.
It's also a good practice to have monitoring of baseline OS health metrics in place (CPU, RAM, Network, Disk, swap usage, etc.). You can use i.e. JMeter PerfMon Plugin or JMeter SSHMon Listener for this.
All required changes have been done to respective files like:
stalecheck=true,
keepalive is checked from HTTP request defaults,
retrycount=1,
hc.parameters file changes,
Socket timeout is 240000
Still we see "java.net.SocketException: Connection reset" in response data however I see the valid requests been passed to Server.
The issue wasnt till we reach 3000 users, worked smoothly till 3000 users.
Connection Reset has a lot of meaning, possible reasons are:
One of the server components is not able to handle load so it closes connections on its side
On JMeter side, check that you running in NON GUI mode and that neither JMeter JVM nor injector machine are overloaded which could explain this. See:
https://jmeter.apache.org/usermanual/get-started.html#non_gui
I'm having an issue with an Amazon EC2 instance during auto scaling. Every command I typed worked. I found no errors. But when testing whether auto scaling is working or not I found that it works until the instance started. The newly spawned instance does not work afterwards though: It's under my load balancer but its status is out of service. One more issue is when I copy and paste the public DNS link into the browser it does not respond and an error is triggered like "firefox can't find ..."
I doubt that there should be problem with the image or the Linux configuration.
Thanks in advance.
Although its been long since you posted it, but try adjusting the health check of the Load balancer,
if your health check is like this
Ping Target:
HTTP:80/index.php
Timeout:
10 seconds
Interval:
30 seconds
Unhealthy Threshold:
4
Healthy Threshold:
2
that means an instance will be marked out of service if the ping target doesn't respond within 10 seconds for 4 consecutive instances, while ELB will try to reach it every 30 seconds.
usually the fact that you get "firefox can't find ..." when you try to access the instance directly means that the service is down. Try to login on the instance check if the service is alive, also check the firewall rules which might block internet/elb requests. Check also your ELB health-check it's a good place to start. If you still have issues try to post some debug information like instance netstat, elb describe, parameters.
Rules on security groups assigned to the instance and the load balancer were not allowing traffic to pass between the two. This caused the health check to fail.So , u r load balancer is out of service.
If you don't have index.html in document root of instance - default health check will fail. You can set custom protocol, port and path for health check when creating load balancer saying as per my experience