I am using Jmeter with MQTT JMeter Plugin to do loading test.
Here is my use cas:
Started 8000 users(threads) during 30 minutes
Each user do one mqtt connect message
Each user do 720 loops to publish a message with 5 seconds timer
Here is my jmeter test plan
My threads
My loop controller:
My timer:
After starting Jmeter, every thing is good:
But after 20 minutes, i am getting many errors for my pub messags:
Here is the error message:
My mqtt server is up and no pb with it.
Jmeter logs:
Aug 01, 2021 3:04:33 PM java.util.Optional ifPresent
INFO: MQTT client is not connected.
Aug 01, 2021 3:04:33 PM net.xmeter.samplers.PubSampler sample
INFO: ** [clientId: ps303411a2200c4e1ca4f34, topic: /test/, payload: 1627830273593ts Publish failed for connection HiveMQTTConnection{clientId='ps303411a2200c4e1ca4f34'}.
Aug 01, 2021 3:04:33 PM java.util.Optional ifPresent
INFO: MQTT client is not connected.
What is the pb ? is this related to Jmeter test plan ? or to my local machine ? i am using EC2 x3 large machine to start Jmeter in background.
since your ramp up period is 1800 sec, you have nearly 5.6k threads at 20th minute where, i think your server starts to saturate. The 501 return code may indicate that some kind of fallback mechanism can give more details about the error, but not sure...
MQTT client is not connected.
It indicates that the connection is down, if you don't see anything suspicious in JMeter logs - most probably it means that your server gets overloaded and cannot handle that many concurrent connections/messages.
Use a combination of listeners like Active Threads Over Time and Response Codes per Second to see what is exact number of users where the problems start occurring
Monitor resources usage like CPU, RAM, Network sockets, Disk IO, etc. to ensure that the MQTT server has enough space to operate, you can use JMeter PerfMon Plugin for this
Check your server logs
Increase JMeter logging verbosity for the MQTT plugin by adding the next line to log4j2.xml file:
<Logger name="net.xmeter" level="debug" />
Related
The target for us to achieve is 500 concurrent users.
We have tried running a test for 100 users over 3 machines. And it ran fine without any errors.
When i tried running the test for 150 or More users with same number of machines, i started getting the following response code
Response code:Non Http Response code:java.net.socketException
Response message:Connection Reset
I have also tried increasing the number of machines to 8 machines. Still it is of no help. Response time is also very high (156 seconds) for some of the requests.
When we checked the server logs to find out what could be causing this issue, No error logs were found there during the time of the execution.
I'm having a hard time finding out what could be the issue. The server side is ruling out if there could be an issue from their end.
Tried the following fixes from Jmeter side:
Increasing the heap size
Changing the retry count in user.properties file
Changing Boolean=True in hc.parameters file
Used HTTP Request Defaulters to change the implementation to HTTPClient4
CPU Config:
Intel (R) Xeon(R) CPU E5-2690 v3 # 2.60 GHz (2 Processors)
5 GB Ram
64-bit Operating System
The Connection Reset error means failed attempt to write to the socket which has been closed already, on TCP protocol level it means receiving a TCP RST
It might be the case JMeter is closing the connection prematurely as JMeter 5.0 had httpclient4.time_to_live property set to 2000 and if you're seeing response times > 2 seconds (and you do) most probably JMeter is closing the connection before getting the full response.
You can try increasing this setting to 60000 matches modern browsers default settings or even more to match your application response time (if you think 3 minutes is acceptable) or consider upgrading to JMeter 5.3 which has better default value.
More information: Bug 64289
The target for us to achieve is 500 concurrent users.
We have tried running a test for 100 users over 3 machines. And it ran fine without any errors.
When i tried running the test for 150 or More users with same number of machines, i started getting the following response code
Response code:Non Http Response code:java.net.socketException
Response message:Connection Reset
I have also tried increasing the number of machines to 8 machines. Still it is of no help. Response time is also very high (156 seconds) for some of the requests.
When we checked the server logs to find out what could be causing this issue, No error logs were found there during the time of the execution.
I'm having a hard time finding out what could be the issue. The server side is ruling out if there could be an issue from their end.
Tried the following fixes from Jmeter side:
Increasing the heap size
Changing the retry count in user.properties file
Changing Boolean=True in hc.parameters file
Used HTTP Request Defaulters to change the implementation to HTTPClient4
CPU Config:
Intel (R) Xeon(R) CPU E5-2690 v3 # 2.60 GHz (2 Processors)
5 GB Ram
64-bit Operating System
The Connection Reset error means failed attempt to write to the socket which has been closed already, on TCP protocol level it means receiving a TCP RST
It might be the case JMeter is closing the connection prematurely as JMeter 5.0 had httpclient4.time_to_live property set to 2000 and if you're seeing response times > 2 seconds (and you do) most probably JMeter is closing the connection before getting the full response.
You can try increasing this setting to 60000 matches modern browsers default settings or even more to match your application response time (if you think 3 minutes is acceptable) or consider upgrading to JMeter 5.3 which has better default value.
More information: Bug 64289
hit a problem using c3p0. In most cases works fine, but in prod env behind firewalls ocasionally fails to checkout connection. The problem is that it takes it 15 minutes to recognize connection is not usable. The pool is not exausted as other connection are being checked out and used happily during that 15 minute inteval.
logs:
23 Apr 2015 09:08:16.426 [EventProcessor-1] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Testing PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] on CHECKOUT.
15 minutes later:
23 Apr 2015 09:23:43.073 [EventProcessor-1] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Test of PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] on CHECKOUT has FAILED.
java.sql.SQLException: Connection is invalid
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:572) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.finerLoggingTestPooledConnection(C3P0PooledConnectionPool.java:451) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.finerLoggingTestPooledConnection(C3P0PooledConnectionPool.java:443) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishResourceOnCheckout(C3P0PooledConnectionPool.java:336) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.resourcepool.BasicResourcePool.attemptRefurbishResourceOnCheckout(BasicResourcePool.java:1727) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:553) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:756) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:683) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140) [c3p0-0.9.5.jar:0.9.5]
and then some more logs:
23 Apr 2015 09:23:43.073 [EventProcessor-1] DEBUG c.m.v.r.BasicResourcePool - A resource could not be refurbished for checkout. [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282]
java.sql.SQLException: Connection is invalid
...
23 Apr 2015 09:23:43.074 [EventProcessor-1] DEBUG c.m.v.r.BasicResourcePool - Resource [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] could not be refurbished in preparation for checkout. Will try to find a better resource.
23 Apr 2015 09:23:43.074 [C3P0PooledConnectionPoolManager[identityToken->67oy4j981qzvkd716hgow4|4177fc5c]-HelperThread-#2] DEBUG c.m.v.r.BasicResourcePool - Preparing to destroy resource: com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282
23 Apr 2015 09:23:43.074 [EventProcessor-1] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Testing PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#41318736] on CHECKOUT.
23 Apr 2015 09:23:43.074 [C3P0PooledConnectionPoolManager[identityToken->67oy4j981qzvkd716hgow4|4177fc5c]-HelperThread-#2] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Preparing to destroy PooledConnection: com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282
23 Apr 2015 09:23:43.076 [C3P0PooledConnectionPoolManager[identityToken->67oy4j981qzvkd716hgow4|4177fc5c]-HelperThread-#2] DEBUG c.m.v.c3p0.impl.NewPooledConnection - Failed to close physical Connection: oracle.jdbc.driver.T4CConnection#25145762
java.sql.SQLRecoverableException: IO Error: Broken pipe
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:612) ~[ojdbc6_g-11.2.0.1.0.jar:11.2.0.1.0]
at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:5094) ~[ojdbc6_g-11.2.0.1.0.jar:11.2.0.1.0]
at com.mchange.v2.c3p0.impl.NewPooledConnection.close(NewPooledConnection.java:642) [c3p0-0.9.5.jar:0.9.5]
c3p0 Configuration:
ComboPooledDataSource ods = new ComboPooledDataSource();
...
ods.setInitialPoolSize(5);
ods.setMinPoolSize(5);
ods.setMaxPoolSize(10);
ods.setMaxStatements(50);
ods.setTestConnectionOnCheckout(true);
So nothing too exotic. I know connection loss is possible, hence testing connection on checkout. Any ideas why it is taking so long to verify/fail connection? We are using Oracle database.
thanks.
First, I presume you've verified that there are no checkouts of that Connection in between your log messages. Obviously, you'd expect lots of messages like...
Testing PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] on CHECKOUT.
...before the final message just prior to the failure. Lots of those messages would occur much earlier. Only the final message just prior to the failure should, ideally, be much closer to detection of the failure than the 15 mins that you are seeing.
Assuming that is the final message like that, then the issue has to do with how your Connections die. c3p0 runs a test, and then waits for either successful completion or an Exception. If your Connection dies in a way such that the Connection test merely hangs for 15 mins, well, then you might see what you are seeing.
Here are a few suggestions.
Use c3p0's idleConnectionTestPeriod to detect these failures ideally prior to client checkouts, so that clients are less likely experience long hangs. (You might test on check-in as well.)
Figure out what kind of Connection test is getting run. You are using c3p0 0.9.5, so if your driver supports it, the default test is a call to Connection.isValid(), which should be fast. I don't see in any of the log you've quoted a stack trace of the actual test failure (Perhaps it is a truncated root cause Exception? It would definitely be logged at FINER/DEBUG level by a logger called com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool) Verify (from the stack trace) that your driver is using a fast isValid() Connection test rather tha c3p0's slow default Connection test. If it is not (presumably because your driver doesn't support that), then consider setting a fast preferredTestQuery.
You could try maxAdministrativeTaskTime, but that is only likely to really help if whatever is hanging the Connection test responds to an interrupt() call.
Anyway, I hope this isn't entirely useless!
it looks like it is a situation when connection is terminated by a firewall in a way that there is no response sent back at all, even a TCP ACK without data. In this case query to verify a connection will never return. This is on socket/jdbc driver level.
Solution:
find out firewall disconnect policy (in our case 1 hour)
set c3p0.maxConnectionAge property to force c3p0 reconnect connections every X seconds.
I have setup a simple proxy on an EC2 instance using Tinyproxy (default config listening/allowing all incoming connections). This works well. If I, for debugging, fill in IP address and the port to proxy settings in my browser, I can brows through the proxy without any issues. Everything works. However if I create an EC2 load balancer in front of the instance (making sure to forward the http port correctly) it just hangs when I browse through the load balancers IP. This seems like a puzzle to me. The instance is running, the load balancer reports "in service", and going around the load balancer works, but going through it just hangs. What am I missing out, and where should I look for the error?
UPDATE
I have now had a look at the logs of Tinyproxy: When trying to access google.com directly through the instances proxy, I see logs like this:
CONNECT Apr 30 20:41:33 [1862]: Request (file descriptor 6): GET http://google.com/ HTTP/1.1
INFO Apr 30 20:41:33 [1862]: No upstream proxy for google.com
CONNECT Apr 30 20:41:33 [1862]: Established connection to host "google.com" using file descriptor 7.
INFO Apr 30 20:41:33 [1862]: Closed connection between local client (fd:6) and remote client (fd:7)
CONNECT Apr 30 20:41:33 [1901]: Connect (file descriptor 6): x1-6-84-1b-ADDJF-20-07-92.fsdfe [430.12327.65117.615]
CONNECT Apr 30 20:41:33 [1901]: Request (file descriptor 6): GET http://www.google.ie/?gws_rd=cr&ei=_V9hU8DeFMTpPJjygIgC HTTP/1.1
INFO Apr 30 20:41:33 [1901]: No upstream proxy for www.google.ie
CONNECT Apr 30 20:41:33 [1901]: Established connection to host "www.google.ie" using file descriptor 7.
However if i try to access google through the load balancer, that then forwards to the instance, then I see logs like this:
CONNECT Apr 30 20:42:54 [1860]: Request (file descriptor 6): GET / HTTP/1.1
CONNECT Apr 30 20:42:54 [1869]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [10.238.155.237]
CONNECT Apr 30 20:42:54 [2037]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [10.238.155.237]
INFO Apr 30 20:42:54 [1860]: process_request: trans Host GET http://google.com:8888/ for 6
INFO Apr 30 20:42:54 [1860]: No upstream proxy for google.com
CONNECT Apr 30 20:43:12 [1861]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [1230.23845.515.2537]
CONNECT Apr 30 20:43:12 [2035]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [143.238.12345.117]
ERROR Apr 30 20:43:12 [2035]: read_request_line: Client (file descriptor: 6) closed socket before read.
ERROR Apr 30 20:43:12 [1861]: read_request_line: Client (file descriptor: 6) closed socket before read.
ERROR Apr 30 20:43:12 [2035]: Error reading readble client_fd 6
ERROR Apr 30 20:43:12 [1861]: Error reading readble client_fd 6
WARNING Apr 30 20:43:12 [2035]: Could not retrieve request entity
WARNING Apr 30 20:43:12 [1861]: Could not retrieve request entity
From what I notice, then the ELB is trying to send the request through port 8888
You can get ELB access logs. These Access logs can help you determine the time taken for a request at different intervals. e.g:
request_processing_time: Total time elapsed (in seconds) from the time the load balancer receives the request and sends the request to a registered instance.
backend_processing_time: Total time elapsed (in seconds) from the time the load balancer sends the request to a registered instance and the instance begins sending the response headers.
response_processing_time: Total time elapsed (in seconds) from the time the load balancer receives the response header from the registered instance and starts sending the response to the client. this processing time includes both queuing time at the load balancer and the connection acquisition time from the load balancer to the backend.
...and a lot more information. You need to configure the access logs first. Please follow below articles to get more understanding around using ELB access logs:
Access Logs for Elastic Load Balancers
Access Logs
These logs may/may not solve your problem but is certainly a good point to start with. Besides, you can always check with AWS Technical support for more in depth analysis.
It sounds like you're trying to use ELB in HTTP mode as, essentially, something resembling a forward proxy, or at least a gateway to a forward proxy that you're running behind it. That's not ELB's intended application, so it isn't surprising that it wouldn't work in that configuration.
ELB in HTTP listener mode is intended to be used as a reverse proxy in front of a web endpoint/origin server.
Configuring your ELB listener to use TCP mode instead of HTTP mode should allow your intended configuration to work, but this is somewhat outside the optimum application of ELB.
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html
I created jmeter test plan with 2000 threads and 10 ramp-up time.
When i ran the test against apache server, some of my test results give a connection refused error.
The connection refused error occured after 21 seconds.
So, my question is this 21 seconds originates from the jmeter or the apache web server?
As far as I know, apache server timeout default is 30 seconds, i didn't change that.
This means your apache server is refusing connections, which means it could be overloaded or misconfigured.