c3p0 connection checkout taking 15 minutes to fail at times - oracle

hit a problem using c3p0. In most cases works fine, but in prod env behind firewalls ocasionally fails to checkout connection. The problem is that it takes it 15 minutes to recognize connection is not usable. The pool is not exausted as other connection are being checked out and used happily during that 15 minute inteval.
logs:
23 Apr 2015 09:08:16.426 [EventProcessor-1] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Testing PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] on CHECKOUT.
15 minutes later:
23 Apr 2015 09:23:43.073 [EventProcessor-1] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Test of PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] on CHECKOUT has FAILED.
java.sql.SQLException: Connection is invalid
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.testPooledConnection(C3P0PooledConnectionPool.java:572) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.finerLoggingTestPooledConnection(C3P0PooledConnectionPool.java:451) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.finerLoggingTestPooledConnection(C3P0PooledConnectionPool.java:443) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.refurbishResourceOnCheckout(C3P0PooledConnectionPool.java:336) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.resourcepool.BasicResourcePool.attemptRefurbishResourceOnCheckout(BasicResourcePool.java:1727) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:553) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:756) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:683) [c3p0-0.9.5.jar:0.9.5]
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140) [c3p0-0.9.5.jar:0.9.5]
and then some more logs:
23 Apr 2015 09:23:43.073 [EventProcessor-1] DEBUG c.m.v.r.BasicResourcePool - A resource could not be refurbished for checkout. [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282]
java.sql.SQLException: Connection is invalid
...
23 Apr 2015 09:23:43.074 [EventProcessor-1] DEBUG c.m.v.r.BasicResourcePool - Resource [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] could not be refurbished in preparation for checkout. Will try to find a better resource.
23 Apr 2015 09:23:43.074 [C3P0PooledConnectionPoolManager[identityToken->67oy4j981qzvkd716hgow4|4177fc5c]-HelperThread-#2] DEBUG c.m.v.r.BasicResourcePool - Preparing to destroy resource: com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282
23 Apr 2015 09:23:43.074 [EventProcessor-1] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Testing PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#41318736] on CHECKOUT.
23 Apr 2015 09:23:43.074 [C3P0PooledConnectionPoolManager[identityToken->67oy4j981qzvkd716hgow4|4177fc5c]-HelperThread-#2] DEBUG c.m.v.c.i.C3P0PooledConnectionPool - Preparing to destroy PooledConnection: com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282
23 Apr 2015 09:23:43.076 [C3P0PooledConnectionPoolManager[identityToken->67oy4j981qzvkd716hgow4|4177fc5c]-HelperThread-#2] DEBUG c.m.v.c3p0.impl.NewPooledConnection - Failed to close physical Connection: oracle.jdbc.driver.T4CConnection#25145762
java.sql.SQLRecoverableException: IO Error: Broken pipe
at oracle.jdbc.driver.T4CConnection.logoff(T4CConnection.java:612) ~[ojdbc6_g-11.2.0.1.0.jar:11.2.0.1.0]
at oracle.jdbc.driver.PhysicalConnection.close(PhysicalConnection.java:5094) ~[ojdbc6_g-11.2.0.1.0.jar:11.2.0.1.0]
at com.mchange.v2.c3p0.impl.NewPooledConnection.close(NewPooledConnection.java:642) [c3p0-0.9.5.jar:0.9.5]
c3p0 Configuration:
ComboPooledDataSource ods = new ComboPooledDataSource();
...
ods.setInitialPoolSize(5);
ods.setMinPoolSize(5);
ods.setMaxPoolSize(10);
ods.setMaxStatements(50);
ods.setTestConnectionOnCheckout(true);
So nothing too exotic. I know connection loss is possible, hence testing connection on checkout. Any ideas why it is taking so long to verify/fail connection? We are using Oracle database.
thanks.

First, I presume you've verified that there are no checkouts of that Connection in between your log messages. Obviously, you'd expect lots of messages like...
Testing PooledConnection [com.mchange.v2.c3p0.impl.NewPooledConnection#5a886282] on CHECKOUT.
...before the final message just prior to the failure. Lots of those messages would occur much earlier. Only the final message just prior to the failure should, ideally, be much closer to detection of the failure than the 15 mins that you are seeing.
Assuming that is the final message like that, then the issue has to do with how your Connections die. c3p0 runs a test, and then waits for either successful completion or an Exception. If your Connection dies in a way such that the Connection test merely hangs for 15 mins, well, then you might see what you are seeing.
Here are a few suggestions.
Use c3p0's idleConnectionTestPeriod to detect these failures ideally prior to client checkouts, so that clients are less likely experience long hangs. (You might test on check-in as well.)
Figure out what kind of Connection test is getting run. You are using c3p0 0.9.5, so if your driver supports it, the default test is a call to Connection.isValid(), which should be fast. I don't see in any of the log you've quoted a stack trace of the actual test failure (Perhaps it is a truncated root cause Exception? It would definitely be logged at FINER/DEBUG level by a logger called com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool) Verify (from the stack trace) that your driver is using a fast isValid() Connection test rather tha c3p0's slow default Connection test. If it is not (presumably because your driver doesn't support that), then consider setting a fast preferredTestQuery.
You could try maxAdministrativeTaskTime, but that is only likely to really help if whatever is hanging the Connection test responds to an interrupt() call.
Anyway, I hope this isn't entirely useless!

it looks like it is a situation when connection is terminated by a firewall in a way that there is no response sent back at all, even a TCP ACK without data. In this case query to verify a connection will never return. This is on socket/jdbc driver level.
Solution:
find out firewall disconnect policy (in our case 1 hour)
set c3p0.maxConnectionAge property to force c3p0 reconnect connections every X seconds.

Related

mqtt loading test with (Jmeter, mosquitto): Publish failed for connection HiveMQTTConnection

I am using Jmeter with MQTT JMeter Plugin to do loading test.
Here is my use cas:
Started 8000 users(threads) during 30 minutes
Each user do one mqtt connect message
Each user do 720 loops to publish a message with 5 seconds timer
Here is my jmeter test plan
My threads
My loop controller:
My timer:
After starting Jmeter, every thing is good:
But after 20 minutes, i am getting many errors for my pub messags:
Here is the error message:
My mqtt server is up and no pb with it.
Jmeter logs:
Aug 01, 2021 3:04:33 PM java.util.Optional ifPresent
INFO: MQTT client is not connected.
Aug 01, 2021 3:04:33 PM net.xmeter.samplers.PubSampler sample
INFO: ** [clientId: ps303411a2200c4e1ca4f34, topic: /test/, payload: 1627830273593ts Publish failed for connection HiveMQTTConnection{clientId='ps303411a2200c4e1ca4f34'}.
Aug 01, 2021 3:04:33 PM java.util.Optional ifPresent
INFO: MQTT client is not connected.
What is the pb ? is this related to Jmeter test plan ? or to my local machine ? i am using EC2 x3 large machine to start Jmeter in background.
since your ramp up period is 1800 sec, you have nearly 5.6k threads at 20th minute where, i think your server starts to saturate. The 501 return code may indicate that some kind of fallback mechanism can give more details about the error, but not sure...
MQTT client is not connected.
It indicates that the connection is down, if you don't see anything suspicious in JMeter logs - most probably it means that your server gets overloaded and cannot handle that many concurrent connections/messages.
Use a combination of listeners like Active Threads Over Time and Response Codes per Second to see what is exact number of users where the problems start occurring
Monitor resources usage like CPU, RAM, Network sockets, Disk IO, etc. to ensure that the MQTT server has enough space to operate, you can use JMeter PerfMon Plugin for this
Check your server logs
Increase JMeter logging verbosity for the MQTT plugin by adding the next line to log4j2.xml file:
<Logger name="net.xmeter" level="debug" />

How to fix "C3P0: A PooledConnection that has already signalled a Connection error is still in use"

We have a web application with the stack Spring, Hibernate, C3P0, Oracle DB Driver (habing an Oracle DB behind).
From time to time we experience blocking locks over a longer period of time which then get killed on the DB end. (we know this is caused by bad application design and we will fix it, but it's not the point of this quesion).
After the DB session was killed by DB it seems that the connection pool reuses the now broken connection which results in the error:
A PooledConnection that has already signalled a Connection error is still in use!
Another error has occurred [ java.sql.SQLRecoverableException: Closed Connection ] which will not be reported to listeners!
On the DataSource we configured
dataSource.setTestConnectionOnCheckin(true);
dataSource.setTestConnectionOnCheckout(true);
But it did not help. We expected that the connections fail these tests and then get renewed. But this does not happen.
Any hints for us how to recreate the broken connections?
This warning is given when a Connection that is already checked out experiences an Exception that causes c3p0 to treat it as invalid (so it will not be reincorporated back into the pool on close()), but the Connection continues to be used and experiences an Exception again. These are not broken Connections in the pool. They are broken Connections in-use by the application. So testing them on checkout (or checkin) doesn't do anything about them.
To get rid of this, you need to examine the Exception handling within your application code. Are there circumstances where an invalid Connection might have thrown an Exception, but that Exception was caught and the Connection reused?
The warning itself is harmless. It's just saying c3p0 already knows the Connection is bad, it won't emit an event to signal that again.

socketException broken pipe upon upgrading httpclient jar version to 4.5.3

I am getting socket exception for broken pipe in my client side.
[write] I/O error: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Broken pipe (Write failed)
[LoggingManagedHttpClientConnection::shutdown] http-outgoing-278: Shutdown connection
1520546494584[20180308 23:01:34] [ConnectionHolder::abortConnection] Connection discarded
1520546494584[20180308 23:01:34] [BasicHttpClientConnectionManager::releaseConnection] Releasing connection [Not bound]
It seems that the upgradation of httpclient jar is causing issue.
Issue is not coming with httpclient-4.3.2
Exception is coming in every 2 minutes. Issue is intermittent at times.
after , send expect:100-continue ,conn.flush is throwing exception
client and server are Linux machine
client uses http jar to make request to server REST.
Please help me in debugging the issue
can httpjar cause such issue?
The persistent connections that are kept alive by the connection manager become stale. That is, the target server shuts down the connection on its end without HttpClient being able to react to that event, while the connection is being idle, thus rendering the connection half-closed or 'stale'
This is a general limitation of the blocking I/O in Java. There is simply no way of finding out whether or not the opposite endpoint has closed connection other than by attempting to read from the socket.
If a stale connection is used to transmit a request message the request execution usually fails in the write operation with SocketException and gets automatically retried.
Apache HttpClient works this problem around by employing the so stale connection check which is essentially a very brief read operation. However, the check can and often is disabled. In fact it is often advisable to have it disabled due to extra latency the check introduces.
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms

java.sql.SQLRecoverableException: Connection has been administratively disabled by console/admin command. Try later

When our application tries to connect Oracle database, this exception is thrown:
java.sql.SQLRecoverableException: Connection has been administratively disabled by console/admin command. Try later. java.lang.Exception: It was disabled at Tue Oct 20 23:55:14 CEST 2015
But, from Weblogic console the connection test returns OK.
Weblogic version: 12.1.3.0.0
Any explanation is welcome. Thanks
The reason the test works is because that is creating a connection and running a test query. That is not what your code is doing when it is using the data source in an ejb. The code is going through the connection pool and that is what has been marked as bad. There is no solution provided by Oracle that I have found except to: restart the server and that will re-enable the connection pool.
I suspect you have "Test connections on reserve" set because that is when this usually arises. What Weblogic does is: before it returns a connection from the pool it will run a test query, if the test query fails it waits and runs it one more time. If the query fails again it marks the connection as unhealthy. If all the connections in the pool become unhealthy it will mark the pool as disabled and gives you the error message you see: 'Connection has been administratively disabled by console/admin command. Try later.'
In regards to the 'Try Later' part of the error message, as far as I can tell Oracle is wrong about trying again later. I have never seen it recover later.
I'd like to share this article that help me out to understand better my problem:
https://www.techpaste.com/2012/09/connection-administratively-destroyed-reconnect-oracle-weblogic-server/
The “java.sql.SQLException: Connection has been administratively destroyed.” is expected:
the DB was shutdown; even if it was restarted later, the JDBC connection are pointing to DB processes
that have been destroyed.
You need to restart your WebLogic Server to recreate new JDBC connections.
All the current transactions are lost, as the database was shutdown.
High availability of your RDBMS is required to minimize this issue.
Is there any other error before that one ?
Maybe in the log you can find that connection is been closed.
You can avoid this by selecting "Test on Connection Reserve" in the datasource.

IHS/Websphere 6.1.31 on AIX - Heavy Load - Stop accepting request afet 170 concurrent users - Connection Refused on same server

I was wondering if someone could point me out into the right direction.
Right now our IHS / Websphere Server is unable to handle more than 170 concurrent users.
We have tuned the IHS, Websphere Thread Pools, Datasource properties, JVM Heap and kernel parameters.
On heavy load we are seeing this in the IHS plugin log
[Mon Jun 27 10:42:15 2011] 00e90070 00002f30 - ERROR: ws_common: websphereGetStream: Failed to connect to app server on host 'XXXXXX', OS err=79
[Mon Jun 27 10:42:15 2011] 00e90070 00002f30 - ERROR: ws_common: websphereExecute: Failed to create the stream
[Mon Jun 27 10:42:15 2011] 00e90070 00002f30 - ERROR: ws_common: websphereHandleRequest: Failed to execute the transaction to 'XXXXXXNode01_YYYYYY'on host 'XXXXXX'; will try another one
Error 79 is connection refused! The strange thing is that both the IHS and the Websphere are on the same server...
Verifying the Thread Pools in the WAS we don't see them reaching their maximum. Monitoring the HEAP it seems OK...
Any ideas?
What is the number of maximum concurrent connections specified # the Web Containers in WAS Cluster?
Can you make a direct call to the XXXXXXNode01_YYYYYY (by passing the IHS) when this error occurs via your browser)? If it still gives your errors, it simply validates the message provided by the plug-in.
HTH
Manglu

Resources