VPN IKE Renegotiation causing dropped database connections in Glassfish - amazon-ec2

We have a Checkpoint VPN connection to the AWS cloud. Our database connections appear to be dropping during the IKE renegotiation that happens regularly during a 24 hour period. In general the renegotiation takes less than 60 seconds, but our database connections are dropping in Glassfish 4.1.1, our application server. A failed connection validation always precedes the dropped connection. What can be done to keep these connections alive longer (no timeouts) so that they don't drop when the renegotiation is being done ?

Related

Azure VM is closing the idle sessions in my postgres database

I am creating a VM in azure to upload a postgres instance in docker and connect to it with my local backend in Spring. What happens is that once connected to the DB after X time of inactivity when trying to make a request I get the following "HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#f162126 (This connection has been closed.). Possibly consider using a shorter maxLifetime value." digging around I realized that it is as if my VM has some kind of behavior that when a connection becomes inactive it closes it causing the above error. The curious thing here is that the sessions are not closed as you can see in the following image even shutting down my backend the sessions are maintained and the only options to delete them is restarting the container in which the DB is hosted.
I have tried to reproduce this behavior on local but it never happens even if I leave the backend idle for an hour if I do the request to the DB it works as if nothing, it only happens with my VM in azure.
I want to clarify that the sessions that appear in the attached image no longer work, i.e. if I try to consume the DB from spring, the error I mentioned appears and automatically Hikari creates new sessions for its pool and I can reproduce this behavior until I reach 100 sessions that after a while would not work again and that Spring never closes when shutting down the backend.
HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#f162126 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
This error is thrown by method isConnectionDead. While checking the connection, if it's still alive & can be used and it will issue an above error if it has already been closed.
You can adjust your maxLifetime setting, to resolve this problem. 30000ms (30 seconds) is the shortest value that is permitted (30 seconds), 1800000ms (30 minutes) is by default value.
A connection in the pool can only last for a certain amount of time which is controlled by the maxLifetime attribute. the value of this property it should be several seconds below any connection time restrictions set by the infrastructure or any databases.
Reference: Hikari Configuration Github.
Well, after much research and reviewing various sources it turns out that azure when creating a VM has certain security policies as Pedro Perez says in the following post in Stack Excahnge: Azure closing idle network connections
You're hitting a design feature of the software load balancer in front of your VMs. By default it will close any idle connections after 4 minutes, but you can configure the timeout to be anything between those 4 and 30 minutes
So in order to overwrite this policy that governs your VM you must do the process of creating a load balancer, do all the relevant configuration and create a Load balancing rule for port 5432 which is the default port of postgres and set the Idle timeout in a range of 4 to 30 min according to your needs.
finally configure your VM so its public ip points to the LB(Load Balancer) public ip and everything will work normally.
It is true that if you simply want to take advantage of Azure's default security policies on the VMs you create you should set the maxLifetime to a maximum of 4 minutes in your Spring application.properties or appliation.yml as #PratikLad says.
in my case I prefer to leave the default Hikari configuration (maxLifetime of 30 mins) so I need to create the LB but if you prefer to change the property by setting it to a maximum of 4 min you would not need to do all the above mentioned on the LB.

Connection pooling with xorm and go-mysql

Just cross-posting this from github.
I am using xorm 0.4.3 with go-mysql. We are on Golang 1.4.
We have specified maxIdleConnetions and maxOpenConnections in xorm as below:-
var orm *xorm.Engine
...
orm.SetMaxOpenConns(50)
orm.SetMaxIdleConns(5)
And we are using the same single xorm instance to query Mysql.
But still we are seeing lot of connections in TCP Connection Establised state which is way over the numbers I have configured in maxIdleConnetions and maxOpenConnections state when we lsof:-
app 8747 10568 sandeshsharma 16u IPv4 691032 0t0 TCP 127.0.0.1:57337->127.0.0.1:mysql (ESTABLISHED)
We have also observed that even if we stop the MySQL, the connection numbers still remain fixed but in the CLOSED_WAIT state. If we shutdown the app then all connections go away.
app 8747 10844 sandeshsharma 38u IPv4 505058 0t0 TCP 127.0.0.1:54160->127.0.0.1:mysql (CLOSE_WAIT)
However in mysql process list it is showing the correct number of connections as I have specified in maxIdleConnetions and maxOpenConnections.
Can some one please explain me this behaviour? Why are we observing so much TCP connections even though we have specified maxIdleConnetions and maxOpenConnections to 5 & 50 respectively?
First of all, Go 1.4 is too old. Use latest Go 1.6.
This answer is written with knowledge of Go 1.6. So some details may be different in your case.
There are four state in connection: connecting, idle, inuse, and closing.
MaxOpenConnections limits number of connection in connecting, idle, inuse state.
So, if your application closes and reopen connection quickly, it can happen.
Since TCP is CLOSED_WAIT state in MySQL server side, your app is waiting EOF from connection. I suppose your app is under very high load
and slow at reading EOF and closing connection. Until read EOF and close connection, TCP state is ESTABLISHED on client side, regardless TCP state in server side.
I recommend you to update Go and "go-sql-driver/mysql", and set MaxIdleConns equal to MaxOpenConns to avoid high reconnect rate.
Instead, you can use SetConnMaxLifetime (new API in Go 1.6) to close
connections when your app is idle.

Connection pool opens more connections then maximum pool size

Hey I'm using Glassfish open source v4 and I'm having a weird problem.
I have defined a JDBC connection pool to Oracle 11g in the admin console and I've set :
Pool Settings
Initial and Minimum Pool Size: 500
Maximum Pool Size: 1000
Pool Resize Quantity: : 750
And I've created a specific user for this connection pool. Yet sometimes when I inspect opened connections in the database I see that there are more then 1000 (maximum I've seen was 1440)
When this happens any query attempts fail, sometimes with OutOfMemory exception, some show http thread interuptions and some don't show any logs at all, just takes a long time.
What I am wondering is how is it possible the Glassfish opens more connections then I've defined it to?
1t try to compare output from netstat on appl. server and db server side. You may have some "dangling" connections. Also try to find some documentation about DCD (Dead connection detection) in Oracle.
Few years ago I saw situations where Java application server thought that the connection is dead because it is not responding for few minutes. So this connection was put onto some dead connection list and a new connection was created.
There also can be some network issues - for example there is a FW between appl and db server.
When TCP connection is not active for one hour then it's cut over on one side but DB sever does not know about that.
The usual way how to investigate that is
compare output of both netstat(s) (appl./db)
identify dangling TCP connections
translate TCP connection onto Unix process id(PID) of Oracle session process
translate PID onto Oracle session (SID and SERIAL#)
kill the session on Oracle level (alter system kill session ...)

tomcat jdbc connection pooling with Oracle database

We are using tomcat jdbc connection pooling with Oracle database.
Recently, we encountered problem that there are too many inactive session in Oracle database from JDBC THIN CLIENT.
Can anyone help us in this? Why is it causing inactive session in the database and what can be the solution to this.
Thank you.
Adjust settings of your pool, following the documentation. Setting maxIdle and minEvictableIdleTimeMillis to low values would ensure that idle connections are evicted fast, and that few idle connections stay open. Obviously, it will also make your pool less efficient, since connections will be closed and opened more frequently.
If you are getting SQLExceptions because connections have timed out, you should set a validationQuery (to something like SELECT 0 FROM DUAL) and your connections will be tested before they are checked-out of the pool. Any connection that has failed will be replaced by a working connection and then returned to your code.

How can I set the timeout on OCILogon2?

When the Oracle 10 databases are up and running fine, OCILogon2() will connect immediately. When the databases are turned off or inaccessible due to network issues - it will fail immediately.
However when our DBAs go into emergency maintenance and block incomming connections, it can take 5 to 10 minutes to timeout.
This is problematic for me since I've found that OCILogin2 isn't thread safe and we can only use it serially - and I connect to quite a few Oracle DBs. 3 blocked servers X 5-10 minutes = 15 to 30 minutes of lockup time
Does anyone know how to set the OCILogon2 connection timeout?
Thanks.
I'm currenty playing with OCI and it seems to me that it's impossible.
The only way I can think of is to use non-blocking mode. You'll need OCIServerAttach() and OCISessionBegin() instead of OCILogon() in this case. But when I tried this, OCISessionBegin() constantly returns OCI_ERROR with the following error code:
ORA-03123 operation would block
Cause: The attempted operation cannot complete now.
Action: Retry the operation later.
It looks strange and I don't yet know how to deal with it.
Possible workaround is to run your logon in another process, which you can kill after timeout...
We think we found the right file setting - but it's one of those problems where we have to wait until something rare and horrible occurs before we can verify it :-/
[sqlnet.ora]
SQLNET.OUTBOUND_CONNECT_TIMEOUT=60
From the Oracle docs..
http://download.oracle.com/docs/cd/B28359_01/network.111/b28317/sqlnet.htm#BIIFGFHI
5.2.35 SQLNET.OUTBOUND_ CONNECT _TIMEOUT
Purpose
Use the SQLNET.OUTBOUND_ CONNECT _TIMEOUT parameter to specify the time, in seconds, for a client to establish an Oracle Net connection to the database instance.
If an Oracle Net connection is not established in the time specified, the connect attempt is terminated. The client receives an ORA-12170: TNS:Connect timeout occurred error.
The outbound connect timeout interval is a superset of the TCP connect timeout interval, which specifies a limit on the time taken to establish a TCP connection. Additionally, the outbound connect timeout interval includes the time taken to be connected to an Oracle instance providing the requested service.
Without this parameter, a client connection request to the database server may block for the default TCP connect timeout duration (approximately 8 minutes on Linux) when the database server host system is unreachable.
The outbound connect timeout interval is only applicable for TCP, TCP with SSL, and IPC transport connections.
Default
None
Example
SQLNET.OUTBOUND_ CONNECT _TIMEOUT=10

Resources