Our application connects to Oracle RAC cluster using tomcat connection pool.
One node in the cluster had an issue and went to hung state .
In order to ensure the application is not impacted , we would like to know the in depth status of the connection pool . For eg: are all connections to a particular node in RAC is having an issue . From the list of health check parameters , we could not find a way to identify which group of connection connecting to a RAC is affected
If yes , we can take corrective action based on it ( either manually or automate the replacement of connection pool to the RAC node which is working )
Let us know if anybody has faced such issue
Thanks
Lives
Related
We have 3 app servers behind load balancer and each app server is connected to Oracle database.
We have setup connection pool of minimum 8 size and max 32 size.
Everytime I query database to see active connections. I do not see more than 2 active connections per server .
I know that during peek business hours ,we have more than 50 users accessing application ( and hence database).
However why I do not see more than 2 connections active at database end.
Question 1 ) How to find if jdbc connections on app server live/active at any moment.
Question 2 ) How to find if few connection are lost and do not reach database.
we have configured oracle TAF (Transparent Application Failover) for a dataguard database so that application can use same service name to connect database in case of any issue with primary database and have to switch to standby db but we are having a unique problem where application servers within the datacenter are able to connect to db but servers from different datacenter are failing to connect using taf service ..after 90 sec timeout interval its trying to connect to standby host and failing
Connection using direct hostname and sid are working perfectly fine even across the datacenter
Error :
Caused by: java.io.IOException: Socket read timed out, socket connect lapse 3 ms. plx9852.xyz.com/135.167.30.103 1524 3 1 true
at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:209)
at oracle.net.nt.ConnOption.connect(ConnOption.java:161)
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:470)
... 54 more
pcdrest_taf.db.xyz.com=
(description=(connect_timeout=90)(retry_count=30)(retry_delay=3)(transport_connect_timeout=3)(load_balance=off)(failover=on)(address_list=(address=(protocol=tcp)(host=plx9843.xyz.com)(port=1524))(address=(protocol=tcp)(host=plx9852.xyz.com)(port=1524)))(connect_data=(service_name=pcdrest_taf.db.xyz.com)(failover_mode=(type=select)(method=basic))))
connection string on application using LDAP :
spring.datasource.jdbcUrl=jdbc:oracle:thin:#ldap://polarx.xyz.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx1.xyz.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx2.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx3.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx4.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx5.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com
Just beware Oracle changed meaning of transport_connect_timeout from seconds into milliseconds without any warning in release 12.1.
So if you use this version there is no way to tell whether 3 means seconds or milliseconds.
Since ver 12.2, your value of 3 (miniseconds) is value is too low.
Moreover there were several bugs in Oracle JDBC driver related to TAF. For example:
Bug 12998506 RETRY_COUNT connection parameter is total number of connection attempts when using JDBC thin Description
The RETRY_COUNT connection parameter is the number of additional times
a connection attempt should be made after the initial attempt has
failed. Therefore if RETRY_COUNT is 2 a maximum of 3 connection
attempts will be made for each address in the ADDRESS_LIST. However
JDBC thin takes RETRY_COUNT to mean the total number of connection
attempts so, in the above example, JDBC thin will make a maximum of 2
attempts for each address instead of the expected 3.
This is a follow on from bug 12760352 where addresses in the
ADDRESS_LIST were being retried in the wrong order when using JDBC
thin (e.g. if the address list contained A and B JDBC thin would
attempt connections as A A ... B B ... instead of A B A B ...).
PS: the parameter retry_delay seems to be ignored by JDBC drivers since ver. 12c and higher.
I have an Oracle DB RAC setup of 2 servers and config SCAN hostname pointing to both servers. My Websphere application server config with jdbc string like below and connection pool of 50:
jdbc:oracle:thin:#//scan-hostname:port/dbname
Everything works fine and both db servers are getting request as expected, except that when either node is down (and the other node is normal), my application will get all kinds of exceptions like (connection reset/JDBC commit failed/Connection is closed) just within first several minutes and normal afterwards.
My guess is those pooled connections to the failing node does not do any retry or failover and just throw exceptions. Is it expected behavior for oracle RAC, that failover only working for new connections not existing connections, or am I missing something somewhere to enable failover?
We're testing the failover behaviour using the MariaDB JDBC connector Aurora specific features.
We've set the JDBC URL as the documentation suggest:
jdbc:mysql:aurora://cluster.cluster-xxxx.us-east-1.rds.amazonaws.com/db
The problem is that as soon as we add the aurora: part in the URL schema, we can see an increase in the connections to the database writer until the point that we've to rollback the change (it even reaches 3.000 connections).
Versions:
MariaDB connector: 2.0.1
HikariCP connection pool: 2.6.1
Play-Slick: 2.1.0
Slick: 3.2.0
Configuration:
master {
profile = "slick.jdbc.MySQLProfile$"
db {
driver = "org.mariadb.jdbc.Driver"
url = "jdbc:mysql:aurora://cluster-name.cluster-xxx.us-east-1.rds.amazonaws.com/db_name?characterEncoding=utf8mb4&rewriteBatchedStatements=true&usePipelineAuth=false"
user = "rw_user"
password = "rw_user_pass"
numThreads = 20
queueSize = 1000000
}
}
slaves = [
{
profile = "slick.jdbc.MySQLProfile$"
db {
driver = "org.mariadb.jdbc.Driver"
url = "jdbc:mysql:aurora://cluster-name.cluster-ro-xxx.us-east-1.rds.amazonaws.com/db_name?characterEncoding=utf8mb4&usePipelineAuth=false"
user = "ro_user"
password = "ro_user_pass"
numThreads = 20
queueSize = 1000000
}
}
]
We'd tried to add the aurora: part to the JDBC URL schema after upgrading the MariaDB connector version, but the number of connections to the Reader started to increase again:
If we run a show processlist on the read only endpoint, we can see all the opened connections in "cleaned up" state, and "Sleep" command.
We'd removed the aurora: part from the read only endpoint just in order to stabilize the number of connections to it. Is it possible that the driver searches for the cluster master while opening connections? That would explain this kind of behaviour.
When using the "aurora" keyword, driver , under the hood, create 2 connections:
a connection to the primary server,
a connection to one of the replicas if any.
The goal is always to save resources on the main server. Generally, only one pool is configured. The driver then uses the connection to the primary / replica according to [Connection.setReadOnly] [1].
When you have separate "write" / "read" pools, using the configuration "failover" will solve your issue: Driver will use only one real connection.
This way, there will be no "wasted" connection.
Failover will then be handled differently, but with the same results (for example, a query not in a transaction that is to be sent to a replica that just crashed will not directly use the primary connection as when using the "aurora" configuration, the driver will recreate a new connection to another replicas before executing the query).
Once you get past several dozen active connections, the database starts stumbling over itself. It is better to throttle the connections in the client instead of assuming you have infinite bandwidth to accept connections in Aurora.
Hey I'm using Glassfish open source v4 and I'm having a weird problem.
I have defined a JDBC connection pool to Oracle 11g in the admin console and I've set :
Pool Settings
Initial and Minimum Pool Size: 500
Maximum Pool Size: 1000
Pool Resize Quantity: : 750
And I've created a specific user for this connection pool. Yet sometimes when I inspect opened connections in the database I see that there are more then 1000 (maximum I've seen was 1440)
When this happens any query attempts fail, sometimes with OutOfMemory exception, some show http thread interuptions and some don't show any logs at all, just takes a long time.
What I am wondering is how is it possible the Glassfish opens more connections then I've defined it to?
1t try to compare output from netstat on appl. server and db server side. You may have some "dangling" connections. Also try to find some documentation about DCD (Dead connection detection) in Oracle.
Few years ago I saw situations where Java application server thought that the connection is dead because it is not responding for few minutes. So this connection was put onto some dead connection list and a new connection was created.
There also can be some network issues - for example there is a FW between appl and db server.
When TCP connection is not active for one hour then it's cut over on one side but DB sever does not know about that.
The usual way how to investigate that is
compare output of both netstat(s) (appl./db)
identify dangling TCP connections
translate TCP connection onto Unix process id(PID) of Oracle session process
translate PID onto Oracle session (SID and SERIAL#)
kill the session on Oracle level (alter system kill session ...)