We're testing the failover behaviour using the MariaDB JDBC connector Aurora specific features.
We've set the JDBC URL as the documentation suggest:
jdbc:mysql:aurora://cluster.cluster-xxxx.us-east-1.rds.amazonaws.com/db
The problem is that as soon as we add the aurora: part in the URL schema, we can see an increase in the connections to the database writer until the point that we've to rollback the change (it even reaches 3.000 connections).
Versions:
MariaDB connector: 2.0.1
HikariCP connection pool: 2.6.1
Play-Slick: 2.1.0
Slick: 3.2.0
Configuration:
master {
profile = "slick.jdbc.MySQLProfile$"
db {
driver = "org.mariadb.jdbc.Driver"
url = "jdbc:mysql:aurora://cluster-name.cluster-xxx.us-east-1.rds.amazonaws.com/db_name?characterEncoding=utf8mb4&rewriteBatchedStatements=true&usePipelineAuth=false"
user = "rw_user"
password = "rw_user_pass"
numThreads = 20
queueSize = 1000000
}
}
slaves = [
{
profile = "slick.jdbc.MySQLProfile$"
db {
driver = "org.mariadb.jdbc.Driver"
url = "jdbc:mysql:aurora://cluster-name.cluster-ro-xxx.us-east-1.rds.amazonaws.com/db_name?characterEncoding=utf8mb4&usePipelineAuth=false"
user = "ro_user"
password = "ro_user_pass"
numThreads = 20
queueSize = 1000000
}
}
]
We'd tried to add the aurora: part to the JDBC URL schema after upgrading the MariaDB connector version, but the number of connections to the Reader started to increase again:
If we run a show processlist on the read only endpoint, we can see all the opened connections in "cleaned up" state, and "Sleep" command.
We'd removed the aurora: part from the read only endpoint just in order to stabilize the number of connections to it. Is it possible that the driver searches for the cluster master while opening connections? That would explain this kind of behaviour.
When using the "aurora" keyword, driver , under the hood, create 2 connections:
a connection to the primary server,
a connection to one of the replicas if any.
The goal is always to save resources on the main server. Generally, only one pool is configured. The driver then uses the connection to the primary / replica according to [Connection.setReadOnly] [1].
When you have separate "write" / "read" pools, using the configuration "failover" will solve your issue: Driver will use only one real connection.
This way, there will be no "wasted" connection.
Failover will then be handled differently, but with the same results (for example, a query not in a transaction that is to be sent to a replica that just crashed will not directly use the primary connection as when using the "aurora" configuration, the driver will recreate a new connection to another replicas before executing the query).
Once you get past several dozen active connections, the database starts stumbling over itself. It is better to throttle the connections in the client instead of assuming you have infinite bandwidth to accept connections in Aurora.
Related
Currently, master-slave replication is finished, and select and insert are branched using the readwritesplit function of maxscale.
I was using common dbcp and checking the connection using the options of testOnBorrow and validationQuery through datasource configuration, but because the query is transmitted through maxscale, select 1 of validationQuery is only transmitted to the slave, and the connection validity of the master cannot be checked.
The master does not check the validity of the connection, so if you connect after not using WAS for a long time, a db connection related error occurs.
to solve this problem
I used master_accept_reads = true,
but I don't want to use it as it will generate more traffic to the master.
As another option persistpoolmax,persistmaxtime
I used, but I got the same error message.
I am wondering if there is a way to send a connection validation query such as ValidationQuery in maxscale or mariadb without distinction between master and slave.
Thank you for reading the long text.
You can use the hint filter to route queries to the master.
https://mariadb.com/kb/en/mariadb-maxscale-24-hintfilter/
I have an Oracle DB RAC setup of 2 servers and config SCAN hostname pointing to both servers. My Websphere application server config with jdbc string like below and connection pool of 50:
jdbc:oracle:thin:#//scan-hostname:port/dbname
Everything works fine and both db servers are getting request as expected, except that when either node is down (and the other node is normal), my application will get all kinds of exceptions like (connection reset/JDBC commit failed/Connection is closed) just within first several minutes and normal afterwards.
My guess is those pooled connections to the failing node does not do any retry or failover and just throw exceptions. Is it expected behavior for oracle RAC, that failover only working for new connections not existing connections, or am I missing something somewhere to enable failover?
Running Postgresql 9.5 on a windows server 2012 R2 in Azure
While running some loadtests on my application, I get errors on not being able to connect to the postgres server. In the logs of postgres I get the following message:
could not receive data from client: No connection could be made
because the target machine actively refused it.
This only happens when the loadtest goes to the next scenario, hitting a different part of the code. So new connections to the database are required. But after 10-20 seconds the rest of the scenario works flawlessly without hitting any other hiccups. So the problem seems to be the tcp connections. (My code retries a couple of times but it is not feasible to let it retry for 20 seconds)
I'm using the following settings in the config files
postgresql.conf
listen_addresses = '*'
max_connections = 500
shared_buffers = 1024MB
temp_buffers = 2MB
work_mem = 2MB
maintenance_work_mem = 128MB
pg_hba.conf
host all all 0.0.0.0/0 trust
host all all ::/0 trust
I know, I know.. It is not save to accept connections from everyone, but this is just for testing purposes and to make sure these settings are not blocking any connection. So this answer is void
I've been monitoring the number of connection on the server and under the load it is stable at 75. Postgres is using around 350mb of RAM. So given the config and the vm specs (7gb ram) there should be plenty of space to create more connections. However when the next scenario is spinning up the number of connections does not increase, it stays level and starts giving these log messages about no connection could be made.
What could be the problem here?
It does sound like this isn't really a Postgres problem (hence no changes in DB stats you're checking), rather that the traffic is being stopped by the server. Possibly because traffic on that port is saturated while handling your load testing queries?
It doesn't sound like you're hitting any of the Azure resource limits (including the database limits if that applies to your setup?), but without more detail on your load tests it's hard to say exactly what is needed.
Solutions from around the web and other SO answers suggest:
Disable TCP autotuning and tweak the TCP/IP registry keys on the server, e.g. set TcpAckFrequency - see this article for details
Make TCP setting adjustments (like WinsockListenBacklog) - which may be affected by whether connection pooling is in use or not - see this MS support article, which is for SQL Server 2005 but has some great tips on troubleshooting rejected TCP/IP connections (using Network Monitor, but applies to newer tools)
Faster request processing if you have enough control of the server - source
Disabling network proxying (in your load testing app): <defaultProxy> <proxy usesystemdefault="False"/> </defaultProxy> - source
Most possible reason is a Firewall/Anti-virus:
Software/Personal Firewall Settings
Multiple Software/Personal Firewalls
Anti-virus Software
LSP Layer
(Virtual) Router Firmware
Does your current Azure infrastructure contain Firewall or Anti-virus ?
Additionally on doing some additional searches, it looks like this is a standard Windows "connection refused" message, which suggests that PostgreSQL is trying to connect to something and being refused.
Also possible that one network element in your network - assuming that you are still connected to the server - will delay or drop somes DB login/authentication network packets (considered for example as a fake auth.replay) ...
You may also use a packet analyzer (like Wireshark) to record/inspect network flow when the error appear.
Regards
I was facing the same issue in my AspNet core application while I was trying to connect the Postgresql from my application. The error was thrown in the Program.cs file when I was calling the Migrate function.
public static void Main(string[] args) {
try {
var host = BuildWebHost(args);
using(var scope = host.Services.CreateScope()) {
// Migrate once after app is started.
scope.ServiceProvider.GetService <MyDatabaseContext>().Migrate();
}
host.Run();
}
catch(Exception e) {
//NLog: catch setup errors
_logger ? .Error(e, "Stopped program because of exception: ");
throw;
}
}
To fix this problem I did the following steps.
Check whether the Postgresql service is running by going to the services.msc
Tried to login to the pgAdmin with the user and password I provided in the database context
Everything was file, and as you know that 5432 is the default port of Postgresql and somehow I was using a different port in my application connection string, changing it to 5432 fixed this issue for me.
"ConnectionString": "User Id=postgres;Password=mypwd;Host=localhost;Port=5432;Database=mydb;"
I came across a similar issue whilst trying to beast my api, where I was seeing Npgsql.NpgsqlException No connection could be made because the target machine actively refused it..
However my issue was was down to the fact that I was re-creating my NpgsqlConnection for each query rather than re-using and keeping it alive.
Does SQL Azure allow 3-rd party connection pool like HikariCP or BoneCP?
We configured HikariCP it works when we just run app but later db doesnt response on request. Is it HikariCP issue or it's common connection poool issue and no need spending more time on investigation?
HikariConfig config = new HikariConfig();
config.setMaximumPoolSize(50);
config.setDriverClassName(env.getProperty("jdbc.driverClassName"));
config.setJdbcUrl(env.getProperty("jdbc.url"));
config.setUsername(env.getProperty("jdbc.user"));
config.setPassword(env.getProperty("jdbc.pass"));
config.addDataSourceProperty("cachePrepStmts", env.getProperty("jdbc.cachePrepStmts"));
config.addDataSourceProperty("prepStmtCacheSize", env.getProperty("jdbc.prepStmtCacheSize"));
config.addDataSourceProperty("prepStmtCacheSqlLimit", env.getProperty("jdbc.prepStmtCacheSqlLimit"));
config.addDataSourceProperty("useServerPrepStmts", env.getProperty("jdbc.useServerPrepStmts"));
See this SQL Azure page re: Connection Constraints.
Maximum allowable durations are subject to change depending on the resource usage.
A logged-in session that has been idle for 30 minutes will be terminated
automatically. We strongly recommend that you use the connection pooling and
always close the connection when you are finished using it so that the unused
connection will be returned to the pool. For more information about connection
pooling, see Connection Pooling.
See if any of these errors match up to your logs. Search that page for "terminated" and "busy" to find error codes that might be relevant to your issue.
I would suggest setting the maxLifetime property in HikariCP to 15 minutes, and the idleTimeout to 2 minutes.
There is nothing on the SQL Azure side that would prohibit you from using a 3rd party connection pool. My guess is that the connection failed between the server and the client and the client didn't remove the connection from the pool.
Moving forward, I'd ensure that whichever 3-rd part connection pool you end up using tests that the connection exists before taking it out of the pool for use.
Hope that helps.
After rebooting the server, the oracle connection from the Tomcat server times out every night. Prior to the reboot, the connection didn't timeout. Now, in the morning, the application throws a JDBC connection error while accessing the DB. Restarting Tomcat corrects the issue. I'm assuming that's due to the connections being re-established. I think, this is due to the Oracle DB timing out the session. How can the session timeout be disabled in Oracle 11g?
Thanks!
Steve
Config.groovy with dev and test omitted.
dataSource {
pooled = true
}
hibernate {
cache.use_second_level_cache = true
cache.use_query_cache = true
cache.provider_class = 'net.sf.ehcache.hibernate.EhCacheProvider'
}
// environment specific settings
environments {
production {
dataSource {
driverClassName = "oracle.jdbc.driver.OracleDriver"
username = "XXXXX"
password = "XXXXXX"
dialect = "org.hibernate.dialect.Oracle10gDialect"
dbCreate = "update" // one of 'create', 'create-drop','update'
url = "jdbc:oracle:thin:#XXXXXX:1521:xxxx"
}
} }
That's generally controlled by the profile associated with the user Tomcat is connecting as.
SQL> SELECT PROFILE, LIMIT FROM DBA_PROFILES WHERE RESOURCE_NAME = 'IDLE_TIME';
PROFILE LIMIT
------------------------------ ----------------------------------------
DEFAULT UNLIMITED
SQL> SELECT PROFILE FROM DBA_USERS WHERE USERNAME = USER;
PROFILE
------------------------------
DEFAULT
So the user I'm connected to has unlimited idle time - no time out.
Adam has already suggested database profiles.
You could check the SQLNET.ORA file. There's an EXPIRE_TIME parameter but this is for detecting lost connections, rather than terminating existing ones.
Given it happens overnight, it sounds more like an idle timeout, which could be down to a firewall between the app server and database server. Setting the EXPIRE_TIME may stop that happening (as there'll be check every 10 minutes to check the client is alive).
Or possibly the database is being shutdown and restarted and that is killing the connections.
Alternatively, you should be able to configure tomcat with a validationQuery so that it will automatically restart the connection without a tomcat restart
This is likely caused by your application's connection pool; not an Oracle DBMS issue. Most connection pools have a validate statement that can execute before giving you the connection. In oracle you would want "Select 1 from dual".
The reason it started occurring after you restarted the server is that the connection pool was probably added without a restart and you are just now experiencing the use of the connection pool for the first time. What is the modification dates on your resource files that deal with database connections?
Validate Query example:
<Resource name="jdbc/EmployeeDB" auth="Container"
validationQuery="Select 1 from dual" type="javax.sql.DataSource" username="dbusername" password="dbpassword"
driverClassName="org.hsql.jdbcDriver" url="jdbc:HypersonicSQL:database"
maxActive="8" maxIdle="4"/>
EDIT:
In the case of Grails, there are similar configuration options for the grails pool. Example for Grails 1.2 (see release notes for Grails 1.2)
dataSource {
pooled = true
dbCreate = "update"
url = "jdbc:mysql://localhost/yourDB"
driverClassName = "com.mysql.jdbc.Driver"
username = "yourUser"
password = "yourPassword"
properties {
maxActive = 50
maxIdle = 25
minIdle = 5
initialSize = 5
minEvictableIdleTimeMillis = 60000
timeBetweenEvictionRunsMillis = 60000
maxWait = 10000
}
}
I came to this question looking for a way to enable oracle session pool expiration based on total session lifetime instead of idle time.
Another goal is to avoid force closes unexpected to application.
It seems it's possible by setting pool validation query to
select 1 from V$SESSION
where AUDSID = userenv('SESSIONID') and sysdate-LOGON_TIME < 30/24/60
This would close sessions aging over 30 minutes in predictable manner that doesn't affect application.
Does the DB know the connection has dropped, or is the session still listed in v$session? That would indicate, I think, that it's being dropped by the network. Do you know how long it can stay idle before encountering the problem, and if that bears any resemblance to the TCP idle values (net.ipv4.tcp_keepalive_time, tcp_keepalive_probes and tcp_keepalive_interval from sysctl if I recall correctly)? Can't remember whether sysctl changes persist by default, but that might be something that was modified and then reset by the reboot.
Also you might be able to reset your JDBC connections without bouncing the whole server; certainly can in WebLogic, which I realise doesn't help much, but I'm not familiar with the Tomcat equivalents.
Check applications connection Pool settings, rather than altering any session timout settings on the oracle db. It's normal that they time out.
Have a look here:
http://grails.org/doc/1.0.x/guide/3.%20Configuration.html#3.3%20The%20DataSource
Are you sure that you have set the "pooled" parameter correctly?
Greetings,
Lars
EDIT:
Your config seems ok on first glimpse.
I came across this issue today. Maybe it is related to your pain:
"Infinite loop of exceptions if the application is started when the database is down for maintenance"