Why is Hikari Datasource leaking in this situation? - spring-boot

I use the Spring Framework (with JPA). The id in the URL was parsed to dynamically create and use a Datasource. Datasource used caffeine to cache, but it was not released from memory even after the expiration time. This caused a memory leak. After the expiration time, it is removed from the cache, but it is not released from memory. Memory leaks occurred even if the cache was removed to allow the Datasource to be used once. Is there any way to cache Datasource?
Below is part of the code.
private HikariDataSource getDataSourceRdsLocation(RdsLocationEntity rdsLocationEntity){
HikariConfig config = new HikariConfig();
config.setAllowPoolSuspension(true);
config.setJdbcUrl("jdbc:mysql://"+rdsLocationEntity.getServerReadOnly()+"/"+rdsLocationEntity.getDatabaseName());
config.setUsername("");
config.setPassword("");
config.setPoolName(rdsLocationEntity.getName());
config.setMaximumPoolSize(10);
config.setMinimumIdle(1);
config.setIdleTimeout(3600000);
config.setMaxLifetime(7200000);
config.setConnectionTimeout(500);
return new HikariDataSource(config);
}
new CaffeineCache(cache.getCacheName(), Caffeine.newBuilder().recordStats()
.removalListener((key, value, cause) -> {
if (cause.wasEvicted() && value instanceof HikariDataSource ds) {
ds.close();
}
})
.expireAfterWrite(cache.getExpiredAfter(), cache.getTimeUnit())
.maximumSize(cache.getMaximumSize())
.build()

for i in {1..1000}; do jmap -histo:live pid > out3.log ; grep org.apache.http.impl.conn.PoolingHttpClientConnectionManage out3.log; date;/bin/sleep 5; done >log5.txt | tail -f log5.txt

Related

Getting too many connection for role when using DataSource

I have a Rest service and when it gets it has to do some insertion and updation to almost 25 database. So when I tried like the below code, it was working in my localhost but when I deploy to my staging server I was getting FATAL: too many connections for role "user123"
List<String> databaseUrls = null;
databaseUrls.forEach( databaseUrl -> {
DataSource dataSource = DataSourceBuilder.create()
.driverClassName("org.postgresql.Driver")
.url(databaseUrl)
.username("user123")
.password("some-password")
.build();
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
jdbcTemplate.update("Some...Update...Query");
});
As per my understanding DataSource need not to be closed because it is never opened.
Note:
A DataSource implementation need not be closed, because it is never
“opened”. A DataSource is not a resource, is not connected to the
database, so it is not holding networking connections nor resources on
the database server. A DataSource is simply information needed when
making a connection to the database, with the database server's
network name or address, the user name, user password, and various
options you want specified when a connection is eventually made.
Can someone tell why I am getting this issue
The problem is in DataSourceBuilder, it actually creates of the connection pools which spawns some number of connections and keeps them running:
private static final String[] DATA_SOURCE_TYPE_NAMES = new String[] {
"org.apache.tomcat.jdbc.pool.DataSource",
"com.zaxxer.hikari.HikariDataSource",
"org.apache.commons.dbcp.BasicDataSource" };
Javadoc says:
/**
* Convenience class for building a {#link DataSource} with common implementations and
* properties. If Tomcat, HikariCP or Commons DBCP are on the classpath one of them will
* be selected (in that order with Tomcat first). In the interest of a uniform interface,
* and so that there can be a fallback to an embedded database if one can be detected on
* the classpath, only a small set of common configuration properties are supported. To
* inject additional properties into the result you can downcast it, or use
* <code>#ConfigurationProperties</code>.
*/
Try to use e.g. SingleConnectionDataSource, then your problem will gone:
List<String> databaseUrls = null;
Class.forName("org.postgresql.Driver");
databaseUrls.forEach( databaseUrl -> {
SingleConnectionDataSource dataSource;
try {
dataSource = new SingleConnectionDataSource(
databaseUrl, "user123", "some-password", true /*suppressClose*/);
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
jdbcTemplate.update("Some...Update...Query");
} catch (Exception e) {
log.error("Failed to run queries for {}", databaseUrl, e);
} finally {
// release resources
if (dataSource != null) {
dataSource.destroy();
}
}
});
First thing it is very bad architecture decision to have single application managing 50 database. Anyway instead of creating DataSource in for loop, you should make use of Factory Design pattern to create DataSource for each DB. You should add some connection pooling mechanism to your system . HijariCP and TomcatPool are most widely used. Analyse logs of failure thread for any further issues.

Include newly added data sources into route Data Source object without restarting the application server

Implemented Spring's AbstractRoutingDatasource by dynamically determining the actual DataSource based on the current context.
Refered this article : https://www.baeldung.com/spring-abstract-routing-data-source.
Here on spring boot application start up . Created a map of contexts to datasource objects to configure our AbstractRoutingDataSource. All these client context details are fetched from a database table.
#Bean
#DependsOn("dataSource")
#Primary
public DataSource routeDataSource() {
RoutingDataSource routeDataSource = new RoutingDataSource();
DataSource defaultDataSource = (DataSource) applicationContext.getBean("dataSource");
List<EstCredentials> credentials = LocalDataSourcesDetailsLoader.getAllCredentails(defaultDataSource); // fetching from database table
localDataSourceRegistrationBean.registerDataSourceBeans(estCredentials);
routeDataSource.setDefaultTargetDataSource(defaultDataSource);
Map<Object, Object> targetDataSources = new HashMap<>();
for (Credentials credential : credentials) {
targetDataSources.put(credential.getEstCode().toString(),
(DataSource) applicationContext.getBean(credential.getEstCode().toString()));
}
routeDataSource.setTargetDataSources(targetDataSources);
return routeDataSource;
}
The problem is if i add a new client details, I cannot get that in routeDataSource. Obvious reason is that these values are set on start up.
How can I achieve to add new client context and I had to re intialize the routeDataSource object.
Planning to write a service to get all the client context newly added and reset the routeDataSource object, no need to restart the server each time any changes in the client details.
A simple solution to this situation is adding #RefreshScope to the bean definition:
#Bean
#Primary
#RefreshScope
public DataSource routeDataSource() {
RoutingDataSource routeDataSource = new RoutingDataSource();
DataSource defaultDataSource = (DataSource) applicationContext.getBean("dataSource");
List<EstCredentials> credentials = LocalDataSourcesDetailsLoader.getAllCredentails(defaultDataSource); // fetching from database table
localDataSourceRegistrationBean.registerDataSourceBeans(estCredentials);
routeDataSource.setDefaultTargetDataSource(defaultDataSource);
Map<Object, Object> targetDataSources = new HashMap<>();
for (Credentials credential : credentials) {
targetDataSources.put(credential.getEstCode().toString(),
(DataSource) applicationContext.getBean(credential.getEstCode().toString()));
}
routeDataSource.setTargetDataSources(targetDataSources);
return routeDataSource;
}
Add Spring Boot Actuator as a dependency:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Then trigger the refresh endpoint POST to /actuator/refresh to update the DataSource (actually every refresh scoped bean).
So this will depend on how much you know about the datasources to be added, but you could set this up as a multi-tenant project. Another example of creating new datasources:
#Autowired private Map <String, Datasource> mars2DataSources;
public void addDataSourceAtRuntime() {
DataSourceBuilder dataSourcebuilder = DataSourcebuilder.create(
MultiTenantJPAConfiguration.class.getclassloader())
.driverclassName("org.postgresql.Driver")
.username("postgres")
.password("postgres")
.url("Jdbc: postgresql://localhost:5412/somedb");
mars2DataSources("tenantX", datasourcebuilder.build())
}
Given that you are using Oracle, you could also use its database change notification features.
Think of it as a listener in the JDBC driver that gets notified whenever something changes in your database table. So upon receiving a change, you could reinitialize/add datasources.
You can find a tutorial of how to do this here: https://docs.oracle.com/cd/E11882_01/java.112/e16548/dbchgnf.htm#JJDBC28820
Though, depending on your organization database notifications need some extra firewall settings for the communication to work.
Advantage: You do not need to manually call the REST Endpoint if something changes, (though Marcos Barberios answer is perfectly valid!)

Spring Boot - Reconnect to a database after its restart

I have an Spring Batch application, which runs every 10 minutes. It gets some data from a REST API and then it saves these data on a database.
Well, where is my problem now?
Sometimes the database (Oracle) may restart, or go offline (no idea, really). But the application doesn't seem to reconnect to the database. It just stays on an idle mode.
Spring Boot: 2.1.2.RELEASE
The application.yml looks like this:
app:
database:
jdbc-url: jdbc:oracle:thin:#<host>:<port>:<db>
username: <username>
password: <password>
driver-class-name: oracle.jdbc.OracleDriver
options:
show-sql: true
ddl-auto: none
dialect: org.hibernate.dialect.Oracle12cDialect
and then, I configure the DataSource like this:
public DataSource dataSource() {
HikariConfig configuration = new HikariConfig();
configuration.setJdbcUrl(properties.getJdbcUrl());
configuration.setUsername(properties.getUsername());
configuration.setPassword(properties.getPassword());
configuration.setDriverClassName(properties.getDriverClassName());
configuration.setLeakDetectionThreshold(60 * 1000);
return new HikariDataSource(configuration);
}
public LocalContainerEntityManagerFactoryBean entityManagerFactory(DataSource dataSource) {
LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
em.setDataSource(dataSource);
em.setPackagesToScan("xxx.xxx.xx");
JpaVendorAdapter vendorAdapter = new HibernateJpaVendorAdapter();
em.setJpaVendorAdapter(vendorAdapter);
Properties additionalProperties = properties();
em.setJpaProperties(additionalProperties);
return em;
}
public PlatformTransactionManager transactionManager(EntityManagerFactory emf) {
return new JpaTransactionManager(emf);
}
private Properties properties() {
Properties additionalProperties = new Properties();
additionalProperties.setProperty("hibernate.hbm2ddl.auto", properties.getOptions().getDdlAuto());
additionalProperties.setProperty("hibernate.dialect", properties.getOptions().getDialect());
additionalProperties.setProperty("hibernate.show_sql", properties.getOptions().getShowSql());
return additionalProperties;
}
To be honest, I am not really sure, if I have done anything wrong here in the configuration.
Thank you!
You should configure maxLifetime by setMaxLifetime for 30 minutes
configuration.setMaxLifetime(108000);
property controls the maximum lifetime of a connection in the pool. When a connection reaches this timeout, even if recently used, it will be retired from the pool. An in-use connection will never be retired, only when it is idle will it be removed.
We strongly recommend setting this value, and it should be at least 30 seconds less than any database or infrastructure imposed connection time limit.
by default Oracle does not enforce a max lifetime for connections

Spring Batch Slow Write and Read

I have a batch Job to read the records From SQLServer and Write into MariaDB.Even though i have implemented the concept of partition in the batch process , the process is very slow
Below is the Datasource Configuration for source and target systems.
#Bean(name = "sourceSqlServerDataSource")
public DataSource mysqlDataSource() {
HikariDataSource hikariDataSource = new HikariDataSource();
hikariDataSource.setMaximumPoolSize(100);
hikariDataSource.setUsername(username);
hikariDataSource.setPassword(password);
hikariDataSource.setJdbcUrl(jdbcUrl);
hikariDataSource.setDriverClassName(driverClassName);
hikariDataSource.setPoolName("Source-SQL-Server");
return hikariDataSource;
}
#Bean(name = "targetMySqlDataSource")
#Primary
public DataSource mysqlDataSource() {
HikariDataSource hikariDataSource = new HikariDataSource();
hikariDataSource.setMaximumPoolSize(100);
hikariDataSource.setUsername(username);
hikariDataSource.setPassword(password);
hikariDataSource.setJdbcUrl(jdbcUrl);
hikariDataSource.setDriverClassName(driverClassName);
hikariDataSource.setPoolName("Target-Myql-Server");
return hikariDataSource;
}
Below is the My Bean configured and thread pool taskexecutor
#Bean(name = "myBatchJobsThreadPollTaskExecutor")
public ThreadPoolTaskExecutor initializeThreadPoolTaskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(100);
threadPoolTaskExecutor.setMaxPoolSize(200);
threadPoolTaskExecutor.setThreadNamePrefix("My-Batch-Jobs-TaskExecutor ");
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(Boolean.TRUE);
threadPoolTaskExecutor.initialize();
log.info("Thread Pool Initialized with min {} and Max {} Pool Size",threadPoolTaskExecutor.getCorePoolSize(),threadPoolTaskExecutor.getMaxPoolSize() );
return threadPoolTaskExecutor;
}
Here are the step and partition step configured
#Bean(name = "myMainStep")
public Step myMainStep() throws Exception{
return stepBuilderFactory.get("myMainStep").chunk(500)
.reader(myJdbcReader(null,null))
.writer(myJpaWriter()).listener(chunkListener)
.build();
}
#Bean
public Step myPartitionStep() throws Exception {
return stepBuilderFactory.get("myPartitionStep").listener(myStepListener)
.partitioner(myMainStep()).partitioner("myPartition",myPartition)
.gridSize(50).taskExecutor(asyncTaskExecutor).build();
}
Updating the post with reader and writer
#Bean(name = "myJdbcReader")
#StepScope
public JdbcPagingItemReader myJdbcReader(#Value("#{stepExecutionContext[parameter1]}") Integer parameter1, #Value("#{stepExecutionContext[parameter2]}") Integer parameter2) throws Exception{
JdbcPagingItemReader jdbcPagingItemReader = new JdbcPagingItemReader();
jdbcPagingItemReader.setDataSource(myTargetDataSource);
jdbcPagingItemReader.setPageSize(500);
jdbcPagingItemReader.setRowMapper(myRowMapper());
Map<String,Object> paramaterMap=new HashMap<>();
paramaterMap.put("parameter1",parameter1);
paramaterMap.put("parameter2",parameter2);
jdbcPagingItemReader.setQueryProvider(myQueryProvider());
jdbcPagingItemReader.setParameterValues(paramaterMap);
return jdbcPagingItemReader;
}
#Bean(name = "myJpaWriter")
public ItemWriter myJpaWriter(){
JpaItemWriter<MyTargetTable> targetJpaWriter = new JpaItemWriter<>();
targetJpaWriter.setEntityManagerFactory(localContainerEntityManagerFactoryBean.getObject());
return targetJpaWriter;
}
Can some one throw light on how to increase the performance of read write using Spring batch...?
Improving the performance of such an application depends on multiple parameters (grid size, chunk size, page size, thread pool size, db connection pool size, latency between db servers and your JVM, etc). So I can't give you a precise answer to your question but I will try to provide some guide lines:
Before starting to improve performance, you need to clearly define a baseline + target. Saying "it is slow" makes no sense. Get yourself ready with at least a JVM profiler and SQL client with a query execution plan analyser. Those are required to find the performance bottle neck either on your JVM or on your Database.
Setting the grid size to 50 and using a thread pool with core size = 100 means 50 threads will be created but not used. Make sure you are using the thread pool task executor in .taskExecutor(asyncTaskExecutor) and not a SimpleAsyncTaskExecutor which does not reuse threads.
50 partitions for 250k records seems a lot to me. You will have 5000 records per partition, each partition will yield 10 transactions (since chunkSize = 500). So you will have 10 transactions x 50 partitions = 500 transactions between two databases servers and your JVM. This can be a performance issue. I would recommend to start with fewer partitions, 5 or 10 for example. Increasing concurrency does not necessarily mean increasing performance. There is always a break even point where your app will spend more time in context switching and dealing with concurrency rather than doing its business logic. Finding that point is an empirical process.
I would run any sql query outside of any Spring Batch job first to see if there is a performance issue with the query itself (query grabbing too much columns, too much records, etc) or with the db schema (missing index for example)
I would not use JPA/Hibernate for such an ETL job. Mapping data to domain objects can be expensive, especially if the O/R mapping is not optimized. Raw JDBC is usually faster in these cases.
There are a lot of other tricks like estimating an item size in memory and make sure the total chunk size in memory is < heap size to avoid unnecessary GC within a chunk, choosing the right GC algorithm for batch apps, etc but those are somehow advanced. The list of guide lines above is a good starting point IMO.
Hope this helps!

jdbc connection pool using ThreadpoolExecutor in spring boot

I have an application that runs through multiple databases and for each database runs select query on all tables and dumps it to hadoop.
My design is to create one datasource connection at a time and use the connection pool obtained to run select queries in multiple threads. Once done for this datasource, close the connection and create new one.
Here is the Async code
#Component
public class MySampleService {
private final static Logger LOGGER = Logger
.getLogger(MySampleService.class);
#Async
public Future<String> callAsync(JdbcTemplate template, String query) throws InterruptedException {
try {
jdbcTemplate.query(query);
//process the results
return new AsyncResult<String>("success");
}
catch (Exception ex){
return new AsyncResult<String>("failed");
}
}
Here is the caller
public String taskExecutor() throws InterruptedException, ExecutionException {
Future<String> asyncResult1 = mySampleService.callAsync(jdbcTemplate,query1);
Future<String> asyncResult2 = mySampleService.callAsync(jdbcTemplate,query2);
Future<String> asyncResult3 = mySampleService.callAsync(jdbcTemplate,query3);
Future<String> asyncResult4 = mySampleService.callAsync(jdbcTemplate,query4);
LOGGER.info(asyncResult1.get());
LOGGER.info(asyncResult2.get());
LOGGER.info(asyncResult3.get());
LOGGER.info( asyncResult4.get());
//now all threads finished, close the connection
jdbcTemplate.getConnection().close();
}
I am wondering if this is a right way to do it or do any exiting/optimized solution that out of box I am missing. I can't use spring-data-jpa since my queries are complex.
Thanks
Spring Boot docs:
Production database connections can also be auto-configured using a
pooling DataSource. Here’s the algorithm for choosing a specific
implementation:
We prefer the Tomcat pooling DataSource for its performance and concurrency, so if that is available we always choose it.
Otherwise, if HikariCP is available we will use it.
If neither the Tomcat pooling datasource nor HikariCP are available and if Commons DBCP is available we will use it, but we
don’t recommend it in production.
Lastly, if Commons DBCP2 is available we will use it.
If you use the spring-boot-starter-jdbc or
spring-boot-starter-data-jpa ‘starters’ you will automatically get a
dependency to tomcat-jdbc.
So you should be provided with sensible defaults.

Resources