How to increase the performance of inserting data into the database? - performance

I use PostgreSQL 9.5 (and a newest JDBC driver - 9.4.1209), JPA 2.1 (Hibernate), EJB 3.2, CDI, JSF 2.2 and Wildfly 10. I've to insert a lot of data into the database (about 1 mln - 170 mln entities). The number of entities depends on a file which user will add to the form on the page.
What is the problem?
The problem is the execution time of inserting data into the database which is very slow. The execution time is growing every calling of the flush() method. I've put the println(...) method to know how fast the execution of the flush method is. For the first ~4 times (400000 entities), I receive result of the println(...) method every ~20s. Later, the execution time of the flush method is incredibly slow and it's still growing.
Of course, if I delete the flush() and clear() methods, I receive result of the println(...) method every 1s BUT when I approach to the 3 mln entities, I also receive the exception:
java.lang.OutOfMemoryError: GC overhead limit exceeded
What have I done so far?
I've tried with the Container-Managed Transactions and the Bean-Managed Transactions as well (look at the code below).
I don't use the auto_increment feature for the PK ID. I add the IDs manually in the bean code.
I've also tried to change the number of the entities to flush (at the moment 100000).
I've tried to set the same number of the entities like in the hibernate.jdbc.batch_size property. It didn't help, the execution time was much slower.
I've tried to experiment with the properties in the persistence.xml file. For example, I added the reWriteBatchedInserts property but indeed I don't know if it could help.
The PostgreSQL is running on the SSD but the data are stored on the HDD because the data could be to big in the future. But I've tried to move my PostgreSQL data to the SSD and the result is the same, nothing has changed.
The question is: How can I increase the performance of inserting data into database?
Here's the structure of my table:
column_name | udt_name | length | is_nullable | key
---------------+-------------+--------+-------------+--------
id | int8 | | NO | PK
id_user_table | int4 | | NO | FK
starttime | timestamptz | | NO |
time | float8 | | NO |
sip | varchar | 100 | NO |
dip | varchar | 100 | NO |
sport | int4 | | YES |
dport | int4 | | YES |
proto | varchar | 50 | NO |
totbytes | int8 | | YES |
info | text | | YES |
label | varchar | 10 | NO |
Here's part of the EJB bean (first version) where I insert the data into the database:
#Stateless
public class DataDaoImpl extends GenericDaoImpl<Data> implements DataDao {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* #param list - data from the file.
* #param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entityManager.persist(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
entityManager.flush();
entityManager.clear();
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
#Inject
private EntityManager entityManager;
}
Instead of using the Container-Managed Transactions, I've tried to use Bean-Managed Transactions either (second version):
#Stateless
#TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* #param list - data from the file.
* #param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the linkedList collection.
*/
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
List<DataStoreAll> entitiesAll=new LinkedList<>();
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entitiesAll.add(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
saveDataStoreAll(entitiesAll);
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
/**
* The method commits the transaction.
*/
private void saveDataStoreAll(List<DataStoreAll> entities) throws EntityExistsException,IllegalArgumentException,TransactionRequiredException,PersistenceException,Throwable {
Iterator<DataStoreAll> iter=entities.iterator();
ut.begin();
while(iter.hasNext()){
entityManager.persist(iter.next());
iter.remove();
entityManager.flush();
entityManager.clear();
}
ut.commit();
}
#Inject
private EntityManager entityManager;
#Inject
private UserTransaction ut;
}
Here's my persistence.xml:
<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://xmlns.jcp.org/xml/ns/persistence
http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="primary">
<jta-data-source>java:/PostgresDS</jta-data-source>
<properties>
<property name="hibernate.show_sql" value="false" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.jdbc.batch_versioned_data" value="true"/>
<property name="reWriteBatchedInserts" value="true"/>
</properties>
</persistence-unit>
</persistence>
If I forgot to add something, tell me about it and I'll update my post.
Update
Here's the controller which calls DataDaoImpl#send(...):
#Named
#ViewScoped
public class DataController implements Serializable {
#PostConstruct
private void init(){
//...
}
/**
* Handle of the uploaded file.
*/
public void handleFileUpload(FileUploadEvent event){
uploadFile=event.getFile();
try(InputStream input = uploadFile.getInputstream()){
Path folder=Paths.get(System.getProperty("jboss.server.data.dir"),"upload");
if(!folder.toFile().exists()){
if(!folder.toFile().mkdirs()){
folder=Paths.get(System.getProperty("jboss.server.data.dir"));
}
}
String filename = FilenameUtils.getBaseName(uploadFile.getFileName());
String extension = FilenameUtils.getExtension(uploadFile.getFileName());
filePath = Files.createTempFile(folder, filename + "-", "." + extension);
//Save the file on the server.
Files.copy(input, filePath, StandardCopyOption.REPLACE_EXISTING);
//Add reference to the unconfirmed uploaded files list.
userFileManager.addUnconfirmedUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "Success", uploadFile.getFileName() + " was uploaded."));
} catch (IOException e) {
//...
}
}
/**
* Sending data from file to the database.
*/
public void send(){
//int idFK=...
//The model includes the data from the file and other things which I transfer to the EJB bean.
AddDataModel addDataModel=new AddDataModel();
//Setting the addDataModel fields...
try{
if(uploadFile!=null){
//Each row of the file == 1 entity.
List<String> list=new ArrayList<String>();
Stream<String> stream=Files.lines(filePath);
list=stream.collect(Collectors.toList());
addDataModel.setList(list);
}
} catch (IOException e) {
//...
}
//Sending data to the DataDaoImpl EJB bean.
if(dataDao.send(addDataModel,idFK)){
userFileManager.confirmUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "The data was saved in the database.", ""));
}
}
private static final long serialVersionUID = -7202741739427929050L;
#Inject
private DataDao dataDao;
private UserFileManager userFileManager;
private UploadedFile uploadFile;
private Path filePath;
}
Update 2
Here's the updated EJB bean where I insert the data into the database:
#Stateless
#TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* #param addDataModel - object which includes path to the uploaded file and other things which are needed.
*/
public void send(AddDataModel addDataModel){
if(handleCSV(addDataModel)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
private boolean handleCSV(AddDataModel addDataModel){
PreparedStatement ps=null;
Connection con=null;
FileInputStream fileInputStream=null;
Scanner scanner=null;
try{
con=ds.getConnection();
con.setAutoCommit(false);
ps=con.prepareStatement("insert into data_store_all "
+ "(id,id_user_table,startTime,time,sIP,dIP,sPort,dPort,proto,totBytes,info) "
+ "values(?,?,?,?,?,?,?,?,?,?,?)");
long start=0;
fileInputStream=new FileInputStream(addDataModel.getPath().toFile());
scanner=new Scanner(fileInputStream, "UTF-8");
Pattern patternRow=Pattern.compile(",");
Pattern patternPort=Pattern.compile("\\d+");
while(scanner.hasNextLine()) {
if(start!=0){
//Loading a row from the file into table.
String[] data=patternRow.split(scanner.nextLine().replaceAll("[\"]",""));
//Preparing datetime.
SimpleDateFormat simpleDateFormat=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
GregorianCalendar calendar=new GregorianCalendar();
calendar.setTime(simpleDateFormat.parse(data[1]));
calendar.set(Calendar.MILLISECOND, Integer.parseInt(Pattern.compile("\\.").split(data[1])[1])/1000);
//Preparing an entity
ps.setLong(1, start++); //id PK
ps.setInt(2, addDataModel.getIdFk()); //id FK
ps.setTimestamp(3, new Timestamp(calendar.getTime().getTime())); //datetime
ps.setDouble(4, Double.parseDouble(data[2])); //time
ps.setString(5, data[3]); //sip
ps.setString(6, data[4]); //dip
if(!data[5].equals("") && patternPort.matcher(data[5]).matches()) ps.setInt(7, Integer.parseInt(data[5])); //sport
else ps.setNull(7, java.sql.Types.INTEGER);
if(!data[6].equals("") && patternPort.matcher(data[6]).matches()) ps.setInt(8, Integer.parseInt(data[6])); //dport
else ps.setNull(8, java.sql.Types.INTEGER);
ps.setString(9, data[7]); //proto
if(!data[8].trim().equals("")) ps.setLong(10, Long.parseLong(data[8])); //len
else ps.setObject(10, null);
if(data.length==10 && !data[9].trim().equals("")) ps.setString(11, data[9]); //info
else ps.setString(11, null);
ps.addBatch();
if(start%100000==0){
System.out.println("Number of entity: "+start);
ps.executeBatch();
ps.clearParameters();
ps.clearBatch();
con.commit();
}
}
else{
start++;
scanner.nextLine();
}
}
if (scanner.ioException() != null) throw scanner.ioException();
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
} finally{
if (fileInputStream!=null)
try {
fileInputStream.close();
} catch (Throwable t2) {
CustomExceptionHandler exception=new CustomExceptionHandler(t2);
return exception.persist("DDI", "handleCSV.Finally");
}
if (scanner != null) scanner.close();
}
return true;
}
#Inject
private EntityManager entityManager;
#Resource(mappedName="java:/PostgresDS")
private DataSource ds;
}

Your problem is not necessarily the database or even hibernate, but that you are loading way too much data into memory at once. That's why you get the out of memory message and why you see the jvm struggling on the way there.
You read the file from a stream, but then push it all into memory when you create the the list of strings. Then you map that list of strings into a linked list of some sort of entity!
Instead, use the the stream to process your files in small chunks and insert the chunks into your database. A scanner based approach would look something like this:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// Talk to your database here!
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
You'll probably find the hibernate/ejb stuff works well enough after you make this change. But I think you'll find plain jdbc to be significantly faster. They say you can expect a 3x to 4x speed bump, depending. That would to make a big difference with a lot of data.
If you are talking about truly huge amounts of data then you should look into the CopyManager, that lets you load streams directly into the database. You can use the streaming apis to transform the data as it goes by.

As you are using WildFly 10, you are in a Java EE 7 environment.
Therefore you should consider using JSR-352 Batch Processing for performing your file import.
Have a look at An Overview of Batch Processing in Java EE 7.0.
This should resolve all your memory consumption and transaction issues.

Related

Order of processing REST API calls

I have a strage(for me) question to ask. I have created synchronized Service which is called by Controller:
#Controller
public class WebAppApiController {
private final WebAppService webApService;
#Autowired
WebAppApiController(WebAppService webApService){
this.webApService= webApService;
}
#Transactional
#PreAuthorize("hasAuthority('ROLE_API')")
#PostMapping(value = "/api/webapp/{projectId}")
public ResponseEntity<Status> getWebApp(#PathVariable(value = "projectId") Long id, #RequestBody WebAppRequestModel req) {
return webApService.processWebAppRequest(id, req);
}
}
Service layer is just checking if there is no duplicate in request and store it in database. Because client which is using this endpoint is making MANY requests continously it happened that before one request was validated agnist duplicate other the same was put in database - that is why I am trying to do synchronized block.
#Service
public class WebAppService {
private final static String UUID_PATTERN_TO = "[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}";
private final WebAppRepository waRepository;
#Autowired
public WebAppService(WebAppRepository waRepository){
this.waRepository= waRepository;
}
#Transactional(rollbackOn = Exception.class)
public ResponseEntity<Status> processScanWebAppRequest(Long id, WebAppScanModel webAppScanModel){
try{
synchronized (this){
Optional<WebApp> webApp=verifyForDuplicates(webAppScanModel);
if(!webApp.isPresent()){
WebApp webApp=new WebApp(webAppScanModel.getUrl())
webApp=waRepository.save(webApp);
processpropertiesOfWebApp(webApp);
return new ResonseEntity<>(HttpStatus.CREATED);
}
return new ResonseEntity<>(HttpStatus.CONFLICT);
}
} catch (NonUniqueResultException ex){
return new ResponseEntity<>(HttpStatus.PRECONDITION_FAILED);
} catch (IncorrectResultSizeDataAccessException ex){
return new ResponseEntity<>(HttpStatus.PRECONDITION_FAILED);
}
}
}
Optional<WebApp> verifyForDuplicates(WebAppScanModel webAppScanModel){
return waRepository.getWebAppByRegex(webAppScanModel.getUrl().replaceAll(UUID_PATTERN_TO,UUID_PATTERN_TO)+"$");
}
And JPA method:
#Query(value="select * from webapp wa where wa.url ~ :url", nativeQuery = true)
Optional<WebApp> getWebAppByRegex(#Param("url") String url);
processpropertiesOfWebApp method is doing further processing for given webapp which at this point should be unique.
Intended behaviour is:
when client post request contains multiple urls like:
https://testdomain.com/user/7e1c44e4-821b-4d05-bdc3-ebd43dfeae5f
https://testdomain.com/user/d398316e-fd60-45a3-b036-6d55049b44d8
https://testdomain.com/user/c604b551-101f-44c4-9eeb-d9adca2b2fe9
Only first one will be stored within database but at this moment this is not what is happening. Select from my database:
select inserted,url from webapp where url ~ 'https://testdomain.com/users/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}$';
2019-11-07 08:53:05 | https://testdomain.com/users/d398316e-fd60-45a3-b036-6d55049b44d8
2019-11-07 08:53:05 | https://testdomain.com/users/d398316e-fd60-45a3-b036-6d55049b44d8
2019-11-07 08:53:05 | https://testdomain.com/users/d398316e-fd60-45a3-b036-6d55049b44d8
(3 rows)
I will try to add unique constraint on url column but I can't imagine this will solve the problem while when UUID changes new url will be unique
Could anyone give me a hint what I am doing wrong?
Question is related with the one I asked before but not found proper solution, so I simplified my method but still no success

IllegalStateException: Cannot convert value of type 'java.sql.Timestamp' to required type 'java.time.LocalDateTime' for property

I'm working on an spring-boot/jpa/ mysql project. Now so far everything worked with DateTime objects when fetching/storing objects with the repository.
The problem has now occured when I use the Jdbc Template to execute a custom sql query.
org.springframework.beans.ConversionNotSupportedException: Failed to convert property
value of type 'java.sql.Timestamp' to required type java.time.LocalDateTime' for
property 'from_time': no matching editors or conversion strategy found
The idea is to fetch Time slots (has a start time and duration in minutes) that are overlapping with a new incoming entry.
To get back my objects I was first using a BeanPropertyMapper and then switched to a custom NestedRowMapper.
The resulting conflicting time slots I want to get look like this:
{
id: 1
comment: "i worked 60minutes"
from_time: "2018-06-16 13:00"
duration_minutes: 60
task: {
name: "My task"
...
}
}
This is the method where I run into the issue:
public List<TimeSlot> getOverlappingEntries(TimeSlot timeslot) throws SQLException {
String sql = "SELECT time_slot.comment, time_slot.from_time,"
+ "DATE_ADD(from_time, INTERVAL duration_minutes MINUTE) AS end_time, "
+ " task.name as `task.name`, task.category as `task.category` "
+ " FROM `time_slot` " + " INNER JOIN task on task.id = time_slot.task_id "
+ " WHERE person_id = ? "
+ " HAVING ? < end_time AND DATE_ADD(? ,INTERVAL ? MINUTE) > from_time;";
PreparedStatementCreator prepared = (con) -> {
PreparedStatement prep = con.prepareStatement(sql);
prep.setObject(1, timeslot.person.id);
prep.setObject(2, timeslot.from_time);
prep.setObject(3, timeslot.from_time);
prep.setObject(4, timeslot.durationMinutes);
logger.info(prep.toString());
return prep;
};
return this.connector.query(prepared, NestedRowMapper.get(TimeSlot.class));
}
Now I would imagine spring is capable of converting those objects easily. And anyway there is the simple way of timestamp.toLocalDateTime() to do so. The problem seems more how to register this as a converter service or how to fix spring-boot configuration to do so.
I tried already a custom converter service but that didn't help:
#javax.persistence.Converter
public class SqlTimestampToLocalDateTimeConverter implements Converter<Timestamp,
LocalDateTime>, AttributeConverter<Timestamp, LocalDateTime> {
#Convert
#Override
public LocalDateTime convert(Timestamp source) {
return source.toLocalDateTime();
}
#Override
public LocalDateTime convertToDatabaseColumn(Timestamp attribute) {
return attribute.toLocalDateTime();
}
#Override
public Timestamp convertToEntityAttribute(LocalDateTime dbData) {
return Timestamp.valueOf(dbData);
}
}
Also many other answers on the internet mentioned that this was already implemented with spring framework 4.x.
The dependencies in the project look like this (build.gradle):
dependencies {
compile "org.springframework.boot:spring-boot-starter-thymeleaf:2.0.2.RELEASE"
compile "org.springframework.boot:spring-boot-starter-web:2.0.2.RELEASE"
compile "org.springframework.boot:spring-boot-starter-security:2.0.2.RELEASE"
compile "org.springframework.boot:spring-boot-starter-data-jpa:2.0.2.RELEASE"
compile "mysql:mysql-connector-java:5.1.46"
compileOnly "org.springframework.boot:spring-boot-devtools:2.0.2.RELEASE"
compile 'org.springframework.data:spring-data-rest-webmvc:3.0.7.RELEASE'
compile 'com.querydsl:querydsl-jpa:4.1.4'
compile 'com.querydsl:querydsl-apt:4.1.4:jpa'
testCompile("junit:junit")
testCompile("org.springframework.boot:spring-boot-starter-test")
testCompile("org.springframework.security:spring-security-test")
}
Thank you for any hints, how to solve this!
/edit:
I think I see a possible workaround now. What I could do is just to fetch the id's of all time slots and then use the repository to fetch the actual objects with their data (also their task objects).
But that feels definitely not like the optimal solution...
This is the NestedRowMapper I use:
import org.springframework.beans.*;
import org.springframework.jdbc.core.RowMapper;
import org.springframework.jdbc.support.JdbcUtils;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
public class NestedRowMapper<T> implements RowMapper<T> {
private Class<T> mappedClass;
public static <T> NestedRowMapper<T> get(Class<T> mappedClass) {
return new NestedRowMapper<>(mappedClass);
}
public NestedRowMapper(Class<T> mappedClass) {
this.mappedClass = mappedClass;
}
#Override
public T mapRow(ResultSet rs, int rowNum) throws SQLException {
try {
T mappedObject = this.mappedClass.newInstance();;
BeanWrapper bw = PropertyAccessorFactory.forBeanPropertyAccess(mappedObject);
bw.setAutoGrowNestedPaths(true);
ResultSetMetaData meta_data = rs.getMetaData();
int columnCount = meta_data.getColumnCount();
for (int index = 1; index <= columnCount; index++) {
try {
String column = JdbcUtils.lookupColumnName(meta_data, index);
Object value = JdbcUtils.getResultSetValue(rs, index, Class.forName(meta_data
.getColumnClassName(index)));
bw.setPropertyValue(column, value);
} catch (TypeMismatchException | NotWritablePropertyException
| ClassNotFoundException e) {
e.printStackTrace();
}
}
return mappedObject;
} catch (InstantiationException | IllegalAccessException e1) {
throw new RuntimeException(e1);
}
}
}
You're on the right lines that you can define a RowMapper that tells your app what type of object each column needs to be mapped to. I would recommend trying to use JdbcTemplate.query method: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/core/JdbcTemplate.html#query-java.lang.String-java.lang.Object:A-org.springframework.jdbc.core.RowMapper-
You will need to define a RowMapper (not necessarily a NestedRowMapper, you could try ParameterizedRowMapper), then pass that into query with your SQL and WHERE conditions mapped as args.
I think the bast way to use BeanPropertyRowMapper.newInstance(TimeSlot.class) in your getOverlappingEntries method
try this on NestedRowMapper.mapRow
if (value instanceof Timestamp) value = ((Timestamp) value).toLocalDateTime();

java application crashed by suspicious jdbc memory leak

I have been working on a java application which crawls page from Internet with http-client(version4.3.3). It uses one fixedThreadPool with 5 threads,each is a loop thread .The pseudocode is following.
public class Spiderling extends Runnable{
#Override
public void run() {
while (true) {
T task = null;
try {
task = scheduler.poll();
if (task != null) {
if Ehcache contains task's config
taskConfig = Ehcache.getConfig;
else{
taskConfig = Query task config from db;//close the conn every time
put taskConfig into Ehcache
}
spider(task,taskConfig);
}
} catch (Exception e) {
e.printStackTrace();
}
}
LOG.error("spiderling is DEAD");
}
}
I am running it with following arguments -Duser.timezone=GMT+8 -server -Xms1536m -Xmx1536m -Xloggc:/home/datalord/logs/gc-2016-07-23-10-28-24.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC on a server(2 cpus,2G memory) and it crashes pretty regular about once in two or three days with no OutOfMemoryError and no JVM error log.
Here is my analysis;
I analyse the gc log with GC-EASY,the report is here. The weird thing is the Old Gen increasing slowly until the allocated max heap size,but the Full Gc has never happened even once.
I suspect it might has memory leak,so I dump the heap map using cmd jmap -dump:format=b,file=soldier.bin and using the Eclipse MAT to analyze the dump file.Here is the problem suspect which object occupies 280+ M bytes.
The class "com.mysql.jdbc.NonRegisteringDriver",
loaded by "sun.misc.Launcher$AppClassLoader # 0xa0018490", occupies 281,118,144
(68.91%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".
Keywords
com.mysql.jdbc.NonRegisteringDriver
java.util.concurrent.ConcurrentHashMap$Segment[]
sun.misc.Launcher$AppClassLoader # 0xa0018490.
I use c3p0-0.9.1.2 as mysql connection pool and mysql-connector-java-5.1.34 as jdbc connector and Ehcache-2.6.10 as memory cache.I have see all posts about 'com.mysql.jdbc.NonregisteringDriver memory leak' and still get no clue.
This problem has driven me crazy for several days, any advice or help will be appreciated!
**********************Supplementary description on 07-24****************
I use a JAVA WEB + ORM Framework called JFinal(github.com/jfinal/jfinal) which is open in github。
Here are some core code for further description about the problem.
/**
* CacheKit. Useful tool box for EhCache.
*
*/
public class CacheKit {
private static CacheManager cacheManager;
private static final Logger log = Logger.getLogger(CacheKit.class);
static void init(CacheManager cacheManager) {
CacheKit.cacheManager = cacheManager;
}
public static CacheManager getCacheManager() {
return cacheManager;
}
static Cache getOrAddCache(String cacheName) {
Cache cache = cacheManager.getCache(cacheName);
if (cache == null) {
synchronized(cacheManager) {
cache = cacheManager.getCache(cacheName);
if (cache == null) {
log.warn("Could not find cache config [" + cacheName + "], using default.");
cacheManager.addCacheIfAbsent(cacheName);
cache = cacheManager.getCache(cacheName);
log.debug("Cache [" + cacheName + "] started.");
}
}
}
return cache;
}
public static void put(String cacheName, Object key, Object value) {
getOrAddCache(cacheName).put(new Element(key, value));
}
#SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key) {
Element element = getOrAddCache(cacheName).get(key);
return element != null ? (T)element.getObjectValue() : null;
}
#SuppressWarnings("rawtypes")
public static List getKeys(String cacheName) {
return getOrAddCache(cacheName).getKeys();
}
public static void remove(String cacheName, Object key) {
getOrAddCache(cacheName).remove(key);
}
public static void removeAll(String cacheName) {
getOrAddCache(cacheName).removeAll();
}
#SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, IDataLoader dataLoader) {
Object data = get(cacheName, key);
if (data == null) {
data = dataLoader.load();
put(cacheName, key, data);
}
return (T)data;
}
#SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, Class<? extends IDataLoader> dataLoaderClass) {
Object data = get(cacheName, key);
if (data == null) {
try {
IDataLoader dataLoader = dataLoaderClass.newInstance();
data = dataLoader.load();
put(cacheName, key, data);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
return (T)data;
}
}
I use CacheKit like CacheKit.get("cfg_extract_rule_tree", extractRootId, new ExtractRuleTreeDataloader(extractRootId)). and class ExtractRuleTreeDataloader will be called if find nothing in cache by extractRootId.
public class ExtractRuleTreeDataloader implements IDataLoader {
public static final Logger LOG = LoggerFactory.getLogger(ExtractRuleTreeDataloader.class);
private int ruleTreeId;
public ExtractRuleTreeDataloader(int ruleTreeId) {
super();
this.ruleTreeId = ruleTreeId;
}
#Override
public Object load() {
List<Record> ruleTreeList = Db.find("SELECT * FROM cfg_extract_fule WHERE root_id=?", ruleTreeId);
TreeHelper<ExtractRuleNode> treeHelper = ExtractUtil.batchRecordConvertTree(ruleTreeList);//convert List<Record> to and tree
if (treeHelper.isValidTree()) {
return treeHelper.getRoot();
} else {
LOG.warn("rule tree id :{} is an error tree #end#", ruleTreeId);
return null;
}
}
As I said before, I use JFinal ORM.The Db.find method code is
public List<Record> find(String sql, Object... paras) {
Connection conn = null;
try {
conn = config.getConnection();
return find(config, conn, sql, paras);
} catch (Exception e) {
throw new ActiveRecordException(e);
} finally {
config.close(conn);
}
}
and the config close method code is
public final void close(Connection conn) {
if (threadLocal.get() == null) // in transaction if conn in threadlocal
if (conn != null)
try {conn.close();} catch (SQLException e) {throw new ActiveRecordException(e);}
}
There is no transaction in my code,so I am pretty sure the conn.close() will be called every time.
**********************more description on 07-28****************
First, I use Ehcache to store the taskConfigs in the memory. And the taskConfigs almost never change, so I want store them in the memory eternally and store them to disk if the memory can not store them all.
I use MAT to find out the GC Roots of NonRegisteringDriver, and the result is show in the following picture.
The Gc Roots of NonRegisteringDriver
But I still don't understand why the default behavior of Ehcache lead memory leak.The taskConfig is a class extends the Model class.
public class TaskConfig extends Model<TaskConfig> {
private static final long serialVersionUID = 5000070716569861947L;
public static TaskConfig DAO = new TaskConfig();
}
and the source code of Model is in this page(github.com/jfinal/jfinal/blob/jfinal-2.0/src/com/jfinal/plugin/activerecord/Model.java). And I can't find any reference (either directly or indirectly) to the connection object as #Jeremiah guessing.
Then I read the source code of NonRegisteringDriver, and don't understand why the map field connectionPhantomRefs of NonRegisteringDriver holds more than 5000 entrys of <ConnectionPhantomReference, ConnectionPhantomReference>,but find no ConnectionImpl in the queue field refQueue of NonRegisteringDriver. Because I see the cleanup code in class AbandonedConnectionCleanupThread which means it will move the ref in the NonRegisteringDriver.connectionPhantomRefs while getting abandoned connection ref from NonRegisteringDriver.refQueue.
#Override
public void run() {
threadRef = this;
while (running) {
try {
Reference<? extends ConnectionImpl> ref = NonRegisteringDriver.refQueue.remove(100);
if (ref != null) {
try {
((ConnectionPhantomReference) ref).cleanup();
} finally {
NonRegisteringDriver.connectionPhantomRefs.remove(ref);
}
}
} catch (Exception ex) {
// no where to really log this if we're static
}
}
}
Appreciate the help offered by #Jeremiah !
From the comments above I'm almost certain your memory leak is actually memory usage from EhCache. The ConcurrentHashMap you're seeing is the one backing the MemoryStore, and I'm guessing that the taskConfig holds a reference (either directly or indirectly) to the connection object, which is why it's showing in your stack.
Having eternal="true" in the default cache makes it so the inserted objects are never allowed to expire. Even without that, the timeToLive and timeToIdle values default to an infinite lifetime!
Combine that with the default behavior of Ehcache when retrieving elements is to copy them (last I checked), through serialization! You're just stacking new Object references up each time the taskConfig is extracted and put back into the ehcache.
The best way to test this (in my opinion) is to change your default cache configuration. Change eternal to false, and implement a timeToIdle value. timeToIdle is a time (in seconds) that a value may exist in the cache without being accessed.
<ehcache> <diskStore path="java.io.tmpdir"/> <defaultCache maxElementsInMemory="10000" eternal="false" timeToIdle="120" overflowToDisk="true" diskPersistent="false" diskExpiryThreadIntervalSeconds="120"/>
If that works, then you may want to look into further tweaking your ehcache configuration settings, or providing a more customized cache reference other than default for your class.
There are multiple performance considerations when tweaking the ehcache. I'm sure that there is a better configuration for your business model. The Ehcache documentation is good, but I found the site to be a bit scattered when I was trying to figure it out. I've listed some links that I found useful below.
http://www.ehcache.org/documentation/2.8/configuration/cache-size.html
http://www.ehcache.org/documentation/2.8/configuration/configuration.html
http://www.ehcache.org/documentation/2.8/apis/cache-eviction-algorithms.html#provided-memorystore-eviction-algorithms
Good luck!
To test your memory leak try the following:
Insert a TaskConfig into ehcache
Immediately retrieve it back out of the cache.
output the value of TaskConfig1.equals(TaskConfig2);
If it returns false, that is your memory leak. Override equals and
hash in your TaskConfig Object and rerun the test.
The root cause of the java program is that the Linux OS runs out of memory and the OOM Killer kills the progresses.
I found the log in /var/log/messages like following.
Aug 3 07:24:03 iZ233tupyzzZ kernel: Out of memory: Kill process 17308 (java) score 890 or sacrifice child
Aug 3 07:24:03 iZ233tupyzzZ kernel: Killed process 17308, UID 0, (java) total-vm:2925160kB, anon-rss:1764648kB, file-rss:248kB
Aug 3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug 3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) cpuset=/ mems_allowed=0
Aug 3 07:24:03 iZ233tupyzzZ kernel: Pid: 6721, comm: Thread (pooled) Not tainted 2.6.32-431.23.3.el6.x86_64 #1
I also find the default value of maxIdleTime is 20 seconds in the C3p0Plugin which is a c3p0 plugin in JFinal, So I think this is why the Object NonRegisteringDriver occupies 280+ M bytes that shown in the MAT report. So I set the maxIdleTime to 3600 seconds and the object NonRegisteringDriver is no longer suspicious in the MAT report.
And I reset the jvm argements to -Xms512m -Xmx512m. And the java program already has been running pretty well for several days. The Full Gc will be called as expected when the Old Gen is full.

Getting reasonable performance from a parameterized query in Spring JDBC template

I am trying to execute a very simple query from Spring JDBCTemplate. I am retrieving one attribute from a record that is identified by primary key. The entirely of the code is shown below. When I do this with a query constructed by concatenation (dangerous and ugly, and currently uncommented) it executes in 0.1 second. When I change my comments and use the parameterized query it executes in 50 seconds. I would much prefer to get the protection that comes with the parameterized query, however 50 seconds seems like a steep price to pay. Any hints how this could be made more reasonable.
public class JdbcEventDaoImpl {
private static JdbcTemplate jtemp;
private static PreparedStatement getJsonStatement;
private static final Logger logger = LoggerFactory.getLogger(JdbcEventDaoImpl.class);
#Autowired
public void setDataSource(DataSource dataSource) {
JdbcEventDaoImpl.jtemp = new JdbcTemplate(dataSource);
}
public String getJdbcForPosting(String aggregationId){
try {
return (String) JdbcEventDaoImpl.jtemp.queryForObject("select PostingJson from PostingCollection where AggregationId = '" + aggregationId + "'", String.class);
//return (String) JdbcEventDaoImpl.jtemp.queryForObject("select PostingJson from PostingCollection where AggregationId = ?", aggregationId, String.class);
} catch (EmptyResultDataAccessException e){
return "Not Available";
}
}
}

Read Application Object from GemFire using Spring Data GemFire. Data stored using SpringXD's gemfire-json-server

I'm using the gemfire-json-server module in SpringXD to populate a GemFire grid with json representation of “Order” objects. I understand the gemfire-json-server module saves data in Pdx form in GemFire. I’d like to read the contents of the GemFire grid into an “Order” object in my application. I get a ClassCastException that reads:
java.lang.ClassCastException: com.gemstone.gemfire.pdx.internal.PdxInstanceImpl cannot be cast to org.apache.geode.demo.cc.model.Order
I’m using the Spring Data GemFire libraries to read contents of the cluster. The code snippet to read the contents of the Grid follows:
public interface OrderRepository extends GemfireRepository<Order, String>{
Order findByTransactionId(String transactionId);
}
How can I use Spring Data GemFire to convert data read from the GemFire cluster into an Order object?
Note: The data was initially stored in GemFire using SpringXD's gemfire-json-server-module
Still waiting to hear back from the GemFire PDX engineering team, specifically on Region.get(key), but, interestingly enough if you annotate your application domain object with...
#JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, include = JsonTypeInfo.As.PROPERTY, property = "#type")
public class Order ... {
...
}
This works!
Under-the-hood I knew the GemFire JSONFormatter class (see here) used Jackson's API to un/marshal (de/serialize) JSON data to and from PDX.
However, the orderRepository.findOne(ID) and ordersRegion.get(key) still do not function as I would expect. See updated test class below for more details.
Will report back again when I have more information.
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(classes = GemFireConfiguration.class)
#SuppressWarnings("unused")
public class JsonToPdxToObjectDataAccessIntegrationTest {
protected static final AtomicLong ID_SEQUENCE = new AtomicLong(0l);
private Order amazon;
private Order bestBuy;
private Order target;
private Order walmart;
#Autowired
private OrderRepository orderRepository;
#Resource(name = "Orders")
private com.gemstone.gemfire.cache.Region<Long, Object> orders;
protected Order createOrder(String name) {
return createOrder(ID_SEQUENCE.incrementAndGet(), name);
}
protected Order createOrder(Long id, String name) {
return new Order(id, name);
}
protected <T> T fromPdx(Object pdxInstance, Class<T> toType) {
try {
if (pdxInstance == null) {
return null;
}
else if (toType.isInstance(pdxInstance)) {
return toType.cast(pdxInstance);
}
else if (pdxInstance instanceof PdxInstance) {
return new ObjectMapper().readValue(JSONFormatter.toJSON(((PdxInstance) pdxInstance)), toType);
}
else {
throw new IllegalArgumentException(String.format("Expected object of type PdxInstance; but was (%1$s)",
pdxInstance.getClass().getName()));
}
}
catch (IOException e) {
throw new RuntimeException(String.format("Failed to convert PDX to object of type (%1$s)", toType), e);
}
}
protected void log(Object value) {
System.out.printf("Object of Type (%1$s) has Value (%2$s)", ObjectUtils.nullSafeClassName(value), value);
}
protected Order put(Order order) {
Object existingOrder = orders.putIfAbsent(order.getTransactionId(), toPdx(order));
return (existingOrder != null ? fromPdx(existingOrder, Order.class) : order);
}
protected PdxInstance toPdx(Object obj) {
try {
return JSONFormatter.fromJSON(new ObjectMapper().writeValueAsString(obj));
}
catch (JsonProcessingException e) {
throw new RuntimeException(String.format("Failed to convert object (%1$s) to JSON", obj), e);
}
}
#Before
public void setup() {
amazon = put(createOrder("Amazon Order"));
bestBuy = put(createOrder("BestBuy Order"));
target = put(createOrder("Target Order"));
walmart = put(createOrder("Wal-Mart Order"));
}
#Test
public void regionGet() {
assertThat((Order) orders.get(amazon.getTransactionId()), is(equalTo(amazon)));
}
#Test
public void repositoryFindOneMethod() {
log(orderRepository.findOne(target.getTransactionId()));
assertThat(orderRepository.findOne(target.getTransactionId()), is(equalTo(target)));
}
#Test
public void repositoryQueryMethod() {
assertThat(orderRepository.findByTransactionId(amazon.getTransactionId()), is(equalTo(amazon)));
assertThat(orderRepository.findByTransactionId(bestBuy.getTransactionId()), is(equalTo(bestBuy)));
assertThat(orderRepository.findByTransactionId(target.getTransactionId()), is(equalTo(target)));
assertThat(orderRepository.findByTransactionId(walmart.getTransactionId()), is(equalTo(walmart)));
}
#Region("Orders")
#JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, include = JsonTypeInfo.As.PROPERTY, property = "#type")
public static class Order implements PdxSerializable {
protected static final OrderPdxSerializer pdxSerializer = new OrderPdxSerializer();
#Id
private Long transactionId;
private String name;
public Order() {
}
public Order(Long transactionId) {
this.transactionId = transactionId;
}
public Order(Long transactionId, String name) {
this.transactionId = transactionId;
this.name = name;
}
public String getName() {
return name;
}
public void setName(final String name) {
this.name = name;
}
public Long getTransactionId() {
return transactionId;
}
public void setTransactionId(final Long transactionId) {
this.transactionId = transactionId;
}
#Override
public void fromData(PdxReader reader) {
Order order = (Order) pdxSerializer.fromData(Order.class, reader);
if (order != null) {
this.transactionId = order.getTransactionId();
this.name = order.getName();
}
}
#Override
public void toData(PdxWriter writer) {
pdxSerializer.toData(this, writer);
}
#Override
public boolean equals(Object obj) {
if (obj == this) {
return true;
}
if (!(obj instanceof Order)) {
return false;
}
Order that = (Order) obj;
return ObjectUtils.nullSafeEquals(this.getTransactionId(), that.getTransactionId());
}
#Override
public int hashCode() {
int hashValue = 17;
hashValue = 37 * hashValue + ObjectUtils.nullSafeHashCode(getTransactionId());
return hashValue;
}
#Override
public String toString() {
return String.format("{ #type = %1$s, id = %2$d, name = %3$s }",
getClass().getName(), getTransactionId(), getName());
}
}
public static class OrderPdxSerializer implements PdxSerializer {
#Override
public Object fromData(Class<?> type, PdxReader in) {
if (Order.class.equals(type)) {
return new Order(in.readLong("transactionId"), in.readString("name"));
}
return null;
}
#Override
public boolean toData(Object obj, PdxWriter out) {
if (obj instanceof Order) {
Order order = (Order) obj;
out.writeLong("transactionId", order.getTransactionId());
out.writeString("name", order.getName());
return true;
}
return false;
}
}
public interface OrderRepository extends GemfireRepository<Order, Long> {
Order findByTransactionId(Long transactionId);
}
#Configuration
protected static class GemFireConfiguration {
#Bean
public Properties gemfireProperties() {
Properties gemfireProperties = new Properties();
gemfireProperties.setProperty("name", JsonToPdxToObjectDataAccessIntegrationTest.class.getSimpleName());
gemfireProperties.setProperty("mcast-port", "0");
gemfireProperties.setProperty("log-level", "warning");
return gemfireProperties;
}
#Bean
public CacheFactoryBean gemfireCache(Properties gemfireProperties) {
CacheFactoryBean cacheFactoryBean = new CacheFactoryBean();
cacheFactoryBean.setProperties(gemfireProperties);
//cacheFactoryBean.setPdxSerializer(new MappingPdxSerializer());
cacheFactoryBean.setPdxSerializer(new OrderPdxSerializer());
cacheFactoryBean.setPdxReadSerialized(false);
return cacheFactoryBean;
}
#Bean(name = "Orders")
public PartitionedRegionFactoryBean ordersRegion(Cache gemfireCache) {
PartitionedRegionFactoryBean regionFactoryBean = new PartitionedRegionFactoryBean();
regionFactoryBean.setCache(gemfireCache);
regionFactoryBean.setName("Orders");
regionFactoryBean.setPersistent(false);
return regionFactoryBean;
}
#Bean
public GemfireRepositoryFactoryBean orderRepository() {
GemfireRepositoryFactoryBean<OrderRepository, Order, Long> repositoryFactoryBean =
new GemfireRepositoryFactoryBean<>();
repositoryFactoryBean.setRepositoryInterface(OrderRepository.class);
return repositoryFactoryBean;
}
}
}
So, as you are aware, GemFire (and by extension, Apache Geode) stores JSON in PDX format (as a PdxInstance). This is so GemFire can interoperate with many different language-based clients (native C++/C#, web-oriented (JavaScript, Pyhton, Ruby, etc) using the Developer REST API, in addition to Java) and also to be able to use OQL to query the JSON data.
After a bit of experimentation, I am surprised GemFire is not behaving as I would expect. I created an example, self-contained test class (i.e. no Spring XD, of course) that simulates your use case... essentially storing JSON data in GemFire as PDX and then attempting to read the data back out as the Order application domain object type using the Repository abstraction, logical enough.
Given the use of the Repository abstraction and implementation from Spring Data GemFire, the infrastructure will attempt to access the application domain object based on the Repository generic type parameter (in this case "Order" from the "OrderRepository" definition).
However, the data is stored in PDX, so now what?
No matter, Spring Data GemFire provides the MappingPdxSerializer class to convert PDX instances back to application domain objects using the same "mapping meta-data" that the Repository infrastructure uses. Cool, so I plug that in...
#Bean
public CacheFactoryBean gemfireCache(Properties gemfireProperties) {
CacheFactoryBean cacheFactoryBean = new CacheFactoryBean();
cacheFactoryBean.setProperties(gemfireProperties);
cacheFactoryBean.setPdxSerializer(new MappingPdxSerializer());
cacheFactoryBean.setPdxReadSerialized(false);
return cacheFactoryBean;
}
You will also notice, I set the PDX 'read-serialized' property (cacheFactoryBean.setPdxReadSerialized(false);) to false in order to ensure data access operations return the domain object and not the PDX instance.
However, this had no affect on the query method. In fact, it had no affect on the following operations either...
orderRepository.findOne(amazonOrder.getTransactionId());
ordersRegion.get(amazonOrder.getTransactionId());
Both calls returned a PdxInstance. Note, the implementation of OrderRepository.findOne(..) is based on SimpleGemfireRepository.findOne(key), which uses GemfireTemplate.get(key), which just performs Region.get(key), and so is effectively the same as (ordersRegion.get(amazonOrder.getTransactionId();). The outcome should not be, especially with Region.get() and read-serialized set to false.
With the OQL query (SELECT * FROM /Orders WHERE transactionId = $1) generated from the findByTransactionId(String id), the Repository infrastructure has a bit less control over what the GemFire query engine will return based on what the caller (OrderRepository) expects (based on the generic type parameter), so running OQL statements could potentially behave differently than direct Region access using get.
Next, I went onto try modifying the Order type to implement PdxSerializable, to handle the conversion during data access operations (direct Region access with get, OQL, or otherwise). This had no affect.
So, I tried to implement a custom PdxSerializer for Order objects. This had no affect either.
The only thing I can conclude at this point is something is getting lost in translation between Order -> JSON -> PDX and then from PDX -> Order. Seemingly, GemFire needs additional type meta-data required by PDX (something like #JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, include = JsonTypeInfo.As.PROPERTY, property = "#type") in the JSON data that PDXFormatter recognizes, though I am not certain it does.
Note, in my test class, I used Jackson's ObjectMapper to serialize the Order to JSON and then GemFire's JSONFormatter to serialize the JSON to PDX, which I suspect Spring XD is doing similarly under-the-hood. In fact, Spring XD uses Spring Data GemFire and is most likely using the JSON Region Auto Proxy support. That is exactly what SDG's JSONRegionAdvice object does (see here).
Anyway, I have an inquiry out to the rest of the GemFire engineering team. There are also things that could be done in Spring Data GemFire to ensure the PDX data is converted, such as making use of the MappingPdxSerializer directly to convert the data automatically on behalf of the caller if the data is indeed of type PdxInstance. Similar to how JSON Region Auto Proxying works, you could write AOP interceptor for the Orders Region to automagicaly convert PDX to an Order.
Though, I don't think any of this should be necessary as GemFire should be doing the right thing in this case. Sorry I don't have a better answer right now. Let's see what I find out.
Cheers and stay tuned!
See subsequent post for test code.

Resources