Get region statistics in spring data geode client application - caching

I need to get the region statistics on apache geode client-cache application.
Setup:
1 locator
1 server
1 client-cache app
All the modules are created using spring.
Cache server will create regions based on the cache.xml
Cache.xml:
<?xml version="1.0" encoding="UTF-8"?>
<cache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://geode.apache.org/schema/cache"
xsi:schemaLocation="http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd"
version="1.0" lock-lease="120" lock-timeout="60" search-timeout="300"
is-server="true" copy-on-read="false">
<pdx read-serialized="true" persistent="true">
<pdx-serializer>
<class-name>
org.apache.geode.pdx.ReflectionBasedAutoSerializer
</class-name>
</pdx-serializer>
</pdx>
<region name="track" refid="PARTITION_PERSISTENT_OVERFLOW">
<region-attributes statistics-enabled="true">
<compressor>
<class-name>org.apache.geode.compression.SnappyCompressor</class-name>
</compressor>
</region-attributes>
<index name="trackKeyIndex" from-clause="/track" expression="key" key-index="true"/>
<index name="trackTransactionNameIndex" from-clause="/track" expression="transactions[*]"/>
</region>
</cache>
Cache-server application
#SpringBootApplication
#org.springframework.data.gemfire.config.annotation.CacheServerApplication(name = "cacheServer", locators = "localhost[10334]")
#EnableClusterAware
#EnableCompression
#EnableStatistics
#EnableGemFireProperties(cacheXmlFile = "cache.xml")
public class CacheServerApplication {
public static void main(String[] args) {
SpringApplication.run(CacheServerApplication.class, args);
}
}
Client-cache application
#SpringBootApplication
#ClientCacheApplication
#EnableClusterDefinedRegions //Fetch cluster defined regions for #Resource autowired prop
#EnableStatistics
public class GeodeClientApplication {
public static void main(String[] args) {
SpringApplication.run(GeodeClientApplication.class, args);
}
}
Component class in client-cache to fetch region statistics.
#Component
public class TrackedInsightsCacheService {
private static Logger logger = LoggerFactory.getLogger(TrackedInsightsCacheService.class);
#Autowired
#Resource(name = "track")
private Region trackRegion;
public Object getRegionStatistics(){
RegionAttributes attributes = trackRegion.getAttributes();
if(attributes.getStatisticsEnabled()) {
return trackRegion.getStatistics();
}
return null;
}
public Object get(String key) {
return trackRegion.get(key);
}
public void put(String key, String value){
trackRegion.put(key, value);
}
}
Autowired TrackRegion is LocalRegion. Whenever I do a get call, it first checks the local region then checks the key on server region.
But When I do the getStatistics call, it says statistics are disabled for the region.
What am I missing here?
Is this the proper way to get region statistics.
I'm able to get the cluster statistics through the gfsh command line and output is something like this,
gfsh>show metrics
Cluster-wide Metrics
Category | Metric | Value
--------- | --------------------- | -----
cluster | totalHeapSize | 4846
cache | totalRegionEntryCount | 1
| totalRegionCount | 1
| totalMissCount | 81
| totalHitCount | 15
diskstore | totalDiskUsage | 0
| diskReadsRate | 0.0
| diskWritesRate | 0.0
| flushTimeAvgLatency | 0
| totalBackupInProgress | 0
query | activeCQCount | 0
| queryRequestRate | 0.0
I have multiple regions in the setup and looking at the cluster wise statistics is not sufficient, so looking for getting region-wise metrics data.

The getStatistics() method returns the statistics for the actual Region instance. Since you're executing this method on the client side, the actual statistics returned will be for the local client Region, which is not what you want.
The gfsh show metrics command actually retrieves the region statistics using JMX, you could check the source code and adapt it to your needs here.
Another option, if you don't want to use JMX, would be to write a custom Geode Function and manually retrieve the statistics you're looking for through the usage of the StatisticsManager.
Cheers.

Related

Spring configuration / properties different per request

I need to figure out if the following scenario is possible in Spring.
If we have different services / databases per region, can Spring facilitate directing calls to those services / databases per request from a single deployment? To give an example, all requests from user X will be directed to services / databases in the EAST region while all requests from user Y will be directed to services / databases in the WEST region.
Obviously connections to each database will use connection pooling, so the configuration will need to differ, not just properties. When other services are initialized, there is authentication done, so it's not just about databases connections.
This being Spring, I'd like to avoid having to pass implementations around. Can I direct Spring to use a specific configuration per request? Is there a better way to accomplish this?
-- Edit --
Technically it can be done like this, though this isn't exactly easily maintainable.
#Configuration
#PropertySource("classpath:region1.properties")
public class TestIndependentConfigurationRegion1Configuration {
#Bean
public String sampleServiceUrl(#Value("${sample.service.url}") String value) {
return value;
}
#Bean
public TestIndependentConfigurationSampleService testSampleService() {
return new TestIndependentConfigurationSampleService();
}
}
#Configuration
#PropertySource("classpath:region2.properties")
public class TestIndependentConfigurationRegion2Configuration {
#Bean
public String sampleServiceUrl(#Value("${sample.service.url}") String value) {
return value;
}
#Bean
public TestIndependentConfigurationSampleService testSampleService() {
return new TestIndependentConfigurationSampleService();
}
}
#Controller
public class TestIndependentConfigurationController {
protected ApplicationContext testRegion1ApplicationContext = new AnnotationConfigApplicationContext(TestIndependentConfigurationRegion1Configuration.class);
protected ApplicationContext testRegion2ApplicationContext = new AnnotationConfigApplicationContext(TestIndependentConfigurationRegion2Configuration.class);
#RequestMapping("/sample/service")
#ResponseBody
public String testSampleService() {
TestIndependentConfigurationSampleService testSampleService = null;
if(/* region 1 */) {
testSampleService = (TestIndependentConfigurationSampleService) testRegion1ApplicationContext.getBean("testSampleService");
}
if(/* region 2 */) {
testSampleService = (TestIndependentConfigurationSampleService) testRegion2ApplicationContext.getBean("testSampleService");
}
testSampleService.executeSampleService();
return "SUCCESS";
}
}
I don't think you can do that with properties. BUT, you should look at (netflix) ribbon client that is integrated with spring. Some of the ribbon's features allow you to load balance request's between regions. You could customize the ribbon client to do what you want.
Some readings here :
https://cloud.spring.io/spring-cloud-netflix/multi/multi_spring-cloud-ribbon.html

Cucumber Spring Framework to automate Google Calculator query

I am creating a Spring Framework to automate the Google Calculator
I have a feature file that has some values as defined below
Feature: Google Calculator
Calculator should calculate correct calculations
Scenario Outline: Add numbers
Given I am on google calculator page
When I add number "<number1>" to number "<number2>"
Then I should get an answer of "<answer>"
Examples:
| number1 | number2 | answer |
| 1 | 2 | 3 |
| 4 | 5 | 9 |
I am trying to use the Given, When , Then to create a test that any number from this feature file could be used in the Calculator
My Steps are as follows:
#Scope("test")
#ContextConfiguration("classpath:spring-context/test-context.xml")
public class GivenSteps {
#Autowired
private WebDriver webDriver;
#Given("^I am on google calculator page$")
public void iAmOnGoogleCalculatorPage() throws Throwable {
webDriver.get("https://www.google.ie/search?q=calculator");
}
#When("^I add number \"([^\"]*)\" to number \"([^\"]*)\"$")
public void i_add_number_to_number(Integer number1, Integer number2) throws Throwable {
WebElement googleTextBox = webDriver.findElement(By.id("cwtltblr"));
googleTextBox.sendKeys(Keys.ENTER);
throw new PendingException();
}
#Then("^I should get the correct answer again$")
public void thecorrectanswertest2() throws Throwable{
WebElement calculatorTextBox = webDriver.findElement(By.id("cwtltblr"));
String result = calculatorTextBox.getText();
}}
My question is how do I code the piece where an number can be choosen and an answer verified from the table in the feature ?
Did you try to use the #Then like as below to compare the answer from table?-
#Then("^I should get the correct answer \"([^\"]*)\" again$")
public void thecorrectanswertest2(String answer) throws Throwable{
WebElement calculatorTextBox = webDriver.findElement(By.id("cwtltblr"));
String result = calculatorTextBox.getText();
if(answer.equalsIgnoreCase(result))
System.out.println("Test Passed");
}
Try this once. It should work

How do I connect a Spring Boot JAR to a remote Oracle database?

I have an application, using Spring 4.3.6 and Spring Boot 1.4.4, that is able to connect to an Oracle database via JNDI connection when deployed as a WAR to a WebLogic 12c server.
I now need to create a modified version of my existing project that can be exported as a standalone JAR with an embedded Tomcat server. How do I connect to the same database from within the JAR?
This is my current Eclipse directory for the current WAR application (with classpath src):
WAR Project
| src
| | main.java
| | | controllers
| | | | BasicController.java
| | | | CrudController.java
| | | Application.java
| | META-INF
| | | resources
| | | | form.html
| JRE System Library [JDK 1.7]
| Referenced Libraries
| lib
| | compile
| | runtime
| resources
| | application.properties
| target
| WEB-INF
| | classes
| | weblogic.xml
| build_war.xml
build_war.xml is the Ant build file for exporting the application to WAR. form.html is a static web page.
This is my existing code:
#SpringBootApplication
public class Application extends SpringBootServletInitializer implements WebApplicationInitializer {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
#Override
protected SpringApplicationBuilder configure(SpringApplicationBuilder application) {
return application.sources(Application.class);
}
}
#Controller
public class BasicController {
#RequestMapping("/")
public String goToForm() {
return "form.html";
}
}
#RestController
public class CrudController {
#Autowired
private JdbcTemplate jdbcTemplate;
#PostMapping("/result")
public String sampleQuery(#RequestParam String tableName, #RequestParam String colNameSet,
#RequestParam String valueSet) {
String query = "INSERT INTO " + tableName + " (" + colNameSet + ") VALUES " + valueSet;
try {
jdbcTemplate.update(query);
} catch (Exception e) {
e.printStackTrace();
query = e.toString();
}
return query;
}
}
There is only one line in application.properties:
spring.datasource.jndi-name=database.jndi.name
The database URL is jdbc:oracle:thin#ip-address:port-number:orcl.
The application can connect to and update the database successfully as a WAR. What do I need to change to connect to the same database as a standalone JAR?
I cannot find any any references or tutorials on Google mentioning anything relevant to my problem. Please kindly walk me through what exactly I need to modify and how. Thanks!
Edit:
To add more information about this application: My Oracle database contains a table PEOPLE with the following columns:
ID INT NOT NULL,
NAME VARCHAR(20) NOT NULL,
AGE INT NOT NULL,
PRIMARY KEY (ID)
form.html submits a POST request to sampleQuery() which then submits a database query from the form inputs.
When the application is deployed as WAR and connects to the database via JNDI, the database query is executed successfully.
However, after modifying application.properties as per shi's answer:
spring.datasource.url=jdbc:oracle:thin:#ip-address:port:orcl
spring.datasource.username=user-name
spring.datasource.password=password
The following error is thrown when I run as Java Application within Eclipse:
org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [INSERT INTO PEOPLE (ID,NAME,AGE) VALUES ('2','Momo','21')]; nested exception is java.sql.SQLSyntaxErrorException: user lacks privilege or object not found: PEOPLE
at org.springframework.jdbc.support.SQLExceptionSubclassTranslator.doTranslate(SQLExceptionSubclassTranslator.java:91)
...
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLSyntaxErrorException: user lacks privilege or object not found: PEOPLE
...
at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
...
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:408)
... 53 more
Caused by: org.hsqldb.HsqlException: user lacks privilege or object not found: PEOPLE
at org.hsqldb.error.Error.error(Unknown Source)
...
at org.hsqldb.Session.execute(Unknown Source)
... 58 more
What happened?
DataSource configuration is controlled by external configuration properties in spring.datasource.*. For example, you might declare the following section in application.properties:
spring.datasource.url=jdbc:thin://url:port/service
spring.datasource.username=dbuser
spring.datasource.password=dbpass
spring.datasource.driver-class-name=com.Oracle.OracleDriver
Reference: https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-sql.html

How to increase the performance of inserting data into the database?

I use PostgreSQL 9.5 (and a newest JDBC driver - 9.4.1209), JPA 2.1 (Hibernate), EJB 3.2, CDI, JSF 2.2 and Wildfly 10. I've to insert a lot of data into the database (about 1 mln - 170 mln entities). The number of entities depends on a file which user will add to the form on the page.
What is the problem?
The problem is the execution time of inserting data into the database which is very slow. The execution time is growing every calling of the flush() method. I've put the println(...) method to know how fast the execution of the flush method is. For the first ~4 times (400000 entities), I receive result of the println(...) method every ~20s. Later, the execution time of the flush method is incredibly slow and it's still growing.
Of course, if I delete the flush() and clear() methods, I receive result of the println(...) method every 1s BUT when I approach to the 3 mln entities, I also receive the exception:
java.lang.OutOfMemoryError: GC overhead limit exceeded
What have I done so far?
I've tried with the Container-Managed Transactions and the Bean-Managed Transactions as well (look at the code below).
I don't use the auto_increment feature for the PK ID. I add the IDs manually in the bean code.
I've also tried to change the number of the entities to flush (at the moment 100000).
I've tried to set the same number of the entities like in the hibernate.jdbc.batch_size property. It didn't help, the execution time was much slower.
I've tried to experiment with the properties in the persistence.xml file. For example, I added the reWriteBatchedInserts property but indeed I don't know if it could help.
The PostgreSQL is running on the SSD but the data are stored on the HDD because the data could be to big in the future. But I've tried to move my PostgreSQL data to the SSD and the result is the same, nothing has changed.
The question is: How can I increase the performance of inserting data into database?
Here's the structure of my table:
column_name | udt_name | length | is_nullable | key
---------------+-------------+--------+-------------+--------
id | int8 | | NO | PK
id_user_table | int4 | | NO | FK
starttime | timestamptz | | NO |
time | float8 | | NO |
sip | varchar | 100 | NO |
dip | varchar | 100 | NO |
sport | int4 | | YES |
dport | int4 | | YES |
proto | varchar | 50 | NO |
totbytes | int8 | | YES |
info | text | | YES |
label | varchar | 10 | NO |
Here's part of the EJB bean (first version) where I insert the data into the database:
#Stateless
public class DataDaoImpl extends GenericDaoImpl<Data> implements DataDao {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* #param list - data from the file.
* #param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entityManager.persist(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
entityManager.flush();
entityManager.clear();
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
#Inject
private EntityManager entityManager;
}
Instead of using the Container-Managed Transactions, I've tried to use Bean-Managed Transactions either (second version):
#Stateless
#TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* #param list - data from the file.
* #param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the linkedList collection.
*/
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
List<DataStoreAll> entitiesAll=new LinkedList<>();
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entitiesAll.add(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
saveDataStoreAll(entitiesAll);
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
/**
* The method commits the transaction.
*/
private void saveDataStoreAll(List<DataStoreAll> entities) throws EntityExistsException,IllegalArgumentException,TransactionRequiredException,PersistenceException,Throwable {
Iterator<DataStoreAll> iter=entities.iterator();
ut.begin();
while(iter.hasNext()){
entityManager.persist(iter.next());
iter.remove();
entityManager.flush();
entityManager.clear();
}
ut.commit();
}
#Inject
private EntityManager entityManager;
#Inject
private UserTransaction ut;
}
Here's my persistence.xml:
<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://xmlns.jcp.org/xml/ns/persistence
http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="primary">
<jta-data-source>java:/PostgresDS</jta-data-source>
<properties>
<property name="hibernate.show_sql" value="false" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.jdbc.batch_versioned_data" value="true"/>
<property name="reWriteBatchedInserts" value="true"/>
</properties>
</persistence-unit>
</persistence>
If I forgot to add something, tell me about it and I'll update my post.
Update
Here's the controller which calls DataDaoImpl#send(...):
#Named
#ViewScoped
public class DataController implements Serializable {
#PostConstruct
private void init(){
//...
}
/**
* Handle of the uploaded file.
*/
public void handleFileUpload(FileUploadEvent event){
uploadFile=event.getFile();
try(InputStream input = uploadFile.getInputstream()){
Path folder=Paths.get(System.getProperty("jboss.server.data.dir"),"upload");
if(!folder.toFile().exists()){
if(!folder.toFile().mkdirs()){
folder=Paths.get(System.getProperty("jboss.server.data.dir"));
}
}
String filename = FilenameUtils.getBaseName(uploadFile.getFileName());
String extension = FilenameUtils.getExtension(uploadFile.getFileName());
filePath = Files.createTempFile(folder, filename + "-", "." + extension);
//Save the file on the server.
Files.copy(input, filePath, StandardCopyOption.REPLACE_EXISTING);
//Add reference to the unconfirmed uploaded files list.
userFileManager.addUnconfirmedUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "Success", uploadFile.getFileName() + " was uploaded."));
} catch (IOException e) {
//...
}
}
/**
* Sending data from file to the database.
*/
public void send(){
//int idFK=...
//The model includes the data from the file and other things which I transfer to the EJB bean.
AddDataModel addDataModel=new AddDataModel();
//Setting the addDataModel fields...
try{
if(uploadFile!=null){
//Each row of the file == 1 entity.
List<String> list=new ArrayList<String>();
Stream<String> stream=Files.lines(filePath);
list=stream.collect(Collectors.toList());
addDataModel.setList(list);
}
} catch (IOException e) {
//...
}
//Sending data to the DataDaoImpl EJB bean.
if(dataDao.send(addDataModel,idFK)){
userFileManager.confirmUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "The data was saved in the database.", ""));
}
}
private static final long serialVersionUID = -7202741739427929050L;
#Inject
private DataDao dataDao;
private UserFileManager userFileManager;
private UploadedFile uploadFile;
private Path filePath;
}
Update 2
Here's the updated EJB bean where I insert the data into the database:
#Stateless
#TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* #param addDataModel - object which includes path to the uploaded file and other things which are needed.
*/
public void send(AddDataModel addDataModel){
if(handleCSV(addDataModel)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
private boolean handleCSV(AddDataModel addDataModel){
PreparedStatement ps=null;
Connection con=null;
FileInputStream fileInputStream=null;
Scanner scanner=null;
try{
con=ds.getConnection();
con.setAutoCommit(false);
ps=con.prepareStatement("insert into data_store_all "
+ "(id,id_user_table,startTime,time,sIP,dIP,sPort,dPort,proto,totBytes,info) "
+ "values(?,?,?,?,?,?,?,?,?,?,?)");
long start=0;
fileInputStream=new FileInputStream(addDataModel.getPath().toFile());
scanner=new Scanner(fileInputStream, "UTF-8");
Pattern patternRow=Pattern.compile(",");
Pattern patternPort=Pattern.compile("\\d+");
while(scanner.hasNextLine()) {
if(start!=0){
//Loading a row from the file into table.
String[] data=patternRow.split(scanner.nextLine().replaceAll("[\"]",""));
//Preparing datetime.
SimpleDateFormat simpleDateFormat=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
GregorianCalendar calendar=new GregorianCalendar();
calendar.setTime(simpleDateFormat.parse(data[1]));
calendar.set(Calendar.MILLISECOND, Integer.parseInt(Pattern.compile("\\.").split(data[1])[1])/1000);
//Preparing an entity
ps.setLong(1, start++); //id PK
ps.setInt(2, addDataModel.getIdFk()); //id FK
ps.setTimestamp(3, new Timestamp(calendar.getTime().getTime())); //datetime
ps.setDouble(4, Double.parseDouble(data[2])); //time
ps.setString(5, data[3]); //sip
ps.setString(6, data[4]); //dip
if(!data[5].equals("") && patternPort.matcher(data[5]).matches()) ps.setInt(7, Integer.parseInt(data[5])); //sport
else ps.setNull(7, java.sql.Types.INTEGER);
if(!data[6].equals("") && patternPort.matcher(data[6]).matches()) ps.setInt(8, Integer.parseInt(data[6])); //dport
else ps.setNull(8, java.sql.Types.INTEGER);
ps.setString(9, data[7]); //proto
if(!data[8].trim().equals("")) ps.setLong(10, Long.parseLong(data[8])); //len
else ps.setObject(10, null);
if(data.length==10 && !data[9].trim().equals("")) ps.setString(11, data[9]); //info
else ps.setString(11, null);
ps.addBatch();
if(start%100000==0){
System.out.println("Number of entity: "+start);
ps.executeBatch();
ps.clearParameters();
ps.clearBatch();
con.commit();
}
}
else{
start++;
scanner.nextLine();
}
}
if (scanner.ioException() != null) throw scanner.ioException();
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
} finally{
if (fileInputStream!=null)
try {
fileInputStream.close();
} catch (Throwable t2) {
CustomExceptionHandler exception=new CustomExceptionHandler(t2);
return exception.persist("DDI", "handleCSV.Finally");
}
if (scanner != null) scanner.close();
}
return true;
}
#Inject
private EntityManager entityManager;
#Resource(mappedName="java:/PostgresDS")
private DataSource ds;
}
Your problem is not necessarily the database or even hibernate, but that you are loading way too much data into memory at once. That's why you get the out of memory message and why you see the jvm struggling on the way there.
You read the file from a stream, but then push it all into memory when you create the the list of strings. Then you map that list of strings into a linked list of some sort of entity!
Instead, use the the stream to process your files in small chunks and insert the chunks into your database. A scanner based approach would look something like this:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// Talk to your database here!
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
You'll probably find the hibernate/ejb stuff works well enough after you make this change. But I think you'll find plain jdbc to be significantly faster. They say you can expect a 3x to 4x speed bump, depending. That would to make a big difference with a lot of data.
If you are talking about truly huge amounts of data then you should look into the CopyManager, that lets you load streams directly into the database. You can use the streaming apis to transform the data as it goes by.
As you are using WildFly 10, you are in a Java EE 7 environment.
Therefore you should consider using JSR-352 Batch Processing for performing your file import.
Have a look at An Overview of Batch Processing in Java EE 7.0.
This should resolve all your memory consumption and transaction issues.

java application crashed by suspicious jdbc memory leak

I have been working on a java application which crawls page from Internet with http-client(version4.3.3). It uses one fixedThreadPool with 5 threads,each is a loop thread .The pseudocode is following.
public class Spiderling extends Runnable{
#Override
public void run() {
while (true) {
T task = null;
try {
task = scheduler.poll();
if (task != null) {
if Ehcache contains task's config
taskConfig = Ehcache.getConfig;
else{
taskConfig = Query task config from db;//close the conn every time
put taskConfig into Ehcache
}
spider(task,taskConfig);
}
} catch (Exception e) {
e.printStackTrace();
}
}
LOG.error("spiderling is DEAD");
}
}
I am running it with following arguments -Duser.timezone=GMT+8 -server -Xms1536m -Xmx1536m -Xloggc:/home/datalord/logs/gc-2016-07-23-10-28-24.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC on a server(2 cpus,2G memory) and it crashes pretty regular about once in two or three days with no OutOfMemoryError and no JVM error log.
Here is my analysis;
I analyse the gc log with GC-EASY,the report is here. The weird thing is the Old Gen increasing slowly until the allocated max heap size,but the Full Gc has never happened even once.
I suspect it might has memory leak,so I dump the heap map using cmd jmap -dump:format=b,file=soldier.bin and using the Eclipse MAT to analyze the dump file.Here is the problem suspect which object occupies 280+ M bytes.
The class "com.mysql.jdbc.NonRegisteringDriver",
loaded by "sun.misc.Launcher$AppClassLoader # 0xa0018490", occupies 281,118,144
(68.91%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".
Keywords
com.mysql.jdbc.NonRegisteringDriver
java.util.concurrent.ConcurrentHashMap$Segment[]
sun.misc.Launcher$AppClassLoader # 0xa0018490.
I use c3p0-0.9.1.2 as mysql connection pool and mysql-connector-java-5.1.34 as jdbc connector and Ehcache-2.6.10 as memory cache.I have see all posts about 'com.mysql.jdbc.NonregisteringDriver memory leak' and still get no clue.
This problem has driven me crazy for several days, any advice or help will be appreciated!
**********************Supplementary description on 07-24****************
I use a JAVA WEB + ORM Framework called JFinal(github.com/jfinal/jfinal) which is open in github。
Here are some core code for further description about the problem.
/**
* CacheKit. Useful tool box for EhCache.
*
*/
public class CacheKit {
private static CacheManager cacheManager;
private static final Logger log = Logger.getLogger(CacheKit.class);
static void init(CacheManager cacheManager) {
CacheKit.cacheManager = cacheManager;
}
public static CacheManager getCacheManager() {
return cacheManager;
}
static Cache getOrAddCache(String cacheName) {
Cache cache = cacheManager.getCache(cacheName);
if (cache == null) {
synchronized(cacheManager) {
cache = cacheManager.getCache(cacheName);
if (cache == null) {
log.warn("Could not find cache config [" + cacheName + "], using default.");
cacheManager.addCacheIfAbsent(cacheName);
cache = cacheManager.getCache(cacheName);
log.debug("Cache [" + cacheName + "] started.");
}
}
}
return cache;
}
public static void put(String cacheName, Object key, Object value) {
getOrAddCache(cacheName).put(new Element(key, value));
}
#SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key) {
Element element = getOrAddCache(cacheName).get(key);
return element != null ? (T)element.getObjectValue() : null;
}
#SuppressWarnings("rawtypes")
public static List getKeys(String cacheName) {
return getOrAddCache(cacheName).getKeys();
}
public static void remove(String cacheName, Object key) {
getOrAddCache(cacheName).remove(key);
}
public static void removeAll(String cacheName) {
getOrAddCache(cacheName).removeAll();
}
#SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, IDataLoader dataLoader) {
Object data = get(cacheName, key);
if (data == null) {
data = dataLoader.load();
put(cacheName, key, data);
}
return (T)data;
}
#SuppressWarnings("unchecked")
public static <T> T get(String cacheName, Object key, Class<? extends IDataLoader> dataLoaderClass) {
Object data = get(cacheName, key);
if (data == null) {
try {
IDataLoader dataLoader = dataLoaderClass.newInstance();
data = dataLoader.load();
put(cacheName, key, data);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
return (T)data;
}
}
I use CacheKit like CacheKit.get("cfg_extract_rule_tree", extractRootId, new ExtractRuleTreeDataloader(extractRootId)). and class ExtractRuleTreeDataloader will be called if find nothing in cache by extractRootId.
public class ExtractRuleTreeDataloader implements IDataLoader {
public static final Logger LOG = LoggerFactory.getLogger(ExtractRuleTreeDataloader.class);
private int ruleTreeId;
public ExtractRuleTreeDataloader(int ruleTreeId) {
super();
this.ruleTreeId = ruleTreeId;
}
#Override
public Object load() {
List<Record> ruleTreeList = Db.find("SELECT * FROM cfg_extract_fule WHERE root_id=?", ruleTreeId);
TreeHelper<ExtractRuleNode> treeHelper = ExtractUtil.batchRecordConvertTree(ruleTreeList);//convert List<Record> to and tree
if (treeHelper.isValidTree()) {
return treeHelper.getRoot();
} else {
LOG.warn("rule tree id :{} is an error tree #end#", ruleTreeId);
return null;
}
}
As I said before, I use JFinal ORM.The Db.find method code is
public List<Record> find(String sql, Object... paras) {
Connection conn = null;
try {
conn = config.getConnection();
return find(config, conn, sql, paras);
} catch (Exception e) {
throw new ActiveRecordException(e);
} finally {
config.close(conn);
}
}
and the config close method code is
public final void close(Connection conn) {
if (threadLocal.get() == null) // in transaction if conn in threadlocal
if (conn != null)
try {conn.close();} catch (SQLException e) {throw new ActiveRecordException(e);}
}
There is no transaction in my code,so I am pretty sure the conn.close() will be called every time.
**********************more description on 07-28****************
First, I use Ehcache to store the taskConfigs in the memory. And the taskConfigs almost never change, so I want store them in the memory eternally and store them to disk if the memory can not store them all.
I use MAT to find out the GC Roots of NonRegisteringDriver, and the result is show in the following picture.
The Gc Roots of NonRegisteringDriver
But I still don't understand why the default behavior of Ehcache lead memory leak.The taskConfig is a class extends the Model class.
public class TaskConfig extends Model<TaskConfig> {
private static final long serialVersionUID = 5000070716569861947L;
public static TaskConfig DAO = new TaskConfig();
}
and the source code of Model is in this page(github.com/jfinal/jfinal/blob/jfinal-2.0/src/com/jfinal/plugin/activerecord/Model.java). And I can't find any reference (either directly or indirectly) to the connection object as #Jeremiah guessing.
Then I read the source code of NonRegisteringDriver, and don't understand why the map field connectionPhantomRefs of NonRegisteringDriver holds more than 5000 entrys of <ConnectionPhantomReference, ConnectionPhantomReference>,but find no ConnectionImpl in the queue field refQueue of NonRegisteringDriver. Because I see the cleanup code in class AbandonedConnectionCleanupThread which means it will move the ref in the NonRegisteringDriver.connectionPhantomRefs while getting abandoned connection ref from NonRegisteringDriver.refQueue.
#Override
public void run() {
threadRef = this;
while (running) {
try {
Reference<? extends ConnectionImpl> ref = NonRegisteringDriver.refQueue.remove(100);
if (ref != null) {
try {
((ConnectionPhantomReference) ref).cleanup();
} finally {
NonRegisteringDriver.connectionPhantomRefs.remove(ref);
}
}
} catch (Exception ex) {
// no where to really log this if we're static
}
}
}
Appreciate the help offered by #Jeremiah !
From the comments above I'm almost certain your memory leak is actually memory usage from EhCache. The ConcurrentHashMap you're seeing is the one backing the MemoryStore, and I'm guessing that the taskConfig holds a reference (either directly or indirectly) to the connection object, which is why it's showing in your stack.
Having eternal="true" in the default cache makes it so the inserted objects are never allowed to expire. Even without that, the timeToLive and timeToIdle values default to an infinite lifetime!
Combine that with the default behavior of Ehcache when retrieving elements is to copy them (last I checked), through serialization! You're just stacking new Object references up each time the taskConfig is extracted and put back into the ehcache.
The best way to test this (in my opinion) is to change your default cache configuration. Change eternal to false, and implement a timeToIdle value. timeToIdle is a time (in seconds) that a value may exist in the cache without being accessed.
<ehcache> <diskStore path="java.io.tmpdir"/> <defaultCache maxElementsInMemory="10000" eternal="false" timeToIdle="120" overflowToDisk="true" diskPersistent="false" diskExpiryThreadIntervalSeconds="120"/>
If that works, then you may want to look into further tweaking your ehcache configuration settings, or providing a more customized cache reference other than default for your class.
There are multiple performance considerations when tweaking the ehcache. I'm sure that there is a better configuration for your business model. The Ehcache documentation is good, but I found the site to be a bit scattered when I was trying to figure it out. I've listed some links that I found useful below.
http://www.ehcache.org/documentation/2.8/configuration/cache-size.html
http://www.ehcache.org/documentation/2.8/configuration/configuration.html
http://www.ehcache.org/documentation/2.8/apis/cache-eviction-algorithms.html#provided-memorystore-eviction-algorithms
Good luck!
To test your memory leak try the following:
Insert a TaskConfig into ehcache
Immediately retrieve it back out of the cache.
output the value of TaskConfig1.equals(TaskConfig2);
If it returns false, that is your memory leak. Override equals and
hash in your TaskConfig Object and rerun the test.
The root cause of the java program is that the Linux OS runs out of memory and the OOM Killer kills the progresses.
I found the log in /var/log/messages like following.
Aug 3 07:24:03 iZ233tupyzzZ kernel: Out of memory: Kill process 17308 (java) score 890 or sacrifice child
Aug 3 07:24:03 iZ233tupyzzZ kernel: Killed process 17308, UID 0, (java) total-vm:2925160kB, anon-rss:1764648kB, file-rss:248kB
Aug 3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug 3 07:24:03 iZ233tupyzzZ kernel: Thread (pooled) cpuset=/ mems_allowed=0
Aug 3 07:24:03 iZ233tupyzzZ kernel: Pid: 6721, comm: Thread (pooled) Not tainted 2.6.32-431.23.3.el6.x86_64 #1
I also find the default value of maxIdleTime is 20 seconds in the C3p0Plugin which is a c3p0 plugin in JFinal, So I think this is why the Object NonRegisteringDriver occupies 280+ M bytes that shown in the MAT report. So I set the maxIdleTime to 3600 seconds and the object NonRegisteringDriver is no longer suspicious in the MAT report.
And I reset the jvm argements to -Xms512m -Xmx512m. And the java program already has been running pretty well for several days. The Full Gc will be called as expected when the Old Gen is full.

Resources