Spring Boot+ Scheduled Method+OutOfMemory GC overhead limit excedded - spring

I am using spring boot application which run through bat file.This i am using for many background services which interact with database and create txt files etc. All methods are annotated with #Scheduled(fixedDelayString = "${fixedDelay}") here fixedDelay=2000.
As many #Scheduled annotation that many thread i have configured through application.properties file. There is one method which basically call three mssql database procedure and i have to wait for all three procedure response and then proceed further. For this i have used Executors.newFixedThreadPool(3) and wait for response by future.get() this is a long running process that might take 2 or 3 hr or even more. Now i am getting Outofmemory GC overhead limit exceeded.First i try to increase the heap size but still this issue comes. And one i see the heapdump, i found outofmemory comes in this Method execution.Is there some limitation that we can not call any sql statement that will take more then 3 hr or more through thread?
I took the heapdump but from that i am not able to get the root cause. Because i am just calling the procedure in this.Not doing any other operation.Please help in this.Heap dump image is attached
Thanks

Related

Heap size (RAM consumtion) does not decrease after executing a save query (Spring boot)

I remarked that after inserting 400 000 rows into mysql using spring boot, the heap size and RAM consumption go up, but never drops down after.
If a similar request is done the second time, then the heap rises with additional 5-10%.
In what the problem could be in ?
Why the heap is not cleared after executing a save query ?
Is there a problem with garbage collector ? if yes, how is it possible to fix ?
P.S: I tried some solutions provided in this article, but they did not help
https://medium.com/quick-code/what-a-recurring-outofmemory-error-taught-me-11f2061063a1
Edited:
I tried to select 1.2 mil records in order to fulfill the heap.
And after 20-25 minutes passed, the heap started to decrease. Why is this happening ? Isn't garbage collector suppose to clear the heap faster ? In this way, if during these 25 minutes other requests are done, server will just crash.
Edited 2:
Seems that when trying the same on ec2, garbage collector does not work at all. It happened already 3 times that server just runs out of memory and crashes.
Anyone know the cause ?
I ran 1 #Async thread which had inside it another 2 #Async threads called from different beans. They finished execution and after that heap never got down.
I do not have any #Cacheable methods or similar stuffs. I am just getting some data from tables, process them and update it. Have no inputstreams etc.

Design Pattern - Spring KafkaListener processing 1 million records in 1 hour

My spring boot application is going to listen to 1 million records an hour from a kafka broker. The entire processing logic for each message takes 1-1.5 seconds including a database insert. Broker has 64 partitions, which is also the concurrency of my #KafkaListener.
My current code is only able to process 90 records in a minute in a lower environment where I am listening to around 50k records an hour. Below is the code and all other config parameters like max.poll.records etc are default values:
#KafkaListener(id="xyz-listener", concurrency="64", topics="my-topic")
public void listener(String record) {
// processing logic
}
I do get "it is likely that the consumer was kicked out of the group" 7-8 times an hour. I think both of these issues can be solved through isolating listener method and multithreading processing of each message but I am not sure how to do that.
There are a few points to consider here. First, 64 consumers seems a bit too much for a single application to handle consistently.
Considering each poll by default fetches 500 records per consumer at a time, your app might be getting overloaded and causing the consumers to get kicked out of the group if a single batch takes more than the 5 minutes default for max.poll.timeout.ms to be processed.
So first, I'd consider scaling the application horizontally so that each application handles a smaller amount of partitions / threads.
A second way to increase throughput would be using a batch listener, and handling processing and DB insertions in batches as you can see in this answer.
Using both, you should be processing a sensible amount of work in parallel per app, and should be able to achieve your desired throughput.
Of course, you should load test each approach with different figures to have proper metrics.
EDIT: Addressing your comment, if you want to achieve this throughput I wouldn't give up on batch processing just yet. If you do the DB operations row by row you'll need a lot more resources for the same performance.
If your rule engine doesn't do any I/O you can iterate each record from the batch through it without losing performance.
About data consistency, you can try some strategies. For example, you can have a lock to ensure that even through a rebalance only one instance will process a given batch of records at a given time - or perhaps there's a more idiomatic way of handling that in Kafka using the rebalance hooks.
With that in place, you can batch load all the information you need to filter out duplicated / outdated records when you receive the records, iterate each record through the rule engine in memory, and then batch persist all results, to then release the lock.
Of course, it's hard to come up with an ideal strategy without knowing more details about the process. The point is by doing that you should be able to handle around 10x more records within each instance, so I'd definitely give it a shot.

Tomcat unexpected maximum response time for a request when load testing is done using jmeter

I have a spring boot application which has a post endpoint which takes the request and send it to another service and get the response back and save it to mongo database and returned the response back to user. The application is deployed on embedded tomcat of spring boot. I am using jmeter to see the max response time, throughput etc.
When i ran a test from jmeter with 500 threads for 10 minutes, i got maximum time as around 3500ms.
When i repeat the test from jmeter the maximum time gets reduced to 900ms.
Again, if i run the test after a long time, the maximum again goes upto 3500ms.
I am not able to get any information regarding this behavior of tomcat.
Could you please help me with understanding this behavior of tomcat?
What do you mean by "unexpected"? Lower response time when you repeat the test can be explained by either your application implementation, like when you start load test against the application which is just deployed it's performance might not be optimal and when you repeat the test the cache is "warmed up" so you're getting better performance.
Another explanation could be JIT optimization as JVM is analyzing the pattern of your application usage and does inner improvements of the bytecode to better serve the given load pattern.
Third possible explanation is MongoDB caching, if 500 users are sending the same responses it might be the case the database stores the result sets in memory and when you repeat the test it doesn't actually access the storage but returns the results directly from the memory which is fast and cheap. Consider properly parameterizing your JMeter test so each thread (virtual user) would use its own credentials and perform different query than the other thread(s), but keep in mind that the test needs to be repeatable so don't use unique data each time, it's better to have sufficient set of pre-defined test data

Using Quartz for long running job

I'm planning to use Quartz scheduler to process a one-time job.
My use case is, I need to migrate BLOB from one storage to another and blob's can be as big as 100GB, so a particular job can run really long enough to get the work done.
The reason I'm using Quartz because of its clustering support, fault tolerance and retry capabilities in case job fails etc. Only thing I'm concerned about is, I might have a lot of miss fire trigger scenario and a lot of database lock which can hamper live production traffic on those database hosts. I will probably be scheduling 10s of thousands of job in one shot.
Few of the things that I figured out is
I can set a high value for org.quartz.jobStore.misfireThreshold so that miss fire does not happen. I don't really care about the time when the job get's picked up as it's background job and no SLA as such. Only thing I care about is that eventually job getting picked up and getting work done.
I can also set batch mode properties org.quartz.scheduler.batchTriggerAcquisitionMaxCount and org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow. I understand the batch max count property should be like equal to the thread pool size which can give the biggest bang on performance but what should be the value of fire ahead of time window be?
I'm using Quartz with Spring boot and will be leveraging org.quartz.impl.jdbcjobstore.JobStoreCMT. What I understand is execute method of the job get wrapped in the transaction, will this cause any problem since transaction will be open for a long time as the job might take hours to complete? Is this something ok? I will be using Oracle database.
Am I missing something here? Can someone share their experience with a similar use case?
Thanks!

EJB transactions slowing down in one thread

I have a performance problem with my process. It's an asynchronous task launched in CMT bean (on jboss server).
1 iteration performs 1 update and 3 inserts to my db via Hibernate. The process is divided into new transactions every 100 iterations.
Flush is called on EntityManager after every update/insert.
While the starting performance of first batch is satisfying (around 5-8s) it slows down drastically with time. The 30th batch takes around 30s to finish and later grows up to over 2 minutes per batch.
I tried switching FlushModeType to COMMIT, manual clearing/closing entityManagers, clearing entityManagers cache, I looked for memory leaks and can't find the reason for this slow down.
I measured little bits of code execution time and every code involving database connection slows down with time. I understand that a transaction slows with more entities processed but why is new transaction also slower?
The latest process consists of 250 000 iterations (2500 transactions in 1 thread) and takes forever to end.
If needed I'll provide more information. Any help would be appreciated.
I tried simplifying this code just to do 1 hibernate insert and no other operations and it still slows with time. This is an abstract pseudo view of what's going on inside.
Bean1
#Asynchronous
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void mainTask(){
while(...){
subTask();
}
}
Bean2
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void subTask(){
100.times{
3*Insert
1*Update
}
}
I am sure that my suggestions might not be accurate one or you tried them already, but I'd like to give a try. As you mentioned that DB connection is bottleneck, I'll go after that.
After reading the question, I find that time taken by transaction is proportional to iteration number. So it looks that entities created in first iteration are being sent to hibernate in next iteration.
For example, in 4th iteration, entities created in 1st,2nd and 3rd iterations are also being sent for update or being sent to hibernate somehow.
That could be the reason for degradation of performance as iteration progresses. As number of records to be updated/inserted/selected increases with each iteration.
I can think of following possibilities on top of my head -
Hibernate session used in first iteration is being used till end. Due to this, entities created in first iteration are being updated in later iterations as well. I read that you tried closing entity managers etc. But still, please check the place where sesssion is ending. You can create a new session in each transaction or delete the entities created in session after each transaction.
The list which filters records for each iteration is sending already processed records. That means the list should send records from 300 to 399 in 3rd iteration, but records from 0 to 399 are sent to transaction.
If you're using HQL, try using named query. Last time when I used Hibernate (about 8 months back), I noticed that when HQL is used number of objects loaded by hibernate are much more than named query.
Hibernate provides a way to print actual SQL query/parameters sent to DB. You can check actual query sent to DB. Link for this -
How to print a query string with parameter values when using Hibernate
We faced similar problems on couple of occasions during executing of batch programs with JPA.
Only solution we could find is use jdbc api for batch programs which involves lot of processing.
Turns out it's a java problem, we tested our application on different configurations and it works fine when our jboss runs on java 1.7 (on contrary to 1.6). Every batch finishes in about 5 seconds. We now stand against a choice to upgrade our java to 1.7 or to dig deeper and find what's wrong with our setup on java 1.6.

Resources