Spring batch do not deallocate memory (out of memory problem)

Spring batch do not deallocate memory (out of memory problem) - spring

I have a spring batch massive loading that reading from a huge xml file (2GB), processing and write into DB oracle with hibernate persist query. I used chunk of 100 elements for this.
The problem is that when I running this batch on server the memory allocation increase until the process is killed for 'out of memory' (i used top command on server and the process comes to use 20Gb memory!)..i think that, for some reason, spring not deallocate memory after ending the chunk elements.
Can you help me to understand what happened?

Are you using JAXB/JAXB2 to unmarshal the xml data, by any chance? If so, the problem could be related to the initialization of the JAXBContext in your method, instead of initializing it once in your application. Initializing the JAXBContext is an expensive operation and a frequent cause of memory leaks. More info related to this issue can be found here.

Are you using JAXB/JAXB2 to unmarshal the xml data, by any chance? If so, the problem could be related to the initialization of the JAXBContext in your method, instead of initializing it once in your application. Initializing the JAXBContext is an expensive operation and a frequent cause of memory leaks. More info related to this issue can be found here.
I'm using Stax as follow:
public class ClassReader<T> extends StaxEventItemReader<T> {
public MyClassReader(Class<T> t) {
super();
XStreamMarshaller unmarshaller = new XStreamMarshaller();
HashMap<String, Object> aliases = new HashMap<String, Object>();
aliases.put("RECORD", t);
unmarshaller.setAliases(aliases);
this.setFragmentRootElementName("RECORD");
this.setUnmarshaller(unmarshaller);
}
}
I don't think this is the problem..

Related

Spring Boot Caching auto refresh using #PostConstruct

I currently have a Spring Boot based application where there is no active cache. Our application is heavily dependent on key-value configurations which we maintain in an Oracle DB. Currently, without cache, each time I want to get any value from that table, it is a database call. This is, expectedly causing a lot of overhead due to high number of transactions to the DB. Hence, the need for cache arrived.
On searching for caching solutions for SpringBoot, I mostly found links where we are caching object while any CRUD operation is performed via the application code itself, using annotations like #Cacheable, #CachePut, #CacheEvict, etc. but this is not applicable for me. I have a master data of key-value pairs in the DB, any change needs approvals and hence the access is not directly provided to the user, it is made once approved directly in the DB.
I want to have these said key-values to be loaded at startup time and kept in the memory, so I tried to implement the same using #PostConstruct and ConcurrentHashMap class, something like this:
public ConcurrentHashMap<String, String> cacheMap = new ConcurrentHashMap<>();
#PostConstruct
public void initialiseCacheMap() {
List<MyEntity> list = myRepository.findAll();
for(int i = 0; i < list.size(); i++) {
cacheMap.put(list.get(i).getKey(), list.get(i).getValue());
}
}
In my service class, whenever I want to get something, I am first checking if the data is available in the map, if not I am checking the DB.
My purpose is getting fulfilled and I am able to drastically improve the performance of the application. A certain set of transactions were earlier taking 6.28 seconds to complete, which are now completed in mere 562 milliseconds! however, there is just one problem which I am not able to figure out:
#PostConstruct is called once by Spring, on startup, post dependency injection. Which means, I have no means to re-trigger the cache build without restart or application downtime, this is not acceptable unfortunately. Further, as of now, I do not have the liberty to use any existing caching frameworks or libraries like ehcache or Redis.
How can I achieve periodic refreshing of this cache (let's say every 30 minutes?) with only plain old Java/Spring classes/libraries?
Thanks in advance for any ideas!

You can do this several ways, but how you can also achieve this is by doing something in the direction of:
private const val everyThrityMinute = "0 0/30 * * * ?"
#Component
class TheAmazingPreloader {
#Scheduled(cron = everyThrityMinute)
#EventListener(ApplicationReadyEvent::class)
fun refreshCachedEntries() {
// the preloading happens here
}
}
Then you have the preloading bits when the application has started, and also the refreshing mechanism in place that triggers, say, every 30 minutes.
You will require to add the annotation on some #Configuration-class or the #SpringBootApplication-class:
#EnableScheduling

Memory Leak in Jackson ObjectMapper's SerializerCache

Not quite sure if this is a Jackson question or a Springboot question, or Jetty:
My microservice became unresponsive in production apparently due to excessive memory usage (telling from OS RSS stat) but no OOM.
I obtained a heap dump via jcmd [pid] GC.heap_dump and later opened it in Eclipse Memory Analyzer Tool (MAT) installed via Eclipse Marketplace.
I'm greeted by this finding:
I think this says Jackson ObjectMapper ate 80% of my heap (395M out of the default 512M size).
What would cause this and how can I prevent it?
UPDATES
I started digging into Jackson's SeralizerCache.
There was indeed a reproducible memory leak but it was fixed in 2.7.x: https://github.com/FasterXML/jackson-databind/issues/1049
This SO question also applied to pre-2.7 versions: Too Many objects in single instance of ObjectMapper SerializerCache causing memory leak
My version is 2.13.1, so the above shouldn't matter.

Found the culprit:
#PostMapping(value = "/headers", produces = MediaType.APPLICATION_JSON_UTF8_VALUE)
#ResponseBody
public ListingHeader[] head(#RequestBody ListingDetailPK[] parms) {
final ListingInfo[] all = readerDao.getAll(Arrays.asList(parms));
ObjectMapper mapper = JSON.mapper().copy();
mapper.registerModule(new MrBeanModule());
try {
return mapper.readValue(mapper.writerFor(ListingHeader[].class)
.writeValueAsString(all), ListingHeader[].class);
} catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
}
A clue was provided in a comment by #Pawel Zieminski:
Is there a large number of (possibly generated) classes in your
system? Some proxy classes maybe?
I suspect that the dynamic proxies generated by MrBeanModule are causing the problem.

OptaPlanner threads are not getting released in SpringBoot application

We are using OptaPlanner(8.2.0) library in Spring Boot to solve knapsack problem using construction heuristic algorithm.
While running the application we observed that threads created by SolverManager are not getting released even after solving the problem. Because of that, performance of the application starts degrading after some time. Also, solver manager starts responding slowly of the increased thread count.
We also tried with latest version(8.17.0) but issue still persist.
Termination conditions:
<termination>
<millisecondsSpentLimit>200</millisecondsSpentLimit>
</termination>
optaplanner:
solver:
termination:
best-score-limit: 0hard/*soft
Code:
#Component
#Slf4j
public class SolutionManager {
private final SolverManager<Solution, String> solutionManager;
public SolutionManager(SolverManager<Solution, String> solutionManager) {
this.solutionManager = solutionManager;
}
public Solution getSolutionResponse(String solutionId, Solution unsolvedProblem)
throws InterruptedException, ExecutionException {
SolverJob<Solution, String> solve = solutionManager.solve(solutionId, unsolvedProblem);
Solution finalBestSolution = solve.getFinalBestSolution();
return finalBestSolution;
}
}
Thread metrics:

I wasn't able to reproduce the problem; after a load represented by solving several datasets in parallel, the number of threads drops back to the same value as before the load started.
The chart you shared doesn't clearly suggest there is a thread leak either; if you take a look at ~12:40 PM and compare it with ~2:00 PM, the number of threads actually did decrease.
Let me also add that the getFinalBestSolution() method actually blocks the calling thread until the solver finishes. If you instead use solve(ProblemId_ problemId, Solution_ problem, Consumer<? super Solution_> finalBestSolutionConsumer), this method returns immediately and the Consumer you provide is called when the solver finishes.

It looks like you might not be using OptaPlanner Spring Boot Starter.
If that's the case, upgrade to a recent version of OptaPlanner and add a dependency to optaplanner-spring-boot-starter. See the docs spring quickstart and the optaplanner-quickstarts repository (in the directory technology) for an example how to use it.

Recover Hikaricp after OutOfMemoryError

I have a very specific scenario that, during the execution of a query, specifically during the fetching rows from db to my resultset, I get an OutOfMemoryError.
The code is simple as it:
public interface MyRepository extends Repository<MyEntity, Long> {
#EntityGraph(value = "MyBigEntityGraphToFetchAllCollections", type = EntityGraphType.FETCH)
#QueryHints({#QueryHint(name = "org.hibernate.readOnly", value = "true")})
MyEntity getOneById(Long id);
}
public class MyService {
...
public void someMethodCalledInLoop(Long id) {
try{
return repository.getOneById(id);
} catch (OutOfMemoryError error) {
// Here the connection is closed. How to reset Hikaricp?
System.gc();
return null;
}
}
}
Seems weird a getOne consumes all the memory, but due to eager fetching about 80 collections and due to multiplication of rows, some cases are insupportable.
I know I have the option to lazely load the collections, but I don't want to. Hit database 1+N times on every load consumes more time and my application dont have it. Its a batch processing of milions of records and less than 0,001% has this impact in memory. So my strategy is just discard this few records and process the next ones.
Just after catch the OutOfMemoryError the memory is freed, the trouble entity turns garbage. But due to this Error, HikariCP closes (or is forced to) the connection.
In the next call of the method, hikaricp still gives me a closed connection. Seems due to memory lack hikaricp doesn't finished correctly the previous transaction and sticks in this state forever.
My intention, now, is to reset or recovery hikaricp. I don't need to care about other threads using the pool.
So, after all, my simple question is, how to programatically restart or recover hikarycp to its primary state, without reboot the application.
Thanks, a lot, for who read this.

Try adding this to your Hibernate configuration:
<property name="hibernate.hikari.connectionTestQuery">select 1</property>
This way HikariCP will test that the connection is still alive before giving it to Hibernate.

Nothing has worked so far.
I minimized the problem by adding a 'query hint' to the method:
#QueryHints({#QueryHint(name = "org.hibernate.timeout", value = "10")})
MyEntity getOneById(Long id);
99% of the resultsets are fetched in 1 or less second, but sometimes the resultset is so big that takes longer. This way the JDBC stops the result fetching before the memory gets compromised.

Merge Multiple csv files into Single csv using Spring batch

I have a business case of Merge Multiple csv files(around 1000+ Each containing 1000 records )into Single csv using Spring batch .
Please help me provide your guidance and solutions in terms of approach and performance-wise as well.
So far, I have tried two approaches,
Approach 1.
Tasklet chunk with multiResourceItemReader to read the files from directory and
FlatFileItemWriter as item writer.
Issue here is, it is very slow in processing since this is single threaded, but approach works as expected.
Approach 2:
Using MultiResourcePartitioner partitioner and AsynTaskExceutor as task-executor.
Issue here is, since it is async multi-thread, data is getting overwritten/ corrupted while merging into final single file.

You can wrap your FlatFileItemWriter in AsyncItemWriter and use along with AsyncItemProcessor. This will not corrupt your data and increase the performance as processing and writing will be through several threads.
#Bean
public AsyncItemWriter asyncItemWriter() throws Exception {
AsyncItemWriter<Customer> asyncItemWriter = new AsyncItemWriter<>();
asyncItemWriter.setDelegate(flatFileItemWriter);
asyncItemWriter.afterPropertiesSet();
return asyncItemWriter;
}
#Bean
public AsyncItemProcessor asyncItemProcessor() throws Exception {
AsyncItemProcessor<Customer, Customer> asyncItemProcessor = new AsyncItemProcessor();
asyncItemProcessor.setDelegate(itemProcessor());
asyncItemProcessor.setTaskExecutor(threadPoolTaskExecutor());
asyncItemProcessor.afterPropertiesSet();
return asyncItemProcessor;
}
#Bean
public TaskExecutor threadPoolTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(10);
executor.setThreadNamePrefix("default_task_executor_thread");
executor.initialize();
return executor;
}

Since your headers are common between your source and destination files, I wouldn't recommend using Spring Batch provided readers to convert lines into specific beans since column level information is not needed & csv being a text format , you can go ahead only with line level info without breaking it at field level.
Also, partitioning per file is going to be a very slow ( if you have those many files ) & you should try by first fixing your number of partitions ( like 10 or 20 ) and try grouping your files into those many partitions. Secondly file writing being a disk based operation & not CPU based, multi threading won't be useful.
What I suggest instead is to write your custom reader & writer in plain Java on the lines as suggested in this answer where your reader will return a List<String> and writer will get List<List<String>> & that you can write to file.
If you have enough memory to hold lines from all files in one go then you can read all files in one go & keep returning chunk_size or you can keep reading small set of files to reach chunk size limit should be good enough. Your reader will return null when no more files to read.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spring batch do not deallocate memory (out of memory problem) - spring

Related

Spring Boot Caching auto refresh using #PostConstruct

Memory Leak in Jackson ObjectMapper's SerializerCache

OptaPlanner threads are not getting released in SpringBoot application

Recover Hikaricp after OutOfMemoryError

Merge Multiple csv files into Single csv using Spring batch

Categories

Resources