OutOfMemoryError in Gradle/SpringBoot/Kotlin Coroutines project - spring-boot

In these days I added a bundle of codes to my project, Gradle build failed in Github actions, some tests throw OOM error when running Gradle test task.
The project tech stack is Spring Boot 3/R2dbc + Kotlin 1.8/Kotlin Coroutines+ Java 17(Gradle Java Language level set to 17)
The build tooling stack.
Local system: Windows 10 Pro(with 16G memory)/Oracle JDK 17/Gradle 7.6(project Gradle wrapper)
Github actions: Custom Ubuntu with 16G memory/Amazon JDK 17
After researching, we use a custom larger runner with 16G memory, and increase the Gradle JVM heap size to 8G, but it is no help.
org.gradle.jvmargs=-Xmx8g -Xms4g
We still get the following errors when running tests. But testing codes itself is not a problem, they have been passed on my local machine.
*** java.lang.instrument ASSERTION FAILED ***: "!errorOutstanding" with message can't create name string at src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 827
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ClassGraph-worker-439"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ClassGraph-worker-438"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "boundedElastic-evictor-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ClassGraph-worker-435"
*** java.lang.instrument ASSERTION FAILED ***: "!errorOutstanding" with message can't create name string at src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 827
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ClassGraph-worker-433"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ClassGraph-worker-436"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ClassGraph-worker-432"
Update: After posting this question on Spring Boot and other discussion, now confirmed it was caused by classgraph. Classgraph is used by spring doc to scan and analyze the OpenAPI endpoints. If I remove spring doc from the project, it works again.
The problem is even I setup a global springdoc.packageToScan to shrink the scan scope, it still failed with OOM error.

It looks that the error happens in a Gradle worker. Gradle executes separate JVM processes to run the tests, whose memory settings are different than main Gradle process. This by default uses 512mb.
You can do different things to solve this: either increase the heap for that worker, or reduce the amount of tests executed in each worker. You can reduce the amount in two ways: either forking multiple parallel processes per module or forking a new process each a fixed amount of tests in serial mode. Increasing the heap for the Gradle test worker might be the best, but if you have many modules executing the tests in parallel you might exhaust the total memory of your agent as well.
Please take a look to the Gradle testing documentation for more details on these options.
You control all of these settings with the code below (I do not recommend applying all of these together, this is just to illustrate the options).
tasks.withType<Test>().configureEach {
maxHeapSize = "1g"
forkEvery = 100
maxParallelForks = 4
}
In any case, my recommendation would be to profile the build to figure out exactly what are the processes exhausting the memory, what is the most suitable memory settings and potentially identify where is the leak that is producing this.

Related

Wildfly 11.0.0 final java.lang.OutOfMemoryError: Metaspace

I am getting java.lang.OutOfMemoryError: Metaspace exception since new deployment on production(Before this change we were using a separate jar for scheduling it was working fine but due to some network issue it was stopping again so we added scheduler and included into wildfly server with other war) env. So basically we are using wildfly 11.0.0 final server in which we have 4 war files and one of them has #scheduled - Or scheduler and it run every 10 mins. So generally we do stop the service of wildfly and start again after new war deployment, but after certain time (4to5 hours) application start slowing down and when see the console of server there i can see java.lang.OutOfMemoryError: Metaspace as below :
WARN [org.jboss.modules] (default task-11) Failed to define class com.arjuna.ats.jta.cdi.TransactionScopeCleanup$1 in Module "org.jboss.jts" from local module loader #1e802ef9 (finder: local module finder #2b6faea6 (roots: E:\Data\wildfly-11.0.0.Final\modules,E:\Data\wildfly-11.0.0.Final\modules\system\layers\base)): java.lang.OutOfMemoryError: Metaspace
ERROR [org.jboss.as.ejb3.invocation] (default task-55) WFLYEJB0034: EJB Invocation failed on component AuditLoggerHandler for method public void com.banctec.caseware.server.logger.AuditLoggerHandlerBean.publishCaseAudit(java.lang.String,com.banctec.caseware.server.helpers.SessionHolder,com.banctec.caseware.resources.Resource[],java.lang.Long) throws com.banctec.caseware.exceptions.CaseWareException: javax.ejb.EJBTransactionRolledbackException: WFLYEJB0457: Unexpected Error
So then for each operation we get similar kind of errors with java.lang.OutOfMemoryError: Metaspace
So very first i have removed plain code from #scheduler and used Executor framework where i have used 5 fixed thread pool and with this change we deployed again but again same issue is coming.
I am not sure what is causing server down again and again and getting this memory leak issue.
In all 4 war we used Spring boot 2.0.2.
Any help appreciated. Sorry for bad English.
You need to increase your heap space. And check if you have a memory leak. Please take a look at following link. http://www.mastertheboss.com/java/solving-java-lang-outofmemoryerror-metaspace-error/
You can use tools like Jprofiler to find memory leak. It works like a charm. Check out following link https://www.youtube.com/watch?v=032aTGa-1XM

Parallel For-Each Mule 4.2.2

I am getting the below errors in the application running in production.
"org.mule.runtime.core.privileged.processor.chain.AbstractMessageProcessorChain - Unexpected state. Error handler should be invoked with either an Event instance or a MessagingException
java.util.ConcurrentModificationException: null"
I am seeing this error while the app is running inside the parallel foreach invoking services.
Any idea when we would this kinda exception happens
It looks like it could be caused by one of the issues fixed in Mule 4.3.0. Try to test with 4.3.0 to see if the issue goes away.

Beam pipeline not moving in Google Dataflow while running ok on direct runner

I have a Beam pipeline runs well locally with DirectRunner. However, when switching to the DataFlowRunner, the job started and I can see the flow chart from the Google dataflow web ui. However, the job does not run. It was hanging there till I stop the job. I am using Beam 2.10. I can see the auto scaling adjusting cpu and no exception in the log.
I think this has something to do with the way I create the Jar file. I am using the shadow Jar to create the jar file in gradle build. Main reason to use the ShadowJar is for mergeServiceFiles(). If not using mergeServiceFiles(), the job will run with exception like No FileSystem found for gs.
So I copied the word count from google dataflow template repo and package as jar file. It shows the same thing, the job started but not moving. The code has been touched with miniumum change for the service account credential. Instead of its original PipelineOptions, I extends the GcsOptions for the credential.
Tried beam 2.12, 2.10.
Dig around and found the full log by clicking on the stackdrive on the upper right corner of the log shown. Found the following
Caused by: java.lang.IllegalStateException: Detected both log4j-over-slf4j.jar AND bound slf4j-log4j12.jar on the class path, preempting StackOverflowError. See also http://www.slf4j.org/codes.html#log4jDelegationLoop for more details. at org.slf4j.impl.Log4jLoggerFactory.<clinit>(Log4jLoggerFactory.java:54) ....
Then there is a
java failed with exit status 1
log entry few rows under the log4j error. Basically the java program already stopped but the dataflow UI still showing it is running on the flow chart.
Use the gradle build script to exclude all the slf4j-log4j12 from
compile ('org.apache.hadoop:hadoop-mapreduce-client-core:3.2.0') {exclude group: 'org.slf4j', module: 'slf4j-log4j12'}
and other dependencies contains slf4j-log4j12 and the job start moving.

optaplanner with aws lambda

I am using optaplanner to solve a scheduling problem. I want to invoke the scheduling code from AWS Lambda (i know that Lambda's max execution time is 5 minutes and thats okay for this application)
To achieve this I have build a maven project with two modules:
module-1: scheduling optimization code
module-2: aws lambda handler ( calls scheduling code from module-1)
When i run my tests in IntelliJ Idea for module-1(that has optaplanner code), it runs fine.
When I invoke the lambda function, i get following exception:
java.lang.ExceptionInInitializerError:
java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at org.kie.api.internal.utils.ServiceRegistry.getInstance(ServiceRegistry.java:27)
...
Caused by: java.lang.RuntimeException: Child services [org.kie.api.internal.assembler.KieAssemblers] have no parent
at org.kie.api.internal.utils.ServiceDiscoveryImpl.buildMap(ServiceDiscoveryImpl.java:191)
at org.kie.api.internal.utils.ServiceDiscoveryImpl.getServices(ServiceDiscoveryImpl.java:97)
...
I have included following dependency in maven file: org.optaplanner optaplanner-core 7.7.0.Final
Also checked that jar file have drools-core, kie-api, kei-internal, drools-compiler. Does anyone know what might be the issue?
Sounds like a bug in drools when running in a restricted environment such as AWS-lambda. Please create a JIRA and link it here.
I was getting the same error attempting to run a fat jar containing an example OptaPlanner project. A little debugging revealed that the problem was services was empty when ServiceDiscoveryImpl::buildMap was invoked; I was using the first META-INF/kie.conf in the build, and as a result services were missing from that file. Naturally your tests would work properly because the class path would contain all of the dependencies (that is, several distinct META-INF/kie.conf files), and not the assembly you were attempting to execute on the lambda.
Concatenating those files instead (using an appropriate merge strategy in assembly) fixes the problem and appears appropriate given how those are loaded by ServiceDiscoveryImpl. The updated JAR runs properly as an AWS lambda.
Note: I was using the default scoreDrl from the v7.12.0.Final Cloud Balancing example.

Subtle diff when running in intellij and executing jar

I'm running out of ideas...
My spring boot app behaves fine when I run it in indellij and gradle idea plugin is applied (apply plugin: 'idea').
Once I remove the plugin from build.gradle it behaves similarly to app executed with java -jar app.jar - there is subtle but important difference, description below.
I have the following scenario, current tx fails due to some exception, tx is marked as roll-back-only, exception is caught and its handling consists of registering post tx recovery activity with TransactionSynchronizationManager.registerSynchronization (new tx).
The code works fine in intellij with idea plugin, when I remove plugin declaration or run spring boot jar with java -jar registering process (post tx failure task) fails with exception:
Caused by: java.lang.IllegalStateException: Transaction synchronization is not active
at org.springframework.transaction.support.TransactionSynchronizationManager.registerSynchronization(TransactionSynchronizationManager.java:291) ~[spring-tx-4.3.10.RELEASE.jar!/:4.3.10.RELEASE]
Btw, the code is in kotlin if it matters.
Any ideas?
UPDATE
I think there is some kind of race condition because in debug mode, even w/o idea plugin, the app behaves as expected (registering process is successful).
I solved my problem and the root cause was quite surprising...
Apparently there's a problem with correct processing of custom Spring annotation.
The method which was supposed to open a new transaction was not annotated with a standard #Transactional annotation, but with custom, application specific annotation (#Transactional with custom tx settings). Debugging session revealed that new tx was not being open. That's it! Inlining custom annotation nearly solved a problem.
Another flaw I detected was a function which was not open, quite strange because the function was not transaction entry point (som further call).
Kotlin compiler bug?
Anyway, lessons learned - pay attention to custom annotations behaviour; refresh knowledge about rules for final/open.

Resources