elasticsearch out of memory error - elasticsearch

We are using elasticsearch 0.90.0 and java version "1.7.0_25".
We migrate data from oracle DB to Hadoop through a executable jar kept at DB server.after 15-20 mins of successful running, we get following exception
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:597)
at java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:343)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.<init>(AbstractNioSelector.java:95)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.<init>(AbstractNioWorker.java:51)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:45)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:99)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:69)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:39)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorkerPool.<init>(NioWorkerPool.java:33)
at org.elasticsearch.transport.netty.NettyTransport.doStart(NettyTransport.java:240)
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
at org.elasticsearch.transport.TransportService.doStart(TransportService.java:90)
at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:85)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:179)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:119)
No exception is caught either in namenode/datanode or elasticsearch logs. This error is caught at DB Server but I think it is related to elastic search.

Looks like you're creating too many Netty clients is my guess which is in turn eating up all your threads. Perhaps wrap your Netty client pool with a service that inject? See Helter Skelter's comment on this answer https://stackoverflow.com/a/5253186/266531

Related

DNS Resolution for Redis, Elasticsearch and Postgres hosts not working after introducing Spring Wbflux

Everything was working until I introduced Webflux into our Sprint Boot application running on external tomcat inside a docker container. This was needed when I wrote an elasticsearch repository using ReactiveCrudRepository to implement asynchronous save/fetch operations. Now, our Spring Boot apps hosted on docker are not able to connect to any of the datastores.
For Redis the exception is -
Caused by: java.net.UnknownHostException: Failed to resolve
'redis1' after 2 queries
For Elasticsearch -
Caused by:
org.springframework.data.elasticsearch.client.NoReachableHostException:
Host 'es_host:9200' not reachable. Cluster state is offline.
The data nodes are all hosted on separate instances.
On researching around a bit, I came across several issues with netty's DNS resolver, but there is no standard fix mentioned anywhere. Please advice.

Tracing memory leak in Spring Azure qPID JMS code

Im trying to trace and identify root cause for memory leak in our very small and simple Spring Boot application.
It uses following:
- Spring Boot 2.2.4
- azure-servicebus-jms-spring-boot-starter 2.2.1
- MSSQL
Function:
The app only dispatches Azure ServiceBus queue and stores data and sends data to other destination.
It is a small app so it starts easily with 64 megs of memory, despite I give it up to 256 megs via Xmx option. Important note is the queue is being dispatched using Spring default transacted mode with dedicated JmsTransactionManager who is actually inner TM of ChainedTransactionManager along with dbTM and additional outbound JMS TM. Both JMS ConnectionFactory objects are created as CachingConnectionFactory.
Behavior:
Once the app is started it seems OK. There is no traffic so I can see in the log it is opening transactions and closing when checking the queue (jms:message-driven-channel-adapter).
However after some time when there is still no traffic, no single message was consumed the memory starts climbing as monitored via JVVM.
There is an error thrown:
--2020-04-24 11:17:01.443 - WARN 39892 --- [er.container-10] o.s.j.l.DefaultMessageListenerContainer : Setup of JMS message listener invoker failed for destination 'MY QUEUE NAME HERE' - trying to recover. Cause: Heuristic completion: outcome state is rolled back; nested exception is org.springframework.transaction.TransactionSystemException: Could not commit JMS transaction; nested exception is javax.jms.IllegalStateException: The Session was closed due to an unrecoverable error.
... and after several minutes it reaches MAX of the heap and since that moment it is failing on OutOfMemory error in the thread opening JMS connections.
--2020-04-24 11:20:04.564 - WARN 39892 --- [windows.net:-1]] i.n.u.concurrent.AbstractEventExecutor : A task raised an exception. Task: org.apache.qpid.jms.provider.amqp.AmqpProvider$$Lambda$871/0x000000080199f840#1ed8f2b9
-
java.lang.OutOfMemoryError: Java heap space
at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61)
at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348)
at org.apache.qpid.proton.engine.impl.ByteBufferUtils.newWriteableBuffer(ByteBufferUtils.java:99)
at org.apache.qpid.proton.engine.impl.TransportOutputAdaptor.init_buffers(TransportOutputAdaptor.java:108)
at org.apache.qpid.proton.engine.impl.TransportOutputAdaptor.pending(TransportOutputAdaptor.java:56)
at org.apache.qpid.proton.engine.impl.SaslImpl$SwitchingSaslTransportWrapper.pending(SaslImpl.java:842)
at org.apache.qpid.proton.engine.impl.HandshakeSniffingTransportWrapper.pending(HandshakeSniffingTransportWrapper.java:138)
at org.apache.qpid.proton.engine.impl.TransportImpl.pending(TransportImpl.java:1577)
at org.apache.qpid.proton.engine.impl.TransportImpl.getOutputBuffer(TransportImpl.java:1526)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.pumpToProtonTransport(AmqpProvider.java:994)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.pumpToProtonTransport(AmqpProvider.java:985)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.lambda$close$3(AmqpProvider.java:351)
at org.apache.qpid.jms.provider.amqp.AmqpProvider$$Lambda$871/0x000000080199f840.run(Unknown Source)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:835)
HeapDumps:
I took couple of heap snapshots during this whole process and looked at what gets increased.
I can see suspicious amount of ConcurrentHashMap/String/Byte[] objects.
Has anyone some clue/hint what can be wrong in this setup and libs: Spring Boot, Apache qPid used under the hood of the Azure JMS dependency etc.? Many thanks.
Update #1
I have clear evidence that the problem is either in Spring or azure service bus starter library - not automatically qPid client used. I would say the library has the bug rather than Spring, just my guess. This is how the failing setup looks like:
There are two JMS destinations and one DB, each having its transaction manager
There is ChainedTransactionManager wrapping above three TMs.
Spring integration app which connects to Azure ServiceBus queue via jms:message-driven-channel-adapter and setting the transaction manager on this component (as created in point 2)
Start the app., no traffic on the queue is needed, after 10 minutes the app will crash due to OutOfMemoryError ... within those 10 minutes I watch log on debug level and only thing which is happening is opening and closing transactions using ChainedTransactionManager ... also as written in the comments another important condition is the third JMS TransactionManager ... with 2 TMs it works and is stable, with 3 it will crash ...
Additional research and steps taken identified the most likely root cause Spring CachingConnectionFactory class. Once I removed that and used only native types the problem went away and memory consumption profile is very different and healthy.
I have to say I created CachingConnectionFactory using standard constructor and didnt further configure the behavior. However these Spring defaults clearly lead to memory leak as per my experience.
In past I had memory leak with ActiveMq which had to be resolved by using CachingConnectionFactory and now I have memory leak with Azure ServiceBus when using CachingConnectionFactory .. strange :) In both cases I see that as bugs because memory management should be correct regardless caching involved or not.
Marking this as my answer.
Tested case: The problem occurs when receiving and sending message both with its own TM and both JMS connectionFactories are type CachedConnectionFactory. At the end I tested the app. with inbound connection factory of type CachedConnectionFactory and outbound just native type ... no memory leak as well.

IBM WAS 9, MDB deployment fail the entire application

We have an IBM WebSphere AS 9.0.0.7 and when we want to deploy an application containing an MDB - which listens to a remote WebShpere MQ server - while the MQ server is down, then WAS reports an error
Caused by: com.ibm.mq.connector.DetailedResourceAdapterInternalException: MQJCA1011: Failed to allocate a JMS connection., error code: MQJCA1011 An internalerror caused an attempt to allocate a connection to fail. See the linked exception for details of the failure.
and stops the deployment, i.e. application does not start. Which is a big problem as it is a critical hub for other operations. We want to force WAS to start the application and retry the JMS connection later. Is it possible?
You can try setting custom property WAS_EndpointInitialState property to INACTIVE, see here and here, and also may want to look through here.
We've found a solution here: Configuring properties for the IBM MQ resource adapter
Trick was to set startupRetryCount and startupRetryInterval. When the MQ server is not available, the app starts, however it is reported as "Partial start". All other parts of the application seems to be running just fine.

Unable to acquire JDBC Connection in SpringBoot app

I have a Microservices‑Based Application, each Microservice is a SpringBoot 2.0.3.RELEASE app., but after my 4rth Microservices launched I have this error:
Unable to acquire JDBC Connection; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
..
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Too many connections
I would like to know how to reduce the maximumPoolSize or if there is a way to know maximumPoolSize because I have't seen anything related when the app starts
You can set the maximum pool size of the JDBC connections in your application.properties file like:
spring.datasource.hikari.maximum-pool-size=5

Quarkus native with Kafka Streams and Schema Registry

Quarkus (1.5.0.Final) as native executable works fine with Kafka Streams and Avro Schema Registry.
But in case of a Kafka streams consuming a topic with Avro Serdes, if a new event is added, there is an exception:
The kafka-streams-avro-serde library try to reach (via REST API) the Schema Registry with the schema added.
This exception below occured: (This works fine in Qaurkus + JVM)
Caused by: org.apache.kafka.common.errors.SerializationException: Error registering Avro schema: {"type":"record","name":"siteconf","namespace":"test","fields":[{"name":"id","type":["null","string"],"default":null},{"name":"site","type":["null","string"],"default":null},{"name":"configuration","type":["null","string"],"default":null}],"connect.name":"siteconf"}
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Internal Server Error; error code: 500
I don't know how to workaround this problem.
It's very annoying because I think it's the only one problem I've detected in Kafka Streams with Schema Registry.
And I was interrested to adaopt Quarkus instead of Spring Boot/Cloud

Resources