Tracing memory leak in Spring Azure qPID JMS code - spring-boot

Im trying to trace and identify root cause for memory leak in our very small and simple Spring Boot application.
It uses following:
- Spring Boot 2.2.4
- azure-servicebus-jms-spring-boot-starter 2.2.1
- MSSQL
Function:
The app only dispatches Azure ServiceBus queue and stores data and sends data to other destination.
It is a small app so it starts easily with 64 megs of memory, despite I give it up to 256 megs via Xmx option. Important note is the queue is being dispatched using Spring default transacted mode with dedicated JmsTransactionManager who is actually inner TM of ChainedTransactionManager along with dbTM and additional outbound JMS TM. Both JMS ConnectionFactory objects are created as CachingConnectionFactory.
Behavior:
Once the app is started it seems OK. There is no traffic so I can see in the log it is opening transactions and closing when checking the queue (jms:message-driven-channel-adapter).
However after some time when there is still no traffic, no single message was consumed the memory starts climbing as monitored via JVVM.
There is an error thrown:
--2020-04-24 11:17:01.443 - WARN 39892 --- [er.container-10] o.s.j.l.DefaultMessageListenerContainer : Setup of JMS message listener invoker failed for destination 'MY QUEUE NAME HERE' - trying to recover. Cause: Heuristic completion: outcome state is rolled back; nested exception is org.springframework.transaction.TransactionSystemException: Could not commit JMS transaction; nested exception is javax.jms.IllegalStateException: The Session was closed due to an unrecoverable error.
... and after several minutes it reaches MAX of the heap and since that moment it is failing on OutOfMemory error in the thread opening JMS connections.
--2020-04-24 11:20:04.564 - WARN 39892 --- [windows.net:-1]] i.n.u.concurrent.AbstractEventExecutor : A task raised an exception. Task: org.apache.qpid.jms.provider.amqp.AmqpProvider$$Lambda$871/0x000000080199f840#1ed8f2b9
-
java.lang.OutOfMemoryError: Java heap space
at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61)
at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348)
at org.apache.qpid.proton.engine.impl.ByteBufferUtils.newWriteableBuffer(ByteBufferUtils.java:99)
at org.apache.qpid.proton.engine.impl.TransportOutputAdaptor.init_buffers(TransportOutputAdaptor.java:108)
at org.apache.qpid.proton.engine.impl.TransportOutputAdaptor.pending(TransportOutputAdaptor.java:56)
at org.apache.qpid.proton.engine.impl.SaslImpl$SwitchingSaslTransportWrapper.pending(SaslImpl.java:842)
at org.apache.qpid.proton.engine.impl.HandshakeSniffingTransportWrapper.pending(HandshakeSniffingTransportWrapper.java:138)
at org.apache.qpid.proton.engine.impl.TransportImpl.pending(TransportImpl.java:1577)
at org.apache.qpid.proton.engine.impl.TransportImpl.getOutputBuffer(TransportImpl.java:1526)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.pumpToProtonTransport(AmqpProvider.java:994)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.pumpToProtonTransport(AmqpProvider.java:985)
at org.apache.qpid.jms.provider.amqp.AmqpProvider.lambda$close$3(AmqpProvider.java:351)
at org.apache.qpid.jms.provider.amqp.AmqpProvider$$Lambda$871/0x000000080199f840.run(Unknown Source)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:835)
HeapDumps:
I took couple of heap snapshots during this whole process and looked at what gets increased.
I can see suspicious amount of ConcurrentHashMap/String/Byte[] objects.
Has anyone some clue/hint what can be wrong in this setup and libs: Spring Boot, Apache qPid used under the hood of the Azure JMS dependency etc.? Many thanks.
Update #1
I have clear evidence that the problem is either in Spring or azure service bus starter library - not automatically qPid client used. I would say the library has the bug rather than Spring, just my guess. This is how the failing setup looks like:
There are two JMS destinations and one DB, each having its transaction manager
There is ChainedTransactionManager wrapping above three TMs.
Spring integration app which connects to Azure ServiceBus queue via jms:message-driven-channel-adapter and setting the transaction manager on this component (as created in point 2)
Start the app., no traffic on the queue is needed, after 10 minutes the app will crash due to OutOfMemoryError ... within those 10 minutes I watch log on debug level and only thing which is happening is opening and closing transactions using ChainedTransactionManager ... also as written in the comments another important condition is the third JMS TransactionManager ... with 2 TMs it works and is stable, with 3 it will crash ...

Additional research and steps taken identified the most likely root cause Spring CachingConnectionFactory class. Once I removed that and used only native types the problem went away and memory consumption profile is very different and healthy.
I have to say I created CachingConnectionFactory using standard constructor and didnt further configure the behavior. However these Spring defaults clearly lead to memory leak as per my experience.
In past I had memory leak with ActiveMq which had to be resolved by using CachingConnectionFactory and now I have memory leak with Azure ServiceBus when using CachingConnectionFactory .. strange :) In both cases I see that as bugs because memory management should be correct regardless caching involved or not.
Marking this as my answer.
Tested case: The problem occurs when receiving and sending message both with its own TM and both JMS connectionFactories are type CachedConnectionFactory. At the end I tested the app. with inbound connection factory of type CachedConnectionFactory and outbound just native type ... no memory leak as well.

Related

JMS message processing with spring integration in cloud environment

I'm currently trying to refactor the processing of JMS messages to work in a distributed/cloud environment. To allow a better retry and error handling the messages are first stored to the database with a JPA entity and then read by spring integration jpa inbound adapter. This works fine as long as just a single instance of my service is running. However when multiple instances are running, the instances try to process the same message even after introducing a processing state on the persisted messages.
I have already tried to save the JMS messages in a JDBC message store, however then I would have to define a group identifier according to which an instance could select a message which is not really possible since the number of instances is dynamic and I can not assign a group id for each instance. Another possibility could be some kind of distributed lock with a LockRegistry but I couldn't make that work.
Do you have any hint/advice how I could implement the following requirements the best with spring integration:
JMS message should be persisted
Any instance can pick up the message and process it
If the processing fails there will be a retry for x times (could also be retried by another instance)
If an instance crashes or gets killed during the processing the message must not be lost
Is there maybe some spring-cloud component which could be helpful?
I'm happy about every hint in which direction I should go.

Wildfly 10 jms send Message to queue as part of XA transaction

I have recently had to support a colleague in verifying why some system tests are not passing in wildfly, system tests that pass consistently on weblogic and glass fish.
After analysing the log, it became clear the reason is related to a JMS message sent by a backed thread getting committed to a queue too soon, when the expectation was the message would be committed when the entry point Container Managed Transaction of an MDB commits. So the message is going out before the MDB that sends it is done running.
In weblogic, to achieve the expected behaviour, you need to make sure that when you take the connection factory given by the container , which is XA configured, you set the connection.createseesion with
transacted = true and
acknowledgement = session transacted.
In a process similar to the one depicted in this URL
http://www.mastertheboss.com/jboss-server/jboss-jms/sending-jms-messages-over-xa-with-wildfly-jboss-as
Except in the snippet above auto acknowledge is set and the first parameter is set to false.
In wildly when our weblogic and glass fish configuration is used, nothing is committed and the system behaves as if the JMS message sent were to be rolled back.
If configuration as in the example above were to be used, instead what would happen is that the JMS message is immediately and the consumer MDB immediately launches being trigerred before the producer transaction actually ends, causing the system test to fail.
According to the official JMS configuration, by using a connection-pooled factory with the transaction=XA attribute, the container should immediately bind the commit of the transaction to the lifecycle of the parent transaction.
See official documentation bellow in particular in respect to the Java:/JmsXa connection factory.
https://docs.jboss.org/author/display/WFLY10/Messaging+configuration
My colleague was initially using a non pooled connection factory, but the injection info reference has since then been fixed. I have tried all possible combinations of parameters in the shed message, but my outcome is sitll:
Either sent too soon or never sent.
To conclude all the other resources are XA. Namely the oracle db is using the XA driver.
Can anyone confirm if in wildly the send JMS message only when parent transaction commits is working and if so how the session is being configured?
I will check if perhaps my colleague has not made a mistake in terms of the configuration of the connection factory used by the Men's themselves to consume messages out of the queue.but if that one is also XA... Then it is a big problem.
So the issues is fixed.
The commit of the JMS message to the queue at the end of the transaction works perfectly.
The issue was two fold:
(a) First spot of code I was looking at address the issue was not correct. Someone had decided to write his own send telegram to queue API elsewhere, and was not using the central API for sneding telegrams, so any modification I to the injection connection factory was actually not taking effect. The stale connection factories were still being used.
(b) Once the correct API was spotted it was easy to make the mechanism work by using the widlfy XA pooled connection factory mentioned in the post above.
The one thing that was tweaked was the connection.CreationSession api.
The API in JEE 7 has been enlarged and it is now better documented than in jEE 6.
To send a JMS message in a container as part of an XA transaction one should do:
connection.createSession() without any parameters.
This can easily be seen in the connection javadoc:
https://docs.oracle.com/javaee/7/api/javax/jms/Connection.html
QUOTE 1:
This method has been superseded by the method createSession(int
sessionMode) which specifies the same information using a single
argument, and by the method createSession() which is for use in a Java
EE JTA transaction. Applications should consider using those methods
instead of this one.
QUOTE 2:
In a Java EE web or EJB container, when there is an active JTA
transaction in progress:
Both arguments transacted and acknowledgeMode are ignored. The session will participate in the JTA transaction and will be committed
or rolled back when that transaction is committed or rolled back, not
by calling the session's commit or rollback methods. Since both
arguments are ignored, developers are recommended to use
createSession(), which has no arguments, instead of this method.
Which means, the code snippet in:
http://www.mastertheboss.com/jboss-server/jboss-jms/sending-jms-messages-over-xa-with-wildfly-jboss-as
Is not appropriate. What one should be doing is creating the session without any parameter and leting the container handle the rest.
Which it does just fine.

How to programatically defer JMS topic message consumption using Spring

I have an application that consumes messages from a JMS topic. As part of the normal application flow it needs to periodically cease consumption of messages. While the application is in this state new messages are stored in the topic (note that my application is still running). Later the application resumes message consumption, also receiving those messages that were placed on the topic while the application wasn't listening.
This functionality is currently achieved by creating and disposing of connections from a ConnectionFactory. However, I now wish to migrate the application to Spring JMS. Although Spring rather neatly abstracts away much of the JMS boiler-plate - I no longer appear to have fine grained control over the underlying connection and hence cannot halt message consumption on demand.
Before I try to wade through Spring JMS internals, can anyone suggest a neat way of doing this?
Can you just avoid returning from onMessage()? How long do you want to stop consumption? Is your problem similar to https://stackoverflow.com/a/628337/20734

about Session.rollBack() in JMS

All,
i am new to JMS and i have a question about Session.rollBack() method in JMS. AFAIK, this method is used to roll back all operations to JMS server (sending/receiving) by the session when using *SESSION_TRANSACTED* acknowledge mode. Now suppose I am calling this method in a catch block of a receiving/processing operation (is reasonable?), to tell JMS server to redeliver the message for processing, But even if it is redelivered the processing still throws the same exception which cause the JMS server to redeliver the message again, so it seems a infinite process. How to handle this problem? or are there any other JMS features that is designed for it? Thanks in advance!
The rollback method in JMS will rollback any message sends and receives in that "transaction". Transaction here is local to the JMS session.
Whether a redelivery will cause a problem really depends on why the exception occurred. If it was due to some transitory issue then a redelivery may work. If you have the kind of problem that is once it occurs will always occur (an example of this would be a JMS TextMessage whose body should contain XML, but doesn't).
The JMS API doesn't provide any solution to this itself. This is typically taken care of by the JMS provider and how it behaves will depend on which one you use. WebSphere MQ for instance will redeliver up to a configurable maximum at which point it will move it off to a queue for bad messages. The Service Integration Bus in WebSphere Application Server has similar behaviour. I suggest you consult your JMS provider documentation to determine exactly how it behaves in this situation.
If you are running in an application server rollback typically doesn't do anything because the application server will be managing transactions for you.

Spring JMS: Creating multiple connection to a queue

To process a large number of messages coming to a queue i need guarantee of at least one jms connection to be there at any time. I am using spring and spring allows to have multiple sessions on a single connection only. In case one and only connection fails, application will come to standstill till spring reconnects to the JMS bridge.
So how can i create more than one connection to a queue in Spring, also how can i do connection pooling here.
The answer to this depends on whether you are using Spring inside a J2EE container(jboss etc.) or in a standalone application.
Standalone - you'll find pooling connections to be a problem. Springs SingleConnectionFactory can be setup to renew the connection on an exception garaunteeing that at some point a connection will come online and start processing the queue again, but you'll still have the problem of waiting for that single connection to renew, plus depending on what messaging implementation your dealing with and how it does load balancing you may find yourself stuck with a connection to a single node in a cluster.
If you are running in a container you can rely on the containers connection factory which will be much more robust. JBoss Messaging in the container for instance will failover seamlessly to other nodes and handles pooling under the covers, but if your working in the container its usually easier to bail on JMS template which kind of sucks and use whatever that container provides.

Resources