Jco Adapter pooling performance deadlock? - spring-boot

We're running an enterprise scale SAP application with front-end springboot clients connecting via Jco adapter 3.0 on Oracle VM using the connection pool (size 100). We're experiencing unsystematic long-running requests > 10s that are not visible in the SAP application server log, i.e. the bottleneck does not appear to be on SAP side.
Looking at the trace files (level 4) for an example request we can see that the time seems lost when the adapter thread tries to get the client from the pool (other threads continue execution, removed the irrelevant threads for clarity):
[20:05:50:259]: [JCoAPI] JCoContext.isStateful(P-foo-CPIC0) in session ID Client-53-1 returns false
[20:05:50:259]: [JCoAPI] JCoContext.begin(P-foo-CPIC0) in session ID Client-53-1
[20:05:50:259]: [JCoAPI] Started context for session Client-53-1
[20:05:50:259]: [JCoAPI] JCoContext.begin() for destination PFOO_200 (P-foo-CPIC0) on context with id Client-53-1; current state counter is 1
[20:05:50:259]: [JCoAPI] destination PFOO_200 destinationID=P-foo-CPIC0 executes Z_foo sessionID=Client-53-1, threadID=0x35
[20:05:50:259]: [JCoAPI] Context.getConnection on destination PFOO_200 (state: destination = STATEFUL, default = STATELESS)
[20:05:50:259]: [JCoAPI] PoolingFactory.getClient() on pool P-foo-CPIC0
--> time lost here
[20:06:20:840]: [JCoAPI] PoolingFactory.getClient() returns handle [3/84977415]
[20:06:20:840]: [JCoAPI] Context.getConnection on destination PFOO_200 nothing found in the context - got client from ConnectionManager [3/84977415]
[20:06:20:840]: [JCoAPI] JCoClient before execute(Z_foo) on handle [3/84977415]
[20:06:20:840]: [JCoRFC] Executing function Z_foo on handle [3/84977415]
[20:06:20:866]: [JCoAPI] JCoClient after execute(Z_foo) on handle [3/84977415] returns after 26 ms
[20:06:20:866]: [JCoAPI] Context.releaseConnection on destination PFOO_200 [3/84977415]
[20:06:20:867]: [JCoAPI] JCoContext.end(P-foo-CPIC0) in session ID Client-53-1
[20:06:20:867]: [JCoAPI] PoolingFactory.releaseClient() handle [3/84977415] into pool P-foo-CPIC0 [pool size: 3, peak limit: 100, waiting threads: 0, currently used: 1]
[20:06:20:879]: [JCoAPI] Finished context for session Client-53-1
[20:06:20:879]: [JCoAPI] JCoContext.end() for destination PFOO_200 (P-foo-CPIC0) on context with id Client-53-1; current state counter is 0
For a typical request the step is handled in milliseconds.
Are there any known limitations or configurations regarding pool handling for the Jco adapter, either on adapter or on SAP side?
Update we've on Jco adapter 3.0.16 and will double-check 3.0.17 now. DNS seems unlikely since we're monitoring dig/nslookup and they're running without delays.

Which JCo patch level do you use?
Did you try to update to the latest JCo patch level 3.0.17 first?
In your time gap the RFC connection will be opened and the RFC logon will be done, if the pool is empty at that time. Did you have a closer look with a higher trace level, or did you have a look into the RFC trace?
This can be anything from not having a free dialog work process at ABAP side, to SAP system database issues (required for the RFC logon authentication checks), slow response times from the SAP message server (if using load balanced logons), SNC handshake issues (if using SNC) or general network issues with the DNS (try using the IP address instead of a hostname).

Another point worth checking: you say your connection pool has size 100. Is it possible, that your program has more than 100 threads? Then it may happen from time to time, that all connections are currently busy in other threads and the current thread has to wait until a function call in another thread completes and a connection is returned to the pool.
(How long a thread waits on an empty pool can be customized via the "pool wait time" parameter.)

Related

JMeter : Check JDBC connection open or close after each thread execution

I am trying to check the JDBC connection status after each thread execution whether it is close or open?.
In my thread group there are three things
JDBC connection configuration
JDBC request (select * from employee)
JSR223 PostProcessor
Script :
def connection = org.apache.jmeter.protocol.jdbc.config.DataSourceElement.getConnection('ConnectionString')
log.info('*************Connection closed: '+ connection.isClosed())
Above script is logging the connection status after each thread execution when loop count is 1. Problem here as soon as I change the loop count to >= 2. it started throwing the error below error
Problem in JSR223 script, JSR223 PostProcessor
javax.script.ScriptException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object
And when I remove the Post processor and increase the loop count it is working fine.
Logs :
2023-02-19 16:15:33,599 INFO o.a.j.t.JMeterThread: Thread started: DB Thread Group 2-1
2023-02-19 16:15:38,054 DEBUG o.a.j.p.j.AbstractJDBCTestElement: executing jdbc:SELECT * FROM EMPLOYEE
2023-02-19 16:15:38,623 INFO o.a.j.e.J.JSR223 PostProcessor: *************Connection closed: false
2023-02-19 16:15:58,637 ERROR o.a.j.e.JSR223PostProcessor: Problem in JSR223 script, JSR223 PostProcessor
javax.script.ScriptException: java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object, borrowMaxWaitDuration=PT10S
at org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:320) ~[groovy-jsr223-3.0.11.jar:3.0.11]
at org.codehaus.groovy.jsr223.GroovyCompiledScript.eval(GroovyCompiledScript.java:71) ~
In the JDBC configuration are you using a connection pool to request connections?
What your test shows is that the JSR223 script is closing the connection, which is probably a good thing from a coding perspective, but the next iteration of the loop tries to execute a request with a closed Connection and blammo. If you switch from raw connections to a connection pool when the JSR 223 closes the connection it'll be returned to the pool and remain open for the next iteration of the loop. You'll have to switch to using DataSource API typically for this, but it's a minor tweak to the script.
I can think of 2 possible reasons:
Either your database is down/overloaded/not reachable via JDBC
Or your connection pool settings need to be tweaked, i.e. max number of connections and/or wait time need to be increased:
In general I don't think your approach is correct, as per JavaDoc:
This method generally cannot be called to determine whether a connection to a database is valid or invalid. A typical client can determine that a connection is invalid by catching any exceptions that might be thrown when an operation is attempted.
So you might want to increase debug logging verbosity for JMeter, your JDBC driver and Java SQL namespace instead

ConnectionPool: pool is empty - increase either maxPoolSize or borrowConnectionTimeout

I was facing this issue for my springboot application that connects to a DB and MQ, and uses Atomikos Transaction manager.
com.atomikos.jms.AtomikosJMSException|Connection pool exhausted - try increasing 'maxPoolSize' and/or 'borrowConnectionTimeout' on the AtomikosConnectionFactoryBean.
com.atomikos.datasource.pool.PoolExhaustedException: ConnectionPool: pool is empty - increase either maxPoolSize or borrowConnectionTimeout
at com.atomikos.datasource.pool.ConnectionPool.waitForAtLeastOneAvailableConnection(ConnectionPool.java:326)
at com.atomikos.datasource.pool.ConnectionPool.findOrWaitForAnAvailableConnection(ConnectionPool.java:144)
at com.atomikos.datasource.pool.ConnectionPool.borrowConnection(ConnectionPool.java:132)
at com.atomikos.datasource.pool.ConnectionPoolWithSynchronizedValidation.borrowConnection(ConnectionPoolWithSynchronizedValidation.java:23)
at com.atomikos.jms.AtomikosConnectionFactoryBean.createConnection(AtomikosConnectionFactoryBean.java:601)
at org.springframework.jms.support.JmsAccessor.createConnection(JmsAccessor.java:196)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.access$100(AbstractPollingMessageListenerContainer.java:77)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer$MessageListenerContainerResourceFactory.createConnection(AbstractPollingMessageListenerContainer.java:490)
at org.springframework.jms.connection.ConnectionFactoryUtils.doGetTransactionalSession(ConnectionFactoryUtils.java:325)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:281)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:245)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076)
at java.lang.Thread.run(Thread.java:748)
I tried printing the maxPoolSize and found that it is 1. This page came across in between (https://www.atomikos.com/Documentation/ConfiguringJms) and I found the line where they increased the MaxPoolSize to 5. I just tried setting it to 2 and it worked.
AtomikosConnectionFactoryBean xaConnectionFactory = new AtomikosConnectionFactoryBean();
xaConnectionFactory.setXaConnectionFactory(ibmMQXAConnectionFactory);
xaConnectionFactory.setMaxPoolSize(2);
Can someone help me to understand what should be the ideal poolsize. what it is for etc?
In order to process messages Atomikos uses DB and JMS connections (in your case).
These connections are taken from the pools of available connections. To get the idea why connection pools are needed, please follow this link as a starting point - Connection_pool
To put it simple - in order to process one message at a time Atomikos needs one DB and one JMS connection/session. So if you plan to process 10 messages in parallel, each connection pool size must be at least 10 (10 for DB and 10 for JMS connection pools respectively).

WebSphere MQ Connection Tuning

I have an application which uses MDB, activation specifications and Queue Connection Factories to get/put Messages from WMQ. The application expects a max load of 80 tps. Both Websphere Application Server and WMQ are clustered and each application server connects to seperate host and channel. The application onMessage method is implemented in way so that both session and connection gets closed after message is consumed and response is sent.
As per our configuration, we have WAS version is 8.5, IBM MQ queue manager version 7, max server sessions for act spec set to 40 for each node. max connection count in Connection Factory to 40 for each node and max session in session pool of connection factory to 10.
Now on peak load we expect to make max 80 MQ Channel Instance but as per the investigation we can see it goes above 200 which is causing an issue as max instance limit is reached.
Is this happening because max session in session pool of connection factory is set to 10?
Is it possible that even though we are closing the session and connection in onMessage, still one connection can have more than one session. If that is the case, is it wise to set this property to 1?
Also can there be some property set at WMQ which could cause this increase in MQ Channel Instances.
You don't mention specific versions of WAS or MQ, and there could be known problems at a specific version that would change the behavior, but in general it should work as described below.
IBM has a nice Technote "TCP/IP Connection usage between WebSphere Application Server V7 and V8, and WebSphere MQ V7 (and later) explained" which goes into detail on this subject.
You do not mention what you have the SVRCONN channel's SHARECNV set to, as per below this will impact the number of channel instances observed, I'll assume the default of 10 for the calculations.
Note that block quotes below are from the Technote
we have set max server sessions for act spec to 40 for each node
The link above states:
Maximum number of conversations = Maximum server sessions + 1
Maximum number of conversations = 40 + 1 = 41
The link also states:
Maximum number of TCP/IP channel instances = Maximum number of conversations / SHARECNV for the channel being used
Maximum number of TCP/IP channel instances = 41 / 10 = 5 (rounded up to nearest connection)
max connection count in Connection Factory to 40 for each node
max session in session pool of connection factory to 10.
Maximum number of conversations = Connection Pool Maximum Connections + (Connection Pool Maximum Connections * Session Pool Maximum Connections)
Maximum number of conversations = 40 + (40 * 10) = 440
Maximum number of TCP/IP channel instances = Maximum number of conversations / SHARECNV for the channel being used
Maximum number of TCP/IP channel instances = 440 / 10 = 44
If your MQ SVRCONN channel's SHARECNV was set to 10, then you should have no more than 49 channel instances for each channel based on each node connecting to a separate channel.
If you are hitting 200 channel instances I would suspect your SHARECNV is less than 10. If it was 1 the the maximum number of channel instances WAS would try to create would go up to 481 which would be limited by the MAXINST of the the channel to 200.
After an application has finished with a JMS Connection and closed it off, it is moved from the Active Pool to the Free Pool, where it is available for reuse. The Connection Pool property Unused timeout defines how long a JMS Connection will stay in the Free Pool before it is disconnected. This property has the default value of 1800 seconds, which is 30 minutes.
Every JMS Connection that is created from a WebSphere MQ messaging provider Connection Factory has an associated JMS Session Pool, which work in the same way as Connection Pools. The maximum number of JMS Sessions that can be created from a single JMS Connection is determined by the Connection Factory Session Pool property Maximum connections. The default value of this property is 10.
A conversation is started when a JMS Session is first created, and will remain active until the JMS Session is closed because it has remained in the Free Pool for longer than the value of the Session Pool's Unused timeout property.
When your app closes the session and connection in onMessage, the connection is moved to the free pool for reuse and the session is moved to the free pool for reuse, the MQ Channel instance will not be closed until the respective timeout is hit.
If you want to keep your maximum channel count less than 200 then you could tune your Session Pool Maximum Connections) to 1 which combined with your Activation Specifications and a SHARECNV(1) would max out at 121 channel instances.
You can also increase the SHARECNV value of the channel which will result in dividing the channel instances by that number.
It is possible that your connections or sessions are not getting closed properly and you have a "leak".

Can MAX_UTILIZATION for PROCESSES reached cause "Unable to get managed connection" Exception?

A JBoss 5.2 application server log was filled with thousands of the following exception:
Caused by: javax.resource.ResourceException: Unable to get managed connection for jdbc_TestDB
at org.jboss.resource.connectionmanager.BaseConnectionManager2.getManagedConnection(BaseConnectionManager2.java:441)
at org.jboss.resource.connectionmanager.TxConnectionManager.getManagedConnection(TxConnectionManager.java:424)
at org.jboss.resource.connectionmanager.BaseConnectionManager2.allocateConnection(BaseConnectionManager2.java:496)
at org.jboss.resource.connectionmanager.BaseConnectionManager2$ConnectionManagerProxy.allocateConnection(BaseConnectionManager2.java:941)
at org.jboss.resource.adapter.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:96)
... 9 more
Caused by: javax.resource.ResourceException: No ManagedConnections available within configured blocking timeout ( 30000 [ms] )
at org.jboss.resource.connectionmanager.InternalManagedConnectionPool.getConnection(InternalManagedConnectionPool.java:311)
at org.jboss.resource.connectionmanager.JBossManagedConnectionPool$BasePool.getConnection(JBossManagedConnectionPool.java:689)
at org.jboss.resource.connectionmanager.BaseConnectionManager2.getManagedConnection(BaseConnectionManager2.java:404)
... 13 more
I've stripped off the first part of the exception, which is basically our internal JDBC wrapper code which tries to get a DB connection from the pool.
Looking at the Oracle DB side I ran the query:
select resource_name, current_utilization, max_utilization, limit_value
from v$resource_limit
where resource_name in ('sessions', 'processes');
This produced the output:
RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION LIMIT_VALUE
processes 1387 1500 1500
sessions 1434 1586 2272
Given the fact that that PROCESSES limit of 1500 was reached, would this cause the JBoss exceptions we experienced? I've also been investigating the possibility of connection leaks, but haven't found any evidence of that so far.
What is the recommended course of action here? Is simply increasing the limit a valid solution?
Usually when max_utilization gets the processes value listener will refuse new connections to database. you can see the errors relates to it in alert log. to solve this in database side you should increase the processes parameter.
hmm strange. is it possible, that exception wrapping in JBOSS hides the original error? You should get some sql exception whose text starts with ORA-. Maybe your JDBC wrapper does not handle errors properly.
The recommended actions is to:
check configured size of connection pool against processes sessions Oracle startup paramters.
check Oracles view v$session, especially columns STATUS, LAST_CALL_ET, SQL_ID, PREV_SQL_ID.
translate sql_id(prev_sql_id) into sql_text via v$sql.
if you application has a connection leak, sql_id and pred_sql_id might point you onto a place in your source code, where a connection was used last (i.e. where it was leaked).

Websphere server that may be hung

I am getting the below error.... Kindly help
[8/5/14 21:06:54:277 GMT-08:00] 00000091 DiscoveryTx W DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\nodeagent connection was closed. Member will be removed from view. DCS connection status is Discovery|Ptp, transmitter closed.
[8/5/14 21:07:23:562 GMT-08:00] 00000010 MbuRmmAdapter W DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_PYMTCAPTURE_CL02 connection was closed. Member will be removed from view. DCS connection status is View|Gossip, this member is suspected by the other member.
[8/5/14 21:08:00:079 GMT-08:00] 00000091 DiscoveryTx W DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_DOWNSTREAM_CL02 connection was closed. Member will be removed from view. DCS connection status is Discovery|Ptp, transmitter closed.
[8/5/14 21:08:16:296 GMT-08:00] 00000010 RmmPtpGroup W DCSV1112W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_DOWNSTREAM_CL02 failed to respond to periodic heartbeats. Member will be removed from view. Configured Timeout is 180000 milliseconds. DCS logical channel is View|Ptp.
[8/5/14 21:08:29:236 GMT-08:00] 00000091 DiscoveryTx W DCSV1115W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Member PT_STS_HK_CELL\PT_STS_HK_DMGR_Node\dmgr connection was closed. Member will be removed from view. DCS connection status is Discovery|Ptp, transmitter closed.
[8/5/14 21:10:20:892 GMT-08:00] 00000018 ApplicationMo W DCSV0004W: DCS Stack DefaultCoreGroup at Member PT_STS_HK_CELL\PT_STS_HK_APP_Node02\PT_STS_QLCOMM_CL02: Did not receive adequate CPU time slice. Last known CPU usage time at 21:03:08:272 GMT-08:00. Inactivity duration was 402 seconds.
[8/5/14 21:11:14:131 GMT-08:00] 00000043 ThreadMonitor W WSVR0605W: Thread "WMQJCAResourceAdapter : 5" (00000067) has been active for 657039 milliseconds and may be hung. There is/are 2 thread(s) in total in the server that may be hung.
at com.ibm.ejs.ras.TraceLogger.doLog(TraceLogger.java:332)
at com.ibm.ejs.ras.TraceLogger.processEvent(TraceLogger.java:319)
at com.ibm.ws.logging.WsHandlerWrapper.publish(WsHandlerWrapper.java:43)
at java.util.logging.Logger.log(Logger.java:1121)
at com.ibm.ejs.ras.Tr.logToJSR47Logger(Tr.java:1681)
at com.ibm.ejs.ras.Tr.fireEvent(Tr.java:1643)
at com.ibm.ejs.ras.Tr.fireTraceEvent(Tr.java:1565)
at com.ibm.ejs.ras.Tr.entry(Tr.java:816)
at com.ibm.ws.sib.utils.ras.SibTr.entry(SibTr.java:912)
at com.ibm.ws.wmqcsi.trace.TraceImpl.methodExit(TraceImpl.java:349)
at com.ibm.msg.client.commonservices.trace.Trace.methodExitInternal(Trace.java:715)
at com.ibm.msg.client.commonservices.trace.Trace.exit(Trace.java:628)
at com.ibm.msg.client.wmq.v6.jms.internal.JMSMessage._setJMSXObjectProperty(JMSMessage.java:3928)
at com.ibm.msg.client.wmq.v6.jms.internal.MQJMSMessage.write(MQJMSMessage.java:1223)
at com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.sendInternal(MQMessageProducer.java:1139)
at com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:768)
at com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:2713)
at com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.sendMessage(JmsMessageProducerImpl.java:872)
at com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send_(JmsMessageProducerImpl.java:727)
at com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send(JmsMessageProducerImpl.java:398)
at com.ibm.mq.jms.MQMessageProducer.send(MQMessageProducer.java:281)
at com.ibm.ejs.jms.JMSQueueSenderHandle.send(JMSQueueSenderHandle.java:204)
you are receiving CPU Starvation errors. This could be because you are thrashing the garbage collector, your heap is not big enough or something else is taking up the CPU time. You need to find the process or processes that are taking up the CPU and examine why they are running high.
Regards,
Brian
The log entry starting with
ThreadMonitor W WSVR0605W: Thread "WMQJCAResourceAdapter : 5" (00000067) has been
active for 657039 milliseconds and may be hung.
indicates that this thread has been active for that period of time BUT the thread stack it generates is just the thread at the point in time that the log entry is generated. This means it could have been stuck for 90% of the time in one point in the code and the stack trace generated is just where it is now.
What that particular thread is doing at that point is appending an entry into the trace logs when the application is attempting to send an MQ JMS message. So there is no indication that that thread is hung at that point.
A couple of things to try:
Investigate the CPU usage as the CPU starvation messages indicate that is a problem.
Search the SystemOut.log for corresponding messages saying threads are no longer hung.
Take a javacore to see the threads at 2 minute intervals to see what threads are moving.
Turn off trace unless you need it.
This is the general error that might be encountered during server start phase.
Basic idea behind this is that, when you start the server, threads are getting initialized for your process/job that you want to run on server.
That thread is waiting for few resources which helps them to run the process/job. But at that point of time thread may get hung, because of un-availablity of resources.
One way to fix it - Kill the process from background because of which that thread is hung.
Again start the server.
Do the following steps:
- Ensure that Deployment manager is up and running
- verify that app server and node agent are stopped - no java processes related to node agent and app server running
- go to NODE_PROFILE\bin (not deployment manager profile)
- run syncNode.sh/bat
- run startNode.sh/bat
- if node agents starts successfully you should be able to start server from command line or web console

Resources