hystrix many threads in waiting state - spring-boot

We have used hystrix - circuit breaker pattern [library] in our one of the module.
usecase is:- we are polling 16 number of messages from kafka and processing them using pararllel stream,so, for each message in workflow it takes 3 rest calls which are protected by hystric command. Now, issue is when I try to run my single instance then CPU shows spikes and thread dump shows many threads in waiting state for all the 3 commands. Like below:-
Omitted thread name but assume all all thread pools it shows same thing:-
Thread Pool-7" #82
Thread State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000004cee2312> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Could you please help me in fine tuning application and thread pool parameters?
what I am missing here?

The default isolation strategy of Hystrix is threadpool and its default size is just 10. It means that only 10 REST calls can be running at the same time in your case.
First, try to increase the below default property to big one.
hystrix.threadpool.default.coreSize=1000 # default is 10
If it works, adjust the value to the proper one.
default can be replaced with the proper HystrixThreadPoolKey for each thread pool.
If you are using Semaphore isolation strategy, try to increase the below one.
hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests=1000
Above one's default is also just 10. default can be replaced with HystrixCommandKey name for each semaphore.
Updated
To choose the isolation strategy, you can use the below property.
hystrix.command.default.execution.isolation.strategy=THREAD or SEMAPHORE
default can be replaced with HystrixCommandKey. It means that you can assign different strategy for each hystrix command.

Related

WTRN0124I: When the timeout occurred the thread

I am getting the below error.Kindly help
[8/8/14 21:14:56:939 GMT-08:00] 00000005 TimeoutManage I WTRN0006W: Transaction 00000147B92EFAE20000000100000012DF462C9E681BA3670A44A25FE1B0F6182303FB5C00000147B92EFAE20000000100000012DF462C9E681BA3670A44A25FE1B0F6182303FB5C00000001 has timed out after 120 seconds.
[8/8/14 21:14:56:967 GMT-08:00] 00000006 TimeoutManage I WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[WMQJCAResourceAdapter : 4,5,main]. The stack trace of this thread when the timeout occurred was:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
java.net.SocketOutputStream.write(SocketOutputStream.java:147)
com.ibm.mq.jmqi.remote.internal.RemoteTCPConnection.send(RemoteTCPConnection.java:1212)
com.ibm.mq.jmqi.remote.internal.system.RemoteConnection.sendTSH(RemoteConnection.java:2289)
com.ibm.mq.jmqi.remote.internal.RemoteHconn.sendTSH(RemoteHconn.java:954)
com.ibm.mq.jmqi.remote.internal.RemoteFAP.jmqiPut(RemoteFAP.java:5443)
com.ibm.mq.jmqi.remote.internal.RemoteFAP.MQPUT(RemoteFAP.java:5205)
com.ibm.msg.client.wmq.v6.base.internal.MQSESSION.MQPUT(MQSESSION.java:1252)
com.ibm.msg.client.wmq.v6.base.internal.MQQueue.putMsg2(MQQueue.java:2090)
com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.sendInternal(MQMessageProducer.java:1262)
com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:768)
com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:2713)
com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.sendMessage(JmsMessageProducerImpl.java:872)
com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send_(JmsMessageProducerImpl.java:727)
com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send(JmsMessageProducerImpl.java:398)
com.ibm.mq.jms.MQMessageProducer.send(MQMessageProducer.java:281)
com.ibm.ejs.jms.JMSQueueSenderHandle.send(JMSQueueSenderHandle.java:204)
com.scb.sts.stsappserver.sender.MessageSender.sendRecords(Unknown Source)
com.scb.sts.services.PCSPPaymentSplitter.doExecute(Unknown Source)
com.scb.sts.stsappserver.eventhandler.SplitterEventHandler.handleEvent(Unknown Source)
com.scb.sts.services.PCSPPaymentReceiver.doProcess(Unknown Source)
com.scb.sts.services.PCSPPaymentReceiver.doExecute(Unknown Source)
com.scb.sts.controllers.OCWSServlet.doPost(Unknown Source)
com.scb.sts.qlcomm.QLCommBean.processXMLFile(Unknown Source)
com.scb.sts.qlcomm.QLCommBean.isDoOutput(Unknown Source)
com.scb.sts.qlcomm.QLCommBean.onMessage(Unknown Source)
com.ibm.ejs.container.MessageEndpointHandler.invokeMdbMethod(MessageEndpointHandler.java:1093)
com.ibm.ejs.container.MessageEndpointHandler.invoke(MessageEndpointHandler.java:778)
$Proxy32.onMessage(Unknown Source)
com.ibm.mq.connector.inbound.MessageEndpointWrapper.onMessage(MessageEndpointWrapper.java:131)
com.ibm.mq.jms.MQSession$FacadeMessageListener.onMessage(MQSession.java:147)
com.ibm.msg.client.jms.internal.JmsSessionImpl.run(JmsSessionImpl.java:2557)
com.ibm.mq.jms.MQSession.run(MQSession.java:860)
com.ibm.mq.connector.inbound.WorkImpl.run(WorkImpl.java:172)
com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:399)
com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1604)
The transaction timeout simply means that the transaction not was committed before the timeout expired, in this case 120s elapsed without a commit.
The stack shows that you're in the onMessage() function of an MDB named QLCommBean. And that this MDB was sending some messages via MessageSender.sendRecords(), which in turn was called the MQ JMS API:
JMSQueueSenderHandle.send()
The top of the stack is:
java.net.SocketOutputStream.socketWrite0(Native Method)
This means that the active code within the MDB at the time of the transaction timeout, was a socket write (sending data over the network). In this case MQ was sending a message to the queue manager.
The transaction timeout itself is not a bug. You need to review the MDB logic and determine if 120s is an appropriate amount of time to be in the MDB. If it isn't, I suggest you add logging to your MDB to find out what it was doing for 120s. It may be that the MQ code has used up a lot of this time, but it may not be. The stack shown is just where the code happened to be 120s after onMessage() was invoked.
As MQ in the process of sending data over the network to the queue manager, you may want to look at your network to see if its performing adequately, or possibly your queue manager. It might be heavily loaded.
If this occurs regularly, one good option is to take a number of javacores over the course of the 120s. You can then see what the stack was at various points.
Otherwise I suggest:
1) Instrument your MDB, make sure you know which code was executed, and at what time. Only this will rule out your MDB logic.
2) Consider your network
3) Possibly trace your queue manager & the MQ JMS code - you may need IBM's help to determine if the time taken by the IBM code is appropriate
4) If 120s is an acceptable length of time for onMessage(), consider increasing the transaction timeout value to a value greater than the maximum time you consider to be acceptable for onMessage().

What is the meaning of thread count in process (wmic)

When I'm running wmic query via command line, I'm detected a line with ThreadCount value.
I don't know about the meaning of ThreadCount.
I'm running this wmic query:
wmic process where (Caption like '%explorer%') get * /format:list
Output of above query:
Caption=explorer.exe
CommandLine=C:\Windows\Explorer.EXE
CreationClassName=Win32_Process
CreationDate=20140725092933.908032+330
CSCreationClassName=Win32_ComputerSystem
CSName=DIGITALFOX
Description=explorer.exe
ExecutablePath=C:\Windows\Explorer.EXE
ExecutionState=
Handle=1820
HandleCount=856
InstallDate=
KernelModeTime=50388323
MaximumWorkingSetSize=1380
MinimumWorkingSetSize=200
Name=explorer.exe
OSCreationClassName=Win32_OperatingSystem
OSName=Microsoft Windows 7 Ultimate |C:\Windows|\Device\Harddisk0\Partition2
OtherOperationCount=90378
OtherTransferCount=2089300
PageFaults=63847
PageFileUsage=32724
ParentProcessId=1776
PeakPageFileUsage=70672
PeakVirtualSize=284794880
PeakWorkingSetSize=42564
Priority=8
PrivatePageCount=33509376
ProcessId=1820
QuotaNonPagedPoolUsage=48
QuotaPagedPoolUsage=388
QuotaPeakNonPagedPoolUsage=53
QuotaPeakPagedPoolUsage=490
ReadOperationCount=1543
ReadTransferCount=4529679
SessionId=1
Status=
TerminationDate=
ThreadCount=30
UserModeTime=34008218
VirtualSize=235257856
WindowsVersion=6.1.7600
WorkingSetSize=33030144
WriteOperationCount=6
WriteTransferCount=696
What is the meaning of ThreadCount in above data?
About Processes and Threads
Each process provides the resources needed to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, often called the primary thread, but can create additional threads from any of its threads.
A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. The thread context includes the thread's set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread's process. Threads can also have their own security context, which can be used for impersonating clients.
ms-help://MS.MSSDK.1033/MS.WinSDK.1033/dllproc/base/about_processes_and_threads.htm
In this the Threadcount is the No of threads that process is currently using.
In your situation the process explorer is using 30 threads.
Thread count is used for avoiding orphan threads so before closing the process thread count should be zero.

Weblogic Stuck Thread on JDBC call

We frequently get a series of Stuck threads on our Weblogic servers. I've analyzed this over a period of time.
What I'd like to understand is whether this stuck thread block indicates it is still reading data from the open socket to the database since the queries are simple SELECT stuff?
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Packet.java:239)
at oracle.net.ns.DataPacket.receive(DataPacket.java:92)
We've run netstat and other commands, the sockets from the Weblogic app server to the Database match the number of connections in the pool.
Any ideas what else we should be investigating here?
Stack trace of thread dump:
"[STUCK] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x61a5b000 nid=0x25f runnable [0x6147b000..0x6147eeb0]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Packet.java:239)
at oracle.net.ns.DataPacket.receive(DataPacket.java:92)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:172)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:117)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:92)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:77)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1023)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:999)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:584)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:183)
at oracle.jdbc.driver.T4CStatement.fetch(T4CStatement.java:1000)
at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314)
- locked <0x774546e0> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228)
- locked <0x774546e0> (a oracle.jdbc.driver.T4CConnection)
at weblogic.jdbc.wrapper.ResultSet_oracle_jdbc_driver_OracleResultSetImpl.next(Unknown Source)
The bit starting from weblogic.work.ExecuteThread.run to here has been omitted. We have 8 sets of thread dumps - and each show the thread waiting on the same line, and the same object locked
at oracle.jdbc.driver.OracleResultSetImpl.close_or_fetch_from_next(OracleResultSetImpl.java:314)
- locked <0x774546e0> (a oracle.jdbc.driver.T4CConnection)
At the time the stack was printed, it seems blocked waiting for more data from the server
at oracle.jdbc.driver.OracleResultSetImpl.next(OracleResultSetImpl.java:228)
Maybe it is just the query which is taking more than StuckThreadMaxTimeand WL issues a Warning.
If possible I would try:
Find which query or queries are getting the threads stuck and check execution time
Use Wireshark to analyze communication with database
Have a look at the driver source code (JD comes to mind) to understand stack trace
if you use weblogic debug flag -Dweblogic.debug.DebugJDBCSQL you will be able to trace the SQL which is actually being executed

Check if a Win32 thread is running or in a suspended state

How do I check to see if a Win32 thread is running or in suspended state?
I can't find any Win32 API which gives the state of a thread. So how do I get the thread state?
I think - originally - this information was not provided because any API that provided this info would be misleading and useless.
Consider two possible cases - the current thread has suspended the thread-of-interest. Code in the current thread knows about the suspended state and should be able to share it so theres no need for the kernel team to add an API.
The 2nd case, some other / a 3rd thread in the system has suspended the thread of interest (and theres no way to track which thread that was). Now you have a race condition - that other thread could, at any time - unsuspend the thread of interest and the information gleaned from the API is useless - you have a value indicating the thread is suspended when it is in fact, not.
Moral of the story - if you want to know that a thread is suspended - suspend it: The return value from SuspendThread is the previous suspend count of the thread. And now you DO know something useful - The thread WAS AND STILL IS suspended - which is useful. Or that it WASN't (but now is) suspended. Either way, the thread's state is now deterministically known so you can in theory make some intelligent choices based on that - whether to ResumeThread, or keep it suspended.
You can get this information by calling NtQuerySystemInformation() with the value for SystemProcessesAndThreadsInformation (integer value 5).
If you want an example of what you can do with this information take a look at Thread Status Monitor.
WMI's Win32_Thread class has a ThreadState property, where 5: "Suspended Blocked" and 6:Suspended Ready.
You will need the Thread's Id to get the right instance directly (the WMI object's Handle property is the thread id).
EDIT: Given this PowerShell query:
gwmi win32_thread | group ThreadState
gives
Count Name Group
----- ---- -----
6 2 {, , , ...}
966 5 {, , , ...}
WMI has a different definition of "Suspended" to Win32.
In Windows 7, you can use QueryUmsThreadInformation. (UMS stands for User mode scheduling).
See here for UmsThreadIsSuspended.
you could get thread suspend count with code like this:
DWORD GetThreadSuspendCount(HANDLE hThread) {
DWORD dwSuspendCount = SuspendThread(hThread);
ResumeThread(hThread);
return dwSuspendCount;
}
but, as already said - it is not accurate.
Moreover, suspending a thread is evil.
YES: it IS possible to get the thread state and determine if it is suspended.
And NO: You don't need Windows 7 to do that.
I published my working class here on Stackoverflow: How to get thread state (e.g. suspended), memory + CPU usage, start time, priority, etc
This class requires Windows 2000 or higher.
I think the state here is referred to as
If thread is in thread proc , doing some processing Or
Waiting for event
This can be taken care of by using variable which can tell that if a thread is actually running or waiting for event to happen.
These scenarios appear when considering threadpools, having some n threads and based on each thread running status , tasks can be assigned to idle threads.

com.ibm.websphere.jtaextensions.NotSupportedException thrown under load

I have an application containing 4 MDB's each of which receives SOAP messages over JMS from MQ. Once the messages have been received we process the XML into an object model and process accordingly which always involves either loading or saving messages to an Oracle database via Hibernate.
Additionally we have a quartz process with fires every minute that may or may not trigger so actions which could also read or write to the database using Hibernate.
When the system in under high load, i.e. processing large numbers 1k + and potentially performing some database read/writes triggered by our quartz process we keep seeing the following exception be thrown in our logs.
===============================================================================
at com.integrasp.iatrade.logic.MessageContextRouting.lookup(MessageContextRouting. java:150)
at com.integrasp.iatrade.logic.RequestResponseManager.findRequestDestination(Reque stResponseManager.java:153) at com.integrasp.iatrade.logic.RequestResponseManager.findRequestDestination(Reque stResponseManager.java:174)
at com.integrasp.iatrade.logic.IOLogic.processResponse(IOLogic.java:411)< br /> at com.integrasp.iatrade.logic.FxOrderQuoteManager.requestQuote(FxOrderQuoteManage r.java:119)
at com.integrasp.iatrade.logic.FxOrderQuoteManager.processRequest(FxOrderQuoteMana ger.java:682)
at com.integrasp.iatrade.logic.FxOrderSubmissionManager.processRequest(FxOrderSubm issionManager.java:408)
at com.integrasp.iatrade.eo.SubmitOrderRequest.process(SubmitOrderRequest.java:60)
at com.integrasp.iatrade.ejb.BusinessLogicRegister.perform(BusinessLogicRegister.j ava:85)
at com.integrasp.iatrade.ejb.mdb.OrderSubmissionBean.onMessage(OrderSubmissionBean .java:147)
at com.ibm.ejs.jms.listener.MDBWrapper$PriviledgedOnMessage.run(MDBWrapper.java:30 2)
at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java:63 )
at com.ibm.ejs.jms.listener.MDBWrapper.callOnMessage(MDBWrapper.java:271)
at com.ibm.ejs.jms.listener.MDBWrapper.onMessage(MDBWrapper.java:240)
at com.ibm.mq.jms.MQSession.run(MQSession.java:1593)
at com.ibm.ejs.jms.JMSSessionHandle.run(JMSSessionHandle.java:970)
at com.ibm.ejs.jms.listener.ServerSession.connectionConsumerOnMessage(ServerSessio n.java:891)
at com.ibm.ejs.jms.listener.ServerSession.onMessage(ServerSession.java:656)
at com.ibm.ejs.jms.listener.ServerSession.dispatch(ServerSession.java:623)
at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja va:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at com.ibm.ejs.jms.listener.ServerSessionDispatcher.dispatch(ServerSessionDispatch er.java:37)
at com.ibm.ejs.container.MDBWrapper.onMessage(MDBWrapper.java:96)
at com.ibm.ejs.container.MDBWrapper.onMessage(MDBWrapper.java:132)
at com.ibm.ejs.jms.listener.ServerSession.run(ServerSession.java:481)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1473)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja va:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at org.hibernate.transaction.WebSphereExtendedJTATransactionLookup$TransactionMana gerAdapter$TransactionAdapter.registerSynchronization(WebSphereExtendedJTATransa ctionLookup.java:225)
... 30 more
Caused by: com.ibm.websphere.jtaextensions.NotSupportedException
at com.ibm.ws.jtaextensions.ExtendedJTATransactionImpl.registerSynchronizationCall backForCurrentTran(ExtendedJTATransactionImpl.java:247)
... 34 more
Could any body help to shed come light on what com.ibm.websphere.jtaextensions.NotSupportedException means. The IBM documentation says
"The exception is thrown by the transaction manager if an attempt is made to register a SynchronizationCallback in an environment or at a time when this function is not available. "
Which to me sounds like the container is rejecting hibernates call to start a transaction. If anybody has any idea on why the container could be throwing the message please let me know.
Thanks in advance
Karl
If you really need high load I would remove the Hibernate layer between your app and the database. Without Hibernate you have less moving parts and more control.
That is the only advice I can give you.
If anyone was interested it was a thread that was trying to sync the transaction when the transaction had timed out.
I had assumed that if the transaction timeout then the thread would have been killed however this was not the case.
karl

Resources