Chronicle write introducing 1-2 sec latency at file roll over time - chronicle

We have a single threaded java process that is writing messages into chronicle queue. The queue (SingleChronicleQueue) is configured with RollCycle HOURLY. At the hourly mark, when the file roll happens, chronicle write takes more than a second (typically 1-2 sec), which seem to be happening with bigger file sizes (~50-90 GB). We're using 4.5.x chronicle-queue version. Any ideas on how to address this problem?
I took a thread dump using jstack tool to see where the thread is stuck.
at sun.nio.ch.FileChannelImpl.unmap0(Native Method)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at net.openhft.chronicle.core.OS.unmap(OS.java:345)
at net.openhft.chronicle.core.OS$Unmapper.run(OS.java:434)
at sun.misc.Cleaner.clean(Cleaner.java:143)
at net.openhft.chronicle.bytes.NativeBytesStore.performRelease(NativeBytesStore.java:501)
at net.openhft.chronicle.bytes.NativeBytesStore$$Lambda$68/1804441305.run(Unknown Source)
at net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:81)
at net.openhft.chronicle.bytes.NativeBytesStore.release(NativeBytesStore.java:265)
at net.openhft.chronicle.bytes.MappedFile.performRelease(MappedFile.java:296)
at net.openhft.chronicle.bytes.MappedFile$$Lambda$63/90346768.run(Unknown Source)
at net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:81)
at net.openhft.chronicle.bytes.MappedFile.release(MappedFile.java:277)
at net.openhft.chronicle.bytes.MappedBytes.performRelease(MappedBytes.java:209)
at net.openhft.chronicle.bytes.AbstractBytes$$Lambda$65/18179709.run(Unknown Source)
at net.openhft.chronicle.core.ReferenceCounter.release(ReferenceCounter.java:81)
at net.openhft.chronicle.bytes.AbstractBytes.release(AbstractBytes.java:395)
at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.resetWires(SingleChronicleQueueExcerpts.java:233)
at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.setCycle2(SingleChronicleQueueExcerpts.java:210)
at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.rollCycleTo(SingleChronicleQueueExcerpts.java:579)
at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.writingDocument(SingleChronicleQueueExcerpts.java:273)
at net.openhft.chronicle.wire.MarshallableOut.writingDocument(MarshallableOut.java:55)
at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.writeBytes(SingleChronicleQueueExcerpts.java:117)

You're using ancient version of Chronicle Queue (last 4.5.x version was released on 26/02/2017 - over 3 years ago). Since then, Chronicle Queue has changed drastically, and one of the changes was to do unmapping in a separate service thread not to stall the writers (the change dates back to the end of 2017, btw), as it was identified that as you point out such unmapping can take quite a long time.
So your solution is to upgrade to one of the latest versions (5.19.X).

Related

hystrix many threads in waiting state

We have used hystrix - circuit breaker pattern [library] in our one of the module.
usecase is:- we are polling 16 number of messages from kafka and processing them using pararllel stream,so, for each message in workflow it takes 3 rest calls which are protected by hystric command. Now, issue is when I try to run my single instance then CPU shows spikes and thread dump shows many threads in waiting state for all the 3 commands. Like below:-
Omitted thread name but assume all all thread pools it shows same thing:-
Thread Pool-7" #82
Thread State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000004cee2312> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Could you please help me in fine tuning application and thread pool parameters?
what I am missing here?
The default isolation strategy of Hystrix is threadpool and its default size is just 10. It means that only 10 REST calls can be running at the same time in your case.
First, try to increase the below default property to big one.
hystrix.threadpool.default.coreSize=1000 # default is 10
If it works, adjust the value to the proper one.
default can be replaced with the proper HystrixThreadPoolKey for each thread pool.
If you are using Semaphore isolation strategy, try to increase the below one.
hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests=1000
Above one's default is also just 10. default can be replaced with HystrixCommandKey name for each semaphore.
Updated
To choose the isolation strategy, you can use the below property.
hystrix.command.default.execution.isolation.strategy=THREAD or SEMAPHORE
default can be replaced with HystrixCommandKey. It means that you can assign different strategy for each hystrix command.

Jmeter TCP Sampler

We are running JMeter for connecting TCP Socket thorugh BinaryTCPClientImpl , We are getting the response code : 500
Response message: org.apache.jmeter.protocol.tcp.sampler.ReadException
JMeter Version : 2.9
Help out
If this is the error
ERROR - jmeter.protocol.tcp.sampler.TCPSampler: org.apache.jmeter.protocol.tcp.sampler.ReadException:
at org.apache.jmeter.protocol.tcp.sampler.BinaryTCPClientImpl.read(BinaryTCPClientImpl.java:140)
at org.apache.jmeter.protocol.tcp.sampler.TCPSampler.sample(TCPSampler.java:414)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:429)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:257)
at java.lang.Thread.run(Unknown Source)
then you have 2 options. The first (and much easier if it applies
to you) is to use the LengthPrefixedBinaryTCPClientImpl. If this
applies to you, that is, if your responses are always the same fixed
sizes, you can simply set the tcp.binarylength.prefix.length property
and go about your business.
If that is not the case, then your other option is to extend
org.apache.jmeter.protocol.tcp.sampler.TCPClient. It may help to get in
touch with the client team of this proprietary protocol, because after
all, they have implemented something that works. You'll probably have
to extend it to look something like LengthPrefixedBinaryTCPClientImpl
read N bytes. Although, this runs the risks of reading too many or too
few bytes. If your application server ever miscalculates the size of
it's output, you suffer the consequences by getting another timeout or
leaving extra bytes in the buffer and reading them on the next iteration
(and then cascading errors).

cascading sinkmode.update notworking

I just started cascading programming and have a cascading job which needs to run variable times of iteration. During each iteration, it ready from file (Tap) generated from previous iteration and write calculated data to two separate SinkTaps.
One Tap (Tap Final) is used to collect data from each iterations.
The other Tap (Tap intermediate) using to collect data that need to be calculated in the next iteration.
I am using SinkMode.UPDATE for "Tap final" to make this happen. It works correct at local mode. But failed at cluster mode. Complain about file already existed ("Tap final").
I am running CDH4.4 and cascading 2.5.2. Seems like there is no one has experienced the same problem.
If anyone knows any possible way to fix it, please let me know. Thanks
Caused by: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://dv-db.machines:8020/tmp/xxxx/cluster/97916 already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:126)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:419)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:332)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:105)
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:196)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
It would helpful to understand the issue better if you could add cascading flow code to your question.
It seems the job file with same name is being used between different jobs on cluster mode. One simple solution in case you are fine to not run it concurrently would be be set max concurrent steps to 1.
Flow flow = flowConnector.connect("name", sources, sinks, outPipe1, outPipe2);
flow.setMaxConcurrentSteps(jobProperties, 1);
UPDATE only works with sinks (like databases) that support in-place updating.
If you're using Hfs (a file system sink) then you'll need to use SinkMode.REPLACE.

WTRN0124I: When the timeout occurred the thread

I am getting the below error.Kindly help
[8/8/14 21:14:56:939 GMT-08:00] 00000005 TimeoutManage I WTRN0006W: Transaction 00000147B92EFAE20000000100000012DF462C9E681BA3670A44A25FE1B0F6182303FB5C00000147B92EFAE20000000100000012DF462C9E681BA3670A44A25FE1B0F6182303FB5C00000001 has timed out after 120 seconds.
[8/8/14 21:14:56:967 GMT-08:00] 00000006 TimeoutManage I WTRN0124I: When the timeout occurred the thread with which the transaction is, or was most recently, associated was Thread[WMQJCAResourceAdapter : 4,5,main]. The stack trace of this thread when the timeout occurred was:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
java.net.SocketOutputStream.write(SocketOutputStream.java:147)
com.ibm.mq.jmqi.remote.internal.RemoteTCPConnection.send(RemoteTCPConnection.java:1212)
com.ibm.mq.jmqi.remote.internal.system.RemoteConnection.sendTSH(RemoteConnection.java:2289)
com.ibm.mq.jmqi.remote.internal.RemoteHconn.sendTSH(RemoteHconn.java:954)
com.ibm.mq.jmqi.remote.internal.RemoteFAP.jmqiPut(RemoteFAP.java:5443)
com.ibm.mq.jmqi.remote.internal.RemoteFAP.MQPUT(RemoteFAP.java:5205)
com.ibm.msg.client.wmq.v6.base.internal.MQSESSION.MQPUT(MQSESSION.java:1252)
com.ibm.msg.client.wmq.v6.base.internal.MQQueue.putMsg2(MQQueue.java:2090)
com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.sendInternal(MQMessageProducer.java:1262)
com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:768)
com.ibm.msg.client.wmq.v6.jms.internal.MQMessageProducer.send(MQMessageProducer.java:2713)
com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.sendMessage(JmsMessageProducerImpl.java:872)
com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send_(JmsMessageProducerImpl.java:727)
com.ibm.msg.client.jms.internal.JmsMessageProducerImpl.send(JmsMessageProducerImpl.java:398)
com.ibm.mq.jms.MQMessageProducer.send(MQMessageProducer.java:281)
com.ibm.ejs.jms.JMSQueueSenderHandle.send(JMSQueueSenderHandle.java:204)
com.scb.sts.stsappserver.sender.MessageSender.sendRecords(Unknown Source)
com.scb.sts.services.PCSPPaymentSplitter.doExecute(Unknown Source)
com.scb.sts.stsappserver.eventhandler.SplitterEventHandler.handleEvent(Unknown Source)
com.scb.sts.services.PCSPPaymentReceiver.doProcess(Unknown Source)
com.scb.sts.services.PCSPPaymentReceiver.doExecute(Unknown Source)
com.scb.sts.controllers.OCWSServlet.doPost(Unknown Source)
com.scb.sts.qlcomm.QLCommBean.processXMLFile(Unknown Source)
com.scb.sts.qlcomm.QLCommBean.isDoOutput(Unknown Source)
com.scb.sts.qlcomm.QLCommBean.onMessage(Unknown Source)
com.ibm.ejs.container.MessageEndpointHandler.invokeMdbMethod(MessageEndpointHandler.java:1093)
com.ibm.ejs.container.MessageEndpointHandler.invoke(MessageEndpointHandler.java:778)
$Proxy32.onMessage(Unknown Source)
com.ibm.mq.connector.inbound.MessageEndpointWrapper.onMessage(MessageEndpointWrapper.java:131)
com.ibm.mq.jms.MQSession$FacadeMessageListener.onMessage(MQSession.java:147)
com.ibm.msg.client.jms.internal.JmsSessionImpl.run(JmsSessionImpl.java:2557)
com.ibm.mq.jms.MQSession.run(MQSession.java:860)
com.ibm.mq.connector.inbound.WorkImpl.run(WorkImpl.java:172)
com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:399)
com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1604)
The transaction timeout simply means that the transaction not was committed before the timeout expired, in this case 120s elapsed without a commit.
The stack shows that you're in the onMessage() function of an MDB named QLCommBean. And that this MDB was sending some messages via MessageSender.sendRecords(), which in turn was called the MQ JMS API:
JMSQueueSenderHandle.send()
The top of the stack is:
java.net.SocketOutputStream.socketWrite0(Native Method)
This means that the active code within the MDB at the time of the transaction timeout, was a socket write (sending data over the network). In this case MQ was sending a message to the queue manager.
The transaction timeout itself is not a bug. You need to review the MDB logic and determine if 120s is an appropriate amount of time to be in the MDB. If it isn't, I suggest you add logging to your MDB to find out what it was doing for 120s. It may be that the MQ code has used up a lot of this time, but it may not be. The stack shown is just where the code happened to be 120s after onMessage() was invoked.
As MQ in the process of sending data over the network to the queue manager, you may want to look at your network to see if its performing adequately, or possibly your queue manager. It might be heavily loaded.
If this occurs regularly, one good option is to take a number of javacores over the course of the 120s. You can then see what the stack was at various points.
Otherwise I suggest:
1) Instrument your MDB, make sure you know which code was executed, and at what time. Only this will rule out your MDB logic.
2) Consider your network
3) Possibly trace your queue manager & the MQ JMS code - you may need IBM's help to determine if the time taken by the IBM code is appropriate
4) If 120s is an acceptable length of time for onMessage(), consider increasing the transaction timeout value to a value greater than the maximum time you consider to be acceptable for onMessage().

com.ibm.websphere.jtaextensions.NotSupportedException thrown under load

I have an application containing 4 MDB's each of which receives SOAP messages over JMS from MQ. Once the messages have been received we process the XML into an object model and process accordingly which always involves either loading or saving messages to an Oracle database via Hibernate.
Additionally we have a quartz process with fires every minute that may or may not trigger so actions which could also read or write to the database using Hibernate.
When the system in under high load, i.e. processing large numbers 1k + and potentially performing some database read/writes triggered by our quartz process we keep seeing the following exception be thrown in our logs.
===============================================================================
at com.integrasp.iatrade.logic.MessageContextRouting.lookup(MessageContextRouting. java:150)
at com.integrasp.iatrade.logic.RequestResponseManager.findRequestDestination(Reque stResponseManager.java:153) at com.integrasp.iatrade.logic.RequestResponseManager.findRequestDestination(Reque stResponseManager.java:174)
at com.integrasp.iatrade.logic.IOLogic.processResponse(IOLogic.java:411)< br /> at com.integrasp.iatrade.logic.FxOrderQuoteManager.requestQuote(FxOrderQuoteManage r.java:119)
at com.integrasp.iatrade.logic.FxOrderQuoteManager.processRequest(FxOrderQuoteMana ger.java:682)
at com.integrasp.iatrade.logic.FxOrderSubmissionManager.processRequest(FxOrderSubm issionManager.java:408)
at com.integrasp.iatrade.eo.SubmitOrderRequest.process(SubmitOrderRequest.java:60)
at com.integrasp.iatrade.ejb.BusinessLogicRegister.perform(BusinessLogicRegister.j ava:85)
at com.integrasp.iatrade.ejb.mdb.OrderSubmissionBean.onMessage(OrderSubmissionBean .java:147)
at com.ibm.ejs.jms.listener.MDBWrapper$PriviledgedOnMessage.run(MDBWrapper.java:30 2)
at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java:63 )
at com.ibm.ejs.jms.listener.MDBWrapper.callOnMessage(MDBWrapper.java:271)
at com.ibm.ejs.jms.listener.MDBWrapper.onMessage(MDBWrapper.java:240)
at com.ibm.mq.jms.MQSession.run(MQSession.java:1593)
at com.ibm.ejs.jms.JMSSessionHandle.run(JMSSessionHandle.java:970)
at com.ibm.ejs.jms.listener.ServerSession.connectionConsumerOnMessage(ServerSessio n.java:891)
at com.ibm.ejs.jms.listener.ServerSession.onMessage(ServerSession.java:656)
at com.ibm.ejs.jms.listener.ServerSession.dispatch(ServerSession.java:623)
at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja va:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at com.ibm.ejs.jms.listener.ServerSessionDispatcher.dispatch(ServerSessionDispatch er.java:37)
at com.ibm.ejs.container.MDBWrapper.onMessage(MDBWrapper.java:96)
at com.ibm.ejs.container.MDBWrapper.onMessage(MDBWrapper.java:132)
at com.ibm.ejs.jms.listener.ServerSession.run(ServerSession.java:481)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1473)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.ja va:43)
at java.lang.reflect.Method.invoke(Method.java:615)
at org.hibernate.transaction.WebSphereExtendedJTATransactionLookup$TransactionMana gerAdapter$TransactionAdapter.registerSynchronization(WebSphereExtendedJTATransa ctionLookup.java:225)
... 30 more
Caused by: com.ibm.websphere.jtaextensions.NotSupportedException
at com.ibm.ws.jtaextensions.ExtendedJTATransactionImpl.registerSynchronizationCall backForCurrentTran(ExtendedJTATransactionImpl.java:247)
... 34 more
Could any body help to shed come light on what com.ibm.websphere.jtaextensions.NotSupportedException means. The IBM documentation says
"The exception is thrown by the transaction manager if an attempt is made to register a SynchronizationCallback in an environment or at a time when this function is not available. "
Which to me sounds like the container is rejecting hibernates call to start a transaction. If anybody has any idea on why the container could be throwing the message please let me know.
Thanks in advance
Karl
If you really need high load I would remove the Hibernate layer between your app and the database. Without Hibernate you have less moving parts and more control.
That is the only advice I can give you.
If anyone was interested it was a thread that was trying to sync the transaction when the transaction had timed out.
I had assumed that if the transaction timeout then the thread would have been killed however this was not the case.
karl

Resources