We have spring boot micro service with several libraries as dependencies. e.g Jest (elastic search), Hikari, Spring-Rabbit, FasterXml and many more.
After analyzing thread dump we found that 2 unknown pools are being created. On the normal development machine, these pools contain 8 to 10 threads. But on prod environment, we observed each of the pool has 66 threads. Thread pool name is auto-generated like pool-7, pool-2 etc.
We want to find out which java class/library is creating this thread pool and spawning the threads. Tried with oracle flight recorder, but even there we could no see the origin for these threads.
Can someone pls suggest the way to find out who is creating these threads?
Thanks,
Smita
It's unfortunate that the Threat Start event in Flight Recorder doesn't record the stack trace from the Thread#start method. I will see if it can be added to a future JDK release. You should however be able to see the thread that starts new threads.
If you can't find other tools to help you, the only way I can think of is to instrument the java.lang.Thread#start method yourself. Either using bytecode instrumentation, or just clone OpenJDK, modify the source file for java.lang.Thread and build your own custom JDK. The last step may sound daunting, but it's not that hard if you are on JDK 8 or later.
hg clone http://hg.openjdk.java.net/jdk8/jdk8
cd jdk8
bash get_source.sh
bash configure
make images
When you clone, there is a README file in the root that will point you to further instructions, if you should run into problems.
Related
During Distributed testing with Jmeter 3.3 in non gui mode i'm getting error as, how can I fix this :
I'm using same version of JMeter and JDK on Master as well as Slave machines.
The JVM should have exited but did not.
The following non-daemon threads are still running (DestroyJavaVM is OK):
Thread[main,5,main],
stackTrace:java.net.SocketInputStream#socketRead0
java.net.SocketInputStream#socketRead
java.net.SocketInputStream#read
java.net.SocketInputStream#read
java.io.BufferedInputStream#fill
java.io.BufferedInputStream#read
java.io.DataInputStream#readByte
sun.rmi.transport.StreamRemoteCall#executeCall
sun.rmi.server.UnicastRef#invoke
java.rmi.server.RemoteObjectInvocationHandler#invokeRemoteMethod
java.rmi.server.RemoteObjectInvocationHandler#invoke
com.sun.proxy.$Proxy19#rrunTest
org.apache.jmeter.engine.ClientJMeterEngine#runTest at line:149
org.apache.jmeter.engine.DistributedRunner#start at line:132
org.apache.jmeter.engine.DistributedRunner#start at line:149
org.apache.jmeter.JMeter#runNonGui at line:1005
org.apache.jmeter.JMeter#startNonGui at line:910
org.apache.jmeter.JMeter#start at line:538
sun.reflect.NativeMethodAccessorImpl#invoke0
sun.reflect.NativeMethodAccessorImpl#invoke
sun.reflect.DelegatingMethodAccessorImpl#invoke
java.lang.reflect.Method#invoke
org.apache.jmeter.NewDriver#main at line:248
I strongly recommend using this jmeter property:
jmeterengine.force.system.exit=true
documented here. These Chinese language web pages link link tipped me off.
You can add -Jjmeterengine.force.system.exit=true on the command line when launching JMeter, or add jmeterengine.force.system.exit=true to JMETER_HOME/bin/jmeter.properties.
How I Confirmed This Fix
With JMeter 5.1 and java version "1.8.0_231" on MS-Win10, we're using a customized version of this JMeter InfluxDB backend Listener.
After my 60 second test run from the command line (jmeter.bat -n -t plan.jtl), the command line hung after displaying this output (very similar to op):
Tidying up ... # Wed Jan 29 14:41:04 CST 2020 (1580330464874)
... end of run
The JVM should have exited but did not.
The following non-daemon threads are still running (DestroyJavaVM is OK):
Thread[DestroyJavaVM,5,main], stackTrace:
Thread[pool-2-thread-3,5,main], stackTrace:sun.misc.Unsafe#park
java.util.concurrent.locks.LockSupport#parkNanos
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#awaitNanos
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue#take
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue#take
java.util.concurrent.ThreadPoolExecutor#getTask
java.util.concurrent.ThreadPoolExecutor#runWorker
java.util.concurrent.ThreadPoolExecutor$Worker#run
java.lang.Thread#run
Thread[pool-2-thread-4,5,main], stackTrace:sun.misc.Unsafe#park
java.util.concurrent.locks.LockSupport#parkNanos
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#awaitNanos
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue#take
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue#take
java.util.concurrent.ThreadPoolExecutor#getTask
java.util.concurrent.ThreadPoolExecutor#runWorker
java.util.concurrent.ThreadPoolExecutor$Worker#run
java.lang.Thread#run
Thread[pool-2-thread-1,5,main], stackTrace:sun.misc.Unsafe#park
java.util.concurrent.locks.LockSupport#parkNanos
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#awaitNanos
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue#take
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue#take
java.util.concurrent.ThreadPoolExecutor#getTask
java.util.concurrent.ThreadPoolExecutor#runWorker
java.util.concurrent.ThreadPoolExecutor$Worker#run
java.lang.Thread#run
After modifying my command line as follows, jmeter.bat cleanly exited instead of hanging and all the ugly stack trace went away too:
jmeter.bat -n -Jjmeterengine.force.system.exit=true -t plan.jtl
To confirm that the problem was caused by our customized JMeter InfluxDB backend Listener, I removed it from the .jmx and I also removed the jmeterengine.force.system.exit=true. No hang, no ugly stacktrace (I actually love stacktraces).
I have not taken the next step to discover whether the problem is with the official JMeter InfluxDB backend Listener or with our customized variant, which is not (and will never be) available publicly.
Should mention one gap in this story. I feel this test conclusively points to our customized backend listener (or jmeter's). However, its odd that none of the threads in the above thread dump seem to belong to the backend listener. So I applaud that JMeter did the right thing by dumping the stack trace -- few other apps go to the extent of auto-dumping when appropriate for troubleshooting. But in this case, perhaps that jmeter auto-dump code needs to be enhanced, because it did not point to the culprit backend listener code. Anyone over there at Apache listening in on this?
Good luck.
Most probably your JMeter engine(s) is(are) overloaded therefore cannot gracefully shut down running threads when you request them to do so.
Make sure you follow JMeter Best Practices
The very first "best practice" states Always use latest version of JMeter so consider migrating to JMeter 5.0 or whatever latest version is available at JMeter Downloads page
Make sure your JMeter instances have enough headroom to operate in terms of CPU, RAM and so on. You can use JMeter PerfMon Plugin for this if you don't have other monitoring software in place/in mind.
Take a thread dump and examine it - this way you will know where exactly your test is stuck
Introduce reasonable timeout values in HTTP Request Defaults so in case when server fails to respond JMeter wouldn't wait infinitely but rather fail with an error
And finally (however I wouldn't recommend this) you can suppress this check by adding the next line to user.properties file:
jmeter.exit.check.pause=-1
if you go for this keep in mind that you may run into a situation when JMeter slaves will still be trying to execute something even after your test ends so you will need to kill and restart the processes manually or using a script.
Following parameters should be set for OBIEE Presentation Server only during load testing.
OBIPS\instanceconfig.xml
save and exit file Restart OBIEE processes using OBIEE EM console.
<ServerInstance>
[...]
<Cursors>
<NewCursorWaitSeconds>36000</NewCursorWaitSeconds>
<OldCursorWaitSeconds>36000</OldCursorWaitSeconds>
</Cursors>
[...]
</ServerInstance>
You do know that this represents a value of 10 hours, correct? You are willing to lock resources for that length of time? This is counterintuitive for optimal application performance as you would seek to recover resources as fast as possible to support more sessions versus locking a resource for an extended period of time.
I refer to the following performance "compass rose" as a guiding item (independent of tool)
If you need to amend the file on a remote server you can do this either via OS Process Sampler or via SSH Command Sampler. The first one is a part of JMeter installation, the second one you can install using JMeter Plugins Manager
See How to Run External Commands and Programs Locally and Remotely from JMeter for more information, example configuration and sample commands.
My goal is to run a load test using 4 Azure servers as load generators and 1 Azure server to initiate the test and gather results. I had the distributed test running and I was getting good data. But today when I remote start the test 3 of the 4 load generators fail with all the http transactions erroring. The failed transactions log the following error:
Non HTTP response message: java.lang.ClassNotFoundException: org.apache.commons.logging.impl.Log4jFactory (Caused by java.lang.ClassNotFoundException: org.apache.commons.logging.impl.Log4jFactory)
I confirmed the presence of commons-logging-1.2.jar in the jmeter\lib folder on each machine.
To try to narrow down the issue I set up one Azure server to both initiate the load and run JMeter-server but this fails too. However, if I start the test from the JMeter UI on that same server the test runs OK. I think this rules out a problem in the script or a problem with the Azure machines talking to each other.
I also simplified my test plan down to where it only runs one simple http transaction and this still fails.
I've gone through all the basics: reinstalled jmeter, updated java to the latest version (1.8.0_111), updated the JAVA_HOME environment variable and backed out the most recent Microsoft Security update on the server. Any advice on how to pick this problem apart would be greatly appreciated.
I'm using JMeter 3.0r1743807 and Java 1.8
The Azure servers are running Windows Server 2008 R2
I did get a resolution to this problem. It turned out to be a conflict between some extraneous code in a jar file and a component of JMeter. It was “spooky” because something influenced the load order of referenced jar files and JMeter components.
I had included a jar file in my JMeter script using the “Add directory or jar to classpath” function in the Test Plan. This jar file has a piece of code I needed for my test along with many other components and one of those components, probably a similar logging function, conflicted with a logging function in JMeter. The problem was spooky; the test ran fine for months but started failing at the maximally inconvenient time. The problem was revealed by creating a very simple JMeter test that would load and run just fine. If I opened the simple test in JMeter then, without closing JMeter, opened my problem test, my problem test would not fail. If I reversed the order, opening the problem test followed by the simple test then the simple test would fail too. Given that the problem followed the order in which things loaded I started looking at the jar files and found my suspect.
When I built the script I left the jar file alone thinking that the functions I need might have dependencies to other pieces within the jar. Now that things are broken I need to find out if that is true and happily it is not. So, to fix the problem I changed the extension on my jar file to zip then edited it in 7-zip. I removed all the code except what I needed. I kept all the folders in the path to my needed code, I did this for two reasons; I did not have to update my code that called the functions and when I tried changing the path the functions did not work.
Next I changed the extension on the file back to jar and changed the reference in JMeter’s “Add directory or jar to classpath” function to point to the revised jar. I haven’t seen the failure since.
Many thanks to the folks who looked at this. I hope the resolution will help someone out.
I am involved in a project which requires me to create a Job Scheduler using “Quartz Scheduler” to schedule various jobs which in turn trigger Pentaho Kettle transformation(s). Kettle transformations are essentially ETL scripts performing some mundane activities in our case. Am facing a critical issue while running the scheduler:
We have around 10 jobs scheduled using Job Scheduler. For some 3 to 4 specific jobs it’s throwing following exception:
Unable to load the job from XML file [/home /transformations/jobs/TestJob.kjb] Unable to read file [file:///home /transformations/jobs/ TestJob.kjb] Could not read from "file:///home /transformations/jobs/TestJob.kjb" because it is a not a file.
org.pentaho.di.job.JobMeta.(JobMeta.java:715)
org.pentaho.di.job.JobMeta.(JobMeta.java:679)
com. XYZ.transformation.jobs.impl.JobBootstrapImpl.executeJob(JobBootstrapImpl.java:115)
com. XYZ.transformation.jobs.impl.JobBootstrapImpl.startJobsExecution(JobBootstrapImpl.java:100)
com. XYZ.transformation.jobs.impl.QuartzJobsScheduler.executeInternal(QuartzJobsScheduler.java:25)
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
org.quartz.core.JobRunShell.run(JobRunShell.java:223)
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Weird thing is that, upon verifying the specified path i.e. “/home /transformations/jobs/TestJob.kjb”, file is present and I am able to read it. Moreover the Job runs successfully and does all the things which it is supposed to, yet throws the exception detailed above.
After observing closely, I strongly feel that Quartz is internally caching jobs and/or its parameters. We do load certain parameters required for the job to execute after it is triggered. Would it be possible to delete/purge the cache used by Quartz? I also tried killing all the java processes running on the box (thinking that it may kill Quartz itself, as Quartz is being run within java process) and restarting quartz and its jobs afresh, but couldn’t make it work as expected. It still stores the old parameters somewhere perhaps in some cache.
Versions used –
Spring Framework (spring-core & spring-beans) - 3.0.6.RELEASE
Quertz Scheduler - 1.8.6
Platform – Redhat Linux - 2.6.18-308.el5
Pentaho kettle – Spoon Stable Release – 4.3.0
I will do in this way:
Ensure that the Pentaho Job can run in standalone first with a shell script, java service wrapper or whatever
In the Quartz Job, then use Quartz's NativeJob to call the same standalone script
Just my two cents
Looks to me like you have an extra space in the path.
/home /transformations/jobs/TestJob.kjb
Between the e of home and the /
Remove that space, I can't possibly believe you actually have a home directory called "home "!!
Following process in our linux server is taking 100% of CPU
java -DMQJMS_LOG_DIR=/opt/hd/ca/mars/tmp/logs/log -DMQJMS_TRACE_DIR=/opt/hd/ca/mars/tmp/logs/trace -DMQJMS_INSTALL_PATH=/opt/isv/mqm/java com.ibm.mq.jms.admin.JMSAdmin -v -cfg /opt/hd/ca/mars/mqm/data/JMSAdmin.config
I forcibly killed the process and bounced MQ then i don't see this. What might be the reason for this to happen?
The java process com.ibm.mq.jms.admin.JMSAdmin is normally executed via the IBM MQ script /opt/mqm/java/bin/JMSAdmin.
The purpose of JMSAdmin is to create JNDI resources for connecting to IBM MQ, these are normally file based and stored in a file called .binding, the location of the .binding file would be found in configuration file that is passed to the command. In your output above the configuration file is /opt/hd/ca/mars/mqm/data/JMSAdmin.config.
JMSAdmin is an interactive process where you run commands such as:
DEFINE QCF(QueueConnectionFactory1) +
QMANAGER(XYZ) +
...
I would be unable to tell you why it was taking 100% CPU, but the process itself does not directly interact with or connect to the queue manager and it would be safe to kill off the process with out needing to restart the queue manager. The .binding file that JMSAdmin generates is used by JMS applications in some configurations to find details of how to connect to MQ and the names of queues and topics to access.
In July 2011 you would have been using IBM MQ v7.0 or lower all of which are out of support, if anyone should come across a similar issue with a recent supported version of MQ I would suggest you take a java thread dump and open a case with IBM to investigate why it is taking up 100% of the CPU.
*PS I know this is a 9 year old question, but I thought an answer may be helpful to someone who finds this when searching for a similar problem.