Which method is the least obtrusive for generating thread dumps in java? - performance

I am aware of the following methods for generating thread dumps in java:
kill -3
jstack
JMX from inside the JVM
JMX remote
JPDA (remote)
JVMTI (C API)
Of these methods, which is the least detrimental to the JVM's performance?

If you just need to dump all stack traces to stdout, kill -3 and jstack should be the cheapest. The functionality is implemented natively in JVM code. No intermediate structures are created - the VM prints everything itself while it walks through the stacks.
Both commands perform the same VM operation except that signal handler prints stack traces locally to stdout of Java process, while jstack receives the output from the target VM through IPC (Unix domain socket on Linux or Named Pipe on Windows).
jstack uses Dynamic Attach mechanism under the hood. You can also utilize Dynamic Attach directly if you wish to receive the stack traces as a plain stream of bytes.
import com.sun.tools.attach.VirtualMachine;
import sun.tools.attach.HotSpotVirtualMachine;
import java.io.InputStream;
public class StackTrace {
public static void main(String[] args) throws Exception {
String pid = args[0];
HotSpotVirtualMachine vm = (HotSpotVirtualMachine) VirtualMachine.attach(pid);
try (InputStream in = vm.remoteDataDump()) {
byte[] buf = new byte[8000];
for (int bytes; (bytes = in.read(buf)) > 0; ) {
System.out.write(buf, 0, bytes);
}
} finally {
vm.detach();
}
}
}
Note that all of the mentioned methods operate in a VM safepoint anyway. This means that all Java threads are stopped while the stack traces are collected.

The most performant option is likely to be the use of the ThreadMXBean.dumpAllThreads() API rather than requesting a text thread dump written to disk:
http://docs.oracle.com/javase/7/docs/api/java/lang/management/ThreadMXBean.html#dumpAllThreads(boolean,%20boolean)
Of course, whether you can use that depends on whether you need a thread dump file, or just the data.

Related

Akka round robin router doesn't respect number of instances

I have the following code:
class ARouter {
public static ActorRef getRouter(actorContext, param1, param2, routerName) {
ActorRef router;
try {
RoundRobinPool roundRobinPool = new RoundRobinPool(1);
Props props = Props.create(MyActor.class, param1, param2, param3);
router = actorContext.actorOf(roundRobinPool.props(props), routerName);
} catch (Exception e) {
router = null;
}
return router;
}
}
and somewhere in my code I do this
ActorRef router = ARouter.getRouter(actorContext, param1, param2, routerName);
anObject.getAListOfItems().forEach(listItem -> router.tell(listItem, getSelf()));
I would expect to to see one thread because although I send the messages to the router to dispatch them to the actors, the router was created with only one routee (If I understand it correctly).
I tried with different number of instances but I always get 8 threads. The only think that worked (and of course "crashed") was setting new RoundRobinPool(0) which worked and the application protested that no actors were available.
No custom configuration file is used. Is there something in the logic of routers that I don't understand?
It's not completely clear what you're asking (your code nowhere refers to threads), but in Akka, a dispatcher schedules an actor's message processing to run on a thread when that actor has a message to process. The standard implementation leverages a thread pool (in 2.6, the default pool has a size equal to the number of cores (counting a hyperthread as a core), 2.5 by default uses a larger pool to guard against inadvertent blocking starving system components): an actor's message processing in that implementation can happen in any thread in the pool.
So if your actors are logging which thread they're running on, for instance, you may see that the actor is running on multiple threads. This is generally desirable: the actor's one-message-at-a-time processing model still ensures safety, and not being pinned to a particular thread in turn means that with n threads in the pool, any combination of n actors can be processing at the same time.
There are alternative dispatcher implementations which will pin an actor to a thread: if actors A and B are both pinned to thread T, then B cannot process a message if A is processing a message. In some scenarios, this reduces context-switch overhead and improves throughput at some cost to latency.
In general, an actor shouldn't care which particular thread it's running on.

JDK 8: ConcurrentHashMap.compute seems occasionally to be allowing multiple calls to remapping function

I'm working on a highly concurrent application that uses an object cache based on a ConcurrentHashMap. My understanding of ConcurrentHashMap is that calls to the "compute" family of methods guarantee atomicity with respect to the remapping function. However, I've found what appears to be anomalous behavior: occasionally, the remapping function is called more than once.
The following snippet in my code shows how this can happen and what I have to do to work around it:
private ConcurrentMap<Integer, Object> cachedObjects
= new ConcurrentHashMap<>(100000);
private ReadWriteLock externalLock = new ReentrantReadWriteLock();
private Lock visibilityLock = externalLock.readLock();
...
public void update(...) {
...
Reference<Integer> lockCount = new Reference<>(0);
try {
newStats = cachedObjects.compute(objectId, (key, currentStats) -> {
...
visibilityLock.lock();
lockCount.set(lockCount.get() + 1);
return updateFunction.apply(objectId, currentStats);
});
} finally {
int count = lockCount.get();
if (count > 1) {
logger.debug("NOTE! visibilityLock acquired {} times!", count);
}
while (count-- > 0) {
// if locked, then unlock. The unlock is outside the compute to
// ensure the lock is released only after the modification is
// visible to an iterator created from the active objects hashmap.
visibilityLock.unlock();
}
}
...
}
Once in a great while, visibilityLock.lock() will be called more than once within the try block. The code in the finally block logs this and I do see the log message when this happens. My remapping function is mostly idempotent so, with the exception visibilityLock.lock(), it's harmless to have it called more than once. When it is, the finally block handles it by unlocking multiple times as needed.
visibilityLock is a read lock obtained from a ReentrantReadWriteLock. The point of this other lock is to ensure that another data structure outside this one cannot see the changes being made by the updateFunction until after compute returns.
Before we get side-tracked on non-issues, I'm already aware that the default implementation of ConcurrentMap.compute indicates that the remapping function can be called multiple times. However, the override (and corresponding documentation) in ConcurrentHashMap provides a guarantee of atomicity and the implementation shows this to be true (afaict).
Has anyone else run into this issue? Is it a JDK bug, or am I just doing something wrong?
I'm using:
$ java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
Eyeballing the JDK code:
https://github.com/bpupadhyaya/openjdk-8/blob/master/jdk/src/share/classes/java/util/concurrent/ConcurrentHashMap.java
It's apparrent that compute will not call the remappingFunction twice. That leaves three possibilities:
You don't have a ConcurrentHashMap, but something else. Check the concrete type at runtime and dump it to the log file.
You are calling compute twice. Is there any flow control in the function which may not be doing what you expect? You've removed most of the function so it's impossible for me to say.
You are calling compute once, but your remappingFunction is locking twice. Is there any flow control in the lambda which might not be doing what you think? Again, you have removed most of the function so there is nothing I can do to help.
To debug, check the lock count, at the point of locking, and if it is nonzero, dump the stack to the logfile.

Releasing memory back to OS when JVM is idle

We have a simple microservice setup, based on Spring Boot and Java 8 on Windows servers.
Many of the services have a low load, since they serve as integrations to all kinds of external partners. So they are idle a lot of the time.
The problem is that the JVM only releases memory back to the OS, when a garbage collection is triggered. So a service might start using 32mb, then serve a single request and allocate 2GB of memory. If there is no other activity on that service, it will not GC and other services on the server will suffer.
Triggering a GC externally or internally with a System.gc works just fine and I have figured out how to use -XX:MaxHeapFreeRatio and -XX:MinHeapFreeRatio with -XX:+UseG1GC to control when the heap should expand and release memory to the OS.
My question is: What is the best way to ensure that memory is relased back to the OS when the JVM is idle?
One idea would be to have the service monitor itself and trigger a System.gc efter a period of idleness, but that might be tricky and errorprone. So hoping for better suggestions.
You can reproduce by running X instances of this program. About 10 made my Windows machine with 8GB give up.
import java.util.*;
public class Load {
public static void main(String[] args) throws Exception {
alloc();
Scanner s = new Scanner(System.in);
System.out.println("enter to gc ");
s.nextLine();
System.gc();
System.out.println("enter to exit");
s.nextLine();
}
private static void alloc() {
ArrayList<String[]> strings = new ArrayList<>();
int max = 1000000;
for (int i = 0; i < max; i++) {
strings.add(new String[500]);
}
}
}
c:\> java -server -XX:+UseG1GC -Xms32m -Xmx2048m Load
Edit: This was marked as a duplicate two times, but it is not a duplicate of the linked questions. The first question is a 2010 version of the same question, but that question is on why the GC does not release memory back to the OS (which was not possible at that time). The other question is about basic GC settings, that I already wrote that I understand. I wish a discussion of how to trigger the garbage collector when the system is idle. So running System.gc every five seconds is not acceptable, because that would have a high risk of colliding with valid requests and ruin the response times.
If calling System.gc() fulfills your needs, I would recomend to use spring scheduler to run a periodic task every x sedonds.
This is quite easy to implement, some annotations
#EnableAsync
#EnableScheduling
#Scheduled(cron = "...")
is all you need.
See spring scheduling for details.
Edit
Calling System.gc() gives only suggests to start the garbage collection, its still up to the JVM to decide when to do it or not.
To find out, if your system is idle or not, you could use the spring metrics.
There are some sub classes of
org.springframework.boot.actuate.endpoint.PublicMetrics
like TomcatPublicMetrics or SystemPublicMetrics that give you information about the system.
You can get them injected using #Autowire and call mertics() to get single values. Based on that you might be able to decide, if your system is idle or not,

Testing connection to HDFS

In order to test connection to HDFS from a java program, is it sufficient enough to rely on FileSystem.get(configuration) or additional sanity checks should be done to do so?(fo ex: some file-based operations like list,copy,delete)
FileSystem.get(Configuration) creates a DistrubutedFileSystem object, which in turn relies on a DFSClient to talk to the NameNode. Buried deep down in the source (1.0.2 is the version i'm looking through), is a call to create an RPC for the NameNode, which in turn creates a Proxy for the ClientProtocol interface.
When this proxy is created, (org.apache.hadoop.ipc.RPC.getProxy(Class<? extends VersionedProtocol>, long, InetSocketAddress, UserGroupInformation, Configuration, SocketFactory, int)), a call is made to ensure the server and client both talk the same 'version', so this confirmation affirms that a NameNode is running at the configured address:
VersionedProtocol proxy =
(VersionedProtocol) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[] { protocol },
new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout));
long serverVersion = proxy.getProtocolVersion(protocol.getName(),
clientVersion);
if (serverVersion == clientVersion) {
return proxy;
} else {
throw new VersionMismatch(protocol.getName(), clientVersion,
serverVersion);
}
Of course, whether the NameNode has sufficient datanodes running to perform some actions (such as create / open files) is not reported by this version match check.

Would a multithreaded Java application exploit a multi-core machine very well?

If I write a multi-threaded java application, will the JVM take care of utilizing all available cores? Do I have to do some work?
Unless you use a JVM that has so-called "green" threads (which is very few these days), Java threads are run by OS threads, so multiple threads get run on different cores by default.
To follow up, I see 100% usage on both cores when I run this code on my dual core. If I bring the number of threads from two to one, one core goes to 100% and another about 4%.
package test;
import java.util.ArrayList;
public class ThreadTest
{
public void startCPUHungryThread()
{
Runnable runnable = new Runnable(){
public void run()
{
while(true)
{
}
}
};
Thread thread = new Thread(runnable);
thread.start();
}
public static void main(String[] args)
{
ThreadTest thread = new ThreadTest();
for (int i=0; i<2; i++)
{
thread.startCPUHungryThread();
}
}
}
All modern JVMs will take advantage of as many cores as your hardware has. An easy way to illustrate this is to download and run the DaCapo benchmark on your machine. The lusearch benchmark uses 32 threads. If you run this on your desktop or server, you should see all of your CPUs hit 100% utilization during the test.
On the flip of that, it is sometimes useful to "bound"/set affinity for a Java process to only use a set of cores/sockets, though done via OS semantics. As previously answered, most runtimes indeed employ all cpus and with highly threaded apps can eat up more resources than you might expect.

Resources