Do Docker's debug and experimental flags have a performance impact?

Do Docker's debug and experimental flags have a performance impact? - performance

If I turn off Docker Daemon debug and experimental flags, would it increase performance on macOS?

I'm afraid this won't be a clear yes or no answer, but generally, I don't believe you will see noticeable changes in performance of Docker with these flags on or off.
The debug flag in Docker mainly controls log output. There are not additional codepaths that get added to the daemon, so unless log writing in that environment has a significant slowdown (e.g. disk performance), again, I don't think the debug flag will have any noticeable impact.
The experimental flag could potentially have impact, but this flag's impact will change each release or even point release depending on what is "behind" the experimental feature flag. Usually it is a new feature (command, option, configuration) that by enabling it should not have impact on the rest of daemon operations. That said, it isn't impossible to consider a case where an experimental feature does change the codepaths of other parts of the daemon, but generally I don't think this is the case, but is hard to predict as new features may come in the future that do impact performance.

Related

JDK7 Application is getting slow after some Uptime

We have a large JDK7 application deployed on JBoss using several libraries like Hibernate, Spring and so on. After initial startup of the server, the application runs as expected but after some uptime it becomes very slow.
Using a profiler, we have seen that every time certain aspects of out application are slowing down but not always the same aspects. While in one run it might be that hibernate flush slows to a crawl, in another run it might be some DI-code from Spring.
What's going on there?

There is a bug in JDK7 regarding the CodeCache memory area which hit us very, very hard.
Explanation
Basically Java starts up and uses just in time compilation (JIT) to compile just the required parts of the bytecode during runtime. This enables the JVM to de- and recompile certain code fragments during execution. This happend, if the JVM determins, that an initial compilation of a certain code fragment is suboptimal. Oracle introduced a feature named tiered compilation in JDK 7 which allows the VM to do just that.
Compiled code in the JVM is stored in the CodeCache memory area. Up to JDK6 the default was that this area would be filled up and once at a 100% the JIT would stop compiling and an error would be printed to the console, however the application would be running same as before: Everything already compiled would stay compiled, everything not yet compiled would be executed in interpretation mode (which is roughly 100x slower)
This option is named CodeCacheFlushing, it is enabled by default since JDK7u4. The idea is, that once CodeCache is full, the least used parts of compiled code are flushed from memory to make room for other code fragments. That would make the JDK6-default-behaviour (to stop compilation all in all) obsolete. It also allowed for a much smaller CodeCache area (in JDK7 CodeCache is 48M by default/96M if tiered compilation is enabled).
Here comes the bug. In JDK7 once the CodeCache gets full, the JIT is stopped. Next comes the flushing of the CodeCache area. That's it. JIT should be reenabled after flushing is completed but that doesn't happen. Also, there is no warning printed to the console. Worse: prior to disabling the JIT roughly half of the already compiled code is thrown out.
In contrast to JDK6 where everything that was fast will stay fast and only new code will be interpreted, in JDK7 you actually lose already compiled and optimized code! All of the sudden parts of your application that performed well will stop doing so. It is left to chance, which parts of the application slow down, which makes tracking that bugger by profiler nearly impossible: At times the hibernate code for flushing slows down, at other times, its the spring DI code or your own appcode.
Are you affected?
You can use a profiler (JProfiler/YourKit) or JConsole (JVisualVM won't do) to monitor the memory-consumption of CodeCache memory area. Typically the CodeCache amount committed will stay very close to the used amount (say, committed is 23mb, used is 22mb). While your application runs, committed and used go up until committed reaches max. At that point used will drop sharply to 1/2 - 2/3 of max. After that, used whill no longer grow. That's where the bug will hit you. In JConsole, it will look like this:
Why me and not all the others?
Chances are, you are using JBoss. Oracle quickly found out that there are things not like they should be and disabled tiered compilation by default - yet Red Hat in its infinite wisdom decided, it knew better and reenabled it. Basically our webapp runs fine on Weblogic and only JBoss is affected, because without the tiered compilation (not enabled in weblogic) the growth of CodeCache is so small, we never actually hit the 48mb threshold even after weeks of operation.
What can I do?
First of all, decide, whether this bug hits you. Second, make it harder for the bug to damage you. If you disable CodeCacheFlushing at least hitting the bug won't make things worse than they were before. Stopping tiered compilation will make it less probable the bug hits you, same as increasing the amount of CodeCache-Memory available.
You can always try to switch to JDK8, this seems unaffected and also you could implement monitoring in your software to warn you, if CodeCache is running full.
TL;DR
In JDK 7 never enable tiered compilation (disabled by default, enabled in JBoss)
in JBoss 7 always set PRESERVE_JAVA_OPTS=true in standalone.conf
always disable CodeCacheFlushing (-XX:-UseCodeCacheFlushing)
always pack a sufficient amount of memory into CodeCache (-XX:ReservedCodeCacheSize=xxM).

windows performance recorder record specific process

Using the Windows Performance Recorder, is it possible to generate an ETL file based on the tracing of a single process? The ETL files generated for all of the processes in the system result in ETL files measured in GBs for intervals as small as a couple of minutes.

ETW (kernel event) tracing is system wide and captures all processes.

I don't think it is possible to record ETW traces that record just one process (at least not with xperf or wpr). If your traces are too big then the best tactic is to make sure that the rest of the system is as quiet as possible so that it doesn't contribute too much data.
If the rest of the system is already quiet then the traces are probably big because ETW traces tend to be big. You can use trace compression to make them smaller on disk - see UIforETW for how this works - https://randomascii.wordpress.com/2015/09/24/etw-central/.
If the rest of the system is not already quiet then yes, it probably is contributing to bloat in the traces. Note that it may also be affecting performance, so that data is not irrelevant.
And, if you really do need single-process profiling consider using a different profiler. The Visual Studio profiler does per-process profiling.

Side Effects of running the JVM in debug mode

I'd like to realease a Java application in debug mode to allow for easier debugging when random or hard to reproduce problems occur on the customer side.
However, I want to get a heads up on potential side effects of doing this? From the Java HotSpot Documentation it seems that there should be no performance penalty.
From the link
Full Speed Debugging
The Java HotSpot VM now uses
full-speed debugging. In previous
version of the VM, when debugging was
enabled, the program executed using
only the interpreter. Now, the full
performance advantage of HotSpot
technology is available to programs,
even with compiled code. The improved
performance allows long-running
programs to be more easily debugged.
It also allows testing to proceed at
full speed. Once there is an
exception, the debugger launches with
full visibility to code sources.
Is this accurate or are there hidden caveats, what about memory footprint and are there any other hidden gotchas while using debug mode.
PS: I found this article from AMD which confirmed my initial suspiciion that the original article from oricale doesn't show the full story.

I can't speak for HotSpot, and won't officially for IBM, but I will say there are certainly legal kinds of optimization that aren't possible to undo fully should a decompilation be required in the middle of them, and thus aren't enabled when debug is being asked for in the production JVMs you are likely to use.
Imagine a situation where the optimizer discovers a part of the program is provably not required and by the various language rules (including JSR 133) is legal to remove, the JVM will want to get rid of it. The one wrinkle is debug: removing the code will look odd to the human stepping through it (variables not updating, possibly not stopping on lines when stepping) so the choice is to disable said optimizations in those cases. The same might also be true for opts like stack allocated objects, etc.. so while the JVM says it's "full speed" it's actually closer to "nearly full speed, with some of the funkier opts that can't quite be undone removed".

This question is old but came up while I was searching for any performance impact if you just leave -agentlib:jdwp... on but are not actively debugging.
Summary: Starting with debugging options but not connecting shouldn't impact the speed now (Java 7+).
Before java 6 (ish) you used -Xdebug and this had a definite impact, it shut off the JIT!
In java 6 they changed it to -agentlib and made it better. there were some bugs though that did cause a performance penalty. Here is one of the bugs that was filed against openjdk, My guess is that there were similar problems with The oracle/sun version: https://bugs.openjdk.java.net/browse/JDK-6902182
Note however that the stated goal is that simply enabling debugging by opening the port should not cause any performance penalty.
It looks like, at least in openjdk, the bugs were cleaned up by java 7. I didn't see anything about performance impacts after that.
If you research this further and find negative results, take note of the java version the testing was done under--everything I saw was referring to versions before 7.
I'd love to hear if anyone encountered performance problems in a recent VM just leaving the port enabled.

If you plan to run the app with remote debugging enabled, it can affect security also. Remote debugging leaves a port open on your machine, and by connecting to it, I can do all sorts of fun things with your application.

The program definitely does lot more than simply running when in debugging mode, so it is obvious that performance can not be same. However if you read the statement carefully, it says that new release can run fully optimized code even if in debugging mode which was not possible earlier. Thus the new jvm is much more faster than previous one which could only run in interpreted mode which no optimization.

Testing perceived performance

I recently got a shiny new development workstation. The only disadvantage of this is that the desktop apps I'm developing now run very, very fast, and so I fear that parts of the code that would be annoyingly slow on end users' machines will go unnoticed during my testing.
Is there a good way to slow down an application for testing? I've tried searching around, but all of the results I've been able to find seem pretty fiddly to set up (e.g., manually setting up a high-priority CPU-bound task on the same CPU core as the target app, or running a background process that rapidly interrupts and resumes the target app), and I don't know if the end result is actually a good representation of running on a slower computer (with its slower CPU, slower RAM, slower disk I/O...).
I don't think that this is a job for a profiler; I'm interested in the user's perception of end-to-end performance rather than in where the time goes for particular operations.

setup a virtual machine, give in as little ram as needed and also you can have it use 1,2 or more CPUs. I like VirtualBox myself install your app and test with different RAM configs

Personally, I'd get an old used crappy computer that is typical of what the users have and test on that. It should be cheap and you will see pretty fast how bad things are.

I think the only way to deal with this is through proper end-user testing, i.e. get yourself a "typical" system for testing and use that to identify any perceptible performance bottlenecks.

You can try out either Virtual PC or VMWare Player/Workstation, load an OS onto it, and then throttle back the resources. I know that with any of those tools you can reduce the memory to whatever you'd like. You can also specify the number of cores you want to use. You might even be able to adjust the clock speed in VMWare Workstation... I'm not sure.

I upvoted SQLMenace's answer, but, I also think that profiling needs to be mentioned, no matter how quickly the code is executing - you'll still see what's taking the most time. If you find yourself with some free time, I think profiling and investigating the results is a good way to spend it.

How can I deliberately slow Windows?

How do I reversably slow a PC with XP?
I want to achieve this without using visible CPU-cycles, so I'm guessing some hardware settings might do.
I don't want my app to run slow, I want the whole OS to be slow. I know some network lookups especially out of a trusted environment (think Active Directory) slow a PC way down. This is the effect I want.
Disclaimer: this is not for a bad/evil/illegal cause!

We use a 'crippled' server we call doofus for load testing. it is an old P3/500 box with limited RAM.
Another option is setting up a VM with very limited resources.

Use powercfg.exe, to force the computer to a power plan you've created that locks the CPU into a lower frequency to conserve power. What states are available depend on your platform (most desktops only have a couple.)

If you think your hardware setup can handle it, some motherboards let you manually specify a clockspeed multiplier or other speed settings in the BIOS. Often there'll be an option for a slower speed or a field where you can manually enter a lower multiplier.
If not, you might consider setting up a virtual machine, and making sure it's not fully virtualized - paravirtualized machines run slower due to the necessary translations that take place in the virtualization layer.

The open source Bochs emulator is pretty easy to slow down by editing its config file. Windows XP will run in it. It is not as powerful as vmware, but there are many more configuration options.
Look at the documentation for the config file, "bochsrc", and particularly the "IPS" entry. (Instructions Per Second)

Remove the thermopaste and put some dust on the CPU :-) Also, remove some RAM.

You may want to take a look at a full-system simulator such as Simics. Simics allows you deterministically simulate an entire system (including networks, if you want). Not only can you tweak the CPU frequency, you can study the system in detail to see how it behaves.
Unfortunately, Simics has quite a pricetag.

If you want to see really dramatic effects very easily, set the /MAXMEM switch in boot.ini (or use msconfig). This will limit the amount of memory used by the system - switching to 256mb or lower would make things very, very slow.

You have lots of options. Things I can think of:
Change your disks to good old fashioned IDE. None of that high-speed DMA stuff, just good old fashioned PIO.
Remove RAM (or diable in the BIOS)
Switch to generic video drivers (I mean "Generic SVGA" type, that are un-accelerated)
Disable core(s) in the BIOS
Slow the CPU down in the bios (if possible)

We keep an old laptop around for this reason. Helped me to find a subtle timing issue in some splash screen code that was absolutely unreproducable on decent quad core dev boxes.

Install Norton 360. It makes the mouse cursor lag during updates and constantly nags for restarts.

Disable the L2 cache in the BIOS

Two Windows applications: Mo'Slo and Cpukiller.
I remember hearing of one that grabs large chunks of RAM, to reduce your available RAM, but I forget what it is called.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio