I am currently using JDK Flight Recorder with JDK 11 and came across some trouble in the CI/CD Plattform. Unfortunately, there is not too much documentation on the new Flight Recorder, but rather on the older one, which was still developed under Java.
When I try to start tests directly from the IDE, everything works fine and I get my recording files.
When I try to do the same thing, automatically, in the CI/CD Plattform, it causes time out and a lot of different indefinite failures, among them: trouble creating the file, the file is not even written, etc.
The JVM commands I used are the following (I put extra spaces for better readability):
-XX:+FlightRecorder
-XX:StartFlightRecording= name="UiTestServer", settings="profile", dumponexit=true, filename=""+System.getenv("CI_PROJECT_DIR") + "flightRecording/javaFlightRecorder.jfr"
The commands are the same that the IDE uses automatically, when starting the flight recording with right click on the specified test.
Does anybody know, whether the Flight Recorder has problems with such systems or specific services which might run parallely to it? I heard of some profiling tools, that are unable to perform on CI Plattforms.
If you need more detail, just ask me. Though, it might happen that I cannot tell anything related to the project.
Bit late as an answer, but JFR can definitely run in CI/CD environments. I have successfully attached JFR to our JMH microbenchmarks and published the results as artifacts in Atlassian Bamboo. Our Bamboo agents are running on AWS, so JFR itself should be good for most cloud environments.
JFR has been built to work in production systems, but if you want guarantees of low overhead (<1%), you should use the default settings, not profile.
'profile' is for a shorter period of time, i.e. 10 minutes, where it may be OK with additional overhead to gain more insight.
This is what I would recommend, for JDK 11 and later:
$ java -XX:StartFlightRecording:filename=/path
There is no need set dumponexit=true if a filename has been specified.
-XX:+FlightRecorder is only needed before JDK 8u40.
You can set a name if you like, but it's typically not needed. If you want to use jcmd and dump a recording, the name can be omitted.
Related
The title really says it all, but just in case, here's some context:
Each time you change your configuration in NixOS, you need to run nixos rebuild to create a new boot image, which will be listed in Grub when you start the computer. A new configuration might require a new kernel. If it does, and you build it, will your old configurations continue to work?
In Ubuntu it appears that one can indeed host multiple kernels on the same machine. And I read somewhere the linux kernel can be pretty small, like 60 MB. Those two facts lead me to expect NixOS will retain the old kernels. But I haven't found anything online that really makes that explicit.
I am currently building a configuration that uses Musnix. If you ask for it, Musnix will build you a realtime kernel. I'm currently building such a new configuration, and hoping I'll still be able to boot my computer after it. I worry because GIthub user #magnetophon, who is involved in Musnix's development, said the Musnix realtime kernel is borken.
This is one of the cool features of NixOS. When you run nixos-rebuild boot (or nixos-rebuild switch too for that matter), it will create new boot entries alongside the old ones. These entries have the right kernel and system configuration in them. So if your experimental kernel doesn't work, you can just reboot and start a previous version of your system, knowing that it will work, even if your kernel also came with userland changes.
The nixos-rebuild command is documented here in the NixOS manual: https://nixos.org/nixos/manual/#sec-changing-config
Intel VTune Amplifier has the possibility to profile a parallel application executed on a remote machine.
Intel Advisor doesn't have such an option. According to this document, you have to use the command-line version of Intel Advisor:
This makes it possible to automate many tasks as well as analyze an
application running on remote hosts
However, the GUI version has many features not offered by the cl version (like suggestions about how to solve vectorization/multi-thread inefficiency etc).
I tried to run advixe-cl on the remote machine and then copy locally the project (and produced results). It works, but some features are lost. As last chance I tried to ssh -X the remote machine and the use advixe-gui, but it seems that the main core of my Xeon Phi KNL is too weak to ruun properly such a graphic application.
What is the correct/best use of Intel Advisor in such a scenario?
The recommended way is described by you here: "run advixe-cl on the remote machine and then copy locally the project".
But you mentioned that "some features were lost". What did you loose exactly?
The key defficiency of given command-line+GUI approach is that you may not see your source code in "Source View" tabs initially. To overcome this limitation, you have to adjust Project Properties of your local project copy and specify "Source Search" and sometimes to "Binaries/Symbol Search" specifying directories providing path to the location where original source code and sometimes executable binarry plus DWARF/pdb debug info files are located.
In case you used "-no-auto-finalize" option in command line (which is more advanced scenario), you may also need to use Re-Finalize feature (available only starting from 2017 Update 2 new release) or (for older versions) make sure that you provide Binary/Symbol/Source Search after opening local project copy, but before "Show My Result" upload data action.
Spark-shell can be used to interact with the distributed storage of data, then what is the essential difference between coding in spark-shell and uploading packaged sbt independent applications to the cluster operation?(I found a difference is sbt submit the job can be seen in the cluster management interface, and the shell can not) After all, sbt is very troublesome, and the shell is very convenient.
Thanks a lot!
Spark-shell gives you a bare console-like interface in which you can run your codes like individual commands. This can be very useful if you're still experimenting with the packages or debugging your code.
I found a difference is sbt submit the job can be seen in the cluster management interface, and the shell can not
Actually, spark shell also comes up in the job UI as "Spark-Shell" itself and you can monitor the jobs you are running through that.
Building spark applications using SBT gives you some organization in your development process, iterative compilation which is helpful in day-to-day development, and a lot of manual work can be avoided by this. If you have a constant set of things that you always run, you can simply run the same package again instead of going through the trouble of running the entire thing like commands. SBT does take some time getting used to if you are new to java style of development, but it can help maintain applications on the long run.
On a windows 2012 RT (x64) TEST server we are running a Tomcat 8 installation and the CPU usage is disconcerting in its regularity of hitting peak usage.
The behavior is happening after an installation of our application but before anyone is accessing it. I have accessed a few pages and tested some features but nothing that could create this behavior that I know of.
There are 2 virtual processors on the server and every ~20 seconds, the CPU usage will spike (on the one processor that is running Tomcat) to 100% for 10 seconds (give or take). See below:
The regularity of the pattern indicates to me that something is incorrect in either the installation or the settings of Tomcat 8.
I have installed the YourKit Java Profiler (by SO recommendation) which I was hoping could shed some light on what is causing these spikes, but haven't been able to see the reason the threads are starting -- at least in part to my newness to YourKit. I did attach it to the Tomcat launch file and it seems to be tracking the behavior.
The catalina logs are silent during the spiking occurrences (as are my application logs) but when I stopped Tomcat there were some messages about ThreadLocals getting started but could not be removed and then: "...Threads are going to be renewed over time to try and avoid a probable memory leak."
I left the server running over the weekend and the pattern has continued until today so I don't think my symptoms are going away. Whatever is starting up has now consumed all available RAM on the system just from starting up these threads (and/or YourKit) every 20 seconds.
What is a possible approach to isolate this aberrant Tomcat activity and hopefully stop or rectify it?
There are many graphs and tabs in YourKit so I hesitate to list everything that might be helpful. Thanks for helping me narrow down the problem with what YourKit (or other tools) could offer me.
Info from catalina log regarding start-up:
Apache Tomcat/8.0.23
Architecture: amd64
Java Home: C:\Program Files\Java\jre1.8.0_65
CATALINA_BASE: C:\Program Files\Apache Software Foundation\Tomcat 8.0
2015-12-08 Update
At Gergely's request, the application is a local installation of DSpace. It's a Java application with a Postgres SQL database backend. We are customizing an opensource version of it from here: http://www.dspace.org/introducing. I'm not exactly sure what else can be helpful and I think the stack trace is more revealing as to what is (and isn't) running -- see below.
By turning on Stack Telemetry in YourKit, "CPU Estimation" was made available by dragging the cursor across a period of profiler history. To me, it looks like all the CPU is doing is spinning idly. Are the Java files pictured below Tomcat routines? They don't strike me as DSpace related (although I'm not an expert) nor does it look like any work is being done while the CPU is peaking.
Of note: the stack trace is identical during the quiet periods -- the only difference being CPU Time (ms) is in the hundreds rather than thousands of milliseconds. For a more direct comparison than what is below, the hump represents ~8,000 ms in Thread.run() and the quiet periods consume ~125 ms of cpu time (although covering approximately the same amount of time).
Lastly, when pages of the application are being requested, a subsequent branch of code appears in the Call Tree. If it happened during the time of a spike it may only take 400 ms of CPU time to load a whole page. The code branch that appears is ApplicationFilterChain.java as a whole separate branch alongside PooledExecutor$Worker.run() -- both underneath java.lang.Thread.run() in the hierarchy.
When trying to interpret the stack trace: Is EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run() responsible?
Processor spikes with no known, associated activity
2015-12-08 Update #2
YourKit comes pre-configured to hide certain java class name patterns which obscured drilling down on java.lang.Thread. Clearing the filters enabled the following screenshots showing that the vast majority of processing time during a spike event is through calling the following 3 methods:
java.io.WinNTFileSystem.canonicalize0
java.io.WinNTFileSystem.getBooleanAttributes (inFile.exists())
StardardRoot.java
My apologies for not yet knowing enough about Tomcat or DSpace to know who is launching these tasks. (In case it matters the line directly above the first line is java.lang.Thread.run() and then <All threads>)
Thank you to those who has viewed and responded to this inquiry. As various individuals have surmised, the problem was related to our settings and use of Tomcat -- not a problem with Tomcat itself (most likely).
This is an attempt to answer the question without perfect knowledge at installing the DSpace application and Tomcat but I think I know enough to be dangerous and potentially helpful to follow-up users.
When installing the application DSpace there are some installation properties in Tomcat's configuration directories that determine whether or not to allow for changes in coding files to be reflected immediately without a Tomcat restart. These settings for us were previously in the directory [tomcat]/conf/Catalina/localhost/ and each of the three files contained a small, insignificant XML file like (e.g. oai.xml):
<?xml version='1.0'?>
<Context docBase="E:/dspace/webapps/oai"
reloadable="false"
cachingAllowed="true"/>
You can find documentation on these properties at the following link:
https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace
Within that documentation is a recommendation about the reloadable and cachingAllowed properties. Search for "Tomcat Context Settings in Production". Here is an excerpt (emphasis mine):
These settings are extremely useful to have when you are first getting started with DSpace, as they let you tweak the DSpace XMLUI (XSLTs or CSS) or JSPUI (JSPs) and see your changes get automatically reloaded by Tomcat (without having to restart Tomcat). However, it is worth noting that the Apache Tomcat documentation recommends Production sites leave the default values in place (reloadable="false" cachingAllowed="true"), as allowing Tomcat to automatically reload all changes may result in "significant runtime overhead".
It is entirely up to you whether to keep these Tomcat settings in place. We just recommend beginning with them, so that you can more easily customize your site without having to require a Tomcat restart. Smaller DSpace sites may not notice any performance issues with keeping these settings in place in Production. Larger DSpace sites may wish to ensure that Tomcat performance is more streamlined.
When I switched these boolean flags to reloadable="false" and cachingAllowed="true" the spiked CPU experience stopped immediately. I don't know if the warning about "Larger sites" applies to us or whether "streamlined performance" could refer to the negative activity I observed.
I presume there may be other problems with our installation that allowed this particular manifestation; one ominous clue is that our production server seems to be operating with these flags in the reloadable="true" configuration. Java, Tomcat, Windows, AND DSpace are ALL getting new versions at the same time so it is fairly difficult to pinpoint why similar Tomcat <context> settings produce such different results.
I am at least content for now to have new behavior and that the system has calmed down. I'll post more if I learn more but will be focusing next on other quandaries.
Update
FWIW, the attributes are settings that directly control Tomcat and they have changed between versions. E.g., cachingAllowed was removed in version 8 which means it can be removed from the Context elements. Compare:
https://tomcat.apache.org/tomcat-8.0-doc/config/context.html#Attributes
https://tomcat.apache.org/tomcat-7.0-doc/config/context.html#Attributes
And for good measure, here is the help text for reloadable in the Tomcat 8 documentation:
Set to true if you want Catalina to monitor classes in /WEB-INF/classes/ and /WEB-INF/lib for changes, and automatically reload the web application if a change is detected. This feature is very useful during application development, but it requires significant runtime overhead and is not recommended for use on deployed production applications. That's why the default setting for this attribute is false. You can use the Manager web application, however, to trigger reloads of deployed applications on demand.
So it would seem that the ultimate answer is that Tomcat 8 on Windows 2012-R2 with the flag reloadable='true' polls for changes to WEB-INF/lib and WEB-INF/classes. The volume of the folders and files to peruse may very well be the cause of these intense, spiked CPU events. For now I will be relying on reloadable='false' which definitely removes the symptom for us.
Not an explicit answer, but way too long for a comment
After reviewing the update on this question and reading a bit I suspect that this recurring issues is caused by a CuratorTask. Reasons being:
The stacktrace you acquired clearly shows that a WorkerThread managed by the DSpace library (so Tomcat is not to be blamed) is using the processor at those times.
After reading a bit about DSpace itself, it looks like that it has a feature that allows users to define curator tasks that should be periodically executed.
On top of this there is at least one task that is - according to the documentation - It is activated by default, so theoretically there can be any number of tasks activated by default.
Moreover this conversation reveals at least 1 curation task that is actived every 10 seconds.
All these together point to the same direction. I would suggest using the UI of DSpace (probably in Admin mode) to look around and find the active curation tasks and verify if their scheduling corresponds to what you have observed.
Our Jenkins server(linux machine) slows down over a period of time and it gets unresponsive. All the jobs take unexpectedly long time(even though they run on slaves which are different machines from server). One of things I have observed is increase in the number of open files. The number seems to be increasing as shown in the image below. Does anyone have a solution to keep check on this without restarting the server? Also, are there any configurations/tweaks that could improve the performance of the jenkins server?
We are using Jenkins for four years and we tried to keep it up-to-date (Jenkins + plug-ins).
Like you we experimented some inconvenience, depending on new versions of Jenkins or plug-ins...
So we decided to stop this "continuous" upgrade
Here are humble tips:
Avoid technical debt. Update Jenkins as much as you can, but use only "Long Term Support" versions (latest is 2.138.2)
Backup your entire jenkins_home before any upgrade!
Restart Jenkins every night
Add RAM to your server. Jenkins use file system a lot and this will improve caching
Define JVM min/max memory parameters with the same value to avoid dynamic reallocation, for example: -Xms4G -Xmx4G
Add slaves and execute jobs only on slaves
In addition to above, you can also try:
Discarding old builds
Distribute the builds on multiple slaves, if possible.