It seems to me that SBT is taking an excessive amount of CPU time just watching for file changes. I'm aware of this post, but the author there was confounding the IDE cpu time with SBT cpu time; I don't have an IDE running.
I'm developing a Play project in Scala with about 370 scala files divided up into 5 modules.
Running sbt with ~run consumes about 70-90% CPU on my MacBook Pro (mid 2012). I don't have an IDE running, I'm not editing any files, no browser accessing the server... Just idling with ~run (and watching the file system) takes 70-90% of a CPU according to Activity Monitor.
After watching it idle for a while I fired up VisualVM; it shows itself and one other JVM process:
Details on the xsbt process (per VisualVM):
PID: 56661
Host: localhost
Main class: xsbt.boot.Boot
Arguments: -Dhttp.port=9001 -Dhttps.port=9443 -Djsse.enableSNIExtension=false -Dhazelcast.config=conf/hazelcast-dev.xml -Dlogger.file=conf/logback-dev.xml -Dconfig.file=conf/passwords/local-dev.conf
JVM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13, mixed mode)
Java: version 1.8.0_121, vendor Oracle Corporation
Java Home: /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre
JVM Flags: <none>
Heap dump on OOME: disabled
JVM arguments:
-Xms1024m
-Xmx1024m
-XX:ReservedCodeCacheSize=128m
-XX:MaxMetaspaceSize=256m
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
Here's a bit more detail from VisualVM on where the time goes:
Is 80% CPU usage -- and fans running most of the time -- reasonable while sbt is idling? Seems really high. Any strategies for bringing it down?
PS: When sbt is just at the command prompt (not watching for file changes) Activity Monitor shows about 5% CPU usage.
PPS: I normally have concurrentRestrictions in Global += Tags.limit(Tags.Test, 4) in my build file to prevent too many concurrent database connections during tests. I'm not sure if this actually does what I think it does. The stats, above, are with that line commented out. When I restore it Activity Monitor still reports ~80% CPU usage but in the CPU samples idleAwaitWork doesn't show up at all and ConcurrentRestrictions$$anon$4.take takes the top spot with SourceModificationWatch$.watch right behind it.
Assuming you're using sbt 1, this might be due to https://github.com/sbt/sbt/issues/3860 as pointed out by Waldemar WosiĆski in comment.
Since we've adopted Close Watch, the problem should now be fixed. Please try sbt 1.1.6 or later and see if the problem still persists.
Related
Recently I start facing issue on few servers where CPU start consuming more resources than usual trend. I am trying to find out the root cause for this and took the dump of w3wp process from Task Manager(right click on process and took the dump).
Now the dmp file size is 14GB and I am trying to analyze it through WinDBG but the tool is not working and getting message:
I also took few minidumps but some of them opening fine while few are not so it's not related to confusion between 32bit or 64bit.(The collected dump is 64bit).
I am trying to know what causing this issue. Is it file size or I am not taking the dump properly.
I checked link but it's not helpful.
Windbg is not the right tool for this job. Dumps are only snapshots so you have no idea what happened before. Use ETW and here the CPU Sampling, which sums all calls and shows you in detail the CPU usage.
Install the Windows Performance Toolkit which is part of the Windows 10 SDK (V1607 works on Win8/8.1(Server2012/R2) and Win10 or the V1511 SDK if you use Windows 7/Server2008R2)), run WPRUi.exe and select CPU Usage
and press on Start. Capture 1-2 minutes of the high CPU usage and next click on Save. Open the generated ETL with WPA.exe (Perf analyzer), drag and drop the CPU Usage (Sampled) graph to the analysys pane
and load the Debug Symbols. Now select your process in the graph, zoom in and expand the stack, here you see the weight of the CPU usage of all calls
In this sample most CPU usage from Internet Explorer comes from HTML stuff.
For .NET applications WPA shows you .net related groupings like GC or JIT:
Expand the stack of the w3wp process to see what it is doing. From the names you should have a clue what happens.
On a windows 2012 RT (x64) TEST server we are running a Tomcat 8 installation and the CPU usage is disconcerting in its regularity of hitting peak usage.
The behavior is happening after an installation of our application but before anyone is accessing it. I have accessed a few pages and tested some features but nothing that could create this behavior that I know of.
There are 2 virtual processors on the server and every ~20 seconds, the CPU usage will spike (on the one processor that is running Tomcat) to 100% for 10 seconds (give or take). See below:
The regularity of the pattern indicates to me that something is incorrect in either the installation or the settings of Tomcat 8.
I have installed the YourKit Java Profiler (by SO recommendation) which I was hoping could shed some light on what is causing these spikes, but haven't been able to see the reason the threads are starting -- at least in part to my newness to YourKit. I did attach it to the Tomcat launch file and it seems to be tracking the behavior.
The catalina logs are silent during the spiking occurrences (as are my application logs) but when I stopped Tomcat there were some messages about ThreadLocals getting started but could not be removed and then: "...Threads are going to be renewed over time to try and avoid a probable memory leak."
I left the server running over the weekend and the pattern has continued until today so I don't think my symptoms are going away. Whatever is starting up has now consumed all available RAM on the system just from starting up these threads (and/or YourKit) every 20 seconds.
What is a possible approach to isolate this aberrant Tomcat activity and hopefully stop or rectify it?
There are many graphs and tabs in YourKit so I hesitate to list everything that might be helpful. Thanks for helping me narrow down the problem with what YourKit (or other tools) could offer me.
Info from catalina log regarding start-up:
Apache Tomcat/8.0.23
Architecture: amd64
Java Home: C:\Program Files\Java\jre1.8.0_65
CATALINA_BASE: C:\Program Files\Apache Software Foundation\Tomcat 8.0
2015-12-08 Update
At Gergely's request, the application is a local installation of DSpace. It's a Java application with a Postgres SQL database backend. We are customizing an opensource version of it from here: http://www.dspace.org/introducing. I'm not exactly sure what else can be helpful and I think the stack trace is more revealing as to what is (and isn't) running -- see below.
By turning on Stack Telemetry in YourKit, "CPU Estimation" was made available by dragging the cursor across a period of profiler history. To me, it looks like all the CPU is doing is spinning idly. Are the Java files pictured below Tomcat routines? They don't strike me as DSpace related (although I'm not an expert) nor does it look like any work is being done while the CPU is peaking.
Of note: the stack trace is identical during the quiet periods -- the only difference being CPU Time (ms) is in the hundreds rather than thousands of milliseconds. For a more direct comparison than what is below, the hump represents ~8,000 ms in Thread.run() and the quiet periods consume ~125 ms of cpu time (although covering approximately the same amount of time).
Lastly, when pages of the application are being requested, a subsequent branch of code appears in the Call Tree. If it happened during the time of a spike it may only take 400 ms of CPU time to load a whole page. The code branch that appears is ApplicationFilterChain.java as a whole separate branch alongside PooledExecutor$Worker.run() -- both underneath java.lang.Thread.run() in the hierarchy.
When trying to interpret the stack trace: Is EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run() responsible?
Processor spikes with no known, associated activity
2015-12-08 Update #2
YourKit comes pre-configured to hide certain java class name patterns which obscured drilling down on java.lang.Thread. Clearing the filters enabled the following screenshots showing that the vast majority of processing time during a spike event is through calling the following 3 methods:
java.io.WinNTFileSystem.canonicalize0
java.io.WinNTFileSystem.getBooleanAttributes (inFile.exists())
StardardRoot.java
My apologies for not yet knowing enough about Tomcat or DSpace to know who is launching these tasks. (In case it matters the line directly above the first line is java.lang.Thread.run() and then <All threads>)
Thank you to those who has viewed and responded to this inquiry. As various individuals have surmised, the problem was related to our settings and use of Tomcat -- not a problem with Tomcat itself (most likely).
This is an attempt to answer the question without perfect knowledge at installing the DSpace application and Tomcat but I think I know enough to be dangerous and potentially helpful to follow-up users.
When installing the application DSpace there are some installation properties in Tomcat's configuration directories that determine whether or not to allow for changes in coding files to be reflected immediately without a Tomcat restart. These settings for us were previously in the directory [tomcat]/conf/Catalina/localhost/ and each of the three files contained a small, insignificant XML file like (e.g. oai.xml):
<?xml version='1.0'?>
<Context docBase="E:/dspace/webapps/oai"
reloadable="false"
cachingAllowed="true"/>
You can find documentation on these properties at the following link:
https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace
Within that documentation is a recommendation about the reloadable and cachingAllowed properties. Search for "Tomcat Context Settings in Production". Here is an excerpt (emphasis mine):
These settings are extremely useful to have when you are first getting started with DSpace, as they let you tweak the DSpace XMLUI (XSLTs or CSS) or JSPUI (JSPs) and see your changes get automatically reloaded by Tomcat (without having to restart Tomcat). However, it is worth noting that the Apache Tomcat documentation recommends Production sites leave the default values in place (reloadable="false" cachingAllowed="true"), as allowing Tomcat to automatically reload all changes may result in "significant runtime overhead".
It is entirely up to you whether to keep these Tomcat settings in place. We just recommend beginning with them, so that you can more easily customize your site without having to require a Tomcat restart. Smaller DSpace sites may not notice any performance issues with keeping these settings in place in Production. Larger DSpace sites may wish to ensure that Tomcat performance is more streamlined.
When I switched these boolean flags to reloadable="false" and cachingAllowed="true" the spiked CPU experience stopped immediately. I don't know if the warning about "Larger sites" applies to us or whether "streamlined performance" could refer to the negative activity I observed.
I presume there may be other problems with our installation that allowed this particular manifestation; one ominous clue is that our production server seems to be operating with these flags in the reloadable="true" configuration. Java, Tomcat, Windows, AND DSpace are ALL getting new versions at the same time so it is fairly difficult to pinpoint why similar Tomcat <context> settings produce such different results.
I am at least content for now to have new behavior and that the system has calmed down. I'll post more if I learn more but will be focusing next on other quandaries.
Update
FWIW, the attributes are settings that directly control Tomcat and they have changed between versions. E.g., cachingAllowed was removed in version 8 which means it can be removed from the Context elements. Compare:
https://tomcat.apache.org/tomcat-8.0-doc/config/context.html#Attributes
https://tomcat.apache.org/tomcat-7.0-doc/config/context.html#Attributes
And for good measure, here is the help text for reloadable in the Tomcat 8 documentation:
Set to true if you want Catalina to monitor classes in /WEB-INF/classes/ and /WEB-INF/lib for changes, and automatically reload the web application if a change is detected. This feature is very useful during application development, but it requires significant runtime overhead and is not recommended for use on deployed production applications. That's why the default setting for this attribute is false. You can use the Manager web application, however, to trigger reloads of deployed applications on demand.
So it would seem that the ultimate answer is that Tomcat 8 on Windows 2012-R2 with the flag reloadable='true' polls for changes to WEB-INF/lib and WEB-INF/classes. The volume of the folders and files to peruse may very well be the cause of these intense, spiked CPU events. For now I will be relying on reloadable='false' which definitely removes the symptom for us.
Not an explicit answer, but way too long for a comment
After reviewing the update on this question and reading a bit I suspect that this recurring issues is caused by a CuratorTask. Reasons being:
The stacktrace you acquired clearly shows that a WorkerThread managed by the DSpace library (so Tomcat is not to be blamed) is using the processor at those times.
After reading a bit about DSpace itself, it looks like that it has a feature that allows users to define curator tasks that should be periodically executed.
On top of this there is at least one task that is - according to the documentation - It is activated by default, so theoretically there can be any number of tasks activated by default.
Moreover this conversation reveals at least 1 curation task that is actived every 10 seconds.
All these together point to the same direction. I would suggest using the UI of DSpace (probably in Admin mode) to look around and find the active curation tasks and verify if their scheduling corresponds to what you have observed.
We are running TeamCity with 20-ish .Net build configurations. All of them uses MSBuild 4.0. We recently moved one of our build agents from a machine running Windows Server 2008 to a new physical server running Windows Server 2012, and after this migration all the builds are taking almost twice as long time to finish, compared to on the old server! The new server is more powerful than the old server, both in terms of CPU, RAM and Disk. We have been running benchmarks on both servers, all of them confirming that the new server should be more capable than the old one.
According to the build logs it seems that all the stages of the build is slower, so it's not just one of the build steps that are slower.
First thing we did was to check the CPU utilization during builds, and it is suspiciously low! Only 3-6% CPU usage on some of the cores. RAM usage was also very low. Could it be some config for the build agent that is slowing down the builds, and that we have overlooked?
Note: The new server is running as a virtual machine. At first we thought that was the reason, but then that should have been reflected on the benchmarks? This is the only virtual machine running on this physical server, and it has almost all HW resources dedicated. Would be interesting to hear if any of you have had similar bad experience with running the build server on a virtual machine. We have also tried booting "natively" from the VHD image, without any difference in the build times.
I know this could be a very tricky one to debug for "external" people, but I was hoping that someone could maybe give some good suggestion on where to look for the issue, as we are kind of stuck right now.
Edit: Tried activating the Performance Monitor tool in TeamCity, and it shows that both CPU, RAM and Disk usage is comfortably running at <10% (Disk access peaked at 35% a couple of times during the build)
In a nutshell: my JBoss instance is running ok, but after some days it's performance is slowly degrading.
Detailed:
I've got a setup with JBoss 5.1.0-GA and Java 1.6.0_18-b07 (x64) running on a 64 bits RHEL 4 box. The hardware is a virtual machine with 8 core Xeon X5550 / 20G ram.
The product deployed in JBoss contains a webservice on which a endurance test is performed.
No database is involved in the process.
The tests are performed using soapui with 4 threads and the tests are configured to create 20% cpu usage.
Let say, at first the average response times are 300ms. After 2 days, the response times are now 600ms, which I don't understand.
Of course I did some checks:
There are no memory leaks (confirmed with jprofiler)
Heap mem is always around 25-50%, perm space usage is 50%
GC is almost never busy
All threads are idle after inspecting a thread dump
While do some further investigations, I did a cpu profile with JProfiler at the beginning (when it's still fast), and on on the (slow) end. What I see then, is that every single call just is 100% slower!
Even call's to a simple Map#put(). (the # of invocations and the content of these maps are the same).
When running a profiler, there are no signs of blocked threads, just running threads.
Does anyone has a clue what's causing the performance degradation?
Thanks!
Update: solved the performance degradation by upgrading the Java version to 1.6.0_24 !
While out of options, I scanned through all the release notes of the java vm, and discovered a performance and reliability fix in 1.6.0_23. See also the
1.6.0_23 release notes
After the jvm upgrade, the performance stays the same and does not degrade over days.
Solution found by Jan :
Solved the performance degradation by upgrading the Java version to 1.6.0_24 !
While out of options, I scanned through all the release notes of the java vm, and discovered a performance and reliability fix in 1.6.0_23. See also the 1.6.0_23 release notes
After the jvm upgrade, the performance stays the same and does not degrade over days.*
A couple of times recently I have noticed that 'something' is causing the Windows System Process to sit at 50+% and it will not quit until the PC is rebooted. Happening on Win2k and Win XP so far.
This is particularly troublesome because it currently appears to be triggered by MSVC 2005/Incredibuild and rebooting the build servers is not a nice thing.
At the same time the 'System Idle Process' process is holding the rest of the CPU and the build steps themselves seem to be starved. ie. a module that normally takes <5 minutes to compile is currently taking 20+.
I'd take a few guesses at maybe being virus checker or tortoise svn but would desperatly like some other suggestions.
Edit:
I've been experiencing this as something that is triggered, and the culprit may not be ongoing. Thats not to say that some other ongoing process hasn't done something 'stupid' and is managing an active lock up of System while appearing to be idle itself.
System (100% of 1 core), and System Idle Process are sharing 98-100% of the total CPU.
Occasionaly mt.exe, link.exe, buildservice would get a look in at 1-2%.
I'm running VNC to view the machine, so it's getting a look in on occasion.
Edit 2:
When left the previous evening the build process seemed to be progressing all be it slowly, but after waiting another 13 hours the 1 hour build process hasn't completed. System is still hogging the 1 core.
My understanding is that the "System" process is the time spent in the kernel (so performing disk I/O, network I/O (you did mention Incredibuild) and the like) -- I'd check for disk fragmentation, virus checkers and possibly look at these on other machines in your Incredibuild cluster.
As the System Idle process runs at "Low" priority, it's a red herring that it'd be "taking up CPU time" -- if anything it's just showing that there is available CPU time available. The fact the processing is stuck to a single processor shows that the process is doing something that is not multi-core aware, or someone has set it's thread affinity to 1.
I've noticed the virus checking software that I use can radically slow down compilation but it does not extend beyond the end of the build. Turning off advanced and heuristic checking improves this to the extent that I do not have to disable the scanner entirely. I have changed my scanning strategy such that I use scheduled full scans now more than advanced on the fly scanning, as it hurts the perfromance of a number of apps. (n.b. I am using the latest cut of Kaspersky). I'm also using an automated backup tool (AJCBackup) that also needs to be restrained when compiling.
You may also want to consider disableing the Windows Indexing service on drives that are be used to create a lot of temporary and object files, as it doesn't provide much value in this context for the amount of performance it draws.
Edit: Have checked which processes are actually hogging the CPU core and traced them back to a given app?
We've encountered issues with Kaspersky and Incredibuild in our offices - compiles and sometimes links will just hang and never finish.
Only seems to affect some machines though which is wierd, and only Windows XP (Vista seems immune from what I've seen).
Only solution I've found so far is to turn Kaspersky off entirely - so if you find a solution then let me know!
RE: smacl, work from the Windows Search/Indexing Service (WSearch) won't be attributed to the System process's CPU time, it should come from the SearchIndexer.exe/SearchFilterHost.exe services (Vista+).
The majority of activity from System you will see will be in disk activity from the lazy writer and other disk accesses. CPU activity from System will be because of kernel activity such as drivers (ISRs/DPCs) and other kernel-level filters (which could include AV file and process filters).
Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) can aid in viewing CPU usage across processes, including System. You can use the public Microsoft Symbol Server and this resource to get you started.
If you can take a trace with Xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx), I can help you analyze where the CPU time is being spent in the System (kernel) context. Xperf isn't officially supported on XP, but you can take a trace on XP and analyze it on other systems.
Xperf and Process Explorer should be able to shine a spotlight on exactly the module(s) that are causing the runaway CPU usage. Symbols may not even be necessary to diagnose the problem; simply the module name can often point to the component in question that is slowing down your system. For example, high CPU usage from ndis.sys can point to network interrupts, or activity from modules such as aavmker4.sys can point to AV software (Avast! in this case).
And as always, check if there are any updated drivers and AV software for your system.
In my office, a conflict between Incredibuild and Spyware Doctor's Immunize feature caused similar issues. Turning off Immunize solved it for us.
What anti-virus/malware do you use?
I'm having same hangs when compiling using IncrediBuild in VS2003, on clean Windows 7 without any anti-virus. It worked fine on same box in XP and Vista.