I have a Go application which randomly starts using 100% CPU after its running for 2-3 days. The CPU usage stays 100% unless the application is killed. I haven't been able to figure out which goroutine is causing the issue. I installed pprof there which works as expected when application is running normally but when CPU goes to 100% even pprof stops responding.
I was able to get the following stacktrace logs by running kill -3 1
https://gist.github.com/lokesh-wise/1804b3dc4fe2df2d31f97fae70e1d6f8
(There are 2 log files, one for application running in bad state (100% CPU) and other for application running in good state.)
Any idea which goroutine is the culprit or how can I find what's causing 100% CPU?
Related
I need to diagnose a server that is unable to reach peak performance. CPU usage drops to zero for around 500ms and then spikes to 100% while trying to process the queued requests, this pattern repeats during a number of hours after which the operation becomes smooth again (Operation had been smooth for years)
This suggests to me that the worker threads are idling while awaiting for an external event to occur. The application is complex and we haven't been able to pinpoint the culprit.
Can Process Monitor be configured to log every time a thread sleeps awaiting for some event?
If possible, can the event be related to a particular stack trace?
If the above is possible, perhaps I could correlate the CPU drops with wait events and pinpoint the culprit.
I have successfully used Windbg before to diagnose these kinds of problems, however in this case, the wait is very brief and I'm not confident that I can make the debugger break exactly while the processor is idling.
Windbg and ProcMon are not the right tools for this job. Install the Windows Performance Toolkit which is part of the Windows 10 SDK on your Developer device.
Now xcopy the folder C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit to the server, open cmd.exe as admin and run wpr.exe -start CPU && timeout -1 && wpr.exe -stop C:\Hang.etl, now minimize the cmd.
After you got the hang, switch back to cmd and press a key to stop logging.
Move the Hang.etl + NGENPDB folder to the dev PC, open the Hang.etl with Windows Performance Analyzer (WPA.exe), load debug symbols and start finding the hang by adding the CPU (Precise) to analysis pane
and make sour you see the columns NewProcess, NewThreadId, NewStack, ReadyingProcess, ReadyingThreadId, ReadyingStack, Waits(us). Click on Waits(us) to see most long on top. Now look for long times, with a small Count (so small operations that take long time, not many operations) and inspect the callstack to have any clues what happens.
As soon as I start a Spring Boot 2.2 application from Eclipse on my Windows 10 Laptop I notice around 10% of permanent CPU usage, and the clock freq. is also barely below 2 GHz, even if the application is sitting there idle.
When I stop the application, cpu usage drops to 1-2%, and the clock goes below 1 GHz.
The application does one sql query every minute, but apart of that does no processing while idle. It basically sits there waiting for requests.
How can I figure out what is causing this usage which seems to prevent the cpu (i7-5600U with 8GB ram) throttling down?
I use Java 1.8.0_221.
Edit 1
I tried running the compiled jar from cmd.exe, and then the idle cpu usage is low.
Edit 2
I tried disabling spring-boot-devtools in my pom.xml, and the idle cpu usage is low as well.
Edit 3
This is probably the issue: https://github.com/spring-projects/spring-boot/issues/9882 It feels like a FileWatcher is set up even for excluded folders and files, which in my case is bower_components with thousands of files.
recently I had to run few heavy one-time queries on our MSSQL 2008 R2 64-bit server and faced a problem: executing them made SQL server consume 100% CPU which eventually (in about 20 seconds) made server absolutely unresponsive.
Thus I was forced to reboot it or wait until execution completes which took a lot of time depending on a query.
What I noticed is that setting CPU Affinity for SQL server to 7 cores instead of 8 available in task manager would keep server responsive so I could cancel my query if it took too long (and proceed with query optimizations without having too reboot).
But is it a good idea to limit CPU Affinity of SQL server?
Please share your thoughts. Server is being used for web-applications.
It turns out to be a Bad Idea.
After few days with CPU affinity 7/8 I noticed that SQL server would continuously load 1-2 cores up to 100% while other cores were available.
It is probably true that SQL Scheduler cannot distribute workload correctly when CPU affinity is limited.
Its years later but in case anyone finds this in search, your assumption is correct that work schedulers become locked to a core. However there is a trace flag to turn on in order to put this back: 8002.
I have noticed this issue today when i started my computer. My fan is making a lot of noise as if its overworking and my CPU usage is 100%. Though i am not running any other application but Google Chrome. Now, i went to taskmanager and checked out the process that's taking much of my CPU and found this "system" thing which is almost consuming 80% of my CPU. Now, when i clicked on the file location it turns out to be located in the Windows->System32 folder. Now, i want to minimize my CPU usage. What should i do?
Press Windows key + R and type devmgmt.msc and look for errors, broken devices, unknown devices, etc. and fix/update where necessary.
ntoskrnl.exe is your kernel.
Increased CPU usage of this is often caused by driver-related issues.
I've written a program that (among other things) downloads multiple large files from a server on the LAN, using TCP. This program runs fine under Linux, MacOS/X, and generally under Windows as well (it uses Qt for the GUI and straight sockets calls for networking), but on certain Windows machines the download appears to be too much for the machine to handle, and I'm wondering if anyone has any ideas as to why that is and what can be done about it.
When downloading files, my program spawns a separate I/O thread that basically just sits in a loop, downloading data over TCP and writing it to a file, writing 128KB per call to QFile:write(). Each file is typically several hundred megabytes long, and a typical download session writes out several dozen of these files. Note that the I/O thread runs independently of the GUI thread, so I wouldn't expect it to affect GUI's performance much if at all -- especially not when running on a multicore PC.
The PC in question is a Core-2Duo Quad Q6600 running at 2.40GHz, with 4GB of RAM. It's running Windows 7 Ultimate SP1, 32-bit. It is receiving data over a Gigabit Ethernet connection and writing it to files on the NTFS-formatted boot partition of the 232GB internal Hitachi ATA drive.
The symptom is that sometimes during a download (seemingly at random) the program's GUI will become non-responsive for 10 to 30 seconds at a time, and often the title bar of the window will have "(not responding)" appended to it. The symptom will then clear up again and the download will proceed normally again. Another symptom is that the desktop is extremely sluggish during the download... for example, if I click on the "Start" button, the Start menu will take ~30 seconds to populate, instead of being populated near-instantaneously as I would expect.
Note that Task Manager shows plenty of free memory, but it does show short spikes of CPU usage to 100% one one of the 4 cores, at the same time the problems are seen.
The data is arriving over Gigabit Ethernet, and if I have my program just receive the data and throw it away (without writing it to the hard drive), the machine can maintain a constant download rate of about 96MB/sec without breaking a sweat. If I write the received data to a file, however, the download rate decreases to about 37MB/sec, and the symptoms described above start to appear.
The interesting thing is that just for curiosity's sake I added this call to my I/O thread's entry function, just before the beginning of its event loop:
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_BELOW_NORMAL);
When I did that, the "(not responding)" symptoms cleared, but then download speed was reduced to only ~25MB/sec.
So my questions are:
Does anyone know what might be causing the sporadic hangups of the GUI when the hard drive is under a heavy write-load?
Why does lowering the I/O thread's priority cause the download rate to drop so much, given that there are three idle cores on the machine? I would think that even a lower-priority thread would have plenty of CPU available in this situation.
Is there any way to get a maximum download rate without causing Windows' desktop responsiveness and/or my app's GUI responsiveness to suffer problems?
Without seeing any code is hard to answer but this seems to be something related to processors and the fact that your download thread is not leaving any space for other threads to performs other operations.
It seems it never waits and that the driver of the network card is not well written.
Are you sure your thread is entering in an idle state when there is no data incoming?
In OS with a single processor a for (;;) {} will consume 100% cpu and if it talks continuously with the kernel it may stops other processes or other threads for doing that, especially if there is a bug or a very bad behaviour in some network card driver in your case.
Probably putting the thread priority below normal you are asking the OS to use your thread less often, this gives by a magical combination of things that allow things to not hang too much.
Check the code, maybe you are forgetting something?
Check if adding a sleep(0) to force the OS to yield to another thread sometime will make things better, but this is a temporary fix, you should find why your thread is consuming 100% cpu, if it is.