dtrace impact, monitoring processes (OS X)? - macos

I'm trying to find a efficient way to programmatically monitor, from user mode, what processes are started on my computer (OS X Yosemite). Since NSWorkspaceDidLaunchApplicationNotification only works for apps and kqueues (NOTE_EXIT) only allows one to monitor a specific process, dtrace probes seemed to be the way to go. I've played around with both /usr/bin/execsnoop and /usr/bin/newproc.d (and stripped down versions, that just install a single probe (syscall::posix_spawn:return) and do nothing else (e.g. no prints)).
These do great job getting me the info I need, but when I start an app that kicks off multiple processes/quickly execs multiple commands (e.g. VMWare Fusion) - the probe(s) seem to noticeably impact the system. Specifically kernel_task consistently spikes to 50%+ CPU usage for a few seconds and the OS UI (mouse, etc) noticeable slows/lags...if the dtrace probes are not installed, this behavior is never observed.
So a few questions:
1) any way to avoid this perf issue? (dtrace #pragmas?)
2) are dtrace probes cumulative? (if I install dtrace probes do I need to manually uninstall them, or does ctl+C clear/disable them?)
3) any way to see what dtrace probes are currently installed?
I'm not attached to using dtrace - but am not aware of another (non-polling) way to get the pid/process name of things that are started on OS X :/

I am extremely surprised to see a measurable impact after enabling a single probe; does even
dtrace -n syscall::posix_spawn:return
cause a problem? If so, are you running short of memory? DTrace does require a (by default) modest amount and its initialisation may be pushing you over the edge. Do you see the problem with anything besides Fusion? It appears to suffer from performance problems of its own on Yosemite.
Probes are shared between consumers. If there is only one running consumer (e.g. dtrace) then all DTrace probes will be removed when it exits. If two consumers have enabled the same probe then it will remain active until the last consumer exits.
Maybe. Someone with access to the OS X source code could modify this script.

Related

OS X Server 10.6.8 keeps crashing

I have an OS X Server 10.4 that started Kernel Panicking on me late december. It had Mail, Web and SQL services turned on.
I switched to another Mac, problem persisted. Updated to 10.6.8, no difference. Switched to a third Mac, no difference.
In short; It gets 1-3 Kernel Panics per day.
I have noticed that the CPU load slowly rises until it panics.
So during this weekend I turned off all services except Mail. It worked very good during the weekend. Today 1500 (15pm) I turned on Web services, and after that the CPU load started to rise...
Among the processes I found more than 300 perl processes owned by the _www user. As well as the clamavd (owned by amisvd if I remember correctly) eating a lot of memory.
Clamavd is the anti virus and spam processes, right? I have tried to turn them off in the OS X Server admin app. It won't turn off.
Regarding the Perl processes I really don't know where that is used. The info panel doesn't give me any idea. I think it means that one of the sites (I have about 20) is using perl, but I don't know which one.
So, some questions:
- Is it safe to upgrade to OS X El Capitan Server? Should I?
- How can I kill all Perl processes at one time? (I have googled this but couldn't find any working tips)
- Any idea of what's going on?
As you may tell I'm not very skilled with Apache, but I still think it's funny and educating to do this.
Graph
Perl processes
How can you kill all Perl processes at one time? killall is available on OS X. Try
killall -9 perl
HTH
btw - my apologies to ThisSuitIsBlackNot -- I see he referred to this in his comments.

Windows 7: poor GUI response in my program while downloading data; is there some way to improve this?

I've written a program that (among other things) downloads multiple large files from a server on the LAN, using TCP. This program runs fine under Linux, MacOS/X, and generally under Windows as well (it uses Qt for the GUI and straight sockets calls for networking), but on certain Windows machines the download appears to be too much for the machine to handle, and I'm wondering if anyone has any ideas as to why that is and what can be done about it.
When downloading files, my program spawns a separate I/O thread that basically just sits in a loop, downloading data over TCP and writing it to a file, writing 128KB per call to QFile:write(). Each file is typically several hundred megabytes long, and a typical download session writes out several dozen of these files. Note that the I/O thread runs independently of the GUI thread, so I wouldn't expect it to affect GUI's performance much if at all -- especially not when running on a multicore PC.
The PC in question is a Core-2Duo Quad Q6600 running at 2.40GHz, with 4GB of RAM. It's running Windows 7 Ultimate SP1, 32-bit. It is receiving data over a Gigabit Ethernet connection and writing it to files on the NTFS-formatted boot partition of the 232GB internal Hitachi ATA drive.
The symptom is that sometimes during a download (seemingly at random) the program's GUI will become non-responsive for 10 to 30 seconds at a time, and often the title bar of the window will have "(not responding)" appended to it. The symptom will then clear up again and the download will proceed normally again. Another symptom is that the desktop is extremely sluggish during the download... for example, if I click on the "Start" button, the Start menu will take ~30 seconds to populate, instead of being populated near-instantaneously as I would expect.
Note that Task Manager shows plenty of free memory, but it does show short spikes of CPU usage to 100% one one of the 4 cores, at the same time the problems are seen.
The data is arriving over Gigabit Ethernet, and if I have my program just receive the data and throw it away (without writing it to the hard drive), the machine can maintain a constant download rate of about 96MB/sec without breaking a sweat. If I write the received data to a file, however, the download rate decreases to about 37MB/sec, and the symptoms described above start to appear.
The interesting thing is that just for curiosity's sake I added this call to my I/O thread's entry function, just before the beginning of its event loop:
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_BELOW_NORMAL);
When I did that, the "(not responding)" symptoms cleared, but then download speed was reduced to only ~25MB/sec.
So my questions are:
Does anyone know what might be causing the sporadic hangups of the GUI when the hard drive is under a heavy write-load?
Why does lowering the I/O thread's priority cause the download rate to drop so much, given that there are three idle cores on the machine? I would think that even a lower-priority thread would have plenty of CPU available in this situation.
Is there any way to get a maximum download rate without causing Windows' desktop responsiveness and/or my app's GUI responsiveness to suffer problems?
Without seeing any code is hard to answer but this seems to be something related to processors and the fact that your download thread is not leaving any space for other threads to performs other operations.
It seems it never waits and that the driver of the network card is not well written.
Are you sure your thread is entering in an idle state when there is no data incoming?
In OS with a single processor a for (;;) {} will consume 100% cpu and if it talks continuously with the kernel it may stops other processes or other threads for doing that, especially if there is a bug or a very bad behaviour in some network card driver in your case.
Probably putting the thread priority below normal you are asking the OS to use your thread less often, this gives by a magical combination of things that allow things to not hang too much.
Check the code, maybe you are forgetting something?
Check if adding a sleep(0) to force the OS to yield to another thread sometime will make things better, but this is a temporary fix, you should find why your thread is consuming 100% cpu, if it is.

Temporarily suspend the PC operating system

How does one programmatically cause the OS to switch off, go away and stop doing anything at all so that a program may have complete control of a PC system?
I'm interested in doing this from both an MS Windows and Linux environments. Any languages or APIs considered.
I want the OS to stop preempting my program, stop its virtual memory management, stop its device drivers and interrupt service routines from running and basically just go away. Then, when my program has had its evil way with the bare metal, I want the OS to come back again without a reboot.
Is this even possible?
With Linux, you could use kexec jump to transfer control completely to another kernel (ie, your program). Of course, with great power comes great responsibility - it is entirely up to you to service interrupts, and avoid corrupting the old kernel's memory. You'll end up having to write your own OS kernel to do this. Also, the transfer of control takes quite some time, as the kernel has to de-initialize all hardware, then reinitialize it when it's time to resume. Since kexec jump was originally designed for hibernation support, this isn't a problem in its original context, but depending on what you're doing, it might be a problem.
You may want to consider instead working within the framework given to you by the OS - just write a normal driver for whatever you're doing.
Finally, one more option would be using the linux Real-Time patchset. This lets you assign static priorities to everything, even interrupt handlers; by running a process with higher priority than anything else, you could suspend /nearly/ everything - the system will still service a small stub for interrupts, as well as certain interrupts that can't be deferred, like timing interrupts, but for the most part the heavy work will be deferred until you relinquish control of the CPU.
Note that the RT patchset won't stop virtual memory and the like - mlockall will prevent page faults on valid pages though, if that's enough for you.
Also, keep in mind that whatever you do, the system BIOS can still cause SMM traps, which cannot be disabled, except by motherboard-model-specific methods.
There are lots of really ugly ways to do this. You could modify the running kernel by writing some trampoline code to /dev/kmem that passes control to your application. But I wouldn't recommend attempting something like that!
Basically, you would need to have your application act as its own operating system. If you want to read data from a file, you would have to figure out where the data lives on disk, and generate your own SCSI requests to talk to the disk drive. You would have to implement your own interrupt handler to get notified when the data is ready. Likewise you would have to handle page faults, memory allocation, etc. Most users feel that this isn't worth the effort...
Why do you want to do this?
Is there something that your application needs to do that the OS won't let it do? Are you concerned with the OS impact on performance? Something else?
If you don't mind shelling out some cash, you could use IntervalZero's RTX to do this for a Windows system. It's a hard realtime subsystem that gets installed on a Windows box as sort of a hack into the HAL and takes over the machine, letting Windows have whatever CPU cycles are left over.
It has its own scheduler and device drivers, but if you run your program at the top RTX priority, don't install any RTX device drivers (or disable interrupts for the duration), then nothing will interrupt it.
It also supports a small amount of interaction with programs on the Windows side.
We use it as a nice way to get a hard realtime box that runs Windows.
coLinux loads CoLinuxDriver into the NT kernel or a colinux.ko into the Linux kernel. It does exactly what you asked – it "unschedules" the host OS, and runs its own code, with its own memory management, interrupts, etc. Then, when it's done, it "reschedules" the host OS, allowing it to continue from where it left off. coLinux uses this to run a modified Linux kernel parallel to the host OS.
Unlike more common virtualization techniques, there are no barriers between coLinux and the bare metal hardware at all. However, hardware and the host OS tend to get confused if the coLinux guest touches anything without restoring it before returning to the host OS.
Not really. Operating Systems are a foundation, and your program runs on top of them. The OS handles memory access, disk writing operations, communications, etc. when your application makes requests, and asking the OS to move out of the way would mean that your program would have to do the OS's job instead.
Not as such, no.
What you want is basically an application that becomes an OS; a severely stripped down Linux kernel coupled with some highly customized and minimized tools might be the way to go for this.
if you were devious, and wanted to avoid alot of the operating system housekeeping you could probably hook yourself into a driver routine. Thinking out aloud, verging on hacking. google how to write root kits.
Yeah dude, you can totally do that, you can also write a program to tell my bank to give you all my money and send you a hot Russian.

how to controls the CPU usage of an app on OS X?

I'm running an application right now which seems to be running at full throttle, but even though the fan seems to be spinning at it's max and the activity monitor reports that the application is using 100% of the processor, I'm suspecting that at the most it is using 100% only of a single of the two cores on my machine.
How can I tell OS X to allow an application use 100%, or as much as the OS can allow, of the processing power of my computer? I have tried some terminal commands like "nice" and "renice" to set up the priority of this process but still can't get it to run at full throttle.
I also would like to know how to do the opposite, set a limit of the processor usage of an app, example set app X to run at 20%.
Is this possible to do without modifying the code of the app?
The answer to this depends upon whether your application is multi-threaded or not. If this is a single-threaded application, which it is unless you have specifically made it multi-threaded then the process will run on one core of your multi-core hardware. There is nothing you can do about this it's a function of the underlying operating system.
If your program is multi-threaded then it is possible to have different threads executing on separate cores. This will increase the overall usage of the process and allow figures greater than 100%.
You can not however force the machine to use 'all' of the processing power available, but you can influence it with nice.
In order to reduce the amount of processor used then you can use nice to lower the priority of the process. If you are root you can also use nice to increase the priority of your process

What causes the MS Windows 'System' Process to go nuts when compiling?

A couple of times recently I have noticed that 'something' is causing the Windows System Process to sit at 50+% and it will not quit until the PC is rebooted. Happening on Win2k and Win XP so far.
This is particularly troublesome because it currently appears to be triggered by MSVC 2005/Incredibuild and rebooting the build servers is not a nice thing.
At the same time the 'System Idle Process' process is holding the rest of the CPU and the build steps themselves seem to be starved. ie. a module that normally takes <5 minutes to compile is currently taking 20+.
I'd take a few guesses at maybe being virus checker or tortoise svn but would desperatly like some other suggestions.
Edit:
I've been experiencing this as something that is triggered, and the culprit may not be ongoing. Thats not to say that some other ongoing process hasn't done something 'stupid' and is managing an active lock up of System while appearing to be idle itself.
System (100% of 1 core), and System Idle Process are sharing 98-100% of the total CPU.
Occasionaly mt.exe, link.exe, buildservice would get a look in at 1-2%.
I'm running VNC to view the machine, so it's getting a look in on occasion.
Edit 2:
When left the previous evening the build process seemed to be progressing all be it slowly, but after waiting another 13 hours the 1 hour build process hasn't completed. System is still hogging the 1 core.
My understanding is that the "System" process is the time spent in the kernel (so performing disk I/O, network I/O (you did mention Incredibuild) and the like) -- I'd check for disk fragmentation, virus checkers and possibly look at these on other machines in your Incredibuild cluster.
As the System Idle process runs at "Low" priority, it's a red herring that it'd be "taking up CPU time" -- if anything it's just showing that there is available CPU time available. The fact the processing is stuck to a single processor shows that the process is doing something that is not multi-core aware, or someone has set it's thread affinity to 1.
I've noticed the virus checking software that I use can radically slow down compilation but it does not extend beyond the end of the build. Turning off advanced and heuristic checking improves this to the extent that I do not have to disable the scanner entirely. I have changed my scanning strategy such that I use scheduled full scans now more than advanced on the fly scanning, as it hurts the perfromance of a number of apps. (n.b. I am using the latest cut of Kaspersky). I'm also using an automated backup tool (AJCBackup) that also needs to be restrained when compiling.
You may also want to consider disableing the Windows Indexing service on drives that are be used to create a lot of temporary and object files, as it doesn't provide much value in this context for the amount of performance it draws.
Edit: Have checked which processes are actually hogging the CPU core and traced them back to a given app?
We've encountered issues with Kaspersky and Incredibuild in our offices - compiles and sometimes links will just hang and never finish.
Only seems to affect some machines though which is wierd, and only Windows XP (Vista seems immune from what I've seen).
Only solution I've found so far is to turn Kaspersky off entirely - so if you find a solution then let me know!
RE: smacl, work from the Windows Search/Indexing Service (WSearch) won't be attributed to the System process's CPU time, it should come from the SearchIndexer.exe/SearchFilterHost.exe services (Vista+).
The majority of activity from System you will see will be in disk activity from the lazy writer and other disk accesses. CPU activity from System will be because of kernel activity such as drivers (ISRs/DPCs) and other kernel-level filters (which could include AV file and process filters).
Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) can aid in viewing CPU usage across processes, including System. You can use the public Microsoft Symbol Server and this resource to get you started.
If you can take a trace with Xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx), I can help you analyze where the CPU time is being spent in the System (kernel) context. Xperf isn't officially supported on XP, but you can take a trace on XP and analyze it on other systems.
Xperf and Process Explorer should be able to shine a spotlight on exactly the module(s) that are causing the runaway CPU usage. Symbols may not even be necessary to diagnose the problem; simply the module name can often point to the component in question that is slowing down your system. For example, high CPU usage from ndis.sys can point to network interrupts, or activity from modules such as aavmker4.sys can point to AV software (Avast! in this case).
And as always, check if there are any updated drivers and AV software for your system.
In my office, a conflict between Incredibuild and Spyware Doctor's Immunize feature caused similar issues. Turning off Immunize solved it for us.
What anti-virus/malware do you use?
I'm having same hangs when compiling using IncrediBuild in VS2003, on clean Windows 7 without any anti-virus. It worked fine on same box in XP and Vista.

Resources