How to measure L1, L2, L3 cache hits & misses in OSX - macos

I've a C++ program and I would like to quantify it's performance by checking the number of hits and misses against the CPU cache.
What's the best way to do it?
I tried using Intel's Performance Counter Monitor but it uses an unsigned Kernel Extension which are disabled on Yosemite. I can obviously disable the check to not load unsigned kexts but I wouldn't like to go down that path.
Is there any other possible way that I'm unaware of?

You can enable unsigned kernel extensions with OS X (reboot afterwards required):
sudo nvram boot-args=kext-dev-mode=1
This enables developer mode on your machine and you can run Intel Performance Counter Monitor as long at it supports Mac OS X 10.10 (Yosemite) in general.
Don't forget to disable it again after you are done with testing (security-issue otherwise):
sudo nvram boot-args=kext-dev-mode=0
As far as I know Intel's tool is far better than cache grind because it uses actual counters from the hardware instead of simulating an cpu and it's cache characteristics in software.

You could, in principle, apply for a kext signing certificate, if you're an Apple developer programme member, and sign the kext yourself. But they generally don't hand them out for internal use, and recommend you enable kext-dev-mode or disable SIP (depending on version). Another good path would be to ask Intel to provide a signed version of their kext!

Related

Immidiate and latent effects of modifying my own kernel binary at runtime?

I'm more of a web-developer and database guy, but severely inconvenient performance issues relating to kernel_task and temperature on my personal machine have made me interested in digging into the details of my Mac OS (I notices some processes would trigger long-lasting spikes in kernel-task, despite consistently low CPU temperature and newly re-imaged machine).
I am a root user on my own OSX machine. I can read /System/Library/Kernels/kernel. My understanding is this is "Mach/XNU" Kernel of this machine (although I don't know a lot about those, but I'm surprised that it's only 13Mb).
What happens if I modify or delete /System/Library/Kernels/kernel?
I imagine since it's at run-time, things might be okay until I try to reboot. If this is the case, would carefully modifying this file change the behavior of my OS, only effective on reboot, presuming it didn't cause a kernel panic? (is kernel-panic only a linux thing?)
What happens if I modify or delete /System/Library/Kernels/kernel?
First off, you'll need to disable SIP (system integrity protection) in order to be able to modify or edit this file, as it's protected even from the root user by default for security reasons.
If you delete it, your system will no longer boot. If you replace it with a different xnu kernel, that kernel will in theory boot next time, assuming it's sufficiently matched to both the installed device drivers and other kexts, and the OS userland.
Note that you don't need to delete/replace the kernel file to boot a different one, you can have more than one installed at a time. For details, see the documentation that comes with Apple's Kernel Debug Kits (KDKs) which you can download from the Apple Developer Downloads Area.
I imagine since it's at run-time, things might be okay until I try to reboot.
Yes, the kernel is loaded into memory by the bootloader early on during the boot process; the file isn't used past that, except for producing prelinked kernels when your device drivers change.
Finally, I feel like I should explain a little about what you actually seem to be trying to diagnose/fix:
but severely inconvenient performance issues relating to kernel_task and temperature on my personal machine have made me interested in digging into the details of my Mac OS
kernel_task runs more code than just the kernel core itself. Specifically, any kexts that are loaded (see kextstat command) - and there are a lot of those on a modern macOS system - are loaded into kernel space, meaning they are counted under kernel_task.
Long-running spikes of kernel CPU usage sound like they might be caused by file system self-maintenance, or volume encryption/decryption activity. They are almost certainly not basic programming errors in the xnu kernel itself. (Although I suppose stupid mistakes are easy to make.)
Another possible culprits are device drivers; especially GPU drivers are incredibly complex pieces of software, and of course are busy even if your system is seemingly idle.
The first step to dealing with this problem - if there indeed is one - would be to find out what the kernel is actually doing with those CPU cycles. So for that you'd want to do some profiling and/or tracing. Doing this on the running kernel most likely again requires SIP to be disabled. The Instruments.app that ships with Xcode is able to profile processes; I'm not sure if it's still possible to profile kernel_task with it, I think it at least used to be possible in earlier versions. Another possible option is DTrace. (there are entire books written on this topic)

Record values of Performance Monitor Counters (PM events) on OS X without Instruments

In Xcode's Instruments, there is a tool called Counters that exposes low-level counter information provided by the CPU, such as the number of instructions executed or number of cache misses:
This is similar to the Linux syscall perf_event_open introduced in Linux 2.6.32. On Linux, I can use perf_event_open then start/stop profiling around the section of my code I'm interested in. I'd like to record the same type of stats on OS X: counting the instructions (etc.) that a certain piece of code takes, and getting the result in an automated fashion. (I don't want to use the Instruments GUI to analyze the data.)
Are there any APIs that allow this (ex: using dtrace or similar)? From some searching it sounds like the private AppleProfileFamily.framework might have the necessary hooks, but it's unclear how to go about linking to or using it.
In GNU/Linux I use Intel's PCM to monitor CPU utilization. I'm not sure if this works fine on OSX, but as far as I know the source-code is including the MacMSRDriver directory. I have no any OSX device, never test it anyway.
In case this source compiled on your device, Just run:
pcm.x -r -- your_program your_program_parameter
or if you want advanced profiling, use pcm-core.x instead or you can build your own code based on pcm-core.cpp

Do I need two machines to develop IOKit Mac drivers?

I'm building an IOKit CFPlugin driver for OS X. I'll be working with network data coming in that will be translated to MIDI data. No hardware is involved other than the built-in Airport. I have experience with drivers on Windows machines and firmware but this is my first dip into doing it on the Mac. So far things are going pretty well, but the Apple documentation sez: "For safety reasons, you should not load your driver on your development machine."
I only have one Mac. I really don't want two Macs- sorry, Apple. Should I take this warning seriously? Are there things I need to know?
Thanks, Tom Jeffries
You could also consider running OS X inside a VM as your testbed. It would surely be much more convenient that having a separate boot volume.
The warning is rather poorly worded; what you should consider doing is using a separate boot volume (partition) for trying out your driver, since it's possible to arbitrarily hose your system with your driver.
If you're doing kernel development on any OS that isn't isolated from your main system (via a VM, alternate boot disk, etc.), you're crazy!
What may be a bigger issue is that you can't do any kernel debugging, because the only option for that is to use GDB on a remote OS X system. For this, you may want to consider running OS X in virtualization.
you DEFINITELY want to have some way to recover a fubar kext installation: a bootable external drive or something you can quickly restore from-- this is the main reason for Apple's warning against running in-development-kernel-extensions on your production machine.
Nicholas is right that in order to debug using gdb (the only way in kernel space) you do need two machines. I've never tried using a VM as Coxy suggests: but I guess it's feasible (assuming that you run your kext on the virtual machine and use the real host machine to run gdb).
My preferred method for tracing and debugging in the kernel is kprintf() routed to firewire (aka firewire kprintf (man fwkpfv) ). for this you do need two machines with firewire ports.
finally, being an old computer musician myself, I wonder why you want to program a MIDI synthesizer (or transformer) on the network stack level. my guess is that you would have a much more gratifying experience working in userland (where you can use floating point math...)
if you need some hints or tips, feel free to get in touch...
|K<
from the ADC Kernel Programming Guide
Kernel programming is a black art that
should be avoided if at all possible.
Fortunately, kernel programming is
usually unnecessary. You can write
most software entirely in user space.
Even most device drivers (FireWire and
USB, for example) can be written as
applications, rather than as kernel
code. A few low-level drivers must be
resident in the kernel's address
space, however, and this document
might be marginally useful if you are
writing drivers that fall into this
category.

Disable Turbo Boost on Core i7 Mac?

Is there any way to programmatically disable Turbo Boost on a Core i7 mac running Mac OS X ? I need to be able to do this for benchmarking purposes during code optimisation etc. Failing that, any kind of utility which can disable/enable Turbo Boost, even if it requires a reboot, would be useful.
There is a related question (not Mac-specific) on SO: How to turn off Turbo Boost temporarily? but even for PCs it seems that there may be no way to do this programatically/on-the-fly ?
I wrote kernel extension that let's you disable TB, have fun:
https://github.com/nanoant/DisableTurboBoost.kext
If you want to disable TB on Linux here another recipe: http://luisjdominguezp.tumblr.com/post/19610447111/disabling-turbo-boost-in-linux
I've just coded an app that allows to load / unload the kernel extension mentioned before, helping to track the system behaviour displaying CPU Temp & current fan speed.
You can check it out here https://github.com/rugarciap/Turbo-Boost-Switcher
Here is an screenshot of how it looks like http://i.stack.imgur.com/tsKaG.png
You can't. Certain stuff needs to be configured from the BIOS, such as TurboBoost or Vt.
In particular, this is done with the IA32_FEATURE_CONTROL MSR. On a PC, at boot time the MSR is unlocked and the BIOS sets the correct bits to enable or disable features. Once configuration is complete, the BIOS locks the MSR for the changes to take effect and prevent future modification.
I don't know if it's possible to unlock the MSR again before the PC is brought into protected mode, and I don't know how this works on a MacBook where EFI is used instead of BIOS. You'll probably be able to pull it off with an EFI extension of sorts.
CPUID.com's Tmonitor utility can disable/enable Turbo Boost on-the-fly from within Windows, not at boot! There must be a way to do the same thing from within OSX.
Finally there seems to be a good solution for this problem which I have tested with Mac OS X Lion on a Core i7 MacBook Pro today and it appears to work well. Adam Strzelecki, a researcher in parallel computing at Jagiellonian University in Krakow, Poland has written DisableTurboBoost.kext - this is a small kext which can be loaded and unloaded at will (via the command line) to disable/enable TurbBoost.

ARM Cortex-A8: How to measure cache utilization?

I have a Freescale's i.MX515EVK, an ARM Cortex-A8/Ubuntu platform with me, unfortunately the Linux kernel on the board is not supporting some of the well known profilers such as Oprofiler or Zoom Profiler(Zoom supports ARM processors, but it internally, uses Oprofiler driver) which give very detailed reports about the cache utilization.
Cortex-A8 has 32KB Instruction and Data caches and a 256KB L2 Cache. Currently when my image processing algorithm is running, I'm totally blind about their usage.
Are there any other methods, other than using profilers to find out cache hits and misses?
Install Valgrind (it supports ARM nowdays) and use the cachegrind tool to check cache utilization. If you are running Ubuntu on the device, it should be as simple as sudo apt-get install valgrind. Valgrind can also help you simulate what would happen with different cache sizes.

Resources