I'd like to my Windows C++ program to be able to read the number of hard page faults it has caused. The program isn't running as administrator. Edited to add: To be clear, I'm not as interested in the aggregate page fault count of the whole system.
It looks like ETW might export counters for this, but I'm having a lot of difficulty figuring out the API, and it's not clear what's accessible by regular users as compared to administrators.
Does anyone have an example of this functionality lying around? Or is it simply not possible on Windows?
(OT, but isn't it sad how much easier this is on *nix? gerusage() and you're done.)
afai can tell the only way to do this would be to use ETW (Event Tracing for Windows) to monitor kernel Hard Page Faults. The event payload has a thread ID that you might be able to correlate with an existing process (this is going to be non-trivial btw) to produce a running per-process count. I don't see any way to get historical information per process.
I can guarantee you that this is A Hard Problem because Process Explorer supports only Page Faults (soft or hard) in its per-process display.
http://msdn.microsoft.com/en-us/magazine/ee412263.aspx
A page fault occurs when a sought-out
page table entry is invalid. If the
requested page needs to be brought in
from disk, it is called a hard page
fault (a very expensive operation),
and all other types are considered
soft page faults (a less expensive
operation). A Page Fault event payload
contains the virtual memory address
for which a page fault happened and
the instruction pointer that caused
it. A hard page fault requires disk
access to occur, which could be the
first access to contents in a file or
accesses to memory blocks that were
paged out. Enabling Page Fault events
causes a hard page fault to be logged
as a page fault with a type Hard Page
Fault. However, a hard fault typically
has a considerably larger impact on
performance, so a separate event is
available just for a hard fault that
can be enabled independently. A Hard
Fault event payload has more data,
such as file key, offset and thread
ID, compared with a Page Fault event.
I think you can use GetProcessMemoryInfo() - Please refer to http://msdn.microsoft.com/en-us/library/ms683219(v=vs.85).aspx for more information.
Yes, quite sad. Or you could just not assume Windows is so gimp that it doesn't even provide a page fault counter and look it up: Win32_PerfFormattedData_PerfOS_Memory.
There is a C/C++ sample on Microsoft's site that explain how to read performance counters: INFO: PDH Sample Code to Enumerate Performance Counters and Instances
You can copy/paste it and I think you're interested by the "Memory" / "Page Reads/sec" counters, as stated in this interesting article: The Basics of Page Faults
This is done with performance counters in windows. It's been a while since I've done anything with them. I don't recall whether or not you need to run as administrator to query them.
[Edit]
I don't have example code to provide but according to this page, you can get this information for a particular process:
Process : Page Faults/sec. This is an
indication of the number of page
faults that occurred due to requests
from this particular process.
Excessive page faults from a
particular process are an indication
usually of bad coding practices.
Either the functions and DLLs are not
organized correctly, or the data set
that the application is using is being
called in a less than efficient
manner.
I don't think you need administrative credential to enumerate the performance counters. A sample at codeproject Performance Counters Enumerator
Related
I feel very clear on what happens with a segmentation fault and a major page fault, but I'm a bit more curious on the subtleties of minor page faults, and maybe an example would be with dynamically linked libraries. Wikipedia says, for instance:
If the page is loaded in memory at the time the fault is generated, but is not marked in the memory management unit as being loaded in memory, then it is called a minor or soft page fault. The page fault handler in the operating system merely needs to make the entry for that page in the memory management unit point to the page in memory and indicate that the page is loaded in memory; it does not need to read the page into memory. This could happen if the memory is shared by different programs and the page is already brought into memory for other programs.
The line, "The page fault handler in the operating system merely needs to make the entry for that page in the memory management unit point to the page in memory" confuses me. Each process has its own page table. So if I try to map in, say, libc, what's the process that the kernel goes through to figure out that it's already been mapped? How does it know that another process is using it or that there's already a frame associated with it? Does this happen with the page cache? I was reading a bit about it here, but I think some clarification would be nice on the steps that occur in the kernel to identify and resolve a minor page fault would be helpful.
Edit: It looks like a radix tree is used to keep track? Although I'm not quite sure I'm understanding this correctly.
At first, the kernel has no idea if the page is in memory or not. Presumably, the process does have a handle open to the file however, so the kernel performs an operation through the kernel-side file descriptor entry. This involves calling into the filesystem which, of course, knows what pages of the file are resident in memory since it's the code that would load the page were one needed.
I'm doing this as a personal project, I want to make a visualizer for this data. but the first step is getting the data.
My current plan is to
make my program debug the target process step through it
each step record the EIP from every thread's context within the target process
construct the memory address the instruction uses from the context and store it.
Is there an easier or built in way to do this?
Have a look at Intel PIN for dynamic binary instrumentation / running a hook for every load / store instruction. intel-pin
Instead of actually single-stepping in a debugger (extremely slow), it does binary-to-binary JIT to add calls to your hooks.
https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/index.html
Honestly the best way to do this is probably instrumentation like Peter suggested, depending on your goals. Have you ever ran a script that stepped through code in a debugger? Even automated it's incredibly slow. The only other alternative I see is page faults, which would also be incredibly slow but should still be faster than single step. Basically you make every page not in the currently executing section inaccessible. Any RW access outside of executing code will trigger an exception where you can log details and handle it. Of course this has a lot of flaws -- you can't detect RW in the current page, it's still going to be slow, it can get complicated such as handling page execution transfers, multiple threads, etc. The final possible solution I have would be to have a timer interrupt that checks RW access for each page. This would be incredibly fast and, although it would provide no specific addresses, it would give you an aggregate of pages written to and read from. I'm actually not entirely sure off the top of my head if Windows exposes that information already and I'm also not sure if there's a reliable way to guarantee your timers would get hit before the kernel clears those bits.
Is there a way, in Windows, to check if a page in in memory or in disk(swap space)?
The reason I want know this is to avoid causing page fault if the page is in disk, by not accessing that page.
There is no documented way that I am aware of for accomplishing this in user mode.
That said, it is possible to determine this in kernel mode, but this would involve inspecting the Page Table Entries, which belong to the Memory Manager - not something that you really wouldn't want to do in any sort of production code.
What is the real problem you're trying to solve?
The whole point of Virtual Memory is to abstract this sort of thing away. If you are storing your own data and in user-land, put it in a data-structure that supports caching and don't think about pages.
If you are writing code in kernel-space, I know in linux you need to convert a memory address from a user-land to a kernal-space one, then there are API calls in the VMM to get at the page_table_entry, and subsequently the page struct from the address. Once that is done, you use logical operators to check for flags, one of which is "swapped". If you are trying to make something fast though, traversing and messing with memory at the page level might not be the most efficient (or safest) thing to do.
More information is needed in order to provide a more complete answer.
I'm trying to write a toy working set estimator, by keeping track of page faults over a period of time. Whenever a page is faulted in, I want to record that it was touched. The scheme breaks down when I try to keep track of accesses to already-present pages. If a page is read from or written to without triggering a fault, I have no way of tracking the access.
So then, I want to be able to cause a "lightweight" fault to occur on a page access. I've heard of some method at some point, but I didn't understand why it worked so it didn't stick in my mind. Dirty bit maybe?
You can use mprotect with PROT_NONE ("Page cannot be accessed"). Then any access to the given page will cause a fault.
The usual way to do this is to simply clear the "present" bit for the page, while leaving the page in memory and the necessary kernel data structures in place so that the kernel knows this.
However, depending on the architecture in question you may have better options - for example, on x86 there is an "Accessed" flag (bit 5 in the PTE) that is set whenever the PTE is used in a linear address translation. You can simply clear this bit whenever you like, and the hardware will set it to record that the page was touched.
Using either of these methods you will need to clear the cached translation for that page out of the TLB - on x86 you can use the INVLPG instruction.
I'm trying to profile a system with XPerf.
And see that performance problems occurs when there is activity in HardFaults !
But what I cant figure out and find in google what are these Hard Faults that xperf shows.
What are they related to?
What do they indicate?
Is there any universal remedy for such situations?
Hard faults table
Indeed.
"First of all, a "hard fault" was previously called a "page fault" in earlier versions of Windows. Perhaps page faults were more easily understood from the name, too. A hard fault happens when the address in memory of part of a program is no longer in main memory, but has been instead swapped out to the paging file, making the system go looking for it on the hard disk. When this happens a lot, it causes slowdowns and increased hard disk activity. When it happens an awful lot, the possibility of hard disk thrashing arises. That's when a program stops responding, but the hard drive continues to run for an extended period. This has historically been referred to as "getting into the page file."
Here is the article.
http://www.brighthub.com/computing/windows-platform/articles/52249.aspx
But be carefull with following suggestions of this article, because it is not quite correct to do so:
http://player.microsoftpdc.com/Session/1689962d-dea2-48bd-80d8-96e954fa5329
http://player.microsoftpdc.com/Session/1c97b279-d7e3-4a3e-9a76-0dac23dfddb5
A hard fault is when a the request process private page or file backed page is not in RAM. Hard faults occur for allocations that have been paged out, but also accesses to data file and executable images.
The type of page will determine where the data data will be read from. Most hard faults are not for data from teh page file, but for data files (your word doc, for example).
Vaguely I remember a hard fault is when the requested virtual memory block is not in memory anymore and needs to be paged-in from the swapfile.