What is Hard Faults in XPerf - performance

I'm trying to profile a system with XPerf.
And see that performance problems occurs when there is activity in HardFaults !
But what I cant figure out and find in google what are these Hard Faults that xperf shows.
What are they related to?
What do they indicate?
Is there any universal remedy for such situations?
Hard faults table

Indeed.
"First of all, a "hard fault" was previously called a "page fault" in earlier versions of Windows. Perhaps page faults were more easily understood from the name, too. A hard fault happens when the address in memory of part of a program is no longer in main memory, but has been instead swapped out to the paging file, making the system go looking for it on the hard disk. When this happens a lot, it causes slowdowns and increased hard disk activity. When it happens an awful lot, the possibility of hard disk thrashing arises. That's when a program stops responding, but the hard drive continues to run for an extended period. This has historically been referred to as "getting into the page file."
Here is the article.
http://www.brighthub.com/computing/windows-platform/articles/52249.aspx
But be carefull with following suggestions of this article, because it is not quite correct to do so:
http://player.microsoftpdc.com/Session/1689962d-dea2-48bd-80d8-96e954fa5329
http://player.microsoftpdc.com/Session/1c97b279-d7e3-4a3e-9a76-0dac23dfddb5

A hard fault is when a the request process private page or file backed page is not in RAM. Hard faults occur for allocations that have been paged out, but also accesses to data file and executable images.
The type of page will determine where the data data will be read from. Most hard faults are not for data from teh page file, but for data files (your word doc, for example).

Vaguely I remember a hard fault is when the requested virtual memory block is not in memory anymore and needs to be paged-in from the swapfile.

Related

How exactly do minor page faults get identified/resolved?

I feel very clear on what happens with a segmentation fault and a major page fault, but I'm a bit more curious on the subtleties of minor page faults, and maybe an example would be with dynamically linked libraries. Wikipedia says, for instance:
If the page is loaded in memory at the time the fault is generated, but is not marked in the memory management unit as being loaded in memory, then it is called a minor or soft page fault. The page fault handler in the operating system merely needs to make the entry for that page in the memory management unit point to the page in memory and indicate that the page is loaded in memory; it does not need to read the page into memory. This could happen if the memory is shared by different programs and the page is already brought into memory for other programs.
The line, "The page fault handler in the operating system merely needs to make the entry for that page in the memory management unit point to the page in memory" confuses me. Each process has its own page table. So if I try to map in, say, libc, what's the process that the kernel goes through to figure out that it's already been mapped? How does it know that another process is using it or that there's already a frame associated with it? Does this happen with the page cache? I was reading a bit about it here, but I think some clarification would be nice on the steps that occur in the kernel to identify and resolve a minor page fault would be helpful.
Edit: It looks like a radix tree is used to keep track? Although I'm not quite sure I'm understanding this correctly.
At first, the kernel has no idea if the page is in memory or not. Presumably, the process does have a handle open to the file however, so the kernel performs an operation through the kernel-side file descriptor entry. This involves calling into the filesystem which, of course, knows what pages of the file are resident in memory since it's the code that would load the page were one needed.

Why memory-mapped files are always mapped at page boundaries?

This is my first question here; I'm not sure if it is off-topic.
While self-studying, I have found the following statement regarding Operating Systems:
Operating systems that allow memory-mapped files always require files to be mapped at page boundaries. For example, with 4-KB page, a file can be mapped in starting at virtual address 4096, but not starting at virtual address 5000.
This statement is explained in the following way:
If a file could be mapped into the middle of page, a single virtual page would
need two partial pages on disk to map it. The first page, in particular, would
be mapped onto a scratch page and also onto a file page. Handling a page
fault for it would be a complex and expensive operation, requiring copying of
data. Also, there would be no way to trap references to unused parts of pages.
For these reasons, it is avoided.
I would like to ask for help to understand this answer. Particularly, what does it mean to say that "a single virtual page would need two partial pages on disk to map it"? From what I found about memory-mapped files, virtual pages are mapped to files on disk, and not to a paging file. Is this what is meant by "partial page"?
Also, what is meant by "scratch page" here? I've tried to look up this term on books (Tanenbaum's "Modern Operating Systems" and "Structured Computer Organization") and on the Web, but haven't found it.
First of all, when reading books and documentation always try to look critically at what you see. Sometimes authors tend to use language like "there is no other way" just to promote the solution that they are describing. Other ways are always possible.
Now to the matter. Modern operating systems always have a disk location for every allocated memory page. This makes sense. Once it will be necessary to discard the page in the memory - it is already clear where to put this page if it is 'dirty' or just discard it if it is not modified. This strategy is widely accepted. Although alternative policies are possible also.
The disk location can be either paging file or memory mapped file. The most common use of the memory mapped files - executables and dlls. They are (almost) never modified. If a page with the code is not used for some time - discard it. If control will come there - reread it from the file.
In the abstract that you mentioned, they say would need two partial pages on disk to map it. The first page, in particular, would be mapped onto a scratch page. They tend to present situation like there is only one solution here. In fact, it is possible to allocate page in a paging file for such combined page and handle appropriate data copying. It is also possible not to have anything in the paging file for such page and assemble this page from files using transient page. In 99% of cases disk controller can read/write only from/to the page boundary. This means that you need to read from the first file to memory page, from the second file to the transient page. Copy data from the transient page and immediately discard it.
As you see, it is perfectly possible to combine several files in one page. There is no principle problem here. Although algorithms for handling this solution will be more complex and they will consume more CPU clocks. Reconstructing such page (if it will be discarded) will require reading from several different files. In our days 4kb is rather small quantity. Saving 2kb is not a huge gain. In my opinion, looking at the benefits and the cost I would say that benefits are not significant enough.
Virtual address pages (on every machine I've ever heard of) are aligned on page sized boundaries. This is simply because it makes the math very easy. On x86, the page size is 4096. That is exactly 12 bits. To figure out which virtual page an address is referring to, you simply shift the address to the right by 12. If you were to map a disk block (assume 4096 bytes) to an address of 5000, it would start on page #1 (5000 >> 12 == 1) and end on page #2 (9095 >> 12 == 2).
Memory mapped files work by mapping a chunk of virtual address space to the file, but the data is loaded on demand (indeed, the file could be far larger than physical memory and may not fit). When you first access the virtual address, if the data isn't there (i.e. it's not in physical memory). The processor will fault and the OS has to fetch the data. When you fetch the data, you need to fetch all of the data for the page, or else you wouldn't be able to turn off the fault. If you don't have the addresses aligned, then you'd have to bring in multiple disk blocks to fill the page. You can certainly do this, it's just messy and inefficient.

How do you see the specific methods responsible for memory allocation in XCode Instruments?

I've been asked to try and reduce memory usage in an app's code I have been given. The app runs fine in the simulator but on the device it is terminated or something, when debugging it enters a 'Paused' state and the app closes on the device.
When running instruments I discovered leaks, fixed them, however there is a large amount of allocation going on. Within a few seconds of launch the instruments allocation trace shows 1,021 KB for 'Malloc 16 Bytes'. This is essentially useless information as is, I need to see where the memory is being allocated but I can't seem to find anything useful. All i can get for a deeper inspection is that 'dyld', 'libsystem_c.dylib', 'libCGFreetype.A.dylib' etc are allocating a lot, but the responsible caller is never a recognizable method from the app source.
How can I see what methods are causing the most allocations here? I need to get this usage down! Thank you
Opening the extended detail view will show the call stack for memory allocations. Choose View > Extended Detail to open the extended detail view.
Switching to the call tree view will help you find where you are allocating memory in your code. Use the jump bar to switch to the call tree view.
1MB is no big deal. You can't do much in terms of throwing up a full view without using 1MB.
There's a good video from WWDC 2010 (http://developer.apple.com/videos/wwdc/2010/) that covers using instruments to analyze memory use. Title is Advanced Memory Analysis with Instruments. There may be an updated video from 2011.

Windows Memory Workings - page tables, and data

I was trying to understand following:
I know that page tables are built for translation between virtual memory and physical memory by virtual memory manager at some point. Since there are many processes running on a system, even though only process active at a time, I was wondering whether page tables for inactive process are moved to page file at any point of time? Given the fact that lower 2 GB area is reserved for windows, it would make sense that windows would keep page tables for all processes on the system. Although it would make sense as well that they are moved to page file if the current process is switched?
Same goes for the writable (data) pages. Will windows keep all the data pages for all the process in memory or move them to page file at some point. On my machine, task manager says 1.5 GB RAM is being utilized out of 3 GB and 1.5 is system cache in performance tab so my understanding is data stays in physical memory for all applications. But would there be a time when it needs to moved to paging file?
I was wondering whether page tables for inactive process are moved to page file at any point of time?
Yes, page tables are pageable.
Will windows keep all the data pages for all the process in memory or move them to page file at some point.
As far as the Windows paging policy is concerned, there's two kinds of memory: pageable and non-pageable. It doesn't really matter which process it belongs to or even if it belongs to the O/S itself, if it's pageable then it's subject to being paged out. So, yes, Windows will page out process data pages if necessary.
I suggest reading the memory management chapter in the Windows Internals book, it should cover all of this.
-scott
You are actually asking two questions here.
What's the paging policy regarding the page tables.
What's the paging policy for "writable data" pages (i.e. virtual memory with R/W permissions).
First I'll correct you a little.
Given the fact that lower 2 GB area is reserved for windows, it would
make sense that windows would keep page tables for all processes on
the system
To be exact it's the upper 2GB that are reserved to windows, more correctly - may be accessed in the kernel mode only by Windows kernel and drivers.
Now, this may surprise you, but the kernel memory may be pagable too! So technically it's not important at all which portion of the 32-bit address space is visible in the user/kernel mode. It's not related to paging.
Another correction: virtual memory may be in physical memory and saved to the page file. There's a common belief that the OS frees physical storage by on-demand saving the pages to the page file. Wrong.
Actually Windows saves memory pages to the page file before they need to be freed. In fact it dumps all the memory pages to the page file (besides of those that are related to other files, such as mapped sections) in background. There are two reasons for this:
During high load the OS will free memory pages quicker (since they're already saved)
In the kernel mode paging is not always possible. Drivers that run on high IRQL (i.e. serve the most time-critical events) may not access physical storage drivers, hence paging is not possible.
So, the answers to your questions are:
Don't know for sure, but it depends on the OS implementation details. I see no reasons why per-process page table may not be paged-out. It's needed during the context switch and modifying process virtual memory. Both situations don't belong to the time-critical events.
Definitely "writable data" memory pages are saved to the page file. Are they removed from the physical memory? On-demand only, during the system load, in the least-recent-used order.

Programatically read program's page fault count on Windows

I'd like to my Windows C++ program to be able to read the number of hard page faults it has caused. The program isn't running as administrator. Edited to add: To be clear, I'm not as interested in the aggregate page fault count of the whole system.
It looks like ETW might export counters for this, but I'm having a lot of difficulty figuring out the API, and it's not clear what's accessible by regular users as compared to administrators.
Does anyone have an example of this functionality lying around? Or is it simply not possible on Windows?
(OT, but isn't it sad how much easier this is on *nix? gerusage() and you're done.)
afai can tell the only way to do this would be to use ETW (Event Tracing for Windows) to monitor kernel Hard Page Faults. The event payload has a thread ID that you might be able to correlate with an existing process (this is going to be non-trivial btw) to produce a running per-process count. I don't see any way to get historical information per process.
I can guarantee you that this is A Hard Problem because Process Explorer supports only Page Faults (soft or hard) in its per-process display.
http://msdn.microsoft.com/en-us/magazine/ee412263.aspx
A page fault occurs when a sought-out
page table entry is invalid. If the
requested page needs to be brought in
from disk, it is called a hard page
fault (a very expensive operation),
and all other types are considered
soft page faults (a less expensive
operation). A Page Fault event payload
contains the virtual memory address
for which a page fault happened and
the instruction pointer that caused
it. A hard page fault requires disk
access to occur, which could be the
first access to contents in a file or
accesses to memory blocks that were
paged out. Enabling Page Fault events
causes a hard page fault to be logged
as a page fault with a type Hard Page
Fault. However, a hard fault typically
has a considerably larger impact on
performance, so a separate event is
available just for a hard fault that
can be enabled independently. A Hard
Fault event payload has more data,
such as file key, offset and thread
ID, compared with a Page Fault event.
I think you can use GetProcessMemoryInfo() - Please refer to http://msdn.microsoft.com/en-us/library/ms683219(v=vs.85).aspx for more information.
Yes, quite sad. Or you could just not assume Windows is so gimp that it doesn't even provide a page fault counter and look it up: Win32_PerfFormattedData_PerfOS_Memory.
There is a C/C++ sample on Microsoft's site that explain how to read performance counters: INFO: PDH Sample Code to Enumerate Performance Counters and Instances
You can copy/paste it and I think you're interested by the "Memory" / "Page Reads/sec" counters, as stated in this interesting article: The Basics of Page Faults
This is done with performance counters in windows. It's been a while since I've done anything with them. I don't recall whether or not you need to run as administrator to query them.
[Edit]
I don't have example code to provide but according to this page, you can get this information for a particular process:
Process : Page Faults/sec. This is an
indication of the number of page
faults that occurred due to requests
from this particular process.
Excessive page faults from a
particular process are an indication
usually of bad coding practices.
Either the functions and DLLs are not
organized correctly, or the data set
that the application is using is being
called in a less than efficient
manner.
I don't think you need administrative credential to enumerate the performance counters. A sample at codeproject Performance Counters Enumerator

Resources