While implementing a stack walker for a debugger I am working on I reached the point to extract the arguments to a function call and display them. To make it simple I started with the cdecl convention in pure 32-bit (both debugger and debuggee), and a function that takes 3 parameters. However, I cannot understand why the arguments in the stack trace are out of order compared to what cdecl defines (right-to-left, nothing in registers), despite trying to figure it out for a few days now.
Here is a representation of the function call I am trying to stack trace:
void Function(unsigned long long a, const void * b, unsigned int c) {
printf("a=0x%llX, b=%p, c=0x%X\n", a, b, c);
_asm { int 3 }; /* Because I don't have stepping or dynamic breakpoints implemented yet */
}
int main(int argc, char* argv[]) {
Function(2, (void*)0x7A3FE8, 0x2004);
return 0;
}
This is what the function (unsurprisingly) printed to the console:
a=0x2, c=0x7a3fe8, c=0x2004
This is the stack trace generated at the breakpoint (the debugger catches the breakpoint and there I try to walk the stack):
0x3EF5E0: 0x10004286 /* previous pc */
0x3EF5DC: 0x3EF60C /* previous fp */
0x3EF5D8: 0x7A3FE8 /* arg b --> Wait... why is b _above_ c here? */
0x3EF5D4: 0x2004 /* arg c */
0x3EF5D0: 0x0 /* arg a, upper 32 bit */
0x3EF5CC: 0x2 /* arg a, lower 32 bit */
The code that's responsible for dumping the stack frames (implemented using the DIA SDK, though, I don't think that is relevant to my problem) looks like this:
ULONGLONG stackframe_top = 0;
m_frame->get_base(&stackframe_top); /* IDiaStackFrame */
/* dump 30 * 4 bytes */
for (DWORD i = 0; i < 30; i++)
{
ULONGLONG address = stackframe_top - (i * 4);
DWORD value;
SIZE_T read_bytes;
if (ReadProcessMemory(m_process, reinterpret_cast<LPVOID>(address), &value, sizeof(value), &read_bytes) == TRUE)
{
debugprintf(L"0x%llX: 0x%X\n", address, value); /* wrapper around OutputDebugString */
}
}
I am compiling the test program without any optimization in vs2015 update 3.
I have validated that I am indeed compiling it as cdecl by looking in the pdb with the dia2dump sample application.
I do not understand what is causing the stack to look like this, it doesn't match anything I learned, nor does it match the documentation provided by Microsoft.
I also checked google a whole lot (including osdev wiki pages, msdn blog posts, and so on), and checked my (by now probably outdated) books on 32-bit x86 assembly programming (that were released before 64-bit CPUs existed).
Thank you very much in advance for any explanations or links!
I had somehow misunderstood where the arguments to a function call end up in memory compared to the base of the stack frame, as pointed out by Raymond. This is the fixed code snippet:
ULONGLONG stackframe_top = 0;
m_frame->get_base(&stackframe_top); /* IDiaStackFrame */
/* dump 30 * 4 bytes */
for (DWORD i = 0; i < 30; i++)
{
ULONGLONG address = stackframe_top + (i * 4); /* <-- Read before the stack frame */
DWORD value;
SIZE_T read_bytes;
if (ReadProcessMemory(m_process, reinterpret_cast<LPVOID>(address), &value, sizeof(value), &read_bytes) == TRUE)
{
debugprintf(L"0x%llX: 0x%X\n", address, value); /* wrapper around OutputDebugString */
}
}
Related
So my problem sounds like this.
I have some platform dependent code (embedded system) which writes to some MMIO locations that are hardcoded at specific addresses.
I compile this code with some management code inside a standard executable (mainly for testing) but also for simulation (because it takes longer to find basic bugs inside the actual HW platform).
To alleviate the hardcoded pointers, i just redefine them to some variables inside the memory pool. And this works really well.
The problem is that there is specific hardware behavior on some of the MMIO locations (w1c for example) which makes "correct" testing hard to impossible.
These are the solutions i thought of:
1 - Somehow redefine the accesses to those registers and try to insert some immediate function to simulate the dynamic behavior. This is not really usable since there are various ways to write to the MMIO locations (pointers and stuff).
2 - Somehow leave the addresses hardcoded and trap the illegal access through a seg fault, find the location that triggered, extract exactly where the access was made, handle and return. I am not really sure how this would work (and even if it's possible).
3 - Use some sort of emulation. This will surely work, but it will void the whole purpose of running fast and native on a standard computer.
4 - Virtualization ?? Probably will take a lot of time to implement. Not really sure if the gain is justifiable.
Does anyone have any idea if this can be accomplished without going too deep? Maybe is there a way to manipulate the compiler in some way to define a memory area for which every access will generate a callback. Not really an expert in x86/gcc stuff.
Edit: It seems that it's not really possible to do this in a platform independent way, and since it will be only windows, i will use the available API (which seems to work as expected). Found this Q here:
Is set single step trap available on win 7?
I will put the whole "simulated" register file inside a number of pages, guard them, and trigger a callback from which i will extract all the necessary info, do my stuff then continue execution.
Thanks all for responding.
I think #2 is the best approach. I routinely use approach #4, but I use it to test code that is running in the kernel, so I need a layer below the kernel to trap and emulate the accesses. Since you have already put your code into a user-mode application, #2 should be simpler.
The answers to this question may provide help in implementing #2. How to write a signal handler to catch SIGSEGV?
What you really want to do, though, is to emulate the memory access and then have the segv handler return to the instruction after the access. This sample code works on Linux. I'm not sure if the behavior it is taking advantage of is undefined, though.
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static void segv_handler(int, siginfo_t *, void *);
int main()
{
struct sigaction action = { 0, };
action.sa_sigaction = segv_handler;
action.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &action, NULL);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static void segv_handler(int, siginfo_t *info, void *ucontext_arg)
{
ucontext_t *ucontext = static_cast<ucontext_t *>(ucontext_arg);
ucontext->uc_mcontext.gregs[REG_RAX] = 1234;
ucontext->uc_mcontext.gregs[REG_RIP] += 2;
}
The code to read the register is written in assembly to ensure that both the destination register and the length of the instruction are known.
This is how the Windows version of prl's answer could look like:
#include <stdint.h>
#include <stdio.h>
#include <windows.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *);
int main()
{
SetUnhandledExceptionFilter(segv_handler);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *ep)
{
// only handle read access violation of REG_ADDR
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION ||
ep->ExceptionRecord->ExceptionInformation[0] != 0 ||
ep->ExceptionRecord->ExceptionInformation[1] != (ULONG_PTR)REG_ADDR)
return EXCEPTION_CONTINUE_SEARCH;
ep->ContextRecord->Rax = 1234;
ep->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}
So, the solution (code snippet) is as follows:
First of all, i have a variable:
__attribute__ ((aligned (4096))) int g_test;
Second, inside my main function, i do the following:
AddVectoredExceptionHandler(1, VectoredHandler);
DWORD old;
VirtualProtect(&g_test, 4096, PAGE_READWRITE | PAGE_GUARD, &old);
The handler looks like this:
LONG WINAPI VectoredHandler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
static DWORD last_addr;
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {
last_addr = ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
ExceptionInfo->ContextRecord->EFlags |= 0x100; /* Single step to trigger the next one */
return EXCEPTION_CONTINUE_EXECUTION;
}
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) {
DWORD old;
VirtualProtect((PVOID)(last_addr & ~PAGE_MASK), 4096, PAGE_READWRITE | PAGE_GUARD, &old);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
This is only a basic skeleton for the functionality. Basically I guard the page on which the variable resides, i have some linked lists in which i hold pointers to the function and values for the address in question. I check that the fault generating address is inside my list then i trigger the callback.
On first guard hit, the page protection will be disabled by the system, but i can call my PRE_WRITE callback where i can save the variable state. Because a single step is issued through the EFlags, it will be followed immediately by a single step exception (which means that the variable was written), and i can trigger a WRITE callback. All the data required for the operation is contained inside the ExceptionInformation array.
When someone tries to write to that variable:
*(int *)&g_test = 1;
A PRE_WRITE followed by a WRITE will be triggered,
When i do:
int x = *(int *)&g_test;
A READ will be issued.
In this way i can manipulate the data flow in a way that does not require modifications of the original source code.
Note: This is intended to be used as part of a test framework and any penalty hit is deemed acceptable.
For example, W1C (Write 1 to clear) operation can be accomplished:
void MYREG_hook(reg_cbk_t type)
{
/** We need to save the pre-write state
* This is safe since we are assured to be called with
* both PRE_WRITE and WRITE in the correct order
*/
static int pre;
switch (type) {
case REG_READ: /* Called pre-read */
break;
case REG_PRE_WRITE: /* Called pre-write */
pre = g_test;
break;
case REG_WRITE: /* Called after write */
g_test = pre & ~g_test; /* W1C */
break;
default:
break;
}
}
This was possible also with seg-faults on illegal addresses, but i had to issue one for each R/W, and keep track of a "virtual register file" so a bigger penalty hit. In this way i can only guard specific areas of memory or none, depending on the registered monitors.
I compile with GCC 5.3 2016q1 for STM32 microcontroller.
Right at the beginning of main I placed a small routine to fill stack with a pattern. Later I search the highest address that still holds this pattern to find out about stack usage, you surely know this. Here is my routine:
uint32_t* Stack_ptr = 0;
uint32_t Stack_bot;
uint32_t n = 0;
asm volatile ("str sp, [%0]" :: "r" (&Stack_ptr));
Stack_bot = (uint32_t)(&_estack - &_Min_Stack_Size);
//*
n = 0;
while ((uint32_t)(Stack_ptr) > Stack_bot)
{
Stack_ptr--;
n++;
*Stack_ptr = 0xAA55A55A;
} // */
After that I initialize hardware, also a UART and print out values of Stack_ptr, Stack_bot and n and then stack contents.
The results are 0x20007FD8 0x20007C00 0
Stack_bot is the expected value because I have 0x400 Bytes in 32k RAM starting at 0x20000000. But I would expect Stack_ptr to be 0x20008000 and n somewhat under 0x400 after the loop is finished. Also stack contents shows no entries of 0xAA55A55A. This means the loop is not executed.
I could only manage to get it executed by creating a small function that holds the above routine and disable optimization for this function.
Anybody knows why that is? And the strangest thing about it is that I could swear it worked a few days ago. I saw a lot of 0xAA55A55A in the stack dump.
Thanks a lot
Martin
Probably problem is with the assembler function, In my code I use this:
// defined by linker script, pointing to end of static allocation
extern unsigned _heap_start;
void fill_heap(unsigned fill=0x55555555) {
unsigned *dst = &_heap_start;
register unsigned *msp_reg;
__asm__("mrs %0, msp\n" : "=r" (msp_reg) );
while (dst < msp_reg) {
*dst++ = fill;
}
}
it will fill memory between _heap_start and current SP.
Why does host_statistics64() in OS X 10.6.8 (I don't know if other versions have this problem) return counts for free, active, inactive, and wired memory that don't add up to the total amount of ram? And why is it missing an inconsistent number of pages?
The following output represents the number of pages not classified as free, active, inactive, or wired over ten seconds (sampled roughly once per second).
458
243
153
199
357
140
304
93
181
224
The code that produces the numbers above is:
#include <stdio.h>
#include <mach/mach.h>
#include <mach/vm_statistics.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char** argv) {
struct vm_statistics64 stats;
mach_port_t host = mach_host_self();
natural_t count = HOST_VM_INFO64_COUNT;
natural_t missing = 0;
int debug = argc == 2 ? !strcmp(argv[1], "-v") : 0;
kern_return_t ret;
int mib[2];
long ram;
natural_t pages;
size_t length;
int i;
mib[0] = CTL_HW;
mib[1] = HW_MEMSIZE;
length = sizeof(long);
sysctl(mib, 2, &ram, &length, NULL, 0);
pages = ram / getpagesize();
for (i = 0; i < 10; i++) {
if ((ret = host_statistics64(host, HOST_VM_INFO64, (host_info64_t)&stats, &count)) != KERN_SUCCESS) {
printf("oops\n");
return 1;
}
/* updated for 10.9 */
missing = pages - (
stats.free_count +
stats.active_count +
stats.inactive_count +
stats.wire_count +
stats.compressor_page_count
);
if (debug) {
printf(
"%11d pages (# of pages)\n"
"%11d free_count (# of pages free) \n"
"%11d active_count (# of pages active) \n"
"%11d inactive_count (# of pages inactive) \n"
"%11d wire_count (# of pages wired down) \n"
"%11lld zero_fill_count (# of zero fill pages) \n"
"%11lld reactivations (# of pages reactivated) \n"
"%11lld pageins (# of pageins) \n"
"%11lld pageouts (# of pageouts) \n"
"%11lld faults (# of faults) \n"
"%11lld cow_faults (# of copy-on-writes) \n"
"%11lld lookups (object cache lookups) \n"
"%11lld hits (object cache hits) \n"
"%11lld purges (# of pages purged) \n"
"%11d purgeable_count (# of pages purgeable) \n"
"%11d speculative_count (# of pages speculative (also counted in free_count)) \n"
"%11lld decompressions (# of pages decompressed) \n"
"%11lld compressions (# of pages compressed) \n"
"%11lld swapins (# of pages swapped in (via compression segments)) \n"
"%11lld swapouts (# of pages swapped out (via compression segments)) \n"
"%11d compressor_page_count (# of pages used by the compressed pager to hold all the compressed data) \n"
"%11d throttled_count (# of pages throttled) \n"
"%11d external_page_count (# of pages that are file-backed (non-swap)) \n"
"%11d internal_page_count (# of pages that are anonymous) \n"
"%11lld total_uncompressed_pages_in_compressor (# of pages (uncompressed) held within the compressor.) \n",
pages, stats.free_count, stats.active_count, stats.inactive_count,
stats.wire_count, stats.zero_fill_count, stats.reactivations,
stats.pageins, stats.pageouts, stats.faults, stats.cow_faults,
stats.lookups, stats.hits, stats.purges, stats.purgeable_count,
stats.speculative_count, stats.decompressions, stats.compressions,
stats.swapins, stats.swapouts, stats.compressor_page_count,
stats.throttled_count, stats.external_page_count,
stats.internal_page_count, stats.total_uncompressed_pages_in_compressor
);
}
printf("%i\n", missing);
sleep(1);
}
return 0;
}
TL;DR:
host_statistics64() get information from different sources which might cost time and could produce inconsistent results.
host_statistics64() gets some information by variables with names like vm_page_foo_count. But not all of these variables are taken into account, e.g. vm_page_stolen_count is not.
The well known /usr/bin/top adds stolen pages to the number of wired pages. This is an indicator that these pages should be taken into account when counting pages.
Notes
I'm working on a macOS 10.12 with Darwin Kernel Version 16.5.0 xnu-3789.51.2~3/RELEASE_X86_64 x86_64 but all behaviour is completly reproducable.
I'm going to link a lot a source code of the XNU Version I use on my machine. It can be found here: xnu-3789.51.2.
The program you have written is basically the same as /usr/bin/vm_stat which is just a wrapper for host_statistics64() (and host_statistics()). The corressponding source code can be found here: system_cmds-496/vm_stat.tproj/vm_stat.c.
How does host_statistics64() fit into XNU and how does it work?
As widley know the OS X kernel is called XNU (XNU IS NOT UNIX) and "is a hybrid kernel combining the Mach kernel developed at Carnegie Mellon University with components from FreeBSD and C++ API for writing drivers called IOKit." (https://github.com/opensource-apple/xnu/blob/10.12/README.md)
The virtual memory management (VM) is part of Mach therefore host_statistics64() is located here. Let's have a closer look at the its implementation which is contained in xnu-3789.51.2/osfmk/kern/host.c.
The function signature is
kern_return_t
host_statistics64(host_t host, host_flavor_t flavor, host_info64_t info, mach_msg_type_number_t * count);
The first relevant lines are
[...]
processor_t processor;
vm_statistics64_t stat;
vm_statistics64_data_t host_vm_stat;
mach_msg_type_number_t original_count;
unsigned int local_q_internal_count;
unsigned int local_q_external_count;
[...]
processor = processor_list;
stat = &PROCESSOR_DATA(processor, vm_stat);
host_vm_stat = *stat;
if (processor_count > 1) {
simple_lock(&processor_list_lock);
while ((processor = processor->processor_list) != NULL) {
stat = &PROCESSOR_DATA(processor, vm_stat);
host_vm_stat.zero_fill_count += stat->zero_fill_count;
host_vm_stat.reactivations += stat->reactivations;
host_vm_stat.pageins += stat->pageins;
host_vm_stat.pageouts += stat->pageouts;
host_vm_stat.faults += stat->faults;
host_vm_stat.cow_faults += stat->cow_faults;
host_vm_stat.lookups += stat->lookups;
host_vm_stat.hits += stat->hits;
host_vm_stat.compressions += stat->compressions;
host_vm_stat.decompressions += stat->decompressions;
host_vm_stat.swapins += stat->swapins;
host_vm_stat.swapouts += stat->swapouts;
}
simple_unlock(&processor_list_lock);
}
[...]
We get host_vm_stat which is of type vm_statistics64_data_t. This is just a typedef struct vm_statistics64 as you can see in xnu-3789.51.2/osfmk/mach/vm_statistics.h. And we get processor information from the makro PROCESSOR_DATA() defined in xnu-3789.51.2/osfmk/kern/processor_data.h. We fill host_vm_stat while looping through all of our processors by simply adding up the relevant numbers.
As you can see we find some well known stats like zero_fill_count or compressions but not all covered by host_statistics64().
The next relevant lines are:
stat = (vm_statistics64_t)info;
stat->free_count = vm_page_free_count + vm_page_speculative_count;
stat->active_count = vm_page_active_count;
[...]
stat->inactive_count = vm_page_inactive_count;
stat->wire_count = vm_page_wire_count + vm_page_throttled_count + vm_lopage_free_count;
stat->zero_fill_count = host_vm_stat.zero_fill_count;
stat->reactivations = host_vm_stat.reactivations;
stat->pageins = host_vm_stat.pageins;
stat->pageouts = host_vm_stat.pageouts;
stat->faults = host_vm_stat.faults;
stat->cow_faults = host_vm_stat.cow_faults;
stat->lookups = host_vm_stat.lookups;
stat->hits = host_vm_stat.hits;
stat->purgeable_count = vm_page_purgeable_count;
stat->purges = vm_page_purged_count;
stat->speculative_count = vm_page_speculative_count;
We reuse stat and make it our output struct. We then fill free_count with the sum of two unsigned long called vm_page_free_count and vm_page_speculative_count. We collect the other remaining data in the same manner (by using variables named vm_page_foo_count) or by taking the stats from host_vm_stat which we filled up above.
1. Conclusion We collect data from different sources. Either from processor informations or from variables called vm_page_foo_count. This costs time and might end in some inconsitency matter the fact VM is a very fast and continous process.
Let's take a closer look at the already mentioned variables vm_page_foo_count. They are defined in xnu-3789.51.2/osfmk/vm/vm_page.h as follows:
extern
unsigned int vm_page_free_count; /* How many pages are free? (sum of all colors) */
extern
unsigned int vm_page_active_count; /* How many pages are active? */
extern
unsigned int vm_page_inactive_count; /* How many pages are inactive? */
#if CONFIG_SECLUDED_MEMORY
extern
unsigned int vm_page_secluded_count; /* How many pages are secluded? */
extern
unsigned int vm_page_secluded_count_free;
extern
unsigned int vm_page_secluded_count_inuse;
#endif /* CONFIG_SECLUDED_MEMORY */
extern
unsigned int vm_page_cleaned_count; /* How many pages are in the clean queue? */
extern
unsigned int vm_page_throttled_count;/* How many inactives are throttled */
extern
unsigned int vm_page_speculative_count; /* How many speculative pages are unclaimed? */
extern unsigned int vm_page_pageable_internal_count;
extern unsigned int vm_page_pageable_external_count;
extern
unsigned int vm_page_xpmapped_external_count; /* How many pages are mapped executable? */
extern
unsigned int vm_page_external_count; /* How many pages are file-backed? */
extern
unsigned int vm_page_internal_count; /* How many pages are anonymous? */
extern
unsigned int vm_page_wire_count; /* How many pages are wired? */
extern
unsigned int vm_page_wire_count_initial; /* How many pages wired at startup */
extern
unsigned int vm_page_free_target; /* How many do we want free? */
extern
unsigned int vm_page_free_min; /* When to wakeup pageout */
extern
unsigned int vm_page_throttle_limit; /* When to throttle new page creation */
extern
uint32_t vm_page_creation_throttle; /* When to throttle new page creation */
extern
unsigned int vm_page_inactive_target;/* How many do we want inactive? */
#if CONFIG_SECLUDED_MEMORY
extern
unsigned int vm_page_secluded_target;/* How many do we want secluded? */
#endif /* CONFIG_SECLUDED_MEMORY */
extern
unsigned int vm_page_anonymous_min; /* When it's ok to pre-clean */
extern
unsigned int vm_page_inactive_min; /* When to wakeup pageout */
extern
unsigned int vm_page_free_reserved; /* How many pages reserved to do pageout */
extern
unsigned int vm_page_throttle_count; /* Count of page allocations throttled */
extern
unsigned int vm_page_gobble_count;
extern
unsigned int vm_page_stolen_count; /* Count of stolen pages not acccounted in zones */
[...]
extern
unsigned int vm_page_purgeable_count;/* How many pages are purgeable now ? */
extern
unsigned int vm_page_purgeable_wired_count;/* How many purgeable pages are wired now ? */
extern
uint64_t vm_page_purged_count; /* How many pages got purged so far ? */
That's a lot of statistics regarding we only get access to a very limited number using host_statistics64(). The most of these stats are updated in xnu-3789.51.2/osfmk/vm/vm_resident.c. For example this function releases pages to the list of free pages:
/*
* vm_page_release:
*
* Return a page to the free list.
*/
void
vm_page_release(
vm_page_t mem,
boolean_t page_queues_locked)
{
[...]
vm_page_free_count++;
[...]
}
Very interesting is extern unsigned int vm_page_stolen_count; /* Count of stolen pages not acccounted in zones */. What are stolen pages? It seems like there are mechanisms to take a page out of some lists even though it wouldn't usually be paged out. One of these mechanisms is the age of a page in the speculative page list. xnu-3789.51.2/osfmk/vm/vm_page.h tells us
* VM_PAGE_MAX_SPECULATIVE_AGE_Q * VM_PAGE_SPECULATIVE_Q_AGE_MS
* defines the amount of time a speculative page is normally
* allowed to live in the 'protected' state (i.e. not available
* to be stolen if vm_pageout_scan is running and looking for
* pages)... however, if the total number of speculative pages
* in the protected state exceeds our limit (defined in vm_pageout.c)
* and there are none available in VM_PAGE_SPECULATIVE_AGED_Q, then
* vm_pageout_scan is allowed to steal pages from the protected
* bucket even if they are underage.
*
* vm_pageout_scan is also allowed to pull pages from a protected
* bin if the bin has reached the "age of consent" we've set
It is indeed void vm_pageout_scan(void) that increments vm_page_stolen_count. You find the corresponding source code in xnu-3789.51.2/osfmk/vm/vm_pageout.c.
I think stolen pages are not taken into account while calculating VM stats a host_statistics64() does.
Evidence that I'm right
The best way to prove this would be to compile XNU with an customized version of host_statistics64() by hand. I had no opportunity do this but will try soon.
Fortunately we are not the only ones interested in correct VM statistics. Therefore we should have a look at the implementation of well know /usr/bin/top (not contained in XNU) which is completely available here: top-108 (I just picked the macOS 10.12.4 release).
Let's have a look at top-108/libtop.c where we find the following:
static int
libtop_tsamp_update_vm_stats(libtop_tsamp_t* tsamp) {
kern_return_t kr;
tsamp->p_vm_stat = tsamp->vm_stat;
mach_msg_type_number_t count = sizeof(tsamp->vm_stat) / sizeof(natural_t);
kr = host_statistics64(libtop_port, HOST_VM_INFO64, (host_info64_t)&tsamp->vm_stat, &count);
if (kr != KERN_SUCCESS) {
return kr;
}
if (tsamp->pages_stolen > 0) {
tsamp->vm_stat.wire_count += tsamp->pages_stolen;
}
[...]
return kr;
}
tsamp is of type libtop_tsamp_t which is a struct defined in top-108/libtop.h. It contains amongst other things vm_statistics64_data_t vm_stat and uint64_t pages_stolen.
As you can see, static int libtop_tsamp_update_vm_stats(libtop_tsamp_t* tsamp) gets tsamp->vm_stat filled by host_statistics64() as we know it. Afterwards it checks if tsamp->pages_stolen > 0 and adds it up to the wire_count field of tsamp->vm_stat.
2. Conclusion We won't get the number of these stolen pages if we just use host_statistics64() as in /usr/bin/vm_stat or your example code!
Why is host_statistics64() implemented as it is?
Honestly, I don't know. Paging is a complex process and therefore a real time observation a challenging task. We have to notice that there seems to be no bug in its implementation. I think that we wouldn't even get a 100% accurate number of pages if we could get access to vm_page_stolen_count. The implementation of /usr/bin/top doesn't count stolen pages if their number is not very big.
An additional interesting thing is a comment above the function static void update_pages_stolen(libtop_tsamp_t *tsamp) which is /* This is for <rdar://problem/6410098>. */. Open Radar is a bug reporting site for Apple software and usually classifies bugs in the format given in the comment. I was unable to find the related bug; maybe it was about missing pages.
I hope these information could help you a bit. If I manage to compile the latest (and customized) Version of XNU on my machine I will let you know. Maybe this brings interesting insights.
Just noticed that if you add compressor_page_count into the mix you get much closer to the actual amount of RAM in the machine.
This is an observation, not an explanation, and links to where this was properly documented would be nice to have!
I have written the following user-level code snippet to test two sub functions, atomic inc and xchg (refer to Linux code).
What I need is just try to perform operations on 32-bit integer, and that's why I explicitly use int32_t.
I assume global_counter will be raced by different threads, while tmp_counter is fine.
#include <stdio.h>
#include <stdint.h>
int32_t global_counter = 10;
/* Increment the value pointed by ptr */
void atomic_inc(int32_t *ptr)
{
__asm__("incl %0;\n"
: "+m"(*ptr));
}
/*
* Atomically exchange the val with *ptr.
* Return the value previously stored in *ptr before the exchange
*/
int32_t atomic_xchg(uint32_t *ptr, uint32_t val)
{
uint32_t tmp = val;
__asm__(
"xchgl %0, %1;\n"
: "=r"(tmp), "+m"(*ptr)
: "0"(tmp)
:"memory");
return tmp;
}
int main()
{
int32_t tmp_counter = 0;
printf("Init global=%d, tmp=%d\n", global_counter, tmp_counter);
atomic_inc(&tmp_counter);
atomic_inc(&global_counter);
printf("After inc, global=%d, tmp=%d\n", global_counter, tmp_counter);
tmp_counter = atomic_xchg(&global_counter, tmp_counter);
printf("After xchg, global=%d, tmp=%d\n", global_counter, tmp_counter);
return 0;
}
My 2 questions are:
Are these two subfunctions written properly?
Will this behave the same when I compile this on 32-bit or
64-bit platform? For example, could the pointer address have a different
length. or could incl and xchgl will conflict with the operand?
My understanding of this question is below, please correct me if I'm wrong.
All the read-modify-write instructions (ex: incl, add, xchg) need a lock prefix. The lock instruction is to lock the memory accessed by other CPUs by asserting LOCK# signal on the memory bus.
The __xchg function in Linux kernel implies no "lock" prefix because xchg always implies lock anyway. http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/cmpxchg_64.h#L15
However, the incl used in atomic_inc does not have this assumption so a lock_prefix is needed.
http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/atomic.h#L105
btw, I think you need to copy the *ptr to a volatile variable to avoid gcc optimization.
William
I - an embedded beginner - am fighting my way through the black magic world of embedded programming. So far I won already a bunch of fights, but this new bug seems to be a hard one.
First, my embedded setup:
Olimex STM32-P207 (STM32F207)
Olimex ARM-USB-OCD-H JTAG
OpenOCD
Eclipse (with CDT and GDB hardware debugging)
Codesourcery Toolchain
Startup file and linker script (adapted memory map for the STM32F207) for RIDE (what uses GCC)
STM32F2xx_StdPeriph_Lib_V1.1.0
Using the many tutorials and Q&As out there I was able to set-up makefile, linker and startup code and got some simple examples running using STM's standard library (classic blinky, using buttons and interrupts etc.). However, once I started playing around with the SysTick interrupts, things got messy.
If add the SysTick_Config() call to my code (even with an empty SysTick_Handler), ...
int main(void)
{
/*!< At this stage the microcontroller clock setting is already configured,
[...]
*/
if (SysTick_Config(SystemCoreClock / 120000))
{
// Catch config error
while(1);
}
[...]
... then my debugger starts at the (inline) function NVIC_SetPriority() and once I hit "run" I end up in the HardFault_Handler().
This only happens, when using the debugger. Otherwise the code runs normal (telling from the blinking LEDs).
I already read a lot and tried a lot (modifying compiler options, linker, startup, trying other code with SysTick_Config() calls), but nothing solved this problem.
One thing could be a hint:
The compiler starts in both cases (with and without the SysTick_Config call) at 0x00000184. In the case without the SysTick_Config call this points at the beginnig of main(). Using SysTick_Config this pionts at NVIC_SetPriority().
Does anybody have a clue what's going on? Any hint about where I could continue my search for a solution?
I don't know what further information would be helpful to solve this riddle. Please let me know and I will be happy to provide the missing pieces.
Thanks a lot!
/edit1: Added results of arm-none-eabi-readelf, -objdump and -size.
/edit2: I removed the code info to make space for the actual code. With this new simplified version debugging starts at
08000184: stmdaeq r0, {r4, r6, r8, r9, r10, r11, sp}
readelf:
[ 2] .text PROGBITS 08000184 008184 002dcc 00 AX 0 0 4
...
2: 08000184 0 SECTION LOCAL DEFAULT 2
...
46: 08000184 0 NOTYPE LOCAL DEFAULT 2 $d
main.c
/* Includes ------------------------------------------------------------------*/
#include "stm32f2xx.h"
/* Private typedef -----------------------------------------------------------*/
/* Private define ------------------------------------------------------------*/
/* Private macro -------------------------------------------------------------*/
/* Private variables ---------------------------------------------------------*/
uint16_t counter = 0;
/* Private function prototypes -----------------------------------------------*/
/* Private functions ---------------------------------------------------------*/
void assert_failed(uint8_t* file, uint32_t line);
void Delay(__IO uint32_t nCount);
void SysTick_Handler(void);
/**
* #brief Main program
* #param None
* #retval None
*/
int main(void)
{
if (SysTick_Config(SystemCoreClock / 12000))
{
// Catch config error
while(1);
}
/*
* Configure the LEDs
*/
RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_GPIOF, ENABLE); // LEDs
GPIO_InitTypeDef GPIO_InitStructure;
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_6 | GPIO_Pin_7 | GPIO_Pin_8 | GPIO_Pin_9;
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_2MHz;
GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
GPIO_Init(GPIOF, &GPIO_InitStructure);
GPIOF->BSRRL = GPIO_Pin_6;
while (1)
{
if (GPIO_ReadInputDataBit(GPIOF, GPIO_Pin_9) == SET)
{
GPIOF->BSRRH = GPIO_Pin_9;
}
else
{
GPIOF->BSRRL = GPIO_Pin_9;
}
Delay(500000);
}
return 0;
}
#ifdef USE_FULL_ASSERT
/**
* #brief Reports the name of the source file and the source line number
* where the assert_param error has occurred.
* #param file: pointer to the source file name
* #param line: assert_param error line source number
* #retval None
*/
void assert_failed(uint8_t* file, uint32_t line)
{
/* User can add his own implementation to report the file name and line number,
ex: printf("Wrong parameters value: file %s on line %d\r\n", file, line) */
/* Infinite loop */
while (1)
{
}
}
#endif
/**
* #brief Delay Function.
* #param nCount:specifies the Delay time length.
* #retval None
*/
void Delay(__IO uint32_t nCount)
{
while (nCount > 0)
{
nCount--;
}
}
/**
* #brief This function handles Hard Fault exception.
* #param None
* #retval None
*/
void HardFault_Handler(void)
{
/* Go to infinite loop when Hard Fault exception occurs */
while (1)
{
}
}
/**
* #brief This function handles SysTick Handler.
* #param None
* #retval None
*/
void SysTick_Handler(void)
{
if (counter > 10000 )
{
if (GPIO_ReadInputDataBit(GPIOF, GPIO_Pin_8) == SET)
{
GPIOF->BSRRH = GPIO_Pin_8;
}
else
{
GPIOF->BSRRL = GPIO_Pin_8;
}
counter = 0;
}
else
{
counter++;
}
}
/edit3:
Soultion:
Because the solution is burrowed in a comment, I put it up here:
My linker file was missing ENTRY(your_function_of_choice); (e.g. Reset_Handler). Adding this made my debugger work again (it now starts at the right point).
Thanks everybody!
I have a strong feeling that the entry point isn't specified correctly in the compiler options or linker script...
And whatever code comes first at link time in the ".text" section, it gets to have the entry point.
The entry point, however, should point to a special "init" function that would set up the stack, initialize the ".bss" section (and maybe some other sections as well), initialize any hardware that's necessary for basic operation (e.g. the interrupt controller and maybe some system timer) and any remaining portions of the and standard libraries before actually transferring control to main().
I'm not familiar with the tools and your hardware, so I can't say exactly what that special "init" function is, but that's pretty much the problem. It's not being pointed to by the entry point of the compiled program. NVIC_SetPriority() doesn't make any sense there.