Why does host_statistics64() return inconsistent results? - macos

Why does host_statistics64() in OS X 10.6.8 (I don't know if other versions have this problem) return counts for free, active, inactive, and wired memory that don't add up to the total amount of ram? And why is it missing an inconsistent number of pages?
The following output represents the number of pages not classified as free, active, inactive, or wired over ten seconds (sampled roughly once per second).
458
243
153
199
357
140
304
93
181
224
The code that produces the numbers above is:
#include <stdio.h>
#include <mach/mach.h>
#include <mach/vm_statistics.h>
#include <sys/types.h>
#include <sys/sysctl.h>
#include <unistd.h>
#include <string.h>
int main(int argc, char** argv) {
struct vm_statistics64 stats;
mach_port_t host = mach_host_self();
natural_t count = HOST_VM_INFO64_COUNT;
natural_t missing = 0;
int debug = argc == 2 ? !strcmp(argv[1], "-v") : 0;
kern_return_t ret;
int mib[2];
long ram;
natural_t pages;
size_t length;
int i;
mib[0] = CTL_HW;
mib[1] = HW_MEMSIZE;
length = sizeof(long);
sysctl(mib, 2, &ram, &length, NULL, 0);
pages = ram / getpagesize();
for (i = 0; i < 10; i++) {
if ((ret = host_statistics64(host, HOST_VM_INFO64, (host_info64_t)&stats, &count)) != KERN_SUCCESS) {
printf("oops\n");
return 1;
}
/* updated for 10.9 */
missing = pages - (
stats.free_count +
stats.active_count +
stats.inactive_count +
stats.wire_count +
stats.compressor_page_count
);
if (debug) {
printf(
"%11d pages (# of pages)\n"
"%11d free_count (# of pages free) \n"
"%11d active_count (# of pages active) \n"
"%11d inactive_count (# of pages inactive) \n"
"%11d wire_count (# of pages wired down) \n"
"%11lld zero_fill_count (# of zero fill pages) \n"
"%11lld reactivations (# of pages reactivated) \n"
"%11lld pageins (# of pageins) \n"
"%11lld pageouts (# of pageouts) \n"
"%11lld faults (# of faults) \n"
"%11lld cow_faults (# of copy-on-writes) \n"
"%11lld lookups (object cache lookups) \n"
"%11lld hits (object cache hits) \n"
"%11lld purges (# of pages purged) \n"
"%11d purgeable_count (# of pages purgeable) \n"
"%11d speculative_count (# of pages speculative (also counted in free_count)) \n"
"%11lld decompressions (# of pages decompressed) \n"
"%11lld compressions (# of pages compressed) \n"
"%11lld swapins (# of pages swapped in (via compression segments)) \n"
"%11lld swapouts (# of pages swapped out (via compression segments)) \n"
"%11d compressor_page_count (# of pages used by the compressed pager to hold all the compressed data) \n"
"%11d throttled_count (# of pages throttled) \n"
"%11d external_page_count (# of pages that are file-backed (non-swap)) \n"
"%11d internal_page_count (# of pages that are anonymous) \n"
"%11lld total_uncompressed_pages_in_compressor (# of pages (uncompressed) held within the compressor.) \n",
pages, stats.free_count, stats.active_count, stats.inactive_count,
stats.wire_count, stats.zero_fill_count, stats.reactivations,
stats.pageins, stats.pageouts, stats.faults, stats.cow_faults,
stats.lookups, stats.hits, stats.purges, stats.purgeable_count,
stats.speculative_count, stats.decompressions, stats.compressions,
stats.swapins, stats.swapouts, stats.compressor_page_count,
stats.throttled_count, stats.external_page_count,
stats.internal_page_count, stats.total_uncompressed_pages_in_compressor
);
}
printf("%i\n", missing);
sleep(1);
}
return 0;
}

TL;DR:
host_statistics64() get information from different sources which might cost time and could produce inconsistent results.
host_statistics64() gets some information by variables with names like vm_page_foo_count. But not all of these variables are taken into account, e.g. vm_page_stolen_count is not.
The well known /usr/bin/top adds stolen pages to the number of wired pages. This is an indicator that these pages should be taken into account when counting pages.
Notes
I'm working on a macOS 10.12 with Darwin Kernel Version 16.5.0 xnu-3789.51.2~3/RELEASE_X86_64 x86_64 but all behaviour is completly reproducable.
I'm going to link a lot a source code of the XNU Version I use on my machine. It can be found here: xnu-3789.51.2.
The program you have written is basically the same as /usr/bin/vm_stat which is just a wrapper for host_statistics64() (and host_statistics()). The corressponding source code can be found here: system_cmds-496/vm_stat.tproj/vm_stat.c.
How does host_statistics64() fit into XNU and how does it work?
As widley know the OS X kernel is called XNU (XNU IS NOT UNIX) and "is a hybrid kernel combining the Mach kernel developed at Carnegie Mellon University with components from FreeBSD and C++ API for writing drivers called IOKit." (https://github.com/opensource-apple/xnu/blob/10.12/README.md)
The virtual memory management (VM) is part of Mach therefore host_statistics64() is located here. Let's have a closer look at the its implementation which is contained in xnu-3789.51.2/osfmk/kern/host.c.
The function signature is
kern_return_t
host_statistics64(host_t host, host_flavor_t flavor, host_info64_t info, mach_msg_type_number_t * count);
The first relevant lines are
[...]
processor_t processor;
vm_statistics64_t stat;
vm_statistics64_data_t host_vm_stat;
mach_msg_type_number_t original_count;
unsigned int local_q_internal_count;
unsigned int local_q_external_count;
[...]
processor = processor_list;
stat = &PROCESSOR_DATA(processor, vm_stat);
host_vm_stat = *stat;
if (processor_count > 1) {
simple_lock(&processor_list_lock);
while ((processor = processor->processor_list) != NULL) {
stat = &PROCESSOR_DATA(processor, vm_stat);
host_vm_stat.zero_fill_count += stat->zero_fill_count;
host_vm_stat.reactivations += stat->reactivations;
host_vm_stat.pageins += stat->pageins;
host_vm_stat.pageouts += stat->pageouts;
host_vm_stat.faults += stat->faults;
host_vm_stat.cow_faults += stat->cow_faults;
host_vm_stat.lookups += stat->lookups;
host_vm_stat.hits += stat->hits;
host_vm_stat.compressions += stat->compressions;
host_vm_stat.decompressions += stat->decompressions;
host_vm_stat.swapins += stat->swapins;
host_vm_stat.swapouts += stat->swapouts;
}
simple_unlock(&processor_list_lock);
}
[...]
We get host_vm_stat which is of type vm_statistics64_data_t. This is just a typedef struct vm_statistics64 as you can see in xnu-3789.51.2/osfmk/mach/vm_statistics.h. And we get processor information from the makro PROCESSOR_DATA() defined in xnu-3789.51.2/osfmk/kern/processor_data.h. We fill host_vm_stat while looping through all of our processors by simply adding up the relevant numbers.
As you can see we find some well known stats like zero_fill_count or compressions but not all covered by host_statistics64().
The next relevant lines are:
stat = (vm_statistics64_t)info;
stat->free_count = vm_page_free_count + vm_page_speculative_count;
stat->active_count = vm_page_active_count;
[...]
stat->inactive_count = vm_page_inactive_count;
stat->wire_count = vm_page_wire_count + vm_page_throttled_count + vm_lopage_free_count;
stat->zero_fill_count = host_vm_stat.zero_fill_count;
stat->reactivations = host_vm_stat.reactivations;
stat->pageins = host_vm_stat.pageins;
stat->pageouts = host_vm_stat.pageouts;
stat->faults = host_vm_stat.faults;
stat->cow_faults = host_vm_stat.cow_faults;
stat->lookups = host_vm_stat.lookups;
stat->hits = host_vm_stat.hits;
stat->purgeable_count = vm_page_purgeable_count;
stat->purges = vm_page_purged_count;
stat->speculative_count = vm_page_speculative_count;
We reuse stat and make it our output struct. We then fill free_count with the sum of two unsigned long called vm_page_free_count and vm_page_speculative_count. We collect the other remaining data in the same manner (by using variables named vm_page_foo_count) or by taking the stats from host_vm_stat which we filled up above.
1. Conclusion We collect data from different sources. Either from processor informations or from variables called vm_page_foo_count. This costs time and might end in some inconsitency matter the fact VM is a very fast and continous process.
Let's take a closer look at the already mentioned variables vm_page_foo_count. They are defined in xnu-3789.51.2/osfmk/vm/vm_page.h as follows:
extern
unsigned int vm_page_free_count; /* How many pages are free? (sum of all colors) */
extern
unsigned int vm_page_active_count; /* How many pages are active? */
extern
unsigned int vm_page_inactive_count; /* How many pages are inactive? */
#if CONFIG_SECLUDED_MEMORY
extern
unsigned int vm_page_secluded_count; /* How many pages are secluded? */
extern
unsigned int vm_page_secluded_count_free;
extern
unsigned int vm_page_secluded_count_inuse;
#endif /* CONFIG_SECLUDED_MEMORY */
extern
unsigned int vm_page_cleaned_count; /* How many pages are in the clean queue? */
extern
unsigned int vm_page_throttled_count;/* How many inactives are throttled */
extern
unsigned int vm_page_speculative_count; /* How many speculative pages are unclaimed? */
extern unsigned int vm_page_pageable_internal_count;
extern unsigned int vm_page_pageable_external_count;
extern
unsigned int vm_page_xpmapped_external_count; /* How many pages are mapped executable? */
extern
unsigned int vm_page_external_count; /* How many pages are file-backed? */
extern
unsigned int vm_page_internal_count; /* How many pages are anonymous? */
extern
unsigned int vm_page_wire_count; /* How many pages are wired? */
extern
unsigned int vm_page_wire_count_initial; /* How many pages wired at startup */
extern
unsigned int vm_page_free_target; /* How many do we want free? */
extern
unsigned int vm_page_free_min; /* When to wakeup pageout */
extern
unsigned int vm_page_throttle_limit; /* When to throttle new page creation */
extern
uint32_t vm_page_creation_throttle; /* When to throttle new page creation */
extern
unsigned int vm_page_inactive_target;/* How many do we want inactive? */
#if CONFIG_SECLUDED_MEMORY
extern
unsigned int vm_page_secluded_target;/* How many do we want secluded? */
#endif /* CONFIG_SECLUDED_MEMORY */
extern
unsigned int vm_page_anonymous_min; /* When it's ok to pre-clean */
extern
unsigned int vm_page_inactive_min; /* When to wakeup pageout */
extern
unsigned int vm_page_free_reserved; /* How many pages reserved to do pageout */
extern
unsigned int vm_page_throttle_count; /* Count of page allocations throttled */
extern
unsigned int vm_page_gobble_count;
extern
unsigned int vm_page_stolen_count; /* Count of stolen pages not acccounted in zones */
[...]
extern
unsigned int vm_page_purgeable_count;/* How many pages are purgeable now ? */
extern
unsigned int vm_page_purgeable_wired_count;/* How many purgeable pages are wired now ? */
extern
uint64_t vm_page_purged_count; /* How many pages got purged so far ? */
That's a lot of statistics regarding we only get access to a very limited number using host_statistics64(). The most of these stats are updated in xnu-3789.51.2/osfmk/vm/vm_resident.c. For example this function releases pages to the list of free pages:
/*
* vm_page_release:
*
* Return a page to the free list.
*/
void
vm_page_release(
vm_page_t mem,
boolean_t page_queues_locked)
{
[...]
vm_page_free_count++;
[...]
}
Very interesting is extern unsigned int vm_page_stolen_count; /* Count of stolen pages not acccounted in zones */. What are stolen pages? It seems like there are mechanisms to take a page out of some lists even though it wouldn't usually be paged out. One of these mechanisms is the age of a page in the speculative page list. xnu-3789.51.2/osfmk/vm/vm_page.h tells us
* VM_PAGE_MAX_SPECULATIVE_AGE_Q * VM_PAGE_SPECULATIVE_Q_AGE_MS
* defines the amount of time a speculative page is normally
* allowed to live in the 'protected' state (i.e. not available
* to be stolen if vm_pageout_scan is running and looking for
* pages)... however, if the total number of speculative pages
* in the protected state exceeds our limit (defined in vm_pageout.c)
* and there are none available in VM_PAGE_SPECULATIVE_AGED_Q, then
* vm_pageout_scan is allowed to steal pages from the protected
* bucket even if they are underage.
*
* vm_pageout_scan is also allowed to pull pages from a protected
* bin if the bin has reached the "age of consent" we've set
It is indeed void vm_pageout_scan(void) that increments vm_page_stolen_count. You find the corresponding source code in xnu-3789.51.2/osfmk/vm/vm_pageout.c.
I think stolen pages are not taken into account while calculating VM stats a host_statistics64() does.
Evidence that I'm right
The best way to prove this would be to compile XNU with an customized version of host_statistics64() by hand. I had no opportunity do this but will try soon.
Fortunately we are not the only ones interested in correct VM statistics. Therefore we should have a look at the implementation of well know /usr/bin/top (not contained in XNU) which is completely available here: top-108 (I just picked the macOS 10.12.4 release).
Let's have a look at top-108/libtop.c where we find the following:
static int
libtop_tsamp_update_vm_stats(libtop_tsamp_t* tsamp) {
kern_return_t kr;
tsamp->p_vm_stat = tsamp->vm_stat;
mach_msg_type_number_t count = sizeof(tsamp->vm_stat) / sizeof(natural_t);
kr = host_statistics64(libtop_port, HOST_VM_INFO64, (host_info64_t)&tsamp->vm_stat, &count);
if (kr != KERN_SUCCESS) {
return kr;
}
if (tsamp->pages_stolen > 0) {
tsamp->vm_stat.wire_count += tsamp->pages_stolen;
}
[...]
return kr;
}
tsamp is of type libtop_tsamp_t which is a struct defined in top-108/libtop.h. It contains amongst other things vm_statistics64_data_t vm_stat and uint64_t pages_stolen.
As you can see, static int libtop_tsamp_update_vm_stats(libtop_tsamp_t* tsamp) gets tsamp->vm_stat filled by host_statistics64() as we know it. Afterwards it checks if tsamp->pages_stolen > 0 and adds it up to the wire_count field of tsamp->vm_stat.
2. Conclusion We won't get the number of these stolen pages if we just use host_statistics64() as in /usr/bin/vm_stat or your example code!
Why is host_statistics64() implemented as it is?
Honestly, I don't know. Paging is a complex process and therefore a real time observation a challenging task. We have to notice that there seems to be no bug in its implementation. I think that we wouldn't even get a 100% accurate number of pages if we could get access to vm_page_stolen_count. The implementation of /usr/bin/top doesn't count stolen pages if their number is not very big.
An additional interesting thing is a comment above the function static void update_pages_stolen(libtop_tsamp_t *tsamp) which is /* This is for <rdar://problem/6410098>. */. Open Radar is a bug reporting site for Apple software and usually classifies bugs in the format given in the comment. I was unable to find the related bug; maybe it was about missing pages.
I hope these information could help you a bit. If I manage to compile the latest (and customized) Version of XNU on my machine I will let you know. Maybe this brings interesting insights.

Just noticed that if you add compressor_page_count into the mix you get much closer to the actual amount of RAM in the machine.
This is an observation, not an explanation, and links to where this was properly documented would be nice to have!

Related

Run a process at the same physical memory location

For a research project, I have a long-running process that uses various buffers and stack variables. I'd like to be able to launch this process multiple times such that the physical addresses backing its heap, stack, code, and static variables are equal each time. I know the exact size of all of these variables, and the size of the heap and stack stay constant during execution. To help with this, I use some helper code to translate arbitrary virtual addresses in my program to their corresponding physical addresses (sourced from here):
struct pagemap
{
union status
{
struct present
{
unsigned long long pfn : 54;
unsigned char soft_dirty : 1;
unsigned char exclusive : 1;
unsigned char zeroes : 4;
unsigned char type : 1;
unsigned char swapped : 1;
unsigned char present : 1;
} present;
struct swapped
{
unsigned char swaptype : 4;
unsigned long long offset : 50;
unsigned char soft_dirty : 1;
unsigned char exclusive : 1;
unsigned char zeroes : 4;
unsigned char type : 1;
unsigned char swapped : 1;
unsigned char present : 1;
} swapped;
} status;
} __attribute__ ((packed));
unsigned long get_pfn_for_addr(void *addr)
{
unsigned long offset;
struct pagemap pagemap;
FILE *pagemap_file = fopen("/proc/self/pagemap", "rb");
offset = (unsigned long) addr / getpagesize() * 8;
if(fseek(pagemap_file, offset, SEEK_SET) != 0)
{
fprintf(stderr, "failed to seek pagemap to offset\n");
exit(1);
}
fread(&pagemap, 1, sizeof(struct pagemap), pagemap_file);
fclose(pagemap_file);
return pagemap.status.present.pfn;
}
unsigned long virt_to_phys(void *addr)
{
unsigned long pfn, page_offset, phys_addr;
pfn = get_pfn_for_addr(addr);
page_offset = (unsigned long) addr % getpagesize();
phys_addr = (pfn << PAGE_SHIFT) + page_offset;
return phys_addr;
}
So far, my methodology has only required that a specific buffer in my program is located at the same physical address for each run. For this, I was just able to exit and relaunch the process whenever the physical address for that buffer was wrong, and I would end up with the correct location relatively quickly each time. However, I'd like to extend my experiment to ensure that my process is loaded identically in physical memory between runs, and this try-and-restart method does not seem to work well for this. Ideally, I would like to be able to set apart some small number of physical page frames that can't be allocated to another process, or to the kernel itself. Then, I would pass a flag down to do_fork that notifies the kernel that this is my special process and to allocate specific page frames to it.
My questions are:
Is there any sort of isolation mechanism already built into the kernel that would let me set aside an exclusive physical memory space that I could launch my process in?
If not, what would be a starting point for modifying the kernel to support behavior like this?
Is there any other solution (not involving either of the two above) that I could use for my desired behavior?
This is something that the kernel, using virtual memory, is tasked to abstract from you, so I'm not sure it is even possible to do (without insane amounts of work).
May I ask what experiment requires this? Perhaps if you describe what you want to achieve, it is easier to offer advice.

How can i know the minor on Linux module initialisation

I am writing a linux kernel module.
Here is what i've done in module's init function:
register_chrdev(300 /* major */, "mydev", &fops);
It works fine. But i need to know the minor number.
I have read we cannot set this minor number. It is the kernel which gives us this number. If so, how can i know it in module's init function ?
Thanks
register_chrdev calls __register_chrdev internally.
static inline int register_chrdev(unsigned int major, const char *name,
const struct file_operations *fops)
{
return __register_chrdev(major, 0, 256, name, fops);
}
If you will see __register_chrdev function signature, it is
int __register_chrdev(unsigned int major, unsigned int baseminor,
unsigned int count, const char *name,
const struct file_operations *fops)
register_chrdev will pass your major number(300) and a base minor number 0 with a count of 256. So, it will reserve 0-255 minor number range for your device.
Also, in the definition of __register_chrdev, dev_t structure is created (contains major & minor number) for your device.
err = cdev_add(cdev, MKDEV(cd->major, baseminor), count);
MKDEV(cd->major, baseminor) creates it. So, the first device number(dev_t) will have 0 as its minor number. Besides, count(256) is the consecutive minor numbers that can be further used.
You can also dynamically get the major & minor number if you use alloc_chrdev_region. All you have to do is pass a dev_t struct
to alloc_chrdev_region. It will dynamically allocate a major and minor number to your device. To get the major and minor number in your module, you can use
major = MAJOR(dev);
minor = MINOR(dev);

Need help understanding stack frame layout

While implementing a stack walker for a debugger I am working on I reached the point to extract the arguments to a function call and display them. To make it simple I started with the cdecl convention in pure 32-bit (both debugger and debuggee), and a function that takes 3 parameters. However, I cannot understand why the arguments in the stack trace are out of order compared to what cdecl defines (right-to-left, nothing in registers), despite trying to figure it out for a few days now.
Here is a representation of the function call I am trying to stack trace:
void Function(unsigned long long a, const void * b, unsigned int c) {
printf("a=0x%llX, b=%p, c=0x%X\n", a, b, c);
_asm { int 3 }; /* Because I don't have stepping or dynamic breakpoints implemented yet */
}
int main(int argc, char* argv[]) {
Function(2, (void*)0x7A3FE8, 0x2004);
return 0;
}
This is what the function (unsurprisingly) printed to the console:
a=0x2, c=0x7a3fe8, c=0x2004
This is the stack trace generated at the breakpoint (the debugger catches the breakpoint and there I try to walk the stack):
0x3EF5E0: 0x10004286 /* previous pc */
0x3EF5DC: 0x3EF60C /* previous fp */
0x3EF5D8: 0x7A3FE8 /* arg b --> Wait... why is b _above_ c here? */
0x3EF5D4: 0x2004 /* arg c */
0x3EF5D0: 0x0 /* arg a, upper 32 bit */
0x3EF5CC: 0x2 /* arg a, lower 32 bit */
The code that's responsible for dumping the stack frames (implemented using the DIA SDK, though, I don't think that is relevant to my problem) looks like this:
ULONGLONG stackframe_top = 0;
m_frame->get_base(&stackframe_top); /* IDiaStackFrame */
/* dump 30 * 4 bytes */
for (DWORD i = 0; i < 30; i++)
{
ULONGLONG address = stackframe_top - (i * 4);
DWORD value;
SIZE_T read_bytes;
if (ReadProcessMemory(m_process, reinterpret_cast<LPVOID>(address), &value, sizeof(value), &read_bytes) == TRUE)
{
debugprintf(L"0x%llX: 0x%X\n", address, value); /* wrapper around OutputDebugString */
}
}
I am compiling the test program without any optimization in vs2015 update 3.
I have validated that I am indeed compiling it as cdecl by looking in the pdb with the dia2dump sample application.
I do not understand what is causing the stack to look like this, it doesn't match anything I learned, nor does it match the documentation provided by Microsoft.
I also checked google a whole lot (including osdev wiki pages, msdn blog posts, and so on), and checked my (by now probably outdated) books on 32-bit x86 assembly programming (that were released before 64-bit CPUs existed).
Thank you very much in advance for any explanations or links!
I had somehow misunderstood where the arguments to a function call end up in memory compared to the base of the stack frame, as pointed out by Raymond. This is the fixed code snippet:
ULONGLONG stackframe_top = 0;
m_frame->get_base(&stackframe_top); /* IDiaStackFrame */
/* dump 30 * 4 bytes */
for (DWORD i = 0; i < 30; i++)
{
ULONGLONG address = stackframe_top + (i * 4); /* <-- Read before the stack frame */
DWORD value;
SIZE_T read_bytes;
if (ReadProcessMemory(m_process, reinterpret_cast<LPVOID>(address), &value, sizeof(value), &read_bytes) == TRUE)
{
debugprintf(L"0x%llX: 0x%X\n", address, value); /* wrapper around OutputDebugString */
}
}

atomic_inc and atomic_xchg in gcc assembly

I have written the following user-level code snippet to test two sub functions, atomic inc and xchg (refer to Linux code).
What I need is just try to perform operations on 32-bit integer, and that's why I explicitly use int32_t.
I assume global_counter will be raced by different threads, while tmp_counter is fine.
#include <stdio.h>
#include <stdint.h>
int32_t global_counter = 10;
/* Increment the value pointed by ptr */
void atomic_inc(int32_t *ptr)
{
__asm__("incl %0;\n"
: "+m"(*ptr));
}
/*
* Atomically exchange the val with *ptr.
* Return the value previously stored in *ptr before the exchange
*/
int32_t atomic_xchg(uint32_t *ptr, uint32_t val)
{
uint32_t tmp = val;
__asm__(
"xchgl %0, %1;\n"
: "=r"(tmp), "+m"(*ptr)
: "0"(tmp)
:"memory");
return tmp;
}
int main()
{
int32_t tmp_counter = 0;
printf("Init global=%d, tmp=%d\n", global_counter, tmp_counter);
atomic_inc(&tmp_counter);
atomic_inc(&global_counter);
printf("After inc, global=%d, tmp=%d\n", global_counter, tmp_counter);
tmp_counter = atomic_xchg(&global_counter, tmp_counter);
printf("After xchg, global=%d, tmp=%d\n", global_counter, tmp_counter);
return 0;
}
My 2 questions are:
Are these two subfunctions written properly?
Will this behave the same when I compile this on 32-bit or
64-bit platform? For example, could the pointer address have a different
length. or could incl and xchgl will conflict with the operand?
My understanding of this question is below, please correct me if I'm wrong.
All the read-modify-write instructions (ex: incl, add, xchg) need a lock prefix. The lock instruction is to lock the memory accessed by other CPUs by asserting LOCK# signal on the memory bus.
The __xchg function in Linux kernel implies no "lock" prefix because xchg always implies lock anyway. http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/cmpxchg_64.h#L15
However, the incl used in atomic_inc does not have this assumption so a lock_prefix is needed.
http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/atomic.h#L105
btw, I think you need to copy the *ptr to a volatile variable to avoid gcc optimization.
William

OpenSSL and multi-threads

I've been reading about the requirement that if OpenSSL is used in a multi-threaded application, you have to register a thread identification function (and also a mutex creation function) with OpenSSL.
On Linux, according to the example provided by OpenSSL, a thread is normally identified by registering a function like this:
static unsigned long id_function(void){
return (unsigned long)pthread_self();
}
pthread_self() returns a pthread_t, and this works on Linux since pthread_t is just a typedef of unsigned long.
On Windows pthreads, FreeBSD, and other operating systems, pthread_t is a struct, with the following structure:
struct {
void * p; /* Pointer to actual object */
unsigned int x; /* Extra information - reuse count etc */
}
This can't be simply cast to an unsigned long, and when I try to do so, it throws a compile error. I tried taking the void *p and casting that to an unsigned long, on the theory that the memory pointer should be consistent and unique across threads, but this just causes my program to crash a lot.
What can I register with OpenSSL as the thread identification function when using Windows pthreads or FreeBSD or any of the other operating systems like this?
Also, as an additional question:
Does anyone know if this also needs to be done if OpenSSL is compiled into and used with QT, and if so how to register QThreads with OpenSSL? Surprisingly, I can't seem to find the answer in QT's documentation.
I will just put this code here. It is not panacea, as it doesn't deal with FreeBSD, but it is helpful in most cases when all you need is to support Windows and and say Debian. Of course, the clean solution assumes usage of CRYPTO_THREADID_* family introduced recently. (to give an idea, it has a CRYPTO_THREADID_cmp callback, which can be mapped to pthread_equal)
#include <pthread.h>
#include <openssl/err.h>
#if defined(WIN32)
#define MUTEX_TYPE HANDLE
#define MUTEX_SETUP(x) (x) = CreateMutex(NULL, FALSE, NULL)
#define MUTEX_CLEANUP(x) CloseHandle(x)
#define MUTEX_LOCK(x) WaitForSingleObject((x), INFINITE)
#define MUTEX_UNLOCK(x) ReleaseMutex(x)
#define THREAD_ID GetCurrentThreadId()
#else
#define MUTEX_TYPE pthread_mutex_t
#define MUTEX_SETUP(x) pthread_mutex_init(&(x), NULL)
#define MUTEX_CLEANUP(x) pthread_mutex_destroy(&(x))
#define MUTEX_LOCK(x) pthread_mutex_lock(&(x))
#define MUTEX_UNLOCK(x) pthread_mutex_unlock(&(x))
#define THREAD_ID pthread_self()
#endif
/* This array will store all of the mutexes available to OpenSSL. */
static MUTEX_TYPE *mutex_buf=NULL;
static void locking_function(int mode, int n, const char * file, int line)
{
if (mode & CRYPTO_LOCK)
MUTEX_LOCK(mutex_buf[n]);
else
MUTEX_UNLOCK(mutex_buf[n]);
}
static unsigned long id_function(void)
{
return ((unsigned long)THREAD_ID);
}
int thread_setup(void)
{
int i;
mutex_buf = malloc(CRYPTO_num_locks() * sizeof(MUTEX_TYPE));
if (!mutex_buf)
return 0;
for (i = 0; i < CRYPTO_num_locks( ); i++)
MUTEX_SETUP(mutex_buf[i]);
CRYPTO_set_id_callback(id_function);
CRYPTO_set_locking_callback(locking_function);
return 1;
}
int thread_cleanup(void)
{
int i;
if (!mutex_buf)
return 0;
CRYPTO_set_id_callback(NULL);
CRYPTO_set_locking_callback(NULL);
for (i = 0; i < CRYPTO_num_locks( ); i++)
MUTEX_CLEANUP(mutex_buf[i]);
free(mutex_buf);
mutex_buf = NULL;
return 1;
}
I only can answer the Qt part. Use QThread::currentThreadId(), or even QThread::currentThread() as the pointer value should be unique.
From the OpenSSL doc you linked:
threadid_func(CRYPTO_THREADID *id) is needed to record the currently-executing thread's identifier into id. The implementation of this callback should not fill in id directly, but should use CRYPTO_THREADID_set_numeric() if thread IDs are numeric, or CRYPTO_THREADID_set_pointer() if they are pointer-based. If the application does not register such a callback using CRYPTO_THREADID_set_callback(), then a default implementation is used - on Windows and BeOS this uses the system's default thread identifying APIs, and on all other platforms it uses the address of errno. The latter is satisfactory for thread-safety if and only if the platform has a thread-local error number facility.
As shown providing your own ID is really only useful if you can provide a better ID than OpenSSL's default implementation.
The only fail-safe way to provide IDs, when you don't know whether pthread_t is a pointer or an integer, is to maintain your own per-thread IDs stored as a thread-local value.

Resources