Why use SysV or POSIX shared memory vs mmap()? - macos

Needing to use IPC to pass large-ish amounts of data (200kb+) from a child process to a parent on OS X 10.4 and above, I read up on shared memory on Unix, specifically System V and POSIX shared memory mechanisms. Then I realized that mmap() can be used with the MAP_ANON and MAP_SHARED flags to do a similar thing (or just with the MAP_SHARED flag, if I don't mind a regular file being created).
My question is, is there any reason not to just use mmap()? It seems much simpler, the memory is still shared, and it doesn't have to create a real file if I use MAP_ANON. I can create the file in the parent process then fork() and exec() the child and use it in the child process.
Second part of the question is, what would be reasons that this approach is not sufficient, and one would have to use SysV or POSIX shared memory mechanisms?
Note that I was planning on doing synchronization using pipes that I need for other communication, i.e. the parent asks for data over the pipe, the child writes it to shared memory, and responds over the pipe that its ready. No multiple readers or writers involved. Portability is not a priority.

If you have a parent/child relationship, it's perfectly fine to use mmap.
sysv_shm is the original unix implementation that allows related and unrelated processes to share memory. posix_shm standardized shared memory.
If you're on posix system without mmap, you'd use posix_shm. If you're on a unix without posix_shm you'd use sysv_shm. If you only need to share memory vs a parent/child you'd use mmap if available.

If memory serves, the only reason to use SysV/POSIX over mmap is portability. In particularly older Unix systems don't support MAP_ANON. Solaris, Linux, the BSDs and OS X do, however, so in practice, there's little reason not to use mmap.

shm in Linux is typically implemented via a /dev/shm file that gets mmapped, so, performance should be equivalent -- I'd go with mmap (w/MAP_ANON and MAP_SHARED as you mention) for simplicity, if I know portability is no issue as you say's the case for you.

As far as the documentation knows, you must use SYSV shared-memory if you want to use Xlib/XCB shared-memory images.

Related

OS X Shared Memory Cleanup

I'm sharing memory between a parent process and multiple children processes by allocating shared memory segments with shm_open/mmap in OS X. Either parent or children may create the segment then communicate the identifying name to either. My understanding is that the parent has to call shm_unlink on each of these segments when it quits to cleanup memory, otherwise the shared memory is permanently leaked.
What I had initially thought from reading the documentation is that shared segments are cleaned up when no processes with it mapped are alive. However experiments show that this isn't the case and someone has to explicitly use shm_unlink.
Is there any way in OS X to list all the currently existing shared memory segments? The problem is that the parent may crash and so not have a chance to call shm_unlink. In Linux my solution is to clean out /dev/shm, but in OS X I would need some way of listing open shared segments.
The answer seems to be: you can't.
First, see this question, which quotes a comment in the kernel:
TODO:
(2) Need to export data to a userland tool via a
sysctl. Should ipcs(1) and ipcrm(1) be expanded or should new
tools to manage both POSIX kernel semaphores and POSIX shared
memory be written?
Also see this post on the Apple mailing list unix-porting:
There is no "picps"/"picprm" utility, you are expected to remember what
you create and clean up afterward, or clean up first thing on
restart if you crash a lot, there is nothing exposed directly
in the filesystem namespace, and you are expected to do
the shm_unlink because it is a rendezvous for potentially a
lot of unrelated programs.
Hope you figure out your problem. you can use ipcs -a and look under the heading Shared Memory for NATTCH. That value will tell you how many shared memory segments are attached to a particular id.

Does Go support volatile / non-volatile variables?

I'm new to the language so bear with me.
I am curious how GO handles data storage available to threads, in the sense that non-local variables can also be non-volatile, like in Java for instance.
GO has the concept of channel, which, by it's nature -- inter thread communication, means it bypasses processor cache, and reads/writes to heap directly.
Also, have not found any reference to volatile in the go lang documentation.
TL;DR: Go does not have a keyword to make a variable safe for multiple goroutines to write/read it. Use the sync/atomic package for that. Or better yet Do not communicate by sharing memory; instead, share memory by communicating.
Two answers for the two meanings of volatile
.NET/Java concurrency
Some excerpts from the Go Memory Model.
If the effects of a goroutine must be observed by another goroutine,
use a synchronization mechanism such as a lock or channel
communication to establish a relative ordering.
One of the examples from the Incorrect Synchronization section is an example of busy waiting on value.
Worse, there is no guarantee that the write to done will ever be
observed by main, since there are no synchronization events between
the two threads. The loop in main is not guaranteed to finish.
Indeed, this code(play.golang.org/p/K8ndH7DUzq) never exits.
C/C++ non-standard memory
Go's memory model does not provide a way to address non-standard memory. If you have raw access to a device's I/O bus you'll need to use assembly or C to safely write values to the memory locations. I have only ever needed to do this in a device driver which generally precludes use of Go.
The simple answer is that volatile is not supported by the current Go specification, period.
If you do have one of the use cases where volatile is necessary, such as low-level atomic memory access that is unsupported by existing packages in the standard library, or unbuffered access to hardware mapped memory, you'll need to link in a C or assembly file.
Note that if you do use C or assembly as understood by the GC compiler suite, you don't even need cgo for that, since the [568]c C/asm compilers are also able to handle it.
You can find examples of that in Go's source code. For example:
http://golang.org/src/pkg/runtime/sema.goc
http://golang.org/src/pkg/runtime/atomic_arm.c
Grep for many other instances.
For how memory access in Go does work, check out The Go Memory Model.
No, go does not support the volatile or register statement.
See this post for more information.
This is also noted in the Go for C++ Programmers guide.
The Go Memory Model documentation explains why the concept of 'volatile' has no application in Go.
Loosely: Among other things, goroutines are free to keep goroutine-local changes cached in registers so those changes are not observable by other goroutines. To "flush" those changes to memory, a synchronization must be performed. Either by using locks or by communicating (channel send or receive).

free mem as function of command 'purge'

one of my app needs the function that free inactive/used/wired memory just like command 'purge'.
Check and google a lot, but can not get any hit
Welcome any comment
Purge doesn't do what you seem to think it does. It doesn't "free inactive/used/wired memory". As the manpage says:
It does not affect anonymous memory that has been allocated through malloc, vm_allocate, etc.
All it does is purge the disk cache. This is only useful if you're running performance tests and want to simulate the effects of "first run after cold boot" without actually cold booting. Again, from the manpage:
Purge can be used to approximate initial boot conditions with a cold disk buffer cache for performance analysis.
There is no public API for this, although a quick scan of the symbols shows that it seems to call a function CPOSXPurgeAllDiskBuffers from the CoreProfile private framework. I believe the underlying kernel and userland disk cache code is all or mostly available on http://www.opensource.apple.com, so you could do probably implement the same thing yourself, if you really want.
As iMysak says, you can just exec (or NSTask, etc.) the tool if you want to.
As a side note, it you could free used/wired memory, presumably that memory is used by something—even if you don't have pointers into it in your own data structures, malloc probably does. Are you trying to segfault your code?
Freeing inactive memory is a different story. Just freeing something up to malloc doesn't necessarily make malloc return it to the OS. And there's no way you can force it to. If you think about the way traditional UNIX works, it makes sense: When you ask it to allocate more memory, it uses sbrk to expand your data segment; if you free up memory at the top, it can sbrk back down, but if you free up memory in the middle, there's no way it can do that. Of course modern UNIX systems don't work that way, but the POSIX and C APIs are all designed to be compatible with systems that do. So, if you want to make sure memory gets freed, you have to handle memory allocation directly.
The simplest and most portable way to do this is to create and mmap a temporary backing file, or just MAP_ANON, and explicitly unmap pages when you're done with them. (This works on all POSIX systems—and, with a pretty simple wrapper, even Windows.) If you need even more control (e.g., to manually handle flushing pages to disk, etc.), you can use the mach/mach_vm.h APIs.
You can directly run it from OS // with exec() function

Difference between pthread and fork on gnu/Linux

What is the basic difference between a pthread and fork w.r.t. linux in terms of
implementation differences and how the scheduling varies (does it vary ?)
I ran strace on two similar programs , one using pthreads and another using fork,
both in the end make clone() syscall with different arguments, so I am guessing
the two are essentially the same on a linux system but with pthreads being easier
to handle in code.
Can someone give a deep explanation?
EDIT : see also a related question
In C there are some differences however:
fork()
Purpose is to create a new process, which becomes the child process of the caller
Both processes will execute the next instruction following the fork() system call
Two identical copies of the computer's address space,code, and stack are created one for parent and child.
Thinking of the fork as it was a person; Forking causes a clone of your program (process), that is running the code it copied.
pthread_create()
Purpose is to create a new thread in the program which is given the same process of the caller
Threads within the same process can communicate using shared memory. (Be careful!)
The second thread will share data,open files, signal handlers and signal dispositions, current working directory, user and group ID's. The new thread will get its own stack, thread ID, and registers though.
Continuing the analogy; your program (process) grows a second arm when it creates a new thread, connected to the same brain.
On Linux, the system call clone clones a task, with a configurable level of sharing.
fork() calls clone(least sharing) and pthread_create() calls clone(most sharing).
forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory.
You should look at the clone manpage.
In particular, it lists all the possible clone modes and how they affect the process/thread, virtual memory space etc...
You say "threads easier to handle in code": that's very debatable. Writing bug-free, deadlock-free multi-thread code can be quite a challenge. Sometimes having two separate processes makes things much simpler.

Are Interlocked* functions useful on shared memory?

Two Windows processes have memory mapped the same shared file. If the file consists of counters, is it appropriate to use the Interlocked* functions (like InterlockedIncrement) to update those counters? Will those synchronize access across processes? Or do I need to use something heavier, like a mutex? Or perhaps the shared-memory mechanism itself ensures consistent views.
The interlocked functions are intended for exactly that type of use.
From http://msdn.microsoft.com/en-us/library/ms684122.aspx:
The threads of different processes can use these functions if the variable is in shared memory.
Of course, if you need to have more than one item updated atomically, you'll need to go with a mutex or some other synchronization object that works across processes. There's nothing built-in to the shared memory mechanism to provide synchronization to access to the shared memory - you'll need to use the interlocked functions or a synchronization object.
From MSDN:
...
The Interlocked API
The interlocked functions provide a
simple mechanism for synchronizing
access to a variable that is shared by
multiple threads. They also perform
operations on variables in an atomic
manner. The threads of different
processes can use these functions if
the variable is in shared memory.
So, yes, it is safe with your shared memory approach.

Resources