Force malloc to pre-fault/MAP_POPULATE/MADV_WILLNEED all allocations for an entire program/process - memory-management

For the sake of some user-space performance profiling, I'd like to cleanly separate the costs of allocating memory from operations that access it. The application does no over-allocation, so every page that gets mapped will be faulted in, probably in code that runs shortly after its allocation.
What I'd like to do is set some flag, environment variable, something, to tell malloc that it should uniformly do the equivalent of calling mmap(..., MAP_POPULATE) or madvise(..., MADV_WILLNEED) or just touching every page of whatever it allocated itself. I haven't found any documentation, on any platform(!), that describes a way to do this. Is there some existing technique that's utterly undocumented, up to my ability to search? Is this a fundamentally misguided or bad idea?
If I wanted to implement this myself, I'm thinking of an LD_PRELOAD including just a reimplementation of malloc that calls the underlying malloc and then does the madvise thing (to be at least somewhat agnostic to huge pages behavior). Any reason that shouldn't work?

malloc is one of the most used, yet relatively slow functions in common use. As a result, it has received a lot of optimization attention over the years. I seriously doubt that any serious implementation of malloc does anything so slow as the string parsing that would be required to check an environment variable at every call.
LD_PRELOAD is not a bad idea, considering what you're doing, you wouldn't even need to recompile to switch between profile and release builds. If you're open to recompiling, I would suggest doing a #define malloc(size) { malloc(size); mmap(...);}. You could even do this at the compile command line via -Dmalloc=... (so long as the system malloc is not itself a define, which would overwrite the cli one).
Another option would be to find/implement a program that uses the debug interface to intercept and redirect calls to malloc. You could theoretically do this by messing with the post-compiled (or post-load) program's import section to point to your dll/so file.
Edit: On second thought, the define might not work on every allocation, since it is often implied by the compiler (e.g. new).

Related

Go destructors?

I know there are no destructors in Go since technically there are no classes. As such, I use initClass to perform the same functions as a constructor. However, is there any way to create something to mimic a destructor in the event of a termination, for the use of, say, closing files? Right now I just call defer deinitClass, but this is rather hackish and I think a poor design. What would be the proper way?
In the Go ecosystem, there exists a ubiquitous idiom for dealing with objects which wrap precious (and/or external) resources: a special method designated for freeing that resource, called explicitly — typically via the defer mechanism.
This special method is typically named Close(), and the user of the object has to call it explicitly when they're done with the resource the object represents. The io standard package does even have a special interface, io.Closer, declaring that single method. Objects implementing I/O on various resources such as TCP sockets, UDP endpoints and files all satisfy io.Closer, and are expected to be explicitly Closed after use.
Calling such a cleanup method is typically done via the defer mechanism which guarantees the method will run no matter if some code which executes after resource acquisition will panic() or not.
You might also notice that not having implicit "destructors" quite balances not having implicit "constructors" in Go. This actually has nothing to do with not having "classes" in Go: the language designers just avoid magic as much as practically possible.
Note that Go's approach to this problem might appear to be somewhat low-tech but in fact it's the only workable solution for the runtime featuring garbage-collection. In a language with objects but without GC, say C++, destructing an object is a well-defined operation because an object is destroyed either when it goes out of scope or when delete is called on its memory block. In a runtime with GC, the object will be destroyed at some mostly indeterminate point in the future by the GC scan, and may not be destroyed at all. So if the object wraps some precious resource, that resource might get reclaimed way past the moment in time the last live reference to the enclosing object was lost, and it might even not get reclaimed at all—as has been well explained by #twotwotwo in their respective answer.
Another interesting aspect to consider is that the Go's GC is fully concurrent (with the regular program execution). This means a GC thread which is about to collect a dead object might (and usually will) be not the thread(s) which executed that object's code when it was alive. In turn, this means that if the Go types could have destructors then the programmer would need to make sure whatever code the destructor executes is properly synchronized with the rest of the program—if the object's state affects some data structures external to it. This actually might force the programmer to add such synchronization even if the object does not need it for its normal operation (and most objects fall into such category). And think about what happens of those exernal data strucrures happened to be destroyed before the object's destructor was called (the GC collects dead objects in a non-deterministic way). In other words, it's much easier to control — and to reason about — object destruction when it is explicitly coded into the program's flow: both for specifying when the object has to be destroyed, and for guaranteeing proper ordering of its destruction with regard to destroying of the data structures external to it.
If you're familiar with .NET, it deals with resource cleanup in a way which resembles that of Go quite closely: your objects which wrap some precious resource have to implement the IDisposable interface, and a method, Dispose(), exported by that interface, must be called explicitly when you're done with such an object. C# provides some syntactic sugar for this use case via the using statement which makes the compiler arrange for calling Dispose() on the object when it goes out of the scope declared by the said statement. In Go, you'll typically defer calls to cleanup methods.
One more note of caution. Go wants you to treat errors very seriously (unlike most mainstream programming language with their "just throw an exception and don't give a fsck about what happens due to it elsewhere and what state the program will be in" attitude) and so you might consider checking error returns of at least some calls to cleanup methods.
A good example is instances of the os.File type representing files on a filesystem. The fun stuff is that calling Close() on an open file might fail due to legitimate reasons, and if you were writing to that file this might indicate that not all the data you wrote to that file had actually landed in it on the file system. For an explanation, please read the "Notes" section in the close(2) manual.
In other words, just doing something like
fd, err := os.Open("foo.txt")
defer fd.Close()
is okay for read-only files in the 99.9% of cases, but for files opening for writing, you might want to implement more involved error checking and some strategy for dealing with them (mere reporting, wait-then-retry, ask-then-maybe-retry or whatever).
runtime.SetFinalizer(ptr, finalizerFunc) sets a finalizer--not a destructor but another mechanism to maybe eventually free up resources. Read the documentation there for details, including downsides. They might not run until long after the object is actually unreachable, and they might not run at all if the program exits first. They also postpone freeing memory for another GC cycle.
If you're acquiring some limited resource that doesn't already have a finalizer, and the program would eventually be unable to continue if it kept leaking, you should consider setting a finalizer. It can mitigate leaks. Unreachable files and network connections are already cleaned up by finalizers in the stdlib, so it's only other sorts of resources where custom ones can be useful. The most obvious class is system resources you acquire through syscall or cgo, but I can imagine others.
Finalizers can help get a resource freed eventually even if the code using it omits a Close() or similar cleanup, but they're too unpredictable to be the main way to free resources. They don't run until GC does. Because the program could exit before next GC, you can't rely on them for things that must be done, like flushing buffered output to the filesystem. If GC does happen, it might not happen soon enough: if a finalizer is responsible for closing network connections, maybe a remote host hits its limit on open connections to you before GC, or your process hits its file-descriptor limit, or you run out of ephemeral ports, or something else. So it's much better to defer and do cleanup right when it's necessary than to use a finalizer and hope it's done soon enough.
You don't see many SetFinalizer calls in everyday Go programming, partly because the most important ones are in the standard library and mostly because of their limited range of applicability in general.
In short, finalizers can help by freeing forgotten resources in long-running programs, but because not much about their behavior is guaranteed, they aren't fit to be your main resource-management mechanism.
There are Finalizers in Go. I wrote a little blog post about it. They are even used for closing files in the standard library as you can see here.
However, I think using defer is more preferable because it's more readable and less magical.

what's the memory allocation functions can be called from the interrupt environment in AIX?

xmalloc can be used in the process environment only when I write a AIX kernel extension.
what's the memory allocation functions can be called from the interrupt environment in AIX?
thanks.
The network memory allocation routines. Look in /usr/include/net/net_malloc.h. The lowest level is net_malloc and net_free.
I don't see much documentation in IBM's pubs nor the internet. There are a few examples in various header files.
There is public no prototype that I can find for these.
If you look in net_malloc.h, you will see MALLOC and NET_MALLOC macros defined that call it. Then if you grep in all the files under /usr/include, you will see uses of these macros. From these uses, you can deduce the arguments to the macros and thus deduce the arguments to net_malloc itself. I would make one routine that is a pass through to net_malloc that you controlled the interface to.
On your target system, do "netstat -m". The last bucket size you see will be the largest size you can call net_malloc with the M_NOWAIT flag. M_WAIT can be used only at process time and waits for netm to allocate more memory if necessary. M_NOWAIT returns with a 0 if there is not enough memory pinned. At interrupt time, you must use M_NOWAIT.
There is no real checking for the "type" but it is good to pick an appropriate type for debugging purposes later on. The netm output from kdb shows the type.
In a similar fashion, you can figure out how to call net_free.
Its sad IBM has chosen not to document this. An alternative to get this information officially is to pay for an "ISV" question. If you are doing serious AIX development, you want to become an ISV / Partner. It will save you lots of heart break. I don't know the cost but it is within reach of small companies and even individuals.
This book is nice to have too.

free mem as function of command 'purge'

one of my app needs the function that free inactive/used/wired memory just like command 'purge'.
Check and google a lot, but can not get any hit
Welcome any comment
Purge doesn't do what you seem to think it does. It doesn't "free inactive/used/wired memory". As the manpage says:
It does not affect anonymous memory that has been allocated through malloc, vm_allocate, etc.
All it does is purge the disk cache. This is only useful if you're running performance tests and want to simulate the effects of "first run after cold boot" without actually cold booting. Again, from the manpage:
Purge can be used to approximate initial boot conditions with a cold disk buffer cache for performance analysis.
There is no public API for this, although a quick scan of the symbols shows that it seems to call a function CPOSXPurgeAllDiskBuffers from the CoreProfile private framework. I believe the underlying kernel and userland disk cache code is all or mostly available on http://www.opensource.apple.com, so you could do probably implement the same thing yourself, if you really want.
As iMysak says, you can just exec (or NSTask, etc.) the tool if you want to.
As a side note, it you could free used/wired memory, presumably that memory is used by something—even if you don't have pointers into it in your own data structures, malloc probably does. Are you trying to segfault your code?
Freeing inactive memory is a different story. Just freeing something up to malloc doesn't necessarily make malloc return it to the OS. And there's no way you can force it to. If you think about the way traditional UNIX works, it makes sense: When you ask it to allocate more memory, it uses sbrk to expand your data segment; if you free up memory at the top, it can sbrk back down, but if you free up memory in the middle, there's no way it can do that. Of course modern UNIX systems don't work that way, but the POSIX and C APIs are all designed to be compatible with systems that do. So, if you want to make sure memory gets freed, you have to handle memory allocation directly.
The simplest and most portable way to do this is to create and mmap a temporary backing file, or just MAP_ANON, and explicitly unmap pages when you're done with them. (This works on all POSIX systems—and, with a pretty simple wrapper, even Windows.) If you need even more control (e.g., to manually handle flushing pages to disk, etc.), you can use the mach/mach_vm.h APIs.
You can directly run it from OS // with exec() function

Executable sections marked as "execute" AND "read"?

I've noticed (on Win32 at least) that in executables, code sections (.text) have the "read" access bit set, as well as the "execute" access bit. Are there any bonafide legit reasons for code to be reading itself instead of executing itself? I thought this was what other sections were for (such as .rdata).
(Specifically, I'm talking about IMAGE_SCN_MEM_READ.)
IMAGE_SCN_MEM_EXECUTE |IMAGE_SCN_MEM_READ are mapped into memory as PAGE_EXECUTE_READ, which is equivalent to PAGE_EXECUTE_WRITECOPY. This is needed to enable copy-on-write access. Copy-on-write means that any attempts to modify the page results in a new, process-private copy of the page being created.
There are a few different reasons for needing write-copy:
Code that needs to be relocated by the loader must have this set so that the loader can do the fix-ups. This is very common.
Sections that have code and data in single section would need this as well, to enable modifying process globals. Code & data in a single section can save space, and possibly improve locality by having code and the globals the code uses being on the same page.
Code that attempts to modify itself. I believe this is fairly rare.
Compile-time constants, particularly for long long or double values, are often loaded with a mov register, address statement from the code segment.
The one example I can think of for a reason to read code is to allow for self modifying code. Code must necessarily be able to read itself in order to be self modifying.
Also consider the opposite side. What advantage is gained from disallowing code from reading itself? I struggled for a bit on this one but I can see no advantage gained from doing so.

Question about g++ generated code

Dear g++ hackers, I have the following question.
When some data of an object is overwritten by a faulty program, why does the program eventually fail on destruction of that object with a double free error? How does it know if the data is corrupted or not? And why does it cause double free?
It's usually not that the object's memory is overwritten, but some part of the memory outside of the object. If this hits malloc's control structures, free will freak out once it accesses them and tries to do weird things based on the corrupted structure.
If you'd really only overwrite object memory with silly stuff, there's no way malloc/free would know. Your program might crash, but for other reasons.
Take a look at valgrind. It's a tool that emulates the CPU and watches every memory access for anomalies (like trying to overwrite malloc's control structures). It's really easy to use, most of the time you just start your program inside valgrind by prepending valgrind on the shell, and it saves you a lot of pain.
Regarding C++: always make sure that you use new in conjunction with delete and, respectively, new[] in conjunction with delete[]. Never mix them up. Bad things will happen, often similar to what you are describing (but valgrind would warn you).

Resources