Smart pointers and cache/memory locality - c++11

I am trying to understand smart pointers in context of locality. I have looked at several questions here on S/O regarding having a std::vector of MyObjects and a std::vector of smart pointers (shared_ptr/unique_ptr) to MyObjects.
What I haven't found the answer to is, if I have a vector of 1000 smart pointers to MyObjects (using make_shared/make_unique), are the objects themselves stored in random locations in memory and the smart pointers stored in a contiguous block of memory in the vector, or are the pointers and the object both stored in the contiguous block of memory in the vector?

At least the elements in a vector are guaranteed to be stored contiguously. This is per the recent C++ standards.
The objects pointed to by the smart pointer are stored somewhere else. It will depend on the memory manager where your object will end up.
If you have a choice, it might be better (and faster) to store the objects directly into the vector. A pointer adds an extra layer of indirection which could slow your program down.

Related

Forcing (encouraging) Golang to GC long-lived structures?

I am very new to Golang, but have lots of Java and C experience.
My application has a fairly complex,hierarchical object model built from a database, ORM-style, which is intended to be long-lived. The structures are built up using pointers created with new and/or make.
There are situations when I need to flush the current objects from memory and rebuild them from the DB. Is it possible to encourage Golang to reclaim the memory by traversing the structures from the lower levels, setting references to nil?
Is there a better way to do this?
(I am starting to realize that this architecture is not really typical for Go, which prefers to keep things on the stack. However, I am not free to discard it and start over, or to change to a different programming language.)
Thanks in advance for any guidance.
The point of garbage collected languages is that you don't have to worry about memory management, you don't have to worry about memory allocations and deallocations.
The general principle is: as long as you have a "reference" to some object, it will not be garbage collected; and objects that no one has a reference of, will be garbage collected.
This principle is of course "recursive". If you have an A object holding a pointer to a B object, and there's only a single reference to A (and no one else has a reference to B), once that reference "goes away", both A and B will be garbage collected.
So if you want some complex object hierarchy to be freed, just make sure you don't keep pointers to it, and it will be freed automatically. You do not have to traverse it recursively to zero pointers.
There is one important thing to keep in mind though: If you have a pointer to some "part" of an object, that will keep the whole object in memory. For example if you create a struct, and take the address of one of its fields and store it somewhere, that pointer will keep the whole struct in memory. Similarly, if you create a slice and you take the address of one of its elements (and store it), that will keep the whole slice (or more precisely its backing array) in memory. This also applies if you reslice the slice to "hide" elements, e.g.:
s := make([]*int, 10)
s[8] = new(int)
s = s[:2]
Here we created a slice of 10 int pointers, we assigned a pointer to the index 8, then resliced the slice to only hold the first 2 elements. There is a backing array in the background with storage for 10 pointers, which is not cleared by the above slicing operation, so the pointed int (at index 8 in the original slice) will remain in memory (as long as you have a reference to the slice or its backing array). For details, see Does go garbage collect parts of slices?
See related questions:
Freeing unused memory?
Cannot free memory once occupied by bytes.Buffer

Overwriting data in memory

I've written a password manager in Ocaml. In order to make it as secure as possible, I'd like to store a string (an encryption key) in memory in such a way that it can be overwritten. Since Ocaml is pass by value , and there's a garbage collector, this has proven difficult. I encrypt all buffers and variables I can, but I still need a "session key" stored to do this. To prevent this from being detected by automated key searching programs or put into swap, it's assembled from a bunch of random data in a buffer using a random increment. So really, all I need is a single variable that can be overwritten for the assembled key for a few seconds before it's passed into the Nocrypto library... Would a reference work for this?
According to this cornell "Refs and Arrays" page, refs are mutable and work similarly to pointers in C. That being said, I also found a stack overflow answer discussing Ocaml refs, in which the answer mentions "they act like pointers to new allocated memory". Does this mean every time, it just allocates a new thing in memory instead of actually mutating the stuff in memory? If so, you couldn't really "overwrite" a ref.
Other possible solutions I've come across are Bigarrays, and "custom blocks". I'm not entirely sure if "custom blocks" are actually allocated outside of the scope of garbage collection or not. They seem like they're used to access external C code. Are they copied around by the garbage collector? Could they be "overwritten?" There's also this idea of "opaque bytes" and opaque objects in memory. I'm having a pretty hard time wrapping my head around how this all fits together. A useful but confusing (to me) discussion of custom blocks in memory on stack overflow is here: Are custom blocks ever copied in memory? Answer says they can be moved around. Even so, could they be overwritten?
The last possible solution is to store it using a Cstruct like the Nocrypto library seems to do. They discuss it in this github issue: Secret material erasure The asker states:
"Granted, most key material is Cstruct.t, which is a Bigarray.Array1.t, which is allocated outside of the GC heap"
Is this even correct? If so, I can't seem to find a source file that actually does this. I'm pretty new to Ocaml and functional programming in general. If you're curious, my program is located on github here: ocaml-pass
TL;DR;
You shall not store any secret information in OCaml heap. Thus you must never copy your secret into any OCaml heap-allocated value, consequently, neither Bytes, nor Strings or Arrays could be used, even temporary.
Introduction to the OCaml Memory Model
OCaml values are uniformly represented as tagged machine words. The least significant bit of a word is used as a tag, that distinguishes between pointers (tag=0) and immediate values (tag=1). Thus a value has always a fixed size, and is a pointer or an immediate.
Immediate values store their data in the most significant part of the word, that is 31-bits in 32 bit systems, and 63 bits in 64-bit systems. Pointers store their data in blocks, that are located in a so-called OCaml Heap. The OCaml Heap is a set of blocks managed by the Garbage Collector (GC). A block is a chunk of data prefixed with a header. The header specifies the size of data, and some other meta information, used by the GC. Block can contain OCaml values (pointers or immediate values) or opaque data.
To summarize. All OCaml values are represented as machine words, that either store data directly in the word or are pointers to heap-allocated blocks. Each pointer points to one and only one block. Several pointers may point to the same block. Such values are considered physically equal. Some blocks are not pointed by any pointers. Such blocks are called dead and are reclaimed by the GC.
Introduction to the OCaml Garbage Collector
The GC manages blocks, by allocating, moving, and deallocating them. The GC itself uses an arena, that is either obtained from the C memory allocator (malloc) or directly from a kernel via the memmap syscall (depends on a particular system and runtime).
The GC is generational, that means that values are first allocated in a special region of a heap called minor heap. The minor heap is a contiguous region of memory of fixed size, represented in the runtime with three pointers: the pointer beg to the beginning of the minor heap, a pointer end to the end of the minor heap, and the pointer cur to the beginning of the free area of the minor heap. When a block is allocated, cur is increased by the size of the block. Then the block is initialized with data. When there is no more free space in the minor heap (i.e., then end - cur is less than the required block size), then a minor GC cycle is triggered. The GC analyzes all blocks stored in the Minor Heap and copies all blocks that are referenced by at least one pointer to the Major Heap. After that, the cur pointer is set to beg.
In the Major Heap, a block may also be copied several times during a process called compaction. The compactor may try to rearrange blocks in its arena in order to achieve more compact representation of the heap.
Security Consequences
As the OCaml GC is a moving GC, it may copy the heap-allocated data arbitrarily. Although it is called moving, it is still in fact just copying. I.e., when a block is moved from the minor heap to the major heap, it is in fact just bit-copied, and thus is duplicated. The block phantom in the minor heap may live for an arbitrary amount of time until it is overwritten by some newly allocated value. When an object is moved during the compaction, it is also copied, and may or may not be overwritten during the process. And, of course, it goes without saying, that once a block becomes dead, it still may survive in a heap for an arbitrary amount of time until reused by the GC.
That all means, that if a secret ends up in the OCaml heap, it will go wild, as the GC can replicate it multiple times in an arbitrary and rather unpredictable ways. Thus, we can only store secrets either in immediate values or in regions that are not controlled by the GC. As it was said before, all OCaml values that are pointers, always point to a block in the OCaml heap. A block may contain data directly, or it could contain a pointer itself, that will point outside the memory heap. The so-called custom blocks, may or may not store their information in the OCaml heap, it depeds on a particular representation of each custom block. For example, the Bigarray library provides custom blocks that store their payload outside of the OCaml heap. Thus a Bigarray is a custom block, that has two fields: a pointer and size. It is an opaque block, i.e., the GC will never treat these two values as OCaml values, and will never follow neither the size nor the pointer. The data pointed by a pointer is located outside of the OCaml heap, and is either allocated by malloc or by memmap (in fact, it could be arbitrary integer, and even point to stack, or static data, it doesn't really matter, as long as we treat bigarrays just as a ptr,len pair).
This all makes Bigarrays ideal for storing secrets. We can be sure, that they are not moved by the GC, we can overwrite them to prevent the information leakage once they are freed.
Further considerations
We should be careful, and never allow a secret to be copied into the OCaml heap from our safe place. That means, that even if our main storage is a safe bigarray the information will still leak if we will copy its contents to an OCaml string. Consequently, if we first read the information into OCaml string, and then copy it into bigarray, the information will still leak. Thus, any interface that uses OCaml heap-allocated values is unsafe and shall not be used. For example, we can't use OCaml channels to read or write secrets (we should rely on memory mapping or unbuffered IO provided by the Unix module). And again, whenever you get a string data type from a Bigarray, you get your data copied, with all the ramifications.
I would use a value of type bytes, essentially a mutable array of bytes:
# let buffer = Bytes.make 16 'x';;
val buffer : bytes = "xxxxxxxxxxxxxxxx"
# Bytes.set buffer 0 'T';;
- : unit = ()
# buffer;;
- : bytes = "Txxxxxxxxxxxxxxx"
# Bytes.fill buffer 0 16 ' ';;
- : unit = ()
# buffer;;
- : bytes = " "
You can overwrite with Bytes.fill after you're done.

Heap Type Implementation

I was implementing a heap sort and I start wondering about the different implementations of heaps. When you don need to access the elements by index(like in a heap sort) what are the pros and cons of implementing a heap with an array or doing it like any other linked data structure.
I think it's important to take into account the memory wasted by the nodes and pointers vs the memory wasted by empty spaces in an array, as well as the time it takes to add or remove elements when you have to resize the array.
When I should use each one and why?
As far as space is concerned, there's very little issue with using arrays if you know how much is going into the heap ahead of time -- your values in the heap can always be pointers to the larger structures. This may afford for better cache localization on the heap itself, but you're still going to have to go out someplace to memory for extra data. Ideally, if your comparison is based on a small morsel of data (often just a 4 byte float or integer) you can store that as the key with a pointer to the full data and achieve good cache coherency.
Heap sorts are already not particularly good on cache hits throughout traversing the heap structure itself, however. For small heaps that fit entirely in L1/L2 cache, it's not really so bad. However, as you start hitting main memory performance will dive bomb. Usually this isn't an issue, but if it is, merge sort is your savior.
The larger problem comes in when you want a heap of undetermined size. However, this still isn't so bad, even with arrays. Anymore, in non-embedded environments with nice, pretty memory systems growing an array with some calls (e.g. realloc, please forgive my C background) really isn't all that slow because the data may not need to physically move in memory -- just some address pointer magic for most of it. Added to the fact that if you use a array-size-doubling strategy (array is too small, double the size in a realloc call) you're still ending up with an O(n) amortized cost with relatively few reallocs and at most double wasted space -- but hey, you'd get that with linked lists anyways if you're using a 32-bit key and 32-bit pointer.
So, in short, I'd stick with arrays for the smaller base data structures. When the heap goes away, so do the pointers I don't need anymore with a single deallocation. However, it's easier to read pointer-based code for heaps in my opinion since dealing with the indexing magic isn't quite as straightforward. If performance and memory aren't a concern, I'd recommend that to anyone in a heartbeat.

Is it possible to free memory using an offset pointer?

Let's say I have an allocation in memory containing a string, "ABCDEFG", but I only have a pointer to the 'E'. Is it possible, on win32, to free that block, given a pointer that is within the block, but not at the start? Any allocation method would work, but a Heap* function would be the path of least resistance.
If not a native solution, have there been any custom memory managers written which offer this feature?
EDIT: This isn't an excuse to be sloppy. I'm developing an automatic memory management system using 100% compile-time metadata. This odd requirement seems to be the only thing standing in the way of getting it working, and even then it's needed only for data types based on arrays (which are slicable).
It would be possible for the memory allocation routines in the runtime library to check a given memory address against the beginning and end of every allocated block. That search accomplished, it would be easy to release the block from the beginning.
Even with clever algorithms behind it, this would incur some kind of search with each memory deallocation. And why? Just to support erroneous programs too stupid to keep track of the beginning of the blocks of memory they allocated?
The standard C idiom thrives on treating blocks of allocated memory like arrays. The pointer returned from *alloc is a pointer to the beginning of an array, and the pointer can be used with subscripts to access any element of that array, subscripts starting at 0. This has worked well enough for 40 years that I can't think of a sensible reason to introduce a change here.
I suppose if you know what the malloc() guard blocks look like, you could write a function that backs up from the pointer you pass it until it finds a 'best guess' of the original memory address and then calls free(). Why not just keep a copy of the base pointer around?
If you use VirtualAlloc to allocate memory, you can use VirtualQuery to figure out which block a pointer belongs to. Once you have the base address, you can pass this to VirtualFree to free the entire block.

in-place realloc with gcc/linux

Is there such a thing? I mean some function that would reallocate memory without moving it if possible or do nothing if not possible. In Visual C there is _expand which does what I want. Does anybody know about equivalents for other platforms, gcc/linux in particular? I'm mostly interested in shrinking memory in-place when possible (and standard realloc may move memory even when its size decreases, in case somebody asks).
I know there is no standard way to do this, and I'm explicitly asking for implementation-dependent dirty hackish tricks. List anything you know that works somewhere.
Aside from using mmap and munmap to eliminate the excess you don't need (or mremap, which could do the same but is non-standard), there is no way to reduce the size of an allocated block of memory. And mmap has page granularity (normally 4k) so unless you're dealing with very large objects, using it would be worse than just leaving the over-sized objects and not shrinking them at all.
With that said, shrinking memory in-place is probably not a good idea, since the freed memory will be badly fragmented. A good realloc implementation will want to move blocks when significantly shrinking them as an opportunity to defragment memory.
I would guess your situation is that you have an allocated block of memory with lots of other structures holding pointers into it, and you don't want to invalidate those pointers. If this is the case, here is a possible general solution:
Break your resizable object up into two allocations, a "head" object of fixed size which points to the second variable-sized object.
For other objects which need to point into the variable-size object, store a pointer to the head object and an integer offset (size_t or ptrdiff_t) into the variable-size object.
Now, even if the variable-size object moves to a new address, none of the references to it are invalidated.
If you're using these objects from multiple threads, you should put a read-write lock in the head object, read-locking it whenever you need to access the variable-sized object, and write-locking it whenever resizing the variable-sized object.
A similar question was asked on another forum. One of the more reasonable answers I saw involved using mmap for the initial allocation (using the MAP_ANONYMOUS flag) and calling mremap without the MREMAP_MAYMOVE flag. A limitation of this approach, though, is that the allocation sizes must be exact multiples to the system's page size.

Resources