Best heuristic for malloc - algorithm

Consider using malloc() to allocate x bytes of memory in a fragmented heap. Assume the heap has multiple contiguous locations of size greater than x bytes.
Which is the best (that leads to least heap wastage) heuristic to choose a location among the following?
Select smallest location that is bigger than x bytes.
Select largest location that is bigger than x bytes.
My intuition is smallest location that is bigger than x bytes. I am not sure which is the best in practice.
No, this is not any assignment question. I was reading this How do malloc() and free() work? and this looks like a good follow up question to ask.

In a generic heap where allocations of different sizes are mixed, then of the two I'd go for putting the allocation in the smallest block that can accomodate it (to avoid reducing the size of the largest block we can allocate before we need to).
There are other ways of implementing a heap however that would make this question less relevant (such as the popular dlmalloc by Doug Lea - where it pools blocks of similar sizes to improve speed and reduce overall fragmentation).
Which solution is best always comes down to the way the application is performing its memory allocations. If you know an applications pattern in advance you should be able to beat the generic heaps both in size and speed.

It's better to select the smallest location. Think about future malloc requests. You don't know what they'll be, and you want to satisfy as many requests as you can. So it's better to find a location that exactly fits your needs, so that bigger requests can be satisfied in the future. In other words, selecting the smallest location reduces fragmentation.

The heuristics you listed are used in the Best Fit and Worst Fit algorithms, respectively. There is also the First Fit algorithm which simply takes the first space it finds that is large enough. It is approximately as good as Best Fit, and much faster.

Related

Algorithms for memory allocation which produces low fragmentation

I've read Modern Operating Systems 4th Edition, Andrew Tanenbaum in which are presented some ways to handle the memory management (with bitmaps, with linked lists) and some of the algorithms that can be used to allocate memory (first fit, next fit, best fit, worst fit, quick fit), which are different but there is not one that is the best.
I'm trying to make my own memory allocator which will prioritize to have as low as possible external fragmentation (blocks of memory that are too small to be used) and the speed of allocation/deallocation (the first thing being the low fragmentation than the speed). I implemented the worst fit (thinking this will produce as little as possible external fragmentation because it always chooses the biggest contiguous space of memory when allocating and the remaining of that memory is enough to be used later for another allocation. I implemented it using a sorted list descending for free spaces and a set for allocated spaced sorted by address. The complexity for allocation is O(1) + the cost of maintaining the list sorted and for deallocation O(log n1) - for finding the address + O(n2) - for parsing the list of the free spaces and inserting the address found. n1= elements of the set, n2 = elements of the list.
I have multiple questions. First is how can I improve the algorithm? Second, what other algorithms used for memory allocation exists that will prioritize the fragmentation of memory? Third, are there any improved versions of the algorithms that I listed that will prioritize the fragmentation of memory? I want to know as many algorithms/methods of improving the algorithms that I know, that will reduce the external fragmentation.

Memory Management - WorstFit vs. BestFit algorithms

I understand the idea behind both BestFit and WorstFit memory schedulers.
I am wondering which approach yields the lowest time in the job queue.
Since WorstFit slows the rate at which small holes in memory are made, does that mean that it will result in a lower average job queue wait time?
I have discovered the answer. For future viewers, Worst Fit maintains, on average, a lower job queue time. It is a direct result of the characteristic of Worst Fit.
With a minimal memory compaction approach (only compacting empty frames who are both adjacent in memory and in the linked list), Worst Fit postpones creating slivers of empty memory.
However, with a more complete memory compaction algorithm (compaction adjacent frames in memory irrespective of their location in the linked list), Worst Fit and Best Fit will operate almost identically. While they choose their frames differently, the OS will always work harder to compact memory and create empty spaces to allocate to incoming processes.

Hashtable for a system with memory constraints

I have read about the variants of hashtable but it is not clear to me which one is more appropriate for a system that is low on memory (we have a memory constraint limit).
Linear/Quadratic probing works well for sparse tables.
I think Double hashing is the same as Quadratic in this aspect.
External chaining does not have issue with clustering.
Most textbooks I have checked seem to assume that a extra space will always be available but practically in most example implementations I have seen since the hashtable is never halved take much more space than really needed.
So which variant of a hashtable is most efficient when we want to make the best usage of memory?
Update:
So my question is not only about the size of the buckets. My understanding is that both the size of the buckets and the performance under load is what matters. Because if the size of the bucket is small but the table degrades on 50% load then this means we need to resize to a larger table often.
See this variant of Cukoo Hashing.
This will require from you more hash functions, but, it makes sense - you need to pay something for the memory savings.

Efficient way to grow a large std::vector

I'm filling a large (~100 to 1000MiB) std::vector<char> with data using std::ifstream::read(). I know the size of this read in advance, so I can construct the vector to that size.
Afterwards however, I keep reading from the file until I find a particular delimiter. Any data up to that delimiter is to be added to the vector. It's 500KiB at worst, usually much less.
Considering the vector's size, I'm wondering if this causes an expensive growth (and reallocation). Memory is an issue here, as the vector's size is to remain fairly close to that of its construction.
Is it a good solution to extend the vector's capacity slightly beyond its initial size using std::vector::reserve, so that the small amount of extra data doesn't require it to grow? If so, it's probably best to construct the vector empty, reserve the capacity and then resize it to its initial size, right?

Why would anyone use best fit memory allocation?

I'm reading Modern Operating Systems by Andrew Tanenbaum, and he writes that best fit is a widely used memory allocation algorithm.
He also writes that it's slower than first fit/next fit since it have to search the entire list of allocated memory. And that it tends to waste more memory since it leaves behind a lot of small useless gaps in memory.
Why is it then widely used? Is it some obvious advantage i have overlooked?
First, it's is not that widely used (like all sequential fits), except, perhaps, in homeworks ;). In my opinion, the widely used strategy is segregated fits (which can very closely approximate best fit).
Second, best fit strategy can be implemented by using a tree of free lists of various sizes
Third, it's considered one of the best policies with regard to memory fragmentation
See
Dynamic Storage Allocation: A Survey and Critical Review
The Memory Fragmentation Problem: Solved?
for information about memory management, not Tannenbaum.
I think it's a mischaracterisation to say that it wastes more memory than first fit. Best fit maximizes available space compared to first fit, particularly when it comes to conserving space available for large allocations. This blog post gives a good example.
Space efficiency and versatility is really the answer. Large blocks can fit unknown future needs better than small blocks, so a best-fit algorithm tries to use the smallest blocks first.
First-fit and next-fit algorithms (that can also cut up blocks) may end up using pieces of the larger block first, which increases the risk that a large malloc() will fail. This is essentially harm from large blocks of external fragmentation.
A best-fit algorithm will often find fits that are only a few bytes larger, leading to fragmentation that is only a few bytes, while also saving the large blocks for when they're needed. Also, leaving the large blocks untouched as long as possible helps cache locality and minimizes the load on the MMU, minimizing costly page faults and and saving memory pages for other programs.
A good best-fit algorithm will properly maintain its speed even when it's managing a large number of small fragments, by increasing internal fragmentation (which is hard to reclaim) and/or by using good lookup tables and search trees.
First-fit and next-fit still also face their own searching problems. Without good size indexing in these algorithms, they still have to spend time searching through blocks for one that fits. Since their "standards are lower," they may find a fit faster using a straightforward search, but as soon as you add intelligent indexing, the speeds between all algorithms becomes much closer.
The one I've been using and tweaking for the last 6 years can find the best fit block in O(1) time for >90% of all allocs. It utilizes a handful of strategies to jump straight to the right block, or start very close so searching is minimized. It has, on more than one occasion, replaced existing block-pool or first-fit algorithms due to it's performance and ability to pack allocations more efficiently.
Best fit is not the best allocation strategy, but it is better than first fit and next fit. The reason is because it suffers from less fragmentation problems than the latter two.
Consider a micro heap of 64 bytes. First we fill it by allocating one 32 and two 16 byte blocks in that order. Then we free all blocks. There are now three free blocks in the heap, one 32 byte and two 16 byte ones.
Using first fit, we allocate one 16 byte block. We do it using the 32 byte block (because it is first in the heap!) and the remainder 16 bytes of that block is split into a new free block. So there are one 16 byte allocated block at the beginning of the heap and then three free 16 bytes block.
What happens if we now wants to allocate a 32 byte block? We can't! There are still 48 bytes free in the heap, but fragmentation has screwed us over.
What would have happened if we had used best fit? When we were searching for a free block to use for our 16 byte allocation, we would have skipped over the 32 byte block at the beginning of the heap and instead picked the 16 byte block after it. That would have preserved the 32 byte block for larger allocations.
I suggest you draw it on paper, that makes it very easy to see what goes on with the heap during allocation and freeing.

Resources