Memory allocation algorithm - algorithm

How can we implement a data structure that allocate & tracks memory with the following constraints
Allocate and Free Memory in O(1)
Min fragmentation.
Lets say you have 1 KB unit of memory.
You need to allocate between 2kB - 64 KB memory
For eg
A- 1
B-1
C-4
D-2
0 1 2 3 4 5 6 7 8 9 10
x A x B C C C C D D x
If we allocate memory at the min address available when we free (shown as x above) memory we will have fragmentation. So in above example even if 3units are free we cant allocate memory of 3 continuous units.

Search for "buddy system" or "buddy memory allocation". That's probably the best solution you will find. Though, it is not purely O(1), it suffers from some internal fragmentation, and external fragmentation can also happen...
You can avoid internal fragmentation completely by using an augmented tree, but then operations take O(log N) time. And you still have external fragmentation.

Related

Memory access time in deep cache hierarchy

I am trying to solve an exercise question from Computer architecture textbook. The book includes the equation for calculating memory access time (MAT) for up to L2 cache (eq. below), however the exercise has upto L4 cache and off chip memory access for which I don't understand how to use the equation to calculate Avg MAT.
So, Average memory access time = Hit time_L1 + Miss rate_L1 x (Hit time_L2 + Miss rate_L2xMiss penalty_L2)
In exercise question it mentioned a cache hierarchy as -- >[32 KB L1; 128 KB L2; 2 MB L3; 8 MB L4; off-chip memory] for which the memory access time needs to be calculated.
Given, cache/latency/Miss per thousand instructions values: 32 KB/1/100, 128 KB/2/80, 512 KB/4/50, 2 MB/8/40, 8 MB/16/10. And off chip memory access requires 200 cycles on average. Also, 1000 instructions of a program, an average of 20 memory accesses may exhibit low enough locality and can't be serviced bty 2MB cache which has 20 Miss per thousand instruction.
Could anyone help me to solve the problem?

times of the Fortran matmul function with different multiplication sizes

I have calculated the time spent by the Fortran's MATMUL function with different multiplication sizes (32 × 32, 64 × 64, ...) and I have questions about the results.
These are the results:
SIZE ----- TIME IN SECONDS
32 ----- 0,000071
64 ----- 0,000032
128 ----- 0,001889
256 ----- 0,010866
512 ----- 0,043
1024 ----- 0,336
2048 ----- 2,878
4096 ----- 51,932
8192 ----- 405,921856
I guess the times should increase by a factor of 8 (m * 2 * n * 2 * k * 2). I do not know if it should be like that. If so, who can tell why it is not like that?
In addition, we see an increase of a factor of 18 with multiplications of 2048 a
4096. Could someone tell me why?
I have measured the times with CALL CPU_TIME() from Fortran and with CALL DATE_AND_TIME() from Fortran and both give very similar results.
My processor is an AMD Phenom (tm) II X4 945 Processor with 4 cores
#Steve is correct, there are many factors that affect performance especially when data sizes are small. Thats why all of your results at and below 2048 are pretty much semi-random and essentially irrelevant. All or most of the data is likely in several layers of CPU cache. So flushing CPU threads and other hardware related events are making these results very skewed. If you run these tests again you will find different results at these small sizes.
So, when you go from 2048 to 4096 you get a major jump. All the data no longer fits into the CPU caches. The computer needs to load blocks of data from RAM into the CPU caches. This explains the large jump in time.
It is at these sizes and larger that the computer has to do more typical operations (load data, perform operations, save data to RAM) and this is the performance you will get as data gets even larger. This is also where performance becomes very consistent as data grows larger. Notice that going from 4096 to 8192 is very close to exactly 8 times longer. At this point, going to 16384 will take almost exactly 8 times 406 seconds.
Any size smaller than 4096 is not giving your computer enough work to accurately measure the performance.
There should be a factor 8 between each timing, and deviations are generally due to memory management like cache alignment and cache- vs array-size. For small arrays there might be a calling overhead to matmul(). A triple do-loop can be faster, at least with some optimization (try -O3 -march=native), and should work equally well for small sizes.

Optimum memory consumption program

Write a program to find the processes which utilize the memory optimally, given the list of the processes with their memory usage and the total memory available.
Example:-
Total memory :- 10
First column denotes process id's and 2nd column is the memory consumption of respective process.
1 2
2 3
3 4
4 5
Answer should be processes {1,2,4} with memory consumption {2,3,5} as 2+3+5=10
This question is Knapsack problem
I believe you can find many sample code on Google

Implementing Stack and Queue with O(1/B)

This is an exercise from this text book (page 77):
Exercise 48 (External memory stacks and queues). Design a stack data structure that needs O(1/B) I/Os per operation in the I/O model
from Section 2.2. It suffices to keep two blocks in internal memory.
What can happen in a naive implementation with only one block in
memory? Adapt your data structure to implement FIFOs, again using two
blocks of internal buffer memory. Implement deques using four buffer
blocks.
I don't want the code. Can anyone explain me what the question needs, and how can i do operations in O(1/B)?
As the book goes, quoting Section 2.2 on page 27:
External Memory: <...> There are special I/O operations that transfer B consecutive words between slow and fast memory. For
example, the external memory could be a hard disk, M would then be the
main memory size and B would be a block size that is a good compromise
between low latency and high bandwidth. On current technology, M = 1
GByte and B = 1 MByte are realistic values. One I/O step would then be
around 10ms which is 107 clock cycles of a 1GHz machine. With another
setting of the parameters M and B, we could model the smaller access
time difference between a hardware cache and main memory.
So, doing things in O(1/B) most likely means, in other words, using a constant number of these I/O operations for each B stack/queue operations.

managed heap fragmentation

I am trying to understand how heap fragmenation works. What does the following output tell me?
Is this heap overly fragmented?
I have 243010 "free objects" with a total of 53304764 bytes. Are those "free object" spaces in the heap that once contained object but that are now garabage collected?
How can I force a fragmented heap to clean up?
!dumpheap -type Free -stat
total 243233 objects
Statistics:
MT Count TotalSize Class Name
0017d8b0 243010 53304764 Free
It depends on how your heap is organized. You should have a look at how much memory in Gen 0,1,2 is allocated and how much free memory you have there compared to the total used memory.
If you have 500 MB managed heap used but and 50 MB is free then you are doing pretty well. If you do memory intensive operations like creating many WPF controls and releasing them you need a lot more memory for a short time but .NET does not give the memory back to the OS once you allocated it. The GC tries to recognize allocation patterns and tends to keep your memory footprint high although your current heap size is way too big until your machine is running low on physical memory.
I found it much easier to use psscor2 for .NET 3.5 which has some cool commands like ListNearObj where you can find out which objects are around your memory holes (pinned objects?). With the commands from psscor2 you have much better chances to find out what is really going on in your heaps. Most commands are also available in SOS.dll in .NET 4 as well.
To answer your original question: Yes free objects are gaps on the managed heap which can simply be the free memory block after your last allocated object on a GC segement. Or if you do !DumpHeap with the start address of a GC segment you see the objects allocated in that managed heap segment along with your free objects which are GC collected objects.
This memory holes do normally happen in Gen2. The object addresses before and after the free object do tell you what potentially pinned objects are around your hole. From this you should be able to determine your allocation history and optimize it if you need to.
You can find the addresses of the GC Heaps with
0:021> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x101da9cc
generation 1 starts at 0x10061000
generation 2 starts at 0x02aa1000
ephemeral segment allocation context: none
segment begin allocated size
02aa0000 02aa1000** 03836a30 0xd95a30(14244400)
10060000 10061000** 103b8ff4 0x357ff4(3506164)
Large object heap starts at 0x03aa1000
segment begin allocated size
03aa0000 03aa1000 03b096f8 0x686f8(427768)
Total Size: Size: 0x115611c (18178332) bytes.
------------------------------
GC Heap Size: Size: 0x115611c (18178332) bytes.
There you see that you have heaps at 02aa1000 and 10061000.
With !DumpHeap 02aa1000 03836a30 you can dump the GC Heap segment.
!DumpHeap 02aa1000 03836a30
Address MT Size
...
037b7b88 5b408350 56
037b7bc0 60876d60 32
037b7be0 5b40838c 20
037b7bf4 5b408350 56
037b7c2c 5b408728 20
037b7c40 5fe4506c 16
037b7c50 60876d60 32
037b7c70 5b408728 20
037b7c84 5fe4506c 16
037b7c94 00135de8 519112 Free
0383685c 5b408728 20
03836870 5fe4506c 16
03836880 608c55b4 96
....
There you find your free memory blocks which was an object which was already GCed. You can dump the surrounding objects (the output is sorted address wise) to find out if they are pinned or have other unusual properties.
You have 50MB of RAM as Free space. This is not good.
Having .NET allocating blocks of 16MB from process, we have a fragmentation issue indeed.
There are plenty of reasons to fragmentation to occure in .NET.
Have a look here and here.
In your case it is possibly a pinning. As 53304764 / 243010 makes 219.35 bytes per object - much lower then LOH objects.

Resources