slab classes and memory allocation in memcached - caching

I recently started going through memcached source code and i came across this structure. Based on my understanding, there are approximately 64 slabs and and each slab represents a unique chunk size. If we took the first slab class ( size 80 , say ) then the pages which are belongs to this slab will have it's memory broken into 80 bytes.
typedef struct {
unsigned int size; // sizes of items
unsigned int perslab; // how many items per slab
void *slots; // list of item ptrs
unsigned int sl_curr; // total free items in list
unsigned int slabs; // how many slabs were allocated for this class
void **slab_list; // array of slab pointers
unsigned int list_size; // size of prev array
size_t requested; // The number of requested bytes
} slabclass_t;
I do not understand this line,
unsigned int slabs; // how many slabs were allocated for this class
What does he mean by how many slabs were allocated for a slab class? Every slab class must be unique right? why will there be multiple slabs within one slab class? Am i missing something?

An allocated slab of class slabclass_t is basically a chunk of memory that hosts perslab number of items of size size. If all the items in that slab are used, Memcached allocates another chunk of memory and adds it to the slab_list. These chunks of memory are also referred to as pages or slab_pages.
So if you start a new Memcached server and store one item for a slab class (say size=80), then for this slab class slabs=1. Once you store perslab+1 items in that class you will have slabs=2 and the slab_list will contain 2 items.
Basically, you have a slab_list and slabs its length whereas list_size is its capacity.
I derived most of this from slabs.c so correct me if I got something wrong.

Related

Efficiently select a available number within a range

I need to use and recycle ids(ints) within a range say from 1 to 20million.
what is the most efficient way to do this?
Somethings i tried.
Generate a running sequence of numbers from 1 to k and store in a
map. after k numbers if lets say if id 2 becomes free we delete it
from the map. And continue our next id from k+1 (it will be good if
i can choose the id that was freed from the beginning(2) instead of
k+1. how can i do this ? )
Generate random numbers in between range 1 to 20 million and check
if its already used with a map lookup, if yes, choose another random
number or do number+1 until map lookup fails.
Storing all numbers from 1 to 20million in a set and taking one by one for use and add back when it's freed( this will have bigger
memory footprint and don't want to do this)
What is the most efficent way to solve this problem, if lets say around
50% of ids are used at any point of time
A space-efficient solution is to use a bit-mask to keep track of free entries. 20M bits is only 2.5MB.
If about half of them will be free, then when you need to allocate a new ID, you can just start at a random spot and walk forward until you find an entry with a free bit.
If you need a guaranteed time bound, then you can use an array of 64-bit words for your bit mask, and a bit mask of 1/64 the size to keep track of which words have free entries. Recurse until you get to one or two words.
If space isn't a problem, then the simplest fast way is to keep free IDs in a free list. That requires an array of up to 20M integers. You remember the last entry freed, and for every free node x, array[x] is the index of the preceeding freed node, or -1.
If your IDs actually point to something, then often you can use the very same array for the free list and the pointers, so the free list takes no extra memory at all.
20M of integers is about 80 Mb of RAM. If we are talking about Java, then according to this article, HashSet<Integer> can take up to 22 times more space, so it's about 1.7 Gb, wow.
You can implement your own bitset that supports fast selection of the next free ID. Bitset should take only about 2.4 Mb of RAM and we can find the next free ID in O(1). Haven't checked the code, it's mostly an idea:
int range = 20_000_000;
long[] bitset = new long[range / 64 + 1]; // About 2.4 Mb of RAM, array length is 312501
Stack<Integer> hasFreeIds = new Stack<Integer>(); // Slots in bitset with free IDs
for (int i = 0; i < bitset.length; ++i) { // All slots have free IDs in the beginning
hasFreeIds.push(i);
}
// Now `hasFreeIds` is about (8 + 4) * 312_000 bytes = ~4Mb of RAM
// Our structure should be ~6.4 Mb of RAM in total
// Complexity is O(1), so should be fast
int getNextFreeId() {
// Select the first slot with free IDs
int freeSlotPos = hasFreeIds.pop();
long slot = bitset[freeSlotPos];
// Find the first free ID
long lowestZeroBit = Long.lowestOneBit(~slot);
int lowestZeroBitPosition = Long.numberOfTrailingZeros(lowestZeroBit);
int freeId = 64 * freeSlotPos + lowestZeroBitPosition;
// Update the slot, flip the bit to mark it as used
slot |= lowestZeroBit;
bitset[freeSlotPos] = slot;
// If the slot still has free IDs, then push it back to our stack
if (~slot != 0) {
hasFreeIds.push(freeSlotPos);
}
return freeId;
}
// Complexity is also O(1)
void returnId(int id) {
// Find slot that contains this id
long slot = bitset[id / 64];
boolean slotIsFull = (~slot == 0L); // True if the slot does not have free IDs
// Flip the bit in the slot to mark it as free
int bitPosition = id % 64;
slot &= ~(1 << bitPosition);
bitset[id / 64] = slot;
// If this slot was full before, we need to push it to the stack
if (slotIsFull) {
hasFreeIds.push(id / 64);
}
}
Theoretically speaking, the fastest would be storing all free IDs in a linked list.
That is, push 20M sequential numbers into a linked list. To allocate an ID pop it from the front. And when an ID is free - push it at either top or bottom depending on your preferred staregy (i.e. would you reuse freed IDs first, or only after each preallocated one was used).
This way both allocating an ID and freeing it is O(1).
Now, as an optimization you don't really need to preallocate all your IDs. You should only store the highest ID allocated. When you need to allocate an ID and the list of free IDs is empty - just increase the highest ID variable and return it.
This way your list will never reach big numbers, unless they were really allocated and returned.

How to design a data structure to store unlimited data you can say infinite data that are always in increasing order

I have to design a data structure that supports billions of data insertion that are always in increasing order, make sure insertion, deletion and search operations complexity is minimum.
suppose data is defined as
typedef struct data
{
int a;
int b;
}data;
where b is always increasing.
I have implemented DS as a dynamic list of 1000 data in a single block with a next block pointer. e.g
typedef struct block
{
int size; // curr element count.
int capacity; //1000 elements.
data* pArr; // malloc 1000 data elements.
struct block* pNext;
}block;
My data structure is good in searching and insertion but after deletion of data it is not reusable, rebuilding is costly.
I am looking for a better DS implementation.

C++ :: two-dimensional matrix, dynamic in one dimension, using unique_ptr?

I have a large genetic dataset (X, Y coordinates), of which I can easily know one dimension (X) during runtime.
I drafted the following code for a matrix class which allows to specify the size of one dimension, but leaves the other one dynamic by implementing std::vector. Each vector is new'd using unique_ptr, which is embedded in a C-style array, also with new and unique_ptr.
class Matrix
{
private:
typedef std::vector<Genotype> GenVec;
typedef std::unique_ptr<GenVec> upGenVec;
std::unique_ptr<upGenVec[]> m;
unsigned long size_;
public:
// ...
// construct
Matrix(unsigned long _size): m(new upGenVec[_size]), size_(_size)
{
for (unsigned long i = 0; i < this->size_; ++i)
this->m[i] = upGenVec(new GenVec);
}
};
My question:
Does it make sense to use this instead of std::vector< std::vector<Genotype> > ?
My reasoning behind this implementation is that I only require one dimension to be dynamic, while the other should be fixed. Using std::vectors could imply more memory allocation than needed. As I am working with data that would fill up estimated ~50GB of RAM, I would like to control memory allocation as much as I can.
Or, are there better solutions?
I won't cite any paragraphs from specification, but I'm pretty sure that std::vector memory overhead is fixed, i.e. it doesn't depend on number of elements it contains. So I'd say your solution with C-style array is actually worse memory-wise, because what you allocate, excluding actual data, is:
N * pointer_size (first dimension array)
N * vector_fixed_size (second dimension vectors)
In vector<vector<...>> solution what you allocate is:
1 * vector_fixed_size (first dimension vector)
N * vector_fixed_size (second dimension vectors)

Memory allocation and structuring

How can we use a contiguous block of memory in such a way that some part of it links with the remaining part? For example if I allocate a contiguous block of bytes using malloc, and now I want to structure it in such a way that the initial part of the block will be structured as pointers which points to remaining part. That means the pointers and the pointing objects should be contiguous...??
The question doesn't make much sense to me. Let's assume you wanted nItems of the size sizeBytes (meaning they are all the same size), you would not need to store the pointers, because you can compute the offsets to the allocated memory whenever you needed it. So you are probably missing some criteria in your question. Here is how you'd do that:
void *block = malloc(nItems * sizeBytes);
Then to reach into the n-thobject, you'd simply do:
void *myMemory = block + n * sizeBytes;
You'd want to possibly do some bounds checking there...
But that's too easy, so I'm guessing you really have structures of different sizes that you want to allocate in a single malloc and get access to. So it's not just a question of figuring out what the address of a "sub block of memory" is, but you'd want to know how to cast it so that you can later make sense of the object (assuming it's a C structure). So I guess I'd have to say I remain confused by the question overall.
You'll probably want / need something like the pointer, the size, and the type of structure each "sub block" of memory is supposed to be. This would then indicate what your header information should probably look like. Roughly, you'd want to compute the storage required for your 'meta data' and then the 'payload data', and malloc those things together.
But it's not a trivial thing to implement, because you have to figure out how you tell your function that allocates / initializes the memory block what the mix of objects is going to be (and the sequence of the layout of each sub-object).
I'm afraid this question is woefully underspecified.
If you want a 2D array of one type of object, you can do it like this:
int entries = xSize * ySize; // create a 2D array of xSize by ySize dimensions
size_t buffSize = entries * objectSize; // objectSize is number of bytes for your object
void *block = malloc(buffSize);
Now to access any entry in your 2D array:
void *thingie = block + y * xSize + x;
Now thingie points to block that corresponds to x, y. If you wanted to, you can also change the layout of your memory object. Above I did row-major. You could do:
void *thing = block + x * ySize + y;
That would be column major. The above can extend to n-dimensions:
int entries = xSize * ySize * zSize; // create a 3D array of xSize, ySize, zSize dimensions
size_t buffSize = entries * objectSize; // objectSize is number of bytes for your object
void *block = malloc(buffSize);
And then:
void *thingie = block + z * ySize * xSize + y * xSize + x;
to get to your record in 3D cube. You can take this to any dimension that you want, of course, you'll blow up your memory sooner than later if you're dealing with large objects in large dimensional spaces.

Using a LinkedList or ArrayList for iteration

If I am be adding an unknown number of elements to a List, and that list is only going to be iterated through, would a LinkedList be better than an ArrayList in the particular instance (Using Java, if that has any relevance)
The performance trade-offs between ArrayList and LinkedList have been discussed before, but in short: ArrayList tends to be faster for most real-life usage scenarios. ArrayList will cause less memory fragmentation and will play nicer with the Garbage Collector, it will use up less memory and allow for faster iteration, and it will be faster for insertions that occur at the end of the list.
So, as long as the insertions in the list always occur at the last position, there's no reason to pick LinkedList - ArrayList is the clear winner.
Okay Its been already answered but I will still try to put my point.
ArrayList is faster in iteration than LinkedList. The reason is same because arrayList is backed by an array. Lets try to understand whay array iteration is faster then linkedList.
There are 2 factors that work for it
Array is stored as contiguous memory locations (You can say then
what?)
System cache is much faster then accessing memory
But you can ask how Cache fits here. Well check here, CPU tries to take leverage of caches by storing data in cache. It uses Locality of refrence.Now there are 2 techniques which are
Reference Locality of refrence
Temporal locality
If at one point a particular memory location is referenced, then it is likely that the same location will be referenced again in the
near future. There is a temporal proximity between the adjacent
references to the same memory location. In this case it is common to
make efforts to store a copy of the referenced data in special memory
storage, which can be accessed faster. Temporal locality is a special
case of spatial locality, namely when the prospective location is
identical to the present location.
Spatial locality
If a particular storage location is referenced at a particular time, then it is likely that nearby memory locations will be
referenced in the near future. In this case it is common to attempt to
guess the size and shape of the area around the current reference for
which it is worthwhile to prepare faster access.
So if one array location is accessed at a time it will load the adjacent memory locations in cache too. But wait it will not load all. It depends on CACHE_LINES. Well CACHE_LINES define how much bits can be loaded in cache at a time.
So before diving further lest remind what we know
Array is contiguous memory locations
When one memory location of array is accessed adjacent also loaded in memory
How much array memory locations are loaded in memory is defined by CACHE-LINES capacity
SO whenever CPU tries to access a memory location it check if that memory is already in cache. If its present it match else its cache miss.
So from what we know in case of array there will be less cache_miss as compared to random memory locations as in linked list. So it makes sense
and finally from here Array_data_structure from Wikipedia it says
In an array with element size k and on a machine with a cache line
size of B bytes, iterating through an array of n elements requires the
minimum of ceiling(nk/B) cache misses, because its elements occupy
contiguous memory locations. This is roughly a factor of B/k better
than the number of cache misses needed to access n elements at random
memory locations. As a consequence, sequential iteration over an array
is noticeably faster in practice than iteration over many other data
structures, a property called locality of refrence
I guess that answers your question.
For iterating both will have the same O(n) complexity on iterating, ArrayList will take less memory BTW.
public List<Integer> generateArrayList(int n) {
long start = System.nanoTime();
List<Integer> result = new ArrayList<>();
for (int i = 0; i < n; i++) {
result.add(i);
}
System.out.println("generateArrayList time: " + (System.nanoTime() - start));
return result;
}
public List<Integer> generateLinkedList(int n) {
long start = System.nanoTime();
List<Integer> result = new LinkedList<>();
for (int i = 0; i < n; i++) {
result.add(i);
}
System.out.println("generateLinkedList time: " + (System.nanoTime() - start));
return result;
}
public void iteratorAndRemove(List<Integer> list) {
String type = list instanceof ArrayList ? "ArrayList" : "LinkedList";
long start = System.nanoTime();
Iterator<Integer> ite = list.iterator();
while (ite.hasNext()) {
int getDataToDo = ite.next();
ite.remove();
}
System.out.println("iteratorAndRemove with " + type + " time: " + (System.nanoTime() - start));
}
#org.junit.Test
public void benchMark() {
final int n = 500_000;
List<Integer> arr = generateArrayList(n);
List<Integer> linked = generateLinkedList(n);
iteratorAndRemove(linked);
iteratorAndRemove(arr);
}
Arraylist is useful for get random position value, linkedlist useful for insert, remove operate. Above code will show linkedlist very faster than ArrayList, in remove function linkedlist faster than arraylist 1000 times, OMG!!!
generateArrayList time: 15997000
generateLinkedList time: 15912000
iteratorAndRemove with LinkedList time: 14188500
iteratorAndRemove with ArrayList time: 13558249400

Resources