How many instances of vm_area_struct can be there - linux-kernel

I got this section from Robert love
The vm_area_struct structure describes a single memory area over a contiguous
interval in a given address space.
Question : So this means only one vm_area_struct is there to refer to the single contiguous memory, or for each user / kernel thread we have a vm_area_struct instance ?

vm_area_struct have single virtual contiguous memory,for one user there may be many vm_area_struct.

Related

How is the heap divided up among processes?

I understand that each process has their own, separate heap unlike threads (which share a common heap, which thus slows heap memory allocation down as functions like malloc need to use locks for synchronization). However, how does it get decided where, and how much, memory is given to each process, and how is it ensured that this does not conflict with the memory allocated to other processes?
I have not been able to find a definitive answer on this through searching, but if one exists, please provide a link as I would greatly appreciate it. Thank you!
In order to answer the question, you need to understand about virtual memory. In virtual memory, the memory is contiguous as to what user processes can see. The heap is given a very big about of the virtual memory which is limited only by the amount of physical RAM and swap space to back the allocations. In itself the process only sees a contiguous virtual address space. On Linux, the memory allocations are done using the buddy algorithm and the kernel keeps a page struct for every page. The page struct along with the memory map of the process in the task_struct thus allows the Linux kernel to follow what page is free and which isn't.

linux kernel, struct bio : how pages are read/written

I'm reading LDD3 and messing up with the kernel source code. Currently, I'm trying to fully understand the struct bio and its usage.
What I have read so far:
https://lwn.net/images/pdf/LDD3/ch16.pdf
http://www.makelinux.net/books/lkd2/ch13lev1sec3
https://lwn.net/Articles/26404/
(a part of) https://www.kernel.org/doc/Documentation/block/biodoc.txt
If I understand correctly, a struct bio describes a request for some blocks to be transferred between a block device and system memory. The rules are that a single struct bio can only refer to a contiguous set of disk sectors but system memory can be non-contiguous and be represented by a vector of <page,len,offset>, right?. That is, a single struct bio requests the reading/writing of bio_sectors(bio) (multitude) sectors, starting with sector bio->bi_sector. The size of data transferred is limited by the actual device, the device driver, and/or the host adapter. I can get that limit by queue_max_hw_sectors(request_queue), right? So, if I keep submitting bios that turn out to be contiguous in disk sectors, the I/O scheduler/elevator will merge these bios into a sigle one, until that limit is reached, right?
Also, bio->size must be a multiple of 512 (or the equivalent sector size) so that bio_sectors(bio) is a whole number, right?
Moreover, these bio_sectors(bio) sectors will be moved to/from system memory, and by memory we mean struct pages. Since there is no specific mapping between <page,len,offset> and disk sectors, I assume that implicitly bio->bi_io_vec are serviced in order or appearence. That is, the first disk sectors (starting at bio->bi_sector) will be written from / read to bio->bi_io_vec[0].bv_page then bio->bi_io_vec[1].pv_page etc. Is that right? If so, should bio_vec->bv_len be always a multiple of sector_size or 512? Since a page is usually 4096bytes, should bv_offset be exactly one of {0,512,1024,1536,...,3584,4096}? I mean, does it make sense for example to request 100bytes to be written on a page starting at offset 200?
Also, what is the meaning of bio.bio_phys_segments and why does it differ from bio.bi_vcnt? bio_phys_segments is defined as "The number of physical segments contained within this BIO". Isn't a triple <page,len,offset> what we call a 'physical segment'?
Lastly, if a struct bio is so complex and powerfull, why do we create lists of struct bio and name them struct request and queue them requests in the request_queue? Why not have a bio_queue for the block device where each struct bio is stored until it is serviced?
I'm a bit confused so any answers or pointers to Documentation will be more than useful! Thank you in advance :)
what is the meaning of bio.bio_phys_segments?
The generic block layer can merge different segments. When the page frames in memory and the chunks of disk data, that are adjacent on the disk, are contiguous then the resultant merge operation creates a larger memory area
which is called physical segment.
Then what is bi_hw_segments?
Yet another merge operation is allowed on architectures that handle the mapping between bus addresses and physical addresses through a dedicated bus circuitry. The memory area resulting from this kind of merge operation is called hardware segment. On the 80 x 86 architecture, which has no such dynamic mapping between bus addresses and physical addresses,hardware segments always coincide with physical segments.
That is, the first disk sectors (starting at bio->bi_sector) will be written from / read to bio->bi_io_vec[0].bv_page then bio->bi_io_vec[1].pv_page etc.
Is that right? If so, should bio_vec->bv_len be always a multiple of sector_size or 512? Since a page is usually 4096bytes, should bv_offset be exactly one of {0,512,1024,1536,...,3584,4096}? I mean, does it make sense for example to request 100bytes to be written on a page starting at offset 200?
The bi_io_vec contains the page frame for the IO. bv_offset is the offset in the page frame. Before actual writing/reading on the disk every thing is mapped to sector as disk deals in sectors. This doesn't imply that length has to be in the multiple of sectors. So this will result into unaligned read/writes which is taken care by underlying device driver.
if a struct bio is so complex and powerfull, why do we create lists of struct bio and name them struct request and queue them requests in the request_queue? Why not have a bio_queue for the block device where each struct bio is stored until it is serviced?
Request queue is per device structure and takes care of flushing. Every block device has its own request queue. And bio structure is generic entity for IO. If you incorporate request_queue featues into bio then you will create a single global bio_queue and that too very heavy structure. Not a good idea. So basically these two structures serve different purposes in context of IO operation.
Hope it helps.

Understanding the Buddy Allocator

I have a conceptual doubt in understanding the way Linux Kernel manages Free blocks. Here is what I interpreted through reading so far.
The Buddy Allocator implementation is allocation scheme that combines a normal power-of-2 allocation.
At times when we need a block of size which is not available, it divides the large block into two. Those two blocks are Buddies, probably hence it is called the Buddy Allocator.
Through a source I learnt that an array of free_area_t structs are maintained for each order that points to a linked list of blocks of pages that are free.
Which I found in <linux/mm.h>
typedef struct free_area_struct {
struct list_head free_list;
unsigned long *map;
} free_area_t;
The free_list appear to be a linked-list of page blocks? My question is, whether it is a list of Free pages or Used pages?
And map appears to be a bitmap that represents the state of a pair of buddies.
My question is How can it be a single-bit that holds the state bit for a pair of buddies? Because if, I use one of the block in a Buddy-pair to allocats, and the other left free, what would be the state then, and how is that managed to be stored in a single bit? Does it represent the entire block of the size of power-of-two, which can be divided in two parts when we need a block size which is not available, so the allocated half is Buddy of the other half which is free? If this is the case that half is being allocated and half remains free, then what will be status of map ? What if both are free? and what if both are allocated? How can be a binary value representing 3 states of a block?
Edit: After further reading, the first doubt is cleared. Which says: If a free block cannot be found of the requested order, a higher order block
is split into two buddies. One is allocated and the other is placed on the free list for
the lower order. So it is linked list of free pages.
map represents the state of a single memory block at the lowest level.

SLAB memory management

I'm confused as to the structuring of the SLAB memory management mechanism.
I get that there are multiple 'caches' that are specific to commonly used data objects, but why does each cache contain multiple 'slabs'?
What differentiates each slab within a cache? Why not simply have the cache filled with the data objects themselves? Why does there need to be this extra layer?
The slab allocator is an abstraction layer to make easier allocation of numerous objects of a same type.
The interface offers the function
struct kmem_cache * kmem_cache_create(const char *name,
size_t size, size_t align, unsigned long flags,
void (*ctor)(void*));
This function creates a new slab allocator that will be able to handle allocation of size-bytes long objects. If the creation is successful, you get your pointer to the related struct kmem_cache. This structures holds information about the slabs it manages.
As it is implied on the Wikipedia description, the purpose of such an extra layer is to prevent memory fragmentation issues that could happen if the memory allocation was made in a simple and intuitive manner. To do so, it introduces the notion of slab through the following data structure :
struct slab {
struct list_head list; /* embedded list structure */
unsigned long colouroff;
void *s_mem; /* first object in the slab */
unsigned int inuse; /* allocated objects in the slab */
kmem_bufctl_t free; /* first free object (if any) */
};
Thus, the kmem_cache object holds 3 lists of its slabs, gathered in 3 flavours :
Empty slabs : these slabs do not contain an in-use object.
Partial slabs : these slabs contain objects currently used but there is still memory area that can hold new objects.
Full slabs : these slabs contains objects being used and cannot host new objects (full ...).
When requesting an object through the slab allocator, it will try to get the required memory area within a partial slab, if it cannot, it will get it from an empty slab.
If you are eager to collect more information about this, you should take a look at Robert Love's Linux Kernel Development
I may be too late to answer this, but this may help others as well.
As I see from Understanding Linux Virtual Memory Manager, having slabs has three major benefits.
Reducing internal fragmentation caused by buddy system. Because we have caches that best suits smaller objects.
Better hardware cache usage - by aligining objects to start at different offsets in different slabs so that interference between cache lines can be reduced. This is based on assumption that we have physically indexed cache.
A slab is primary unit in cache, which is acquired/relinquished at once. This causes reduction in external fragmentation as well.
See section 3.2 from The Slab Allocator: An Object Caching Kernel memory Allocator (1994).

dynamic memory allocation

When we dynamically allocate memory, does the memory occupies the continuous memory segment.
Yes, the allocation is virtually contiguous (if you got it with one malloc() call). It may not be physically contiguous, but from an application perspective, you don't usually care.
It depends on what exactly you are asking. For example, let's say you have this C code:
char* a = malloc(100);
char* b = malloc(100);
a and b pointers each have 100 bytes allocated to them. However, you cannot assume that the 100 bytes allocated to b will be right after the 100 bytes allocated to a, or vice versa, or anything, in fact, about their positions relative to each other. So in that sense, no, they are not contiguous.
Within each block of 100 bytes, however, those 100 bytes are contiguous from the viewpoint of your program. That is, a[1] is one byte away from a[0] and a[2].
You should separate the concept of virtual memory from the one of physical memory.
While every allocated chunk (either a single object, or an array of objects) has a contiguous virtual space (starting from the address that your dynamic memory allocator gives to you), it can be splitted in real memory according to how the underlying operating system manages memory.
Of course if virtual memory is not present they will correspond, otherwise it's contiguous for the program that is using it but not in the physical layout of the memory..
Not necessarily, and usually no. There may be different allocation mechanisms.
Many will store metadata between allocated chunks, split heaps according to object sizes, and other things. You cannot rely on continuity of returned pointers.

Resources