Ext2 File system Block bitmap - linux-kernel

I was reading Ext2 file system details, and I am not clear with the fact that the number of blocks in a block group is (b x 8) where b is the block size.
How have they arrived at this figure. What is the significance of 8.

For each group in a filesystem ext2 there is a block bitmap, which keeps track of which blocks are used (bit equals 1) and which are still free (bit equals 0). This structure is designed to occupy exactly one block. Hence, the number of bits in the block bitmap is equal to b x 8, where b is the block size expressed in bytes.
Blocks in the group must not outnumber bits in the block bitmap - otherwise we would not be able to keep information on their availability. At the same time we want groups to manage maximal possible number of blocks in order to limit space occupied by metadata. Therefore, the number of blocks in the group equals the maximum: b x 8.

Related

How to calculate cache block size from its overhead?

I've being looking for this for a lot of time now (over 3 days) without luck. Maybe one of you guys can tell me how can I solve it.
Consider you have a computer with a 16-bit size address and a byte addressable memory. The cache is 2-way set-associative mapped, write-back policy and a perfect LRU replacement strategy. Cache has an overhead of 4352 bits. What's the size of the block?
Very few resources talk about overhead and the ones I've found only relate it to total cache size. The problem is I only know how to calculate cache size with #blocks or at least with the fields of the address properly defined (which I have not being able to do for this problem since I can't calculate the size of the tag.).
Any help would be appreciated.
So, here's how I read this question:
Overhead bits are the bits that don't count toward the actual data that is being cached.  They are bits that track maintenance state of the cache, and help the cache implement hits, write back, and eviction policy.  To some way of looking at it, if one byte is being cached (8 bits) how many non-data bits are in the cache to help manage that (or at least for all the actual data bits how many non-data/overhead bits are there).
This is mathematical, so I hope I haven't made an error, but even if I have maybe you can see your way through the reasoning.
Let's derive some additional information:
A write-back policy means the cache needs to store "dirty" information for each data block: dirty is 1-bit: yes, dirty -or- no, clean.
For 2-way set associative cache, a "perfect" LRU algorithm is also 1 bit (yes: first block -or- no: second block) but this 1 bit costs per index position (i.e. per line) — not per block as there are two blocks per index.
What we don't know is if there is a valid bit, which would also be per data block, but most caches I see in coursework have the valid bits, so we might assume they have it.
And lastly, there's the tag bits where tag bits are: however many bits are leftover in the address space bits after accounting for index bits and block offset bits.
So, a formula for overhead might be:
overhead in bits = index positions * (1 x LRU bit + block overhead bits)
where block overhead bits = 2 [ways] * (1 x Dirty bit + 1 x Valid bit + tag bits)
We also know that tag bits = address space bits - index bits - block bits
So, we have:
4352 [overhead in bits] = index positions * (1 + 2 * (2 + tag bits))
-and-
tag bits = address space bits - index bits - block offset bits
-and-
index positions = 2index bits
-and-
We also know that the number of tag, index, and block offset bits has to be an integer (no fractions of bits).
So, we can begin to reduce those two formulas by substituting:
4352 = index positions * (1 + 2 * (2 + address space bits - index bits - block bits)
by reduction also then:
4352 = 2index bits * (1 + 2 * (2 + 16 - index bits - block bits)
Solving for block bits we have:
-((4352/2index bits - 1)/2 - 18 + index bits) = block bits
I don't know how to solve this directly mathematically, given the constraint that the variables must be integers, so, instead of solving directly, simply try/search different values:
If index bits is 7 then by this formula, block bits is fractional, so that doesn't work.
If index bits is 9 then by this formula, block bits is fractional, so that doesn't work.
No other values between 0 and 16 result in an integer number of bits, except:
If index bits is 8 then by this formula, block bits is 2, so:
16 = tag bits + 8 + 2, meaning tag bits is 6, index bits is 8, and block offset is 2.
Since block offset is 2 then block size is 22.

How to count # of cache misses in theory for a matrix in memory exceeding cache size?

I'm currently considering an n x n matrix M of 64-bit integer elements stored in main memory in row-major order. I have an L1 data cache of 16KB split in 64B blocks (no L2 or L3). My code is meant to print out each element of the array one at a time, by either traversing the matrix in row-first order or column-first order.
In the case where n = 16 (i.e. 16 x 16 matrix), I've counted 0 cache misses using both row-first order and column-first order since the matrix M fits entirely in the 16KB cache (it never needs to jump to main memory to fetch an element). How would I deal with the case of, say, n = 256 (256 x 256 matrix of 64-bit ints); i.e. when M doesn't fully fit in the cache? Do I count all the ints that don't fit as misses, or can spatial locality be leveraged somehow? Assume the cache is initially empty.
The "0 cache misses" seems to assume you start out with M already in cache. That's already a bit suspicious, but OK.
For the 256x256 case, you need to simulate how the cache behaves. You must have cache misses to bring in the missing entries. Each cache miss brings in not just the requested int, but also 7 adjacent ints.

How do I map a memory address to a block when there is an offset in a direct-mapped cache?

To start off, the first cache has 16 one-word blocks. As an example I will use 0x03 memory reference. The index has 4 bits (0011). It is clear that the bits equal 3mod16 (0011 = 0x03 = 3). However I am getting confused using this mod equation to determine block location in a cache with offset bits.
The second cache has a total size of eight two-word blocks. This means that there is 1 offset bit. Since there are now 8 blocks, there are only 3 index bits. As an example, I will take the same memory reference of 0x03. However now I am having trouble mapping to the block using the mod equation I used before. I try 3mod8 which is 3, however in this case, since there is an offset bit, the index bits are 001. 001 is not equal to 3 so what did I do wrong? Does mod not work when there are offset bits? I was under the impression that the mod equation would always equal the index bits.
Its all in the address. You get the address, then mask off number of bits from the end, for following reasons.
Number of words in the cacheline. If you've got 2 word cacheline (take a bit out, 4 word - 2 bts etc)
Then how many cacheline entries you have. (If is a 1024 cacheline, you takeout 10 bits. This 10 bits is your index, remaining bits are for your Tag)
Now, you also need to consider 'WAY' as well. If its a direct mapped cache, above applies. If its a 2 way set associative cache, you dont have 1024 lines, what you have a 512 blocks with each having 2 lines in them. Which means you only need 9 bits to determine the index of the block. If its 4 way, you've got 256 blocks with 4 lines in them, meaning you only need 8 bits for your index.
In a set associative cache, index are there to choose a block, once a block is chosen, use can use a policy like LRU to fill an entry in case of a cache miss. Hits are determined by comparing the tag in the selected block.
Bottom line, block location is not determined by the address, only a block is selected by the address and thereafter its Tag comparison to find the data.

Cache Memory Blocks Organization

I am not able to understand how exactly the cache is organized in the following scenario.
The cache size is 256 bytes. The cache line size is 8 bytes. All variables are 4 bytes. Assume that an array A[1024] is stored in memory locations 0-4095. Suppose if we are using fully associative mapping technique, how is the array mapped to this particular cache ? Consider that the cache is initially empty and we use LRU algorithm for replacement. During each replacement, an entire line of cache is replaced.
Initial analysis :
There will be 32 cache blocks each with 8 bytes length. But the variables to be stored in these locations is only 4 bytes long. I am not able to take this analysis any further as to how these array elements are mapped to the 32 cache blocks.
Let's assume it's accessed sequentially:
for (int i=0; i<1024; ++i)
read(A[i]);
In that case, you'll fill the first 64 elements (A[0] through A[63]) into the 32 cache blocks in adjacent pairs like MSalters said.
The next access would have to kick out the least recently used line, which, since you access the array in sequential order is A[64]. It would have to pick a victim to kick out, and since you're using LRU that would be the first block (way 0). You therefore replace A[0] and A[1] with A[64] and A[65] and so on, so in general you'll have element i mapped into way floor(i/2)%32.
Now computing the hit rate requires an additional assumption - each memory line fetched is the size of a full block (8 bytes), since you can't fill half blocks (actually there are ways using mask bits, but let's assume the simple case). We therefore get each second element "for free" - fetching A[0] would also fetch A[1] and so on. In theory this means that the hit rate could be 50% (miss even elements, hit odds, in reality most CPUs would perform the accesses in parallel so you won't really have that hit rate, but let's say the accesses are serialized here).
Note that each new block fetched after the first 64 elements would have to evict a block from the cache, if processing the elements also modifies them you'll have to write them back too.
Elements A[0] and A[1] are stored in adjacent memory locations, 0-4 and 4-8. That means they share the first cache block. The other elements are similarly mapped pairwise to a cache line. Which pair goes where?

understanding the basic concepts in memory organisation and applying those effectively in solving questions

(well, actually proceeding to the question, I want to confess that this is a homework question, please do consider it and help me in improving my understanding a bit more.)
I have recently started learning computer organisation and architecture. I have gained fair understanding for how caches are organised, how mapping between cache and main memory takes place (direct , fully and set-associative mapping), what is a page table(what are pages, blocks etc.), i can that say I have basic knowledge of segmentation , paging, virtual address and physical addresses.( at the basic level ofcourse).
well I have come across this question:
A computer has 46-bit virtual address ,32- bit physical address, and a three level
page table organisation. The page table base-register stores the base address of the
first level table(t1), which occupies exactly one page.Each entry of t1 stores the base
address of the page of second level table t2. Each entry of t2 stores the base address
of the page of the third level table t3. Each entry of t3 stores a page table entry
(PTE). The PTE is 32 bit in size. The processor used in the computer has a 1MB
16-way set associative virtually indexed physically tagged cache. The cache block size
is 64 Bytes.
First of all I am facing difficulty in just imagining such type of a virtual computer.
can any one help me by giving a simple steps on How to realize such a virtual computer on paper, or just how to understand what is given in the question. What is really asked??
How would one represent a computer having a 46-bit virtual address and having three level page table.
what is virtually indexed and physically tagged cache.
After reading what is given above , I feel that I just know the terms but I am unable to relate them together to solve problems.
I will be glad If someone tries to explain how my thought process should be understand and apply these concepts practically to solve such types of problems.
some questions based on the above paragraph:
1) What is the size of a page in KB in this computer?
2) what is the minimum number of page colours needed to guarantee that no two synonyms
map to different sets in the processor cache of this computer?
A good resource where such problems are actually taught to solve will a appreciated.
Good articles and views are most welcome.
Thankyou in advance !!
We know that all levels page tables must be completely full except outermost, outermost page table may occupy whole page or less. But in question it is given that Outermost page table occupy whole page.
Now let page size is 2 Bytes.
Given that PTE = 32 bits = 4 Bytes = 22 Bytes.
Number of entries in any page of any pagetable =page size/PTE = 2p/22 = 2p-2.
Therefore Logical address split is
|---------------------|------------------|---------------------|-------------|
| p-2 | p-2 | p-2 | p |
|---------------------|------------------|---------------------|-------------|
logical address space is 46bits given.
Hence equation becomes,
(p-2)+(p-2)+(p-2)+p = 46
⇒p=13.
There fore page size is 213 Bytes =8KB.
I can help you with the first question.
Let page size is 2^x. Each entry of T1 is 32 bits means 4 bytes. Total size is 2^X bytes(1 page). T1 contains 2^X / 4 = 2^(X-2) entries.
So we use first X-2 bits of the 46 bit virtual address to index into one entry in T1. It gives the address of one T2.
T2 also contains 2^(X-2) entries.( same way as T1). So we use next X-2 bits to index into T2 and get the address of a T3.
It is given each entry of T3 is 32 bytes (including flags and all). Total size of 1 page = 2^X bytes. No of entries is 2^(X-2). So again we use X-2 bits to index into T3 and get the staring adress of frame.
Then we need the offset. Since the page size is 2^X, offset is X bits long.
First (x-2) bits gives address of T2.
Next (x-2) bits gives address of T3.
Next (x-2) bits gives address of Frame from T3.
Remaining x bits give offset in frame.
Total is 46 bits
x-2 + x-2 + x-2 + X = 46
4X - 6 = 46
X =13
Page Size = 2^13 bytes.

Resources