We got a cache given with 8 frames and it's directly mapped. Following access sequence on main memory blocks has been observed:
2 5 0 13 2 5 10 8 0 4 5 2
Count the hit rate of this organized cache.
Solution:
I understand how and why the numbers are placed in the table like that. But I don't understand why 2 and 5 have been bold-printed and why we got hit rate of 17%.
This has been solved by our professor but I don't understand it completely.
Like was mentioned by #Margaret Bloom in the comments, the numbers in bold refer to cache-hits. Non-bold refer to cache misses.
You might understand it better by using this simulator: cachesimulator.com
The Simulator works with WORD-instructions only, so a little conversion of your assignment need to be made in order to simulate it:
cache-size: 32 bytes (8 rows)
block-size: 4 bytes (one word per row)
associativity: 1 (direct-mapped cache)
replacement algorithm: LRU
memory size: (any number larger than (14*4) works) for example: 1024
Now since the simulator works with WORD-instructions you need to convert your access sequence by multiplying each number by 4, also, in the simulator you enter addresses in hexadecimal so after you have multiplied by 4 you convert to hexadecimal, then you get:
8 14 0 34 8 14 28 20 0 10 14 8
In the simulator you enter instructions on the form:
<operationtype><space><register><space><address>
In your case the operationtype is LOAD and the register does'nt matter. So you can use any register, for example:
LOAD 1 8
LOAD 1 14
LOAD 1 0
LOAD 1 34
LOAD 1 8
LOAD 1 14
LOAD 1 28
LOAD 1 20
LOAD 1 0
LOAD 1 10
LOAD 1 14
LOAD 1 8
Enter the instructions above in the text-area of the simulator and click run. You can then see the cache hits and misses in real-time and when the simulation is finnished you can analyze the results by looking at the content of the cache memory and the list of instruction-results. You can view the main-memory address that each element in the cache refers to by hovering over it.
I understand how and why the numbers are placed in the table like that.
So you understand which how addresses map to cache lines, and that the vertical axis is time.
But I don't understand why 2 and 5 have been bold-printed and why we got hit rate of 17%.
The table entries are bold (cache hit) when the previous access to the same cache line was to the same address. A different address that maps to the same cache line causes a cache miss (evicting the old contents).
Visually / graphically: look vertically upwards in the same column to see which data is currently hot in the cache line.
Obviously once you know how many cache hits there were, calculating the hit rate is easy.
Normally you should just ask your professor extremely basic questions like this. However, your diagram was really easy to understand, so it made this trivial question easy to understand and answer.
Related
Question:
Consider a computer system that has a cache with 4096 block. Each block can store 16 bytes. What will be the value stored in the TAG field of the cache block that holds the memory block containing the address 0xABCDEF.
a. if it is Direct Mapped Cache
b. if it is 16 way set associate cache
c. if it is fully associative cache
Here is my work/logic below:
We know that each block can store 16 bytes. So that 2^4. Meaning our block offset is 4
ABCDE is 24 bits, b/c 4 bits per piece
4096 blocks is 2^12
a. is it is directly mapped then 24 - 20 - 4 --> 0
b. if its 16 way then our calculation are 24-16(index) - 4(offset) --> 4
c. if it fully associate then we do have an index and its just 24 - 4 --> 20
I am not sure if I am approaching the question the right way. Any help would be much appreciated!
I am using this illustration as my reference for how cache is represented:
http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-associativity-handout.pdf
Ok, so I figured it out.
a. Since our cache is directly mapped. We have 2^12 block blocks. That means the index will be 12, and with an offset of 4. The TAG value will be 8
b. Now since is 16 way associative. 2^4 = 16. So we do 2^12/ 2^4 ==> 2^8
This means the 8 will be in our index slot and the TAG value will be 12
c. If it is fully associative, that mean we won't have to account for the index slow. So its just 24 - 20 ==> 4
Hope this helps anyone who bumps into this
I was reading here and at the problem 3, exercise 2 he states:
The top-level page table should not assume that 2nd level page tables are page-aligned. So, we store full physical addresses there. Fortunately, we do not need control bits. So, each entry is at least 44 bits (6 bytes for byte-aligned, 8 bytes for word-aligned). Each top-level page table is therefore 256*6 = 1536 bytes (256 * 8 = 2048 bytes).
So, why does first layer can't assume page aligned 2nd layer?
Since it's presumes that the memory is not allocated in page granularity, wouldn't that make the allocation significantly more complicated?
I have tried btw to read the lectures of the course but they are that comprehensive.
First of all, sorry for bad English since my English skill is not that good...
Before the question, I want to explain my situation to help understanding.
I want to use EEPROM as a kind of counter.
The value of that counter would be increased very frequenty so I should consider endurance problem.
My idea is, write counter value on multiple address alternatively so cell wearing is reduced by N.
for example, if I use 5x area for counting,
Count 1 -> 1 0 0 0 0
Count 2 -> 1 2 0 0 0
Count 3 -> 1 2 3 0 0
Count 4 -> 1 2 3 4 0
Count 5 -> 1 2 3 4 5
Count 6 -> 6 2 3 4 5
...
So cell endurance can be extended by a factor of N.
However, AFAIK, for current NAND flash, data erase/write is done by a group of bytes, called block. So, if all the bytes are within single write/erase block, my method would not work.
So, my main question : Does erase/write operation of EEPROM of PIC is done by a group of bytes? or done by a single word or byte?
For example, if it is done by a group of 8-bytes, then I should make 8-byte offset between each counter value to make my method properly work.
Otherwise, if it is done by a byte or a word, I don't have to consider about spacing/offset.
From datasheet PIC24FJ256GB110 section 5.0:
The user may write program memory data in blocks of 64 instructions
(192 bytes) at a time, and erase program memory in blocks of 512
instructions (1536 bytes) at a time.
However you can overwrite individual block several times if you left the rest of block erased (bits are one) and the privius content rest the same. Remember: you can clear single bit in block only ones.
How much will decerease the data retention after 8 writes in to single FLASH block I don't know!
On my assignment we have 2 questions: we have a 2-way set associative cache. The cache has four sets in total. Main memory consists of 4K blocks of 8 words each and word addressing is used.
Part a) ask to demonstrate the address format, which I've solved to be word = 3 bit set =2 bit and field = 7 bit. The problem im having is in part b):
Compute the hit ratio for a program that loops 3 times from location 8 to location 51. In other words, think of this as an assembly language program that runs from the opcode at location 8 to the opcode at location 51m then loops back to location 8. It does three such iterations in total.
Now to my understanding after the research I've done there's a standard normally some sort of speed or hit rate that is given? I was wondering how do i calculate the hit ratio if i don't know a miss rate, a miss penalty a cache speed or anything?
I think we're in the same class lol I have the exact same question on assignment due tonight.. Anyway I did some research and found this answer to a similar question on chegg:
a. Given that memory contains 2K blocks of eight words.
2K can be distributed as 2K * 23 = 211* 23 = 214 so we have 14-bit addresses with 9 bits
in the tag field, 2 bits in the set field and 3 in the word field
b. First iteration of the loop:
→ Address 8 is a miss, then entire block brought into Set 1.9-15 are then hits.
→ 16 is a miss, entire block brought into Set 2, 17-23 are hits.
→ 24 is a miss, entire block brought into Set 3, 25-31 are hits.
→ 32 is a miss, entire block brought into Set 0, 33-39 are then hits.
→ 40 is a miss, entire block brought into Set 1 41-47 are hits.
→ 48 is a miss, entire block brought into Set 2, 49-51 are hits.
For the first iteration of the loop, we have 6 misses, and 5*7 + 3 hits, or 38 hits.
On the remaining iterations, we have 5*8+4 hits, or 44 hits each,for 88 more hits.
Therefore, we have 6 misses and 126 hits, for a hit ratio of 126/132, or 95.45%.
Hope this helps, good luck!
Hey all i have a question and an answer, but i cant understand the second part of the answer!
Could any1 here please help me out?
Here it is:
Question;
A computer has 32-bit virtual addresses and 4-K.B pages. The program and data together fit in the lowest page (0-4095) The stack fits in the highest page. How many entries are needed in the page table if traditional (one-level) paging is used? How many page table entries are needed for two-level paging, with 10 bits in each part?
Answer;
For a one-level page table, there are 2^32 /2^12 or 1M pages needed. Thus the
page table must have 1M entries. For two-level paging, the main page table
has 1K entries, each of which points to a second page table. Only two of
these are used. Thus in total only three page table entries are needed, one in
the top-level table and one in each of the lower-level tables.
I cant understand the bolded. for example i cant understand how this 1K comes up.
Thanks for your time,
Cheers!
I think, the answer to the first part of the question is wrong because you are not using the context of the question: The program and data together fit in the lowest page (0-4095) The stack fits in the highest page. So, while the total number of page table entries is 1048576, of those you only use 2 entries, one for each of those 2 pages (entry 0 points at the code/data page and entry 1048575 points at the stack page).
For the second part of the question you're given an extremely useful hint: two-level paging, with 10 bits in each part. But first, let's go back to the above, simpler case...
In case 1 with one page table, virtual addresses:
have 32 bits (given as A computer has 32-bit virtual addresses)
their 12 least significant bits indicate a location within a page (given as A computer has ... 4-K.B pages, also as fit in the lowest page (0-4095))
The remaining 20 most significant bits obviously select an entry in the page table. The selected page table entry contains the physical address of the page.
So, the virtual addresses look like this:
most significant bits least significant bits
| 20 bits = index into the page table | 12 bits = index into the page |
Hence, the CPU uses this formula to access memory:
PhysicalAddress = PageTable[VirtualAddress / 4096] + VirtualAddress modulo 4096
Now, let's get back to case 2.
You still have the 12 LSB bits to select a byte in the page.
But What's new? It's two-level paging, with 10 bits in each part.
Those 10 bits are the lengths of page table indices, which you now have two.
With this we arrive at the following break-down for virtual addresses:
most significant bits least significant bits
| 10 bits = PT index | 10 bits = PT index | 12 bits = page index |
And the address translating formula, naturally, is:
PhysAddr = PageTable[VirtAddr / (1024*4096)][(VirtAddr / 4096) modulo 1024] + VirtAddr modulo 4096
Now, we still have the same program that occupies 2 pages.
The virtual addresses pointing at the code/data page are (in binary):
0000000000|0000000000|xxxxxxxxxxxx
And the virtual addresses pointing at the stack page are (in binary as well):
1111111111|1111111111|xxxxxxxxxxxx
From this you can see that you are using 2 different page table entries at level 1 (selected by indices 0000000000 and 1111111111) and similarly 2 different page table entries at level 2.
So, in case 2 the total is 2+2=4 page table entries needed for the program to operate.
P.S. in case you don't remember: 210 = 1024, 212 = 4096, 220 = 1048576.