Determining the tag of a cache - caching

Given a cache with 256 blocks, and 16 bytes per block, how can I determine the value of the tag field of a cache block that holds the 24-bit address 0x3CFBCF? Would there be any differences depending on whether the cache was direct-mapped, fully associative, or n-way set-asscociative?

This gives a great overview on finding the SET, TAG, and OFFSET of an address.

Related

Where are tag bits stored in direct mapped cache?

From my understanding, direct mapped cache compares tag bits. But where are tag bits stored? Are they inside cache? If yes, are they stored inside the cache block itself and actual block size is bigger?
The cache tag bits are the bits within an address (from the perspective of the CPU) that are used as a tag based on the size and width of the cache.
Let us assume a very simple cache with 8 64 byte lines
the 6 least significant bits represent a location within a 64 byte line. The next 3 bits would be the tag, since we only have 8 lines
bits in address:
... xxxx xxxt ttxx xxxx
Addresses 0x86 and 0x10080 would have the same tag in this example
This is an oversimplified example, and there are many nuances to caches, so I would recommend reading some more in depth material on the topic, or read about an actual implementation (i.e. a CPU manual) to get a much better feel for how this works

Cache memory design - address decoding

I was given the following problem:
A CPU generates 32 bit addresses for a byte addressable memory. Design an 8 KB cache memory for this CPU (8 KB is the cache size only for the data; it does not include the tag). The block size is 32 bytes. Show the block diagram, and the address decoding for direct mapped cache memory.
I determined that:
8 bits are needed for indexing
5 bits are needed for block offset
19 bits for the tag.
Is my solution correct? How should I do the decoding?
The numbers seem correct, however it is always worth pointing out in your solution that you're taking cache associativity under account. Specifically, 32-8-5=19 is only valid when the cache is directly mapped.
The decoding part is nicely illustrated in your drawing – it's simply the act of taking 32 bits of the address as used by the CPU apart into the tag, index, and offset fields.

Minimum associativity for a PIPT L1 cache to also be VIPT, accessing a set without translating the index to physical

This question comes in context of a section on virtual memory in an undergraduate computer architecture course. Neither the teaching assistants nor the professor were able to answer it sufficiently, and online resources are limited.
Question:
Suppose a processor with the following specifications:
8KB pages
32-bit virtual addresses
28-bit physical addresses
a two-level page table, with a 1KB page table at the first level, and 8KB page tables at the
second level
4-byte page table entries
a 16-entry 8-way set associative TLB
in addition to the physical frame (page) number, page table entries contain a valid bit, a
readable bit, a writeable bit, an executable bit, and a kernel-only bit.
Now suppose this processor has a 32KB L1 cache whose tags are computed based on physical addresses. What is the minimum associativity that cache must have to allow the appropriate cache set to be accessed before computing the physical address that corresponds to a virtual address?
Intuition:
My intuition is that if the number of indices in the cache and the number of virtual pages (aka page table entries) is evenly divisible by each other, then we could retrieve the bytes contained within the physical page directly from the cache without ever computing that physical page, thus providing a small speed-up. However, I am unsure if this is the correct intuition and definitely don't know how to follow through with it. Could someone please explain this?
Note: I have computed the number of page table entries to be 2^19, if that helps anyone.
What is the minimum associativity that cache must have to allow the appropriate cache set to be accessed before computing the physical address that corresponds to a virtual address?
They're only specified that the cache is physically tagged.
You can always build a virtually indexed cache, no minimum associativity. Even direct-mapped (1 way per set) works. See Cache Addressing Methods Confusion for details on VIPT vs. PIPT (and VIVT, and even the unusual PIVT).
For this question not to be trivial, I assume they also meant "without creating aliasing problems", so VIPT is just a speedup over PIPT (physically indexed, phyiscally tagged). You get the benefit of allowing TLB lookup in parallel with fetching tags (and data) for the ways of the indexed set without any downsides.
My intuition is that if the number of indices in the cache and the number of virtual pages (aka page table entries) is evenly divisible by each other, then we could retrieve the bytes contained within the physical page directly from the cache without ever computing that physical page
You need the physical address to check against the tags; remember your cache is physically tagged. (Virtually tagged caches do exist, but typically have to get flushed on context switches to a process with different page tables = different virtual address space. This used to be used for small L1 caches on old CPUs.)
Having both numbers be a power of 2 is normally assumed, so they're always evenly divisible.
Page sizes are always a power of 2 so you can split an address into page number and offset-within-page by just taking different ranges of bits in the address.
Small/fast cache sizes also always have a power of 2 number of sets so the index "function" is just taking a range of bits from the address. For a virtually-indexed cache: from the virtual address. For a physically-indexed cache: from the physical address. (Outer caches like a big shared L3 cache may have a fancier indexing function, like a hash of more address bits, to avoid aliasing for addresses offset from each other by a large power of 2.)
The cache size might not be a power of 2, but you'd do that by having a non-power-of-2 associativity (e.g. 10 or 12 ways is not rare) rather than a non-power-of-2 line size or number of sets. After indexing a set, the cache fetches the tags for all the ways of that set and compare them in parallel. (And for fast L1 caches, often fetch the data selected by the line-offset bits in parallel, too, then the comparators just mux that data into the output, or raise a flag for no match.)
Requirements for VIPT without aliasing (like PIPT)
For that case, you need all index bits to come from below the page offset. They translate "for free" from virtual to physical so a VIPT cache (that indexes a set before TLB lookup) has no homonym/synonym problems. Other than performance, it's PIPT.
My detailed answer on Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? includes a section on that speed hack.
Virtually indexed physically tagged cache Synonym shows a case where the cache does not have that property, and needs page coloring by the OS to let avoid synonym problems.
How to compute cache bit widths for tags, indices and offsets in a set-associative cache and TLB has some more notes about cache size / associativity that give that property.
Formula:
min associativity = cache size / page size
e.g. a system with 8kiB pages needs a 32kiB L1 cache to be at least 4-way associative so that index bits only come from the low 13.
A direct-mapped cache (1 way per set) can only be as large as 1 page: byte-within-line and index bits total up to the byte-within-page offset. Every byte within a direct-mapped (1-way) cache must have a unique index:offset address, and those bits come from contiguous low bits of the full address.
To put it another way, 2^(idx_bits + within_line_bits) is the total cache size with only one way per set. 2^N is the page size, for a page offset of N (the number of byte-within-page address bits that translate for free).
The actual number of sets (in this case = lines) depends on the line size and page size. Using smaller / larger lines would just shift the divide between offset and index bits.
From there, the only way to make the cache bigger without indexing from higher address bits is to add more ways per set, not more ways.

Fully Associative cache offset bits

When dealing with a fully associative cache, why is it necessary to use an offset (or word), if the entire cache is being searched anyway what will the offset do for you?
Offset bits will allow more addresses to share the same block , when you increase the block size
it will store more words by removing offset bits

How does lookup the L1 and L2 cache?

Recently I was reading some material on cpu cache. I am wondering how does the cpu lookup the L1 and L2 cache and in what format is the data in the cpu cache stored?
I think a linear scan of the cache would be inefficient, are there any better solutions?
Thanks.
It uses index bits and tags extracted from the address it is looking up.
Say you are accessing some 32 bit address ADDR
ADDR will have bits: 31--------------------------0, [------tag|index|offset]
Then depending on the size of your cache:
Let's say you have a 32K, Direct Mapped cache with 32bytes per block.
Offset bits are used to find the data within each line because 8bytes is a minimum data size to be brought into the cache (well you always get the full 32bytes, but within the 32bytes you will have your data.)
This accounts for a cache with 1024 lines or sets, again each line with 32bytes. In order to index the 1024 sets you need 10bits. Thus the 10 bits from your address are used as an index into the cache. The offset bits are used to see where inside that line your data is , and the tag bits are used to match the address that you are looking up since two or more addresses will map into the same line of the cache.
Makes sense?
I do not know your answer, but I can recommend a good book that might lead you to one - The Essentials Of Computer Organization and Architecture

Resources