Behind Windows x64's 44-bit virtual memory address limit - windows

http://www.alex-ionescu.com/?p=50.
I read the above post. The author explains why Windows x64 supports only 44-bit virtual memory address with singly linked list example.
struct { // 8-byte header
ULONGLONG Depth:16;
ULONGLONG Sequence:9;
ULONGLONG NextEntry:39;
} Header8;
The first sacrifice to make was to reduce the space for the sequence
number to 9 bits instead of 16 bits, reducing the maximum sequence
number the list could achieve. This still only left 39 bits for the
pointer — a mediocre improvement over 32 bits. By forcing the
structure to be 16-byte aligned when allocated, 4 more bits could be
won, since the bottom bits could now always be assumed to be 0.
Oh, I can't understand.
What "By forcing the structure to be 16-byte aligned when allocated, 4 more bits could be won, since the bottom bits could now always be assumed to be 0." means?

16 is 0010000 in binary
32 is 0100000 in binary
64 is 1000000 in binary
etc
You can see that for all numbers that are multiples of 16, the last four bits are always zero.
So, instead of storing these bits you can leave them out and add them back in when it's time to use the pointer.

For a 2^N-byte aligned pointer, its address is always divisible by 2^N - which means that lower N bits are always zero. You can store additional information in them:
encode ptr payload = ptr | payload
decode_ptr data = data & ~mask
decode_payload data = data & mask
where mask is (1 << N) - 1 - i.e. a number with low N bits set.
This trick is often used to save space in low-level code (payload can be GC flags, type tag, etc.)
In effect you're not storing a pointer, but a number from which a pointer can be extracted. Of course, care should be taken to not dereference the number as a pointer without decoding.

Related

How to calculate cache block size from its overhead?

I've being looking for this for a lot of time now (over 3 days) without luck. Maybe one of you guys can tell me how can I solve it.
Consider you have a computer with a 16-bit size address and a byte addressable memory. The cache is 2-way set-associative mapped, write-back policy and a perfect LRU replacement strategy. Cache has an overhead of 4352 bits. What's the size of the block?
Very few resources talk about overhead and the ones I've found only relate it to total cache size. The problem is I only know how to calculate cache size with #blocks or at least with the fields of the address properly defined (which I have not being able to do for this problem since I can't calculate the size of the tag.).
Any help would be appreciated.
So, here's how I read this question:
Overhead bits are the bits that don't count toward the actual data that is being cached.  They are bits that track maintenance state of the cache, and help the cache implement hits, write back, and eviction policy.  To some way of looking at it, if one byte is being cached (8 bits) how many non-data bits are in the cache to help manage that (or at least for all the actual data bits how many non-data/overhead bits are there).
This is mathematical, so I hope I haven't made an error, but even if I have maybe you can see your way through the reasoning.
Let's derive some additional information:
A write-back policy means the cache needs to store "dirty" information for each data block: dirty is 1-bit: yes, dirty -or- no, clean.
For 2-way set associative cache, a "perfect" LRU algorithm is also 1 bit (yes: first block -or- no: second block) but this 1 bit costs per index position (i.e. per line) — not per block as there are two blocks per index.
What we don't know is if there is a valid bit, which would also be per data block, but most caches I see in coursework have the valid bits, so we might assume they have it.
And lastly, there's the tag bits where tag bits are: however many bits are leftover in the address space bits after accounting for index bits and block offset bits.
So, a formula for overhead might be:
overhead in bits = index positions * (1 x LRU bit + block overhead bits)
where block overhead bits = 2 [ways] * (1 x Dirty bit + 1 x Valid bit + tag bits)
We also know that tag bits = address space bits - index bits - block bits
So, we have:
4352 [overhead in bits] = index positions * (1 + 2 * (2 + tag bits))
-and-
tag bits = address space bits - index bits - block offset bits
-and-
index positions = 2index bits
-and-
We also know that the number of tag, index, and block offset bits has to be an integer (no fractions of bits).
So, we can begin to reduce those two formulas by substituting:
4352 = index positions * (1 + 2 * (2 + address space bits - index bits - block bits)
by reduction also then:
4352 = 2index bits * (1 + 2 * (2 + 16 - index bits - block bits)
Solving for block bits we have:
-((4352/2index bits - 1)/2 - 18 + index bits) = block bits
I don't know how to solve this directly mathematically, given the constraint that the variables must be integers, so, instead of solving directly, simply try/search different values:
If index bits is 7 then by this formula, block bits is fractional, so that doesn't work.
If index bits is 9 then by this formula, block bits is fractional, so that doesn't work.
No other values between 0 and 16 result in an integer number of bits, except:
If index bits is 8 then by this formula, block bits is 2, so:
16 = tag bits + 8 + 2, meaning tag bits is 6, index bits is 8, and block offset is 2.
Since block offset is 2 then block size is 22.

Addressing Size Regarding Bytes

Just to make sure, does every single address contain one byte? So say you had theoretical addresses FFF0 and FFFF: there are 16 values between these two addresses, which means between them they contain 16 bytes, or 8 x 16 bits? Every individual address is linked to a single byte?
Just to make sure, does every single address contain one byte?
...which means between them they contain 16 bytes, or 8 x 16 bits?
Every individual address is linked to a single byte?
Yes to all three questions.
Which is why the limitation with 32-bit addressing, you can only access 2^32 bytes == 4,294,967,296 bytes == 4 GiB. Each addressable memory location gives access to 1 byte.
If we could access 2 bytes with one address, then that limit would have been 8 GiB. And the architecture of modern chips and all software would have to be modified to determine whether they want both bytes or just the first or the second. So you'd need, say, 1 more bit to determine that. Guess what, if you had 33-bit machines, that's what we'd get...max address-able space of 8 GiB. Which is still effectively 1-byte-containing addresses. Workarounds do exist but that's not related to your questions.
* GiB = Binary GigaBytes.
Note that this is not related to "types" where a char is 1 byte and an int is 4 bytes. Programming languages compensate for that when trying to access the value of a stored variable/data stored at a location(s). And they are actually calculated as total bits rather than total bytes. So an int is considered as 32 bits rather than 4 bytes. When C fetches an int's value from memory, it will fetch all 4 bytes even though the address of the int refers to just one, the address of the first byte.
Yes. Addresses map to bytes 1 to 1, even if they expect you to work with a word size of two or four bytes at a time.

Cache calculating block offset and index

I've read several topics about this theme but I could not get the answer. So my question is:
1) How is the block offset calculated?
I want to know not the formula but the concept of it. As I know it is quantity of cases which a block can store the address. For example If there is a block with 8 byte storage and has to store 2 byte addresses. Does its block offset is 2 bit?(So there is 4 cases to store the address (the diagram below might make easier to see what I am saying).
The block offset is simply calculated as log2 cache_line_size.
The reason is that all system that I know of are byte addressable. So you need enough bits to index any byte in the block. Although most systems have a word size that is larger than a single byte, they still support offsets of a single byte gradulatrity, even if that is not the common case.
So for the example you mentioned of an 8-byte block size with 2-byte word, you would still need 3 bits in order to allow accessing any byte. If you had a system that was not byte addressable then you could use just 2 bits for the block offset. But in practice all systems that I know of are byte addressable.

VS_VERSIONINFO structure - unnecessary padding

I have taken the VS_VERSIONINFO structure from a file and the Value (VS_FIXEDFILEINFO) is padded with 32 bits.
According to MSDN, Value should be padded to fall on a 32 bit boundary.
Padding1
Type: WORD
Contains as many zero words as necessary to align the Value member on a 32-bit boundary.
But value is already on a 32 bit boundary.
Why is VS_FIXEDFILEINFO padded with 32 bits on a 32 bit boundary, anyway?
To align data on a 32 bit boundary, only padding with less than 32 bits would make sense.
I'm asking this because I need to parse an RC script and generate this resource.
Padding is added to structures and their members so that the CPU can access the memory holding those members using addresses that are aligned to the CPU's word width.
Back in the dark days some CPUs could be persuaded to generate a bus error if you did a non-aligned access but these days it's just slower, particularly if you miss the onboard caches.
VS_FIXEDFILEINFO is arbitrary data of arbitrary length therefore some padding may appear after it to bring the subsequent VS_VERSIONINFO structure members back into alignment.
The wording of MS's documentation for the wLength member of VS_VERSIONINFO implies that you shouldn't consider padding between the VS_VERSIONINFO that you're looking at and the next one in memory. i.e. do not subtract the address of the next structure from the first one and use that as wLength because you may bring in some padding bytes between the two structures that you don't want.

I don't understand something in memory addressing

I have a very simple (n00b) question.
A 20-bit external address bus gave a 1 MB physical address space (2^20
= 1,048,576).(Wikipedia)
Why 1 MByte?
2^20 = 1,048,576 bit = 1Mbit = 128KByte not 1MB
I misunderstood something.
When you have 20 bits you can address up to 2^20. This is your range, not the number of bits.
I.e. if you have 8 bits your range is up to 255 (unsigned) not 2^8 bits.
So with 20 bits you can address up to 2^20 bytes i.e. 1MB
I.e. with 20 bits you can represent addresses from 0 up to 2^20 = 1,048,576. I.e. you can reference up to 1MB of memory.
1 << 20 addresses, that is 1,048,576 bytes addressable. Hence, 1 MB physical address space.
Because the smallest addressable unit of memory (in general - some architectures have small bit-addressable pieces of memory) is the byte, not the bit. That is, each address refers to a byte, rather than to a bit.
Why, you ask? Direct access to individual bits is almost never needed - and if you need it, you can still load the surrounding byte and get the bit with bit masks and shifts. Increasing the bits per address allows you to address more memory with the same address range.
Note that a byte doesn't have to be 8 bit, strictly speaking, though it's ubiquitous by now. But regardless of the byte size, you're grouping bits together to be able to handle larger quantities of them.

Resources