Relation between computer architecture and cache block size - caching

Suppose memory is byte addressable and cache block size is 4 bytes . So in one cache access 1 block is accessed. Does it means computer architecture is of 32 bit. My question is what derivation you can make about computer architecture if you are given about cache block size

No, usually cache block size is larger than the register width, to take advantage of spatial locality between nearby full-register-width loads / stores which is typical. Making cache as fine-grained a 4-byte chunks costs a large amount of overhead (tags and so on) compared to the amount of storage needed for the actual data. e.g. 20 tag bits, plus "dirty" and other MESI state per 32-bit cache line, might mean that a 32 kiB (usable space) cache needs more like 56 kiB of raw SRAM storage, and that's without considering ECC or parity.
If a CPU has a floating-point unit, it can often do 64-bit loads/stores, even if the integer register width is only 32-bit. (Or even wider with SIMD, or load-pair / store-pair instructions.)
Typical real-world cache sizes are 64 bytes on modern systems, and formerly 32 bytes on earlier CPUs like Pentium III. 64 bytes is the DDR SDRAM burst size, so it's a good choice for the size of off-chip memory accesses. (Recent Intel systems with AVX-512 SIMD can load/store a whole 64-byte (512-bit) cache line with a single instruction, though. SIMD vector width has caught up to cache line size. But integer accesses are still at most 8 bytes wide.)
There's no relationship between cache block size and architecture bitness. You definitely want the block size to be at least as wide as a normal load / store, but it would be possible to build a 64-bit machine with 32-bit cache blocks. That would mean 64-bit loads take two cache accesses to do it, so it would be a really bad idea unless your usual workload consisted of using 64-bit addresses in registers to access scattered 32-bit values, and you wanted to optimize for that without caring about efficiency of anything else.
Most 64-bit ISAs can work with 32 or 64-bit data equally efficiently. Some, notably x86-64, don't even have what you'd call a "word size". There's no one native access size that's most efficient on x86-64, and instructions are an unaligned byte stream, not like ISAs with aligned 32-bit instruction words like RISC-V or AArch64.
So if you knew that the cache block size was 32-bit, it would be a good guess that the register width was at most 32-bit, but could be 8 or 16-bit. (Or 4-bit or possibly even 6-bit or something? With sizes smaller than 32-bit, for historical CPUs it often becomes a question of what one means by bitness: ALU, register, bus, fixed-width instruction? Notice that in earlier parts of the answer, I just talked about register width, not "32-bit CPU".)
If this was a real commercial design instead of a computer science example, an 8-bit machine would be the most likely; a normal 32-bit machine would use larger cache blocks but you could plausibly imagine finer granularity on a machine that could only load 1 byte at a time. (Of course, being an 8-bit machine doesn't imply that restriction; you could have a load-pair instruction, or FP registers that allow 32-bit or 64-bit loads/stores.)

Related

Why 4-level paging can only cover 64 TiB of physical address

There are the words in linux/Documentation/x86/x86_64/5level-paging.rst
Original x86-64 was limited by 4-level paging to 256 TiB of virtual address space and 64 TiB of physical address space.
I know that the limit of virtual address is 256TB because 2^48 = 256TB. But I don't know why its limit of physical is only 64TB.
Suppose we set the size of each page to 4k. Thus a linear address has 12 bits of offset, 9 bits indicate the index in each four level, which means 512 entries per level. A linear address can cover 512^4 pages, 512^4 * 4k = 256TB of space.
This is my understanding of the calculation of space limit. I'm wondering what's wrong with it.
The x86-64 ISA's physical address space limit is unchanged by PML5, remaining at 52-bit. Real CPUs implement some narrower number of physical address bits, saving bits in cache tags and TLB entries, among other places.
The 64 TiB limit is not imposed by x86-64 itself, but by the way Linux requires more virtual address space than physical for its own convenience and efficiency. See x86_64/mm.txt for the actual layout of Linux's virtual address space on x86-64 with PML4 paging, and note the 64 TB "direct mapping of all physical memory (page_offset_base)"
x86-64 Linux doesn't do HIGHMEM / LOWMEM
Linux can't actually use more than phys mem = 1/4 virtual address space, without nasty HIGHMEM / LOWMEM stuff like in the bad old days of 32-bit kernels on machines with more than 1 GiB of RAM (vm/highmem.html). (With a 3:1 user:kernel split of address space, letting user-space have 3GiB, but with the kernel having to map pages in/out of its own space if not accessing them via the current process's user-space addresses.)
Linus's rant about 32-bit PAE expands on why it's nice for an OS to have enough virtual address space to keep everything mapped, with the usual assertion that people who don't agree with him are morons. :P I tend to agree with him on this, that there are obvious efficiency advantages and that PAE is a huge PITA for the kernel. Probably even moreso on an SMP system.
If anyone had proposed a patch to add highmem support for x86-64 to allow using more than 64 TiB of physical memory with the existing PML4 format, I'd expect Linus would tell them 1995 called and wants its bad idea back. He wouldn't consider merging such a patch unless much RAM became common for servers, but hardware vendors still hadn't provided an extension for wider virtual addresses.
Fortunately that didn't happen: probably no CPU has supported wider than 46-bit phys addrs without supporting PML5. Vendors know that supporting more RAM than mainline Linux can use wouldn't be a selling point. But as the doc said, commercial systems were getting up to a max capacity of 64 TiB.
x86-64's page-table format has room for 52-bit physical addresses
The x86-64 page-table format itself has always had that much room: Why in x86-64 the virtual address are 4 bits shorter than physical (48 bits vs. 52 long)? has diagrams from AMD's manuals. Of course early CPUs had narrower physical addresses so you couldn't for example have a PCIe device put its device memory way up high in physical address space.
Your calculation has nothing to do with physical address limits, which is set by the number of bits in each page-table entry that can be used for that.
In x86-64 (and PAE), the page table format reserves bits up to bit #51 for use as physical-address bits, so OSes must zero them for forward compatibility with future CPUs. The low 12 bits are used for other things, but the physical address is formed by zeroing out the bits other than the phys-address bits in the PTE, so those low 12 bits become the low zero bits in an aligned physical-page address.
x86 terminology note: logical addresses are seg:off, and segment_base + offset gives you a linear address. With paging enabled (as required in long mode), linear addresses are virtual, and are what's used as a search key for the page tables (effectively a radix tree cached by the TLB).
Your calculation is just correctly reiterating the 256 TiB size of virtual address space, based on 4-level page tables with 4k pages. That's how much memory can be simultaneously mapped with PML4.
A physical page has to be the same size as a virtual page, and in x86-64 yes that's 4 KiB. (Or 2M largepage or 1G hugepage).
Fun fact: the x86-64 page-table-entry format is the same as PAE, so modern CPUs can also access large amounts of memory 32-bit mode. But of course not map it all at once. It's probably not a coincidence that AMD chose to use an existing well-designed format when making AMD64, so their CPUs would only need two different modes for hardware page-table walker: legacy x86 with 4-byte PTEs (10 bits per level) and PAE/AMD64 with 8-byte PTEs (9 bits per level).

Why is the register length static in any CPU

Why is the register length (in bits) that a CPU operates on not dynamically/manually/arbitrarily adjustable? Would it make the computer slower if it was adjustable this way?
Imagine you had an 8-bit integer. If you could adjust the CPU register length to 8 bits, the CPU would only have to go through the first 8 bits instead of extending the 8-bit integer to 64 bits and then going through all 64 bits.
At first I thought you were asking if it was possible to have a CPU with no definitive register size. That make no sense since the number and size of the registers is a physical property of the hardware and cannot be changed.
However some architecture let the programmer work on a smaller part of a register or to pair registers.
The x86 does both for example, with add al, 9 (uses only 8 bits of the 64-bit rax) and div rbx (pairs rdx:rax to form a 128-bit register).
The reason this scheme is not so diffuse is that it comes with a lot of trade-offs.
More registers means more bits needed to address them, simply put: longer instructions.
Longer instructions mean less code density, more complex decoders and less performance.
Furthermore most elementary operations, like the logic ones, addition and subtraction are already implemented as operating on a full register in a single cycle.
Finally, one execution unit can handle only one instruction at a time, we cannot issue eight 8-bit additions in a 64-bit ALU at the same time.
So there wouldn't be any improvement, nor in the latency nor in the throughput.
Accessing partial registers is useful for the programmer to fan-out the number of available registers, so for example if an algorithm works with 16-bit data, the programmer could use a single physical 64-bit register to store four items and operate on them independently (but not in parallel).
The ISAs that have variable length instructions can also benefit from using partial register because that usually means smaller immediate values, for example and instruction that set a register to a specific value usually have an immediate operand that matches the size of register being loaded (though RISC usually sign-extends or zero-extends it).
Architectures like ARM (presumably others as well) supports half precision floats. The idea is to do what you were speculating and #Margaret explained. With half precision floats, you can pack two float values in a single register, thereby introducing less bandwidth at a cost of reduced accuracy.
Reference:
[1] ARM
[2] GCC

8 and 16 bit architecture

I'm a bit confused about bit architectures. I just cant find a good article that answers my questions, so I figured I'd ask SO.
Question 1:
When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
Question 2:
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?
When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
You'd have to ask whoever was speaking of a 16-bit architecture what they meant by it. They could mean addresses are 16-bits long. They could mean general-purpose CPU registers are 16-bits long. They could mean something else. But there's no way we could know what some hypothetical person might mean. There is no universal definition of what makes something a "16-bit architecture".
For example, the 8032 is an 8-bit architecture with 8-bit general purpose registers. But it has a 16-bit pointer register that can be used to address 65,536 bytes of storage.
Regardless of bitness, almost all systems use byte addresses. So a 32-bit variable will take up 4 addresses on a machine of any bitness.
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?
With 16-bits, there are only 65,536 possible ways those bits can be set. So a 16-bit register has 65,536 possible values.
Yes. Note, though that int on 16-bit architectures is usually just 16 bits wide.
Also note that it doesn't make sense to say that a variable "takes up" two addresses. The correct thing to say is that a 32-bit variable is as wide as two pointers on a 16-bit platform.
It will still occupy four bytes of space, no matter what architecture.
Yes; that's exactly what 16-bit addresses mean.
Note that each of these addresses points to a single byte of memory.
Depends on your definitions of 8-bit and 16-bit architecture.
The 6502 was considered an 8-bit CPU, because it operated on 8-bit values (the register size), yet had 16-bit addresses.
The 68000 was considered a 16-bit CPU, yet had 32-bit registers and addresses.
With x86, it is generally the address size that defines the architecture.
Also, '64-bit' CPUs don't always have a full 64-bit external address bus. They might internally handle addresses of that size, so the virtual address space can be large, but it doesn't mean they can have that much external memory.
Example From Wikipedia - All internal registers, as well as internal and external data buses, were 16 bits wide, firmly establishing the "16-bit microprocessor" identity of the 8086. A 20-bit external address bus gave a 1 MB physical address space (2^20 = 1,048,576). This address space was addressed by means of internal 'segmentation'. The data bus was multiplexed with the address bus in order to fit a standard 40-pin dual in-line package. 16-bit I/O addresses meant 64 KB of separate I/O space (2^16 = 65,536). The maximum linear address space was limited to 64 KB, simply because internal registers were only 16 bits wide. Programming over 64 KB boundaries involved adjusting segment registers (see below) and remained so until the 80386 introduced wider (32 bits) registers (and more advanced memory management hardware).
So you can see that there are no fixed rules that a 16 bit architecture will have 16 address lines only. Don't mix up two things, though it's intuitive to believe so.

Why does the 8086 use an extra register to address 1MB of memory?

I heard that the 8086 has 16-bit registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20-bit registers. It does this by using another register to to hold another 16 bits, and then adds the value in the 16-bit registers to the value in this other register to be able to generate numbers which can address up to 1MB of memory. Is that right?
Why is it done this way? It seems that there are 32-bit registers, which is more than sufficient to address 1MB of memory.
Actually this has nothing to do with number of registers. Its the size of the register which matters. A 16 bit register can hold up to 2^16 values so it can address 64K bytes of memory.
To address 1M, you need 20 bits (2^20 = 1M), so you need to use another register for the the additional 4 bits.
The segment registers in an 8086 are also sixteen bits wide. However, the segment number is shifted left by four bits before being added to the base address. This gives you the 20 bits.
the 8088 (and by extension, 8086) is instruction compatible with its ancestor, the 8008, including the way it uses its registers and handles memory addressing. the 8008 was a purely 16 bit architecture, which really couldn't address more than 64K of ram. At the time the 8008 was created, that was adequate for most of its intended uses, but by the time the 8088 was being designed, it was clear that more was needed.
Instead of making a new way for addressing more ram, intel chose to keep the 8088 as similar as possible to the 8008, and that included using 16 bit addressing. To allow newer programs to take advantage of more ram, intel devised a scheme using some additional registers that were not present on the 8008 that would be combined with the normal registers. these "segment" registers would not affect programs that were targeted at the 8008; they just wouldn't use those extra registers, and would only 'see' 16 addres bits, the 64k of ram. Applications targeting the newer 8088 could instead 'see' 20 address bits, which gave them access to 1MB of ram
I heard that the 8086 has 16 registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20 registers.
You're misunderstanding the number of registers and the registers' width. 8086 has eight 16-bit "general purpose" registers (that can be used for addressing) along with four segment registers. 16-bit addressing means that it can only support 216 B = 64 KB of memory. By getting 4 more bits from the segment registers we'll have 20 bits that can be used to address a total of 24*64KB = 1MB of memory
Why is it done this way? It seems that there are 32 registers, which is more than sufficient to address 1MB of memory.
As said, the 8086 doesn't have 32 registers. Even x86-64 nowadays don't have 32 general purpose registers. And the number of registers isn't relevant to how much memory a machine can address. Only the address bus width determines the amount of addressable memory
At the time of 8086, memory is extremely expensive and 640 KB is an enormous amount that people didn't think that would be reached in the near future. Even with a lot of money one may not be able to get that large amount of RAM. So there's no need to use the full 32-bit address
Besides, it's not easy to produce a 32-bit CPU with the contemporary technology. Even 64-bit CPUs today aren't designed to use all 64-bit address lines
Why can't OS use entire 64-bits for addressing? Why only the 48-bits?
Why do x86-64 systems have only a 48 bit virtual address space?
It'll takes more wires, registers, silicons... and much more human effort to design, debug... a CPU with wider address space. With the limited transistor size of the technology in the 70s-80s that may not even come into reality.
8086 doesn't have any 32-bit integer registers; that came years later in 386 which had a much higher transistor budget.
8086's segmentation design made sense for a 16-bit-only CPU that wanted to be able to use 20-bit linear addresses.
Segment registers could have been only 8-bit or something with a larger shift, but apparently there are some advantages to fine-grained segmentation where a segment start address can be any 16-byte aligned linear address. (A linear address is computed from (seg << 4) + off.)

How do we determine if a processor is 8-bit; 16-bit or 32-bit

Is it determined by size of the address buss; if yes then was 8086 a 20-bit processor? If no what is criteria for assigning a bit number like 8-bit, 16-bit, 32-bit to processor?
It's not well defined. Broadly, as xtofl points out, it's the size of the atomic unit of computation (in early computers, this wasn't always synonymous with "register"). So the PDP-10 was a 36 bit machine, a 8080 was 8 bit, and a IBM 360 or Intel 80386 is "32 bits".
But there are exceptions. The Motorola 68000 and 68010 CPUs implemented a 32 bit register set, but did it via microcode on top of a mostly 16 bit internal architecture. They were usually marketed as "16 bit" CPUs at the time.
The size of the address bus is almost never the defining factor. All successful "8 bit" CPUs implemented 16 bit addressing, for example (often via odd hacks to make up for the lack of a single address register, c.f. 6502's indirect addressing modes or the Z80's H/L registers). And the 8086, as you mention, used its segment register addressing to get 20 address lines to work (the 80286 extended this trick to 24 bits of physical address). And in the other direction, many "32 bit" CPUs had smaller address buses to save logic that wouldn't be used on a machine that would never have more than a few megabytes of memory: the 68000 limited addressing to 24 bits, even though the pointers themselves were 32. Likewise modern 64 bit CPUs universally implement less than 64 bits of physical address.
As far as i know the bit width of the processor is determined by how many bits the internal data processing circuits accept at once. Like if the adders, multipliers etc in the ALU accept 16 bit operands then the CPU is 16 bit, and if it accepts 32 bits then it is 32 bit. It does not matter what is the width of the data bus or the address bus. In general the bit length of the Accumulator would determine the bit length of the processor.
I guess normally you label it by the size of it's accumulators/registers.
With respect to a CPU, I'd say that it's the width of a register. You can do an operation on only 8 bits, 16-bits, 32-bits, etc. at a time.
The bit size (8-bit, 16-bit, 32-bit) of a microprocecessor is determined by the hardware, specifically the width of the data bus. The Intel 8086 is a 16-bit processor because it can move 16 bits at a time over the data bus. The Intel 8088 is an 8-bit processor even though it has an identical instruction set. This is similar to the Motorola 68000 and 68008 processors. The bit size is not determined by the programmer's view (the register width and the address range).
I think the first number of Integrated chip refers to the type of the processor.
If it is IC 8085 means it is a 8 bit processor.
any processor can be designated by its' two attributes
instruction set architecture &
no. of bits it can handle in single clock cycle.
take for example Intel's IA-32 architecture, also called x86-32 , here x86 indicates the architecture and 32 indicates 32-bit processor
X-Architecture
there are a number of architectures
Pre-x86 x86
-Intel's IA-32 architecture, also called x86-32 -x86-64
- -with AMD's AMD64 and Intel's Intel 64 version of it
- Motorola's 6800 and 68000 a
rchitectures ARM7
Y-bit processor
: simply- its the data handling capability of cpu/processor in a single clock cycle.
suppose it is an 8 bit processor then in a single clock cycle, the ALU can perform operation on 8 bit data only.(note that this operation may be an internal operation like add/sub as well as transferring data to other IO device)
classification Based on Registers-
Processor in addition to ALU and CU contains some memory locations as well, called as registers. depending on the processor, a register may typically store 8, 16, 32 or 64 bits. The register size of a particular processor allows us to classify the processor. Processors with a register size of n-bits are called n-bit processors, so that processors with 8-bit registers are called 8-bit processors.
classification Based on databus width-
since the alu can handle only 8 bit data in a single clock cycle it won't make sense to have data bus width more than that & 8 bit processor will have 8 bit wide databus, hence databus width can also be an alternate way to find out the bit processing capability of processor.for a processor with n bit databus means that the CPU can transfer n-bits to another device in a single operation.
for the question:
"suppose we have a 32 bit ALU i.e. it can take 32 bits at a time and
our data bus size is 16 bit i.e. it can hold 16 bit of data at a time
thn wht will be the ans. In this case...?"
the example of such processor would be intel 8088 & Moto 68000
Using bus width classification, the Intel 8088 microprocessor is an 8-bit processor since it uses an 8-bit data bus, although its CPU registers are in fact 16-bit registers.
Similarly the Motorola 68000 is classified as a 16-bit processor, even though its CPU registers are 32-bit registers.
Sometimes a combination of the two classifications is used where the 8088 might be described as 8/16-bit processor and the Motorola 68000 as a 16/32-bit processor.
The word size(8-bits, 16-bits or 32-bits) of a microprocessor is the size of the data path in the execution unit. Typically, this is the size of the accumulator.
This is the execution unit size. An example where this matters is the 8088, which is a 16 bit computer running on an 8 bit bus. The 8085 is 8-bits. The 8086/8088 is 16-bits. The 80386 is 32-bits. Mordern Intel Processors are 64-bits.

Resources