How does computer really request data in a computer? - memory-management

I was wondering how exactly does a CPU request data in a computer. In a 32 Bits architecture, I thought that a computer would put a destination on the address bus and would receive 4 Bytes on the data bus. I recently read on the memory alignment in computer and it confused me. I read that the CPU has to read two times the memory to access a not multiple 4 address. Why is so? The address bus lets it access not multiple 4 address.

The address bus itself, even in a 32-bit architecture, is usually not 32 bits in size. E.g. the Pentium's address bus was 29 bits. Given that it has a full 32-bit range, in the Pentium's case that means each slot in memory is eight bytes wide. So reading a value that straddles two of those slots means two reads rather than one, and alignment prevents that from happening.
Other processors (including other implementations of the 32-bit Intel architecture) have different memory access word sizes but the point is generic.

Related

Does memory-mapped I/O work by using RAM addresses?

Imagine a processor capable of addressing an 8-bit range (I know this is ridiculously small in reality) with a 128 byte RAM. And there is some 8-bit device register mapped to address 100. In order to store a value to it, does the CPU need to store a value at address 100 or does it specifically need to store a value at address 100 within RAM? In pseudo-assembly:
STI 100, value
VS
STI RAM_start+100, value
Usually, the address of a device is specified relative to the start of the address space it lives in.
The datasheet has surely more context and will clarify if the address is relative to something else.
However, before using it you have to translate that address as the CPU would see it.
For example, if your 8-bit address range accessible with the sti instruction is split in half:
0-127 => RAM
128-255 => IO
Because the hardware is wired this way, then, as seen from the CPU, the IO address range starts at 128, so an IO address of x is accessible at 128 + x.
The CPU datasheet usually establishes the convention used to give the addresses of the devices and the memory map of the CPU.
Address spaces can be hierarchical (e.g. as in PCI) or windowed (e.g. like the legacy PCI config space on x86), can have aliases, they may require special instructions or overlaps (e.g. reads to ROM, writes to RAM).
Always refers to the CPU manual/datasheet to understand the CPU memory map and how its address range(s) is (are) routed.

How is byte addressing implemented in modern computers?

I have trouble understanding how in say a 32-bit computer byte addressing is achieved:
Is the ram itself byte addressable meaning the first byte has address 0 and the second 1 etc? In this case, wouldn't is take 4 read cycles to read a 32-bit word and waste the width of the data bus?
Or does the ram consist of 32-bit words meaning address 0 points to the first 4 bytes and address 2 points to bytes 5 to 8? In this case I would expect the ram interface to make byte addressing possible (from the cpu's point of view)
Think of RAM as 8 bit wide structure with N entries. N is often the size quoted when referring to memory (256 MB - 256M entries, 2GB - 2G entries etc, B is for bytes). When you access this memory, the smallest unit you can address is one of these entries which is 8 bits (1 byte). Since you can only access it at byte level, we call it byte addressable memory.
Now coming to your question about accessing this memory, we do not just access a byte. Most of the time, memory accesses are sent through caches which are there to reduce memory access latency. Caches store data at a higher granularity than a byte or word, normally it is multiple of words. In doing so, caches explore a property called "locality". Locality means, there is a high chance that we either access this data item or a near by data item very soon. So fetching not just the byte, but all the adjacent bytes is not a waste. Think of it as an investment for future, saves you multiple data fetches that you would have done otherwise.
Memory addresses in RAM start with 0th address and they are accessed using the registers with capacity of 8 bit register or 32 bit registers. Based on these registers the value from specific address is accessed by the CPU. If you really need to understand how it works, you will need to run couple of programs using Assembly language to navigate in the physical memory by reading the values directly using registers and register move commands.

8 and 16 bit architecture

I'm a bit confused about bit architectures. I just cant find a good article that answers my questions, so I figured I'd ask SO.
Question 1:
When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
Question 2:
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?
When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
You'd have to ask whoever was speaking of a 16-bit architecture what they meant by it. They could mean addresses are 16-bits long. They could mean general-purpose CPU registers are 16-bits long. They could mean something else. But there's no way we could know what some hypothetical person might mean. There is no universal definition of what makes something a "16-bit architecture".
For example, the 8032 is an 8-bit architecture with 8-bit general purpose registers. But it has a 16-bit pointer register that can be used to address 65,536 bytes of storage.
Regardless of bitness, almost all systems use byte addresses. So a 32-bit variable will take up 4 addresses on a machine of any bitness.
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?
With 16-bits, there are only 65,536 possible ways those bits can be set. So a 16-bit register has 65,536 possible values.
Yes. Note, though that int on 16-bit architectures is usually just 16 bits wide.
Also note that it doesn't make sense to say that a variable "takes up" two addresses. The correct thing to say is that a 32-bit variable is as wide as two pointers on a 16-bit platform.
It will still occupy four bytes of space, no matter what architecture.
Yes; that's exactly what 16-bit addresses mean.
Note that each of these addresses points to a single byte of memory.
Depends on your definitions of 8-bit and 16-bit architecture.
The 6502 was considered an 8-bit CPU, because it operated on 8-bit values (the register size), yet had 16-bit addresses.
The 68000 was considered a 16-bit CPU, yet had 32-bit registers and addresses.
With x86, it is generally the address size that defines the architecture.
Also, '64-bit' CPUs don't always have a full 64-bit external address bus. They might internally handle addresses of that size, so the virtual address space can be large, but it doesn't mean they can have that much external memory.
Example From Wikipedia - All internal registers, as well as internal and external data buses, were 16 bits wide, firmly establishing the "16-bit microprocessor" identity of the 8086. A 20-bit external address bus gave a 1 MB physical address space (2^20 = 1,048,576). This address space was addressed by means of internal 'segmentation'. The data bus was multiplexed with the address bus in order to fit a standard 40-pin dual in-line package. 16-bit I/O addresses meant 64 KB of separate I/O space (2^16 = 65,536). The maximum linear address space was limited to 64 KB, simply because internal registers were only 16 bits wide. Programming over 64 KB boundaries involved adjusting segment registers (see below) and remained so until the 80386 introduced wider (32 bits) registers (and more advanced memory management hardware).
So you can see that there are no fixed rules that a 16 bit architecture will have 16 address lines only. Don't mix up two things, though it's intuitive to believe so.

Why does the 8086 use an extra register to address 1MB of memory?

I heard that the 8086 has 16-bit registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20-bit registers. It does this by using another register to to hold another 16 bits, and then adds the value in the 16-bit registers to the value in this other register to be able to generate numbers which can address up to 1MB of memory. Is that right?
Why is it done this way? It seems that there are 32-bit registers, which is more than sufficient to address 1MB of memory.
Actually this has nothing to do with number of registers. Its the size of the register which matters. A 16 bit register can hold up to 2^16 values so it can address 64K bytes of memory.
To address 1M, you need 20 bits (2^20 = 1M), so you need to use another register for the the additional 4 bits.
The segment registers in an 8086 are also sixteen bits wide. However, the segment number is shifted left by four bits before being added to the base address. This gives you the 20 bits.
the 8088 (and by extension, 8086) is instruction compatible with its ancestor, the 8008, including the way it uses its registers and handles memory addressing. the 8008 was a purely 16 bit architecture, which really couldn't address more than 64K of ram. At the time the 8008 was created, that was adequate for most of its intended uses, but by the time the 8088 was being designed, it was clear that more was needed.
Instead of making a new way for addressing more ram, intel chose to keep the 8088 as similar as possible to the 8008, and that included using 16 bit addressing. To allow newer programs to take advantage of more ram, intel devised a scheme using some additional registers that were not present on the 8008 that would be combined with the normal registers. these "segment" registers would not affect programs that were targeted at the 8008; they just wouldn't use those extra registers, and would only 'see' 16 addres bits, the 64k of ram. Applications targeting the newer 8088 could instead 'see' 20 address bits, which gave them access to 1MB of ram
I heard that the 8086 has 16 registers which allow it to only address 64K of memory. Yet it is still able to address 1MB of memory which would require 20 registers.
You're misunderstanding the number of registers and the registers' width. 8086 has eight 16-bit "general purpose" registers (that can be used for addressing) along with four segment registers. 16-bit addressing means that it can only support 216 B = 64 KB of memory. By getting 4 more bits from the segment registers we'll have 20 bits that can be used to address a total of 24*64KB = 1MB of memory
Why is it done this way? It seems that there are 32 registers, which is more than sufficient to address 1MB of memory.
As said, the 8086 doesn't have 32 registers. Even x86-64 nowadays don't have 32 general purpose registers. And the number of registers isn't relevant to how much memory a machine can address. Only the address bus width determines the amount of addressable memory
At the time of 8086, memory is extremely expensive and 640 KB is an enormous amount that people didn't think that would be reached in the near future. Even with a lot of money one may not be able to get that large amount of RAM. So there's no need to use the full 32-bit address
Besides, it's not easy to produce a 32-bit CPU with the contemporary technology. Even 64-bit CPUs today aren't designed to use all 64-bit address lines
Why can't OS use entire 64-bits for addressing? Why only the 48-bits?
Why do x86-64 systems have only a 48 bit virtual address space?
It'll takes more wires, registers, silicons... and much more human effort to design, debug... a CPU with wider address space. With the limited transistor size of the technology in the 70s-80s that may not even come into reality.
8086 doesn't have any 32-bit integer registers; that came years later in 386 which had a much higher transistor budget.
8086's segmentation design made sense for a 16-bit-only CPU that wanted to be able to use 20-bit linear addresses.
Segment registers could have been only 8-bit or something with a larger shift, but apparently there are some advantages to fine-grained segmentation where a segment start address can be any 16-byte aligned linear address. (A linear address is computed from (seg << 4) + off.)

Understanding word alignment

I understand what it means to access memory such that it is aligned but I don’t understand why this is necessary. For instance, why can I access a single byte from an address 0x…1 but I cannot access a half word (two bytes) from the same address.
Again, I understand that if you have an address A and an object of size s that the access is aligned if A mod s = 0. But I just don’t understand why this is important at the hardware level.
Hardware is complex; this is a simplified explanation.
A typical modern computer might have a 32-bit data bus. This means that any fetch that the CPU needs to do will fetch all 32 bits of a particular memory address. Since the data bus can't fetch anything smaller than 32 bits, the lowest two address bits aren't even used on the address bus, so it's as if RAM is organised into a sequence of 32-bit words instead of 8-bit bytes.
When the CPU does a fetch for a single byte, the read cycle on the bus will fetch 32 bits and then the CPU will discard 24 of those bits, loading the remaining 8 bits into whatever register. If the CPU wants to fetch a 32 bit value that is not aligned on a 32-bit boundary, it has several general choices:
execute two separate read cycles on the bus to load the appropriate parts of the data word and reassemble them
read the 32-bit word at the address determined by throwing away the low two bits of the address
read some unexpected combination of bytes assembled into a 32-bit word, probably not the one you wanted
throw an exception
Various CPUs I have worked with have taken all four of those paths. In general, for maximum compatibility it is safest to align all n-bit reads to an n-bit boundary. However, you can certainly take shortcuts if you are sure that your software will run on some particular CPU family with known unaligned read behaviour. And even if unaligned reads are possible (such as on x86 family CPUs), they will be slower.
The computer always reads in some fixed size chunks which are aligned.
So, if you don't align your data in memory, you will have to probably read more than once.
Example
word size is 8 bytes
your structure is also 8 bytes
if you align it, you'll have to read one chunk
if you don't align it, you'll have to read two chunks
So, it's basically to speed up.
The reason for all alignment rules are the various widths of the Cache Lines (Instruction-Cache do have 16 Byte lines for the Core2 Architecture, and the Data-Cache do have 64-Byte Lines for L1 and 128-Byte Lines for L2).
So if you want to store/load data that crosses a Cahce-Line Boundary you need to load and store both Cache-lines, which hits the performance.
So you just don't do it because of the performance hit, its that simple.
Try reading a serial port. The data is 8 bits wide.
Nice hardware designers ensure it lies on a least significant byte of the word.
If you have a C structure that has elements not word aligned ( from backwards compatibility or conservation of memory say )
then the address of any byte within the structure is not word aligned.

Resources