Memory, Stack and 64 bit

Memory, Stack and 64 bit - macos

On a x86 system a memory location can hold 4 bytes (32 / 8) of data, therefore a single memory address in a 64 bit system can hold 8 bytes per memory address. When examining the stack in GDB though this doesn't appear to be the case, example:
0x7fff5fbffa20: 0x00007fff5fbffa48 0x0000000000000000
0x7fff5fbffa30: 0x00007fff5fbffa48 0x00007fff857917e1
If I have this right then each hexadecimal pair (48) is a byte, thus the first memory address
0x7fff5fbffa20: is actually holding 16 bytes of data and not 8.
This has had me really confused and has for a while, so absolutely any input is vastly appreciated.

Short answer: on both x86 and x64 the minimum addressable entity is a byte: each "memory location" contains one byte, in each case. What you are seeing from GDB is only formatting: it is dumping 16 contiguous bytes, as the address increasing from ....20 to ....30, (on the left) indicates.
Long answer: 32bit or 64bit is used to indicate many things, in an architecture: almost always, is the addressable size (how many bits are in an address = how much memory you can directly address - again, bytes of memory). It also usually indicates the dimension of registers, and also (but not always) the native word size.
That means that usually, even if you can address a single byte, the machine works "better" using data of different (longer) size. What "better" means is beyond the question; a little background, however, is good to understand some misconceptions about word size in the question.

Related

How is byte addressing implemented in modern computers?

I have trouble understanding how in say a 32-bit computer byte addressing is achieved:
Is the ram itself byte addressable meaning the first byte has address 0 and the second 1 etc? In this case, wouldn't is take 4 read cycles to read a 32-bit word and waste the width of the data bus?
Or does the ram consist of 32-bit words meaning address 0 points to the first 4 bytes and address 2 points to bytes 5 to 8? In this case I would expect the ram interface to make byte addressing possible (from the cpu's point of view)

Think of RAM as 8 bit wide structure with N entries. N is often the size quoted when referring to memory (256 MB - 256M entries, 2GB - 2G entries etc, B is for bytes). When you access this memory, the smallest unit you can address is one of these entries which is 8 bits (1 byte). Since you can only access it at byte level, we call it byte addressable memory.
Now coming to your question about accessing this memory, we do not just access a byte. Most of the time, memory accesses are sent through caches which are there to reduce memory access latency. Caches store data at a higher granularity than a byte or word, normally it is multiple of words. In doing so, caches explore a property called "locality". Locality means, there is a high chance that we either access this data item or a near by data item very soon. So fetching not just the byte, but all the adjacent bytes is not a waste. Think of it as an investment for future, saves you multiple data fetches that you would have done otherwise.

Memory addresses in RAM start with 0th address and they are accessed using the registers with capacity of 8 bit register or 32 bit registers. Based on these registers the value from specific address is accessed by the CPU. If you really need to understand how it works, you will need to run couple of programs using Assembly language to navigate in the physical memory by reading the values directly using registers and register move commands.

Why do bytes exist? Why don't we just use bits?

A byte consists of 8 bits on most systems.
A byte typically represents the smallest data type a programmer may use. Depending on language, the data types might be called char or byte.
There are some types of data (booleans, small integers, etc) that could be stored in fewer bits than a byte. Yet using less than a byte is not supported by any programming language I know of (natively).
Why does this minimum of using 8 bits to store data exist? Why do we even need bytes? Why don't computers just use increments of bits (1 or more bits) rather than increments of bytes (multiples of 8 bits)?
Just in case anyone asks: I'm not worried about it. I do not have any specific needs. I'm just curious.

because at the hardware level memory is naturally organized into addressable chunks. Small chunks means that you can have fine grained things like 4 bit numbers; large chunks allow for more efficient operation (typically a CPU moves things around in 'chunks' or multiple thereof). IN particular larger addressable chunks make for bigger address spaces. If I have chunks that are 1 bit then an address range of 1 - 500 only covers 500 bits whereas 500 8 bit chunks cover 4000 bits.
Note - it was not always 8 bits. I worked on a machine that thought in 6 bits. (good old octal)

Paper tape (~1950's) was 5 or 6 holes (bits) wide, maybe other widths.
Punched cards (the newer kind) were 12 rows of 80 columns.
1960s:
B-5000 - 48-bit "words" with 6-bit characters
CDC-6600 -- 60-bit words with 6-bit characters
IBM 7090 -- 36-bit words with 6-bit characters
There were 12-bit machines; etc.
1970-1980s, "micros" enter the picture:
Intel 4004 - 4-bit chunks
8008, 8086, Z80, 6502, etc - 8 bit chunks
68000 - 16-bit words, but still 8-bit bytes
486 - 32-bit words, but still 8-bit bytes
today - 64-bit words, but still 8-bit bytes
future - 128, etc, but still 8-bit bytes
Get the picture? Americans figured that characters could be stored in only 6 bits.
Then we discovered that there was more in the world than just English.
So we floundered around with 7-bit ascii and 8-bit EBCDIC.
Eventually, we decided that 8 bits was good enough for all the characters we would ever need. ("We" were not Chinese.)
The IBM-360 came out as the dominant machine in the '60s-70's; it was based on an 8-bit byte. (It sort of had 32-bit words, but that became less important than the all-mighty byte.
It seemed such a waste to use 8 bits when all you really needed 7 bits to store all the characters you ever needed.
IBM, in the mid-20th century "owned" the computer market with 70% of the hardware and software sales. With the 360 being their main machine, 8-bit bytes was the thing for all the competitors to copy.
Eventually, we realized that other languages existed and came up with Unicode/utf8 and its variants. But that's another story.

Good way for me to write something late on night!
Your points are perfectly valid, however, history will always be that insane intruder how would have ruined your plans long before you were born.
For the purposes of explanation, let's imagine a ficticious machine with an architecture of the name of Bitel(TM) Inside or something of the like. The Bitel specifications mandate that the Central Processing Unit (CPU, i.e, microprocessor) shall access memory in one-bit units. Now, let's say a given instance of a Bitel-operated machine has a memory unit holding 32 billion bits (our ficticious equivalent of a 4GB RAM unit).
Now, let's see why Bitel, Inc. got into bankruptcy:
The binary code of any given program would be gigantic (the compiler would have to manipulate every single bit!)
32-bit addresses would be (even more) limited to hold just 512MB of memory. 64-bit systems would be safe (for now...)
Memory accesses would be literally a deadlock. When the CPU has got all of those 48 bits it needs to process a single ADD instruction, the floppy would have already spinned for too long, and you know what happens next...
Who the **** really needs to optimize a single bit? (See previous bankruptcy justification).
If you need to handle single bits, learn to use bitwise operators!
Programmers would go crazy as both coffee and RAM get too expensive. At the moment, this is a perfect synonym of apocalypse.
The C standard is holy and sacred, and it mandates that the minimum addressable unit (i.e, char) shall be at least 8 bits wide.
8 is a perfect power of 2. (1 is another one, but meh...)

In my opinion, it's an issue of addressing. To access individual bits of data, you would need eight times as many addresses (adding 3 bits to each address) compared to using accessing individual bytes. The byte is generally going to be the smallest practical unit to hold a number in a program (with only 256 possible values).

Some CPUs use words to address memory instead of bytes. That's their natural data type, so 16 or 32 bits. If Intel CPUs did that it would be 64 bits.
8 bit bytes are traditional because the first popular home computers used 8 bits. 256 values are enough to do a lot of useful things, while 16 (4 bits) are not quite enough.
And, once a thing goes on for long enough it becomes terribly hard to change. This is also why your hard drive or SSD likely still pretends to use 512 byte blocks. Even though the disk hardware does not use a 512 byte block and the OS doesn't either. (Advanced Format drives have a software switch to disable 512 byte emulation but generally only servers with RAID controllers turn it off.)
Also, Intel/AMD CPUs have so much extra silicon doing so much extra decoding work that the slight difference in 8 bit vs 64 bit addressing does not add any noticeable overhead. The CPU's memory controller is certainly not using 8 bits. It pulls data into cache in long streams and the minimum size is the cache line, often 64 bytes aka 512 bits. Often RAM hardware is slow to start but fast to stream so the CPU reads kilobytes into L3 cache, much like how hard drives read an entire track into their caches because the drive head is already there so why not?

First of all, C and C++ do have native support for bit-fields.
#include <iostream>
struct S {
// will usually occupy 2 bytes:
// 3 bits: value of b1
// 2 bits: unused
// 6 bits: value of b2
// 2 bits: value of b3
// 3 bits: unused
unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
};
int main()
{
std::cout << sizeof(S) << '\n'; // usually prints 2
}
Probably an answer lies in performance and memory alignment, and the fact that (I reckon partly because byte is called char in C) byte is the smallest part of machine word that can hold a 7-bit ASCII. Text operations are common, so special type for plain text have its gain for programming language.

Why bytes?
What is so special about 8 bits that it deserves its own name?
Computers do process all data as bits, but they prefer to process bits in byte-sized groupings. Or to put it another way: a byte is how much a computer likes to "bite" at once.
The byte is also the smallest addressable unit of memory in most modern computers. A computer with byte-addressable memory can not store an individual piece of data that is smaller than a byte.
What's in a byte?
A byte represents different types of information depending on the context. It might represent a number, a letter, or a program instruction. It might even represent part of an audio recording or a pixel in an image.
Source

8 and 16 bit architecture

I'm a bit confused about bit architectures. I just cant find a good article that answers my questions, so I figured I'd ask SO.
Question 1:
When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
Question 2:
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?

When speaking of a 16 bit architecture, does it mean each ram address is 16 bits long? So if I create an int (32 bit) in C++ the variable would take up 2 addresses?
You'd have to ask whoever was speaking of a 16-bit architecture what they meant by it. They could mean addresses are 16-bits long. They could mean general-purpose CPU registers are 16-bits long. They could mean something else. But there's no way we could know what some hypothetical person might mean. There is no universal definition of what makes something a "16-bit architecture".
For example, the 8032 is an 8-bit architecture with 8-bit general purpose registers. But it has a 16-bit pointer register that can be used to address 65,536 bytes of storage.
Regardless of bitness, almost all systems use byte addresses. So a 32-bit variable will take up 4 addresses on a machine of any bitness.
in a 16 bit architecture there are only 2^16 (65536) amount of addresses inside the RAM. Why can't they add more? Is this because 16 bit can't represent a higher value and therefore can't reference to adresses above 65535?
With 16-bits, there are only 65,536 possible ways those bits can be set. So a 16-bit register has 65,536 possible values.

Yes. Note, though that int on 16-bit architectures is usually just 16 bits wide.
Also note that it doesn't make sense to say that a variable "takes up" two addresses. The correct thing to say is that a 32-bit variable is as wide as two pointers on a 16-bit platform.
It will still occupy four bytes of space, no matter what architecture.
Yes; that's exactly what 16-bit addresses mean.
Note that each of these addresses points to a single byte of memory.

Depends on your definitions of 8-bit and 16-bit architecture.
The 6502 was considered an 8-bit CPU, because it operated on 8-bit values (the register size), yet had 16-bit addresses.
The 68000 was considered a 16-bit CPU, yet had 32-bit registers and addresses.
With x86, it is generally the address size that defines the architecture.
Also, '64-bit' CPUs don't always have a full 64-bit external address bus. They might internally handle addresses of that size, so the virtual address space can be large, but it doesn't mean they can have that much external memory.

Example From Wikipedia - All internal registers, as well as internal and external data buses, were 16 bits wide, firmly establishing the "16-bit microprocessor" identity of the 8086. A 20-bit external address bus gave a 1 MB physical address space (2^20 = 1,048,576). This address space was addressed by means of internal 'segmentation'. The data bus was multiplexed with the address bus in order to fit a standard 40-pin dual in-line package. 16-bit I/O addresses meant 64 KB of separate I/O space (2^16 = 65,536). The maximum linear address space was limited to 64 KB, simply because internal registers were only 16 bits wide. Programming over 64 KB boundaries involved adjusting segment registers (see below) and remained so until the 80386 introduced wider (32 bits) registers (and more advanced memory management hardware).
So you can see that there are no fixed rules that a 16 bit architecture will have 16 address lines only. Don't mix up two things, though it's intuitive to believe so.

Why is alignment imporant?

I know that some processors fail with misaligned data, and others like the oh-so-common x86, would just be slower with that.
My question is why? Why is it harder for an x86 processor to get the data from the pointer 0x12345679 than it is from the pointer 0x12345678? Just to be clear, I'm aware that page faults may happen if the data is in multiple pages, and I understand that more data may need to be fetched from memory (one part for the start of the value and one for the end), but that isn't always true and this isn't what my question is about. I'm asking, why is it always slower?
Suppose the memory starts at 0x10000000. Why is it harder for the processor to get a 2-byte short from 0x10000001 than it is from 0x10000002? Why is it harder to get a 4-byte int from 0x10000001 than it is from 0x10000000? And so forth.

Because the data bus is wider than eight bits.
Let assume that the data bus is 32 bits. To get 16 bits from address 0x10000001, it has to get the four bytes that starts at 0x10000000 and shift the value to get the two bytes in the middle.
To get 16 bits from the address 0x10000003, it has to get the words that start at 0x10000000 and 0x10000004, and use one byte from each value.

The processor can only access memory in an aligned fashion. This is a consequence of how the interconnect between the processor and memory functions.
When a processor supports unaligned reads, what's really happening is the processor issuing two separate reads (or one read of larger size) and stitching the parts together, which is why it's slower than an aligned read.

One example: if the databus is 32 bits and a 32 bit value is not on a 32 bit boundary, the bytes will have to be fetched in more than one operation and moved around to load the value properly into a processor register.

Understanding word alignment

I understand what it means to access memory such that it is aligned but I don’t understand why this is necessary. For instance, why can I access a single byte from an address 0x…1 but I cannot access a half word (two bytes) from the same address.
Again, I understand that if you have an address A and an object of size s that the access is aligned if A mod s = 0. But I just don’t understand why this is important at the hardware level.

Hardware is complex; this is a simplified explanation.
A typical modern computer might have a 32-bit data bus. This means that any fetch that the CPU needs to do will fetch all 32 bits of a particular memory address. Since the data bus can't fetch anything smaller than 32 bits, the lowest two address bits aren't even used on the address bus, so it's as if RAM is organised into a sequence of 32-bit words instead of 8-bit bytes.
When the CPU does a fetch for a single byte, the read cycle on the bus will fetch 32 bits and then the CPU will discard 24 of those bits, loading the remaining 8 bits into whatever register. If the CPU wants to fetch a 32 bit value that is not aligned on a 32-bit boundary, it has several general choices:
execute two separate read cycles on the bus to load the appropriate parts of the data word and reassemble them
read the 32-bit word at the address determined by throwing away the low two bits of the address
read some unexpected combination of bytes assembled into a 32-bit word, probably not the one you wanted
throw an exception
Various CPUs I have worked with have taken all four of those paths. In general, for maximum compatibility it is safest to align all n-bit reads to an n-bit boundary. However, you can certainly take shortcuts if you are sure that your software will run on some particular CPU family with known unaligned read behaviour. And even if unaligned reads are possible (such as on x86 family CPUs), they will be slower.

The computer always reads in some fixed size chunks which are aligned.
So, if you don't align your data in memory, you will have to probably read more than once.
Example
word size is 8 bytes
your structure is also 8 bytes
if you align it, you'll have to read one chunk
if you don't align it, you'll have to read two chunks
So, it's basically to speed up.

The reason for all alignment rules are the various widths of the Cache Lines (Instruction-Cache do have 16 Byte lines for the Core2 Architecture, and the Data-Cache do have 64-Byte Lines for L1 and 128-Byte Lines for L2).
So if you want to store/load data that crosses a Cahce-Line Boundary you need to load and store both Cache-lines, which hits the performance.
So you just don't do it because of the performance hit, its that simple.

Try reading a serial port. The data is 8 bits wide.
Nice hardware designers ensure it lies on a least significant byte of the word.
If you have a C structure that has elements not word aligned ( from backwards compatibility or conservation of memory say )
then the address of any byte within the structure is not word aligned.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio