Memory Pool Returned Memory Alignment - c++11

In my application I have a memory pool. I allocate all memory at startup in the form of a uint64 array (64 bit machine). Then construct objects in this array using placement new. So object 1 starts at position pool[0] object 2 start at position pool[1] so on so forth. Since each object will span at least 64bits or multiples of sizeof(uint64) (if it needs more uint64 slots to allocate).
Am I correct to assume that all memory returned from the pool will be aligned correctly? Since each uint64 in the array will be properly aligned by the compiler. If so, does using uint32 on a 32bit machine in the same manner will work?

You are correct to assume that there will not be any padding. Compilers mostly pack in 2 bytes or 4 bytes boundaries (this can be controlled).
You should verify this on your specific target using __alignof__.
The keyword __alignof__ allows you to inquire about how an object is aligned, or the minimum alignment usually required by a type. Its syntax is just like sizeof.
However, an allocation of 8 bytes, might not be aligned to 64bit, if the array was allocated starting with a 32bit offset address.
You could use aligned_alloc(8, size) to allocate the memory, then cast it to array of uint64.

Related

Why does an empty slice have 24 bytes?

I'm want to understand what happens when created an empty slice with make([]int, 0). I do this code for test:
emptySlice := make([]int, 0)
fmt.Println(len(emptySlice))
fmt.Println(cap(emptySlice))
fmt.Println(unsafe.Sizeof(emptySlice))
The size and capacity return is obvious, both are 0, but the size of slice is 24 bytes, why?
24 bytes should be 3 int64 right? One internal array for a slice with 24 bytes should be something like: [3]int{}, then why one empty slice have 24 bytes?
If you read the documentation for unsafe.Sizeof, it explains what's going on here:
The size does not include any memory possibly referenced by x. For instance, if x is a slice, Sizeof returns the size of the slice descriptor, not the size of the memory referenced by the slice.
All data types in Go are statically sized. Even though slices have a dynamic number of elements, that cannot be reflected in the size of the data type, because then it would not be static.
The "slice descriptor", as implied by the name, is all the data that describes a slice. That is what is actually stored in a slice variable.
Slices in Go have 3 attributes: The underlying array (memory address), the length of the slice (memory offset), and the capacity of the slice (memory offset). In a 64-bit application, memory addresses and offsets tend to be stored in 64-bit (8-byte) values. That is why you see a size of 24 (= 3 * 8 ) bytes.
unsafe.Sizeof is the size of the object in memory, exactly the same as sizeof in C and C++. See How to get memory size of variable?
A slice has size, but also has the ability to resize, so the maximum resizing ability must also be stored somewhere. But being resizable also means that it can't be a static array but needs to store a pointer to some other (possibly dynamically allocated) array
The whole thing means it needs to store its { begin, end, last valid index } or { begin, size, capacity }. That's a tuple of 3 values which means its in-memory representation is at least 3×8 bytes on 64-bit platforms, unless you want to limit the maximum size and capacity to much smaller than 264 bytes
It's exactly the same situation in many C++ types with the same dynamic sizing capability like std::string or std::vector is also a 24-byte type although on some implementations 8 bytes of padding is added for alignment reasons, resulting in a 32-byte string type. See
C++ sizeof Vector is 24?
Why is sizeof array(type string) 24 bytes with a single space element?
Why is sizeof(string) == 32?
Why is sizeof array(type string) 24 bytes with a single space element?
In fact golang's strings.Builder which is the closest to C++'s std::string has size of 32 bytes. See demo

Accesing more Window Extra Bytes than LongPointer

Im trying to figure out of how to correctly work with the Extra Bytes that you can let windows allocate for your windows and window classes.
If i'm reading the docs correctly, you can tell windows to allocate a specified amount of memory for your window or window class.
But there is only two methods i could find to access and modify said data, SetWindowLongPtr and GetWindowLongPtr.
Problem is, with those methods you can only set a LongPtr full of data, so 64 / 32 bits depending on your system.
Can somebody explain this to me, is there a method i am missing or is this as it should be?
(Get|Set)WindowLong() accesses a value at the specified nIndex as a whole LONG.
(Get|Set)WindowLongPtr() accesses a value at the specified nIndex as a whole LONG_PTR.
So yes, this does mean that (Get|Set)WindowLongPtr() will access a different number of bytes depending on whether you are compiling your project for 32bit or 64bit. As such, if you want to read/write a smaller number of bytes, you will have to read/write a whole LONG/_PTR and do some bit shifting as needed.
Even though you can specify an arbitrary byte count for the WNDCLASS/EX::cbWndExtra field, in reality it needs to be large enough to hold at least sizeof(LONG/_PTR) number of bytes at the last byte offset you intend to specify in the nIndex parameter.
This is stated in the GetWindowLong()/SetWindowLong() and GetWindowLongPtr()/SetWindowLongPtr documentations:
nIndex
Type: int
The zero-based offset to the value to be retrieved. Valid values are in the range zero through the number of bytes of extra window memory, minus four; for example, if you specified 12 or more bytes of extra memory, a value of 8 would be an index to the third 32-bit integer.
nIndex
Type: int
The zero-based offset to the value to be retrieved. Valid values are in the range zero through the number of bytes of extra window memory, minus the size of a LONG_PTR.
From the documentation for both Extra Class Memory and Extra Window Memory:
Because extra memory is allocated from the system's local heap, an application should use extra [class or window] memory sparingly. The RegisterClassEx function fails if the amount of extra [class or window] memory requested is greater than 40 bytes. If an application requires more than 40 bytes, it should allocate its own memory and store a pointer to the memory in the extra [class or window] memory.

How is byte addressing implemented in modern computers?

I have trouble understanding how in say a 32-bit computer byte addressing is achieved:
Is the ram itself byte addressable meaning the first byte has address 0 and the second 1 etc? In this case, wouldn't is take 4 read cycles to read a 32-bit word and waste the width of the data bus?
Or does the ram consist of 32-bit words meaning address 0 points to the first 4 bytes and address 2 points to bytes 5 to 8? In this case I would expect the ram interface to make byte addressing possible (from the cpu's point of view)
Think of RAM as 8 bit wide structure with N entries. N is often the size quoted when referring to memory (256 MB - 256M entries, 2GB - 2G entries etc, B is for bytes). When you access this memory, the smallest unit you can address is one of these entries which is 8 bits (1 byte). Since you can only access it at byte level, we call it byte addressable memory.
Now coming to your question about accessing this memory, we do not just access a byte. Most of the time, memory accesses are sent through caches which are there to reduce memory access latency. Caches store data at a higher granularity than a byte or word, normally it is multiple of words. In doing so, caches explore a property called "locality". Locality means, there is a high chance that we either access this data item or a near by data item very soon. So fetching not just the byte, but all the adjacent bytes is not a waste. Think of it as an investment for future, saves you multiple data fetches that you would have done otherwise.
Memory addresses in RAM start with 0th address and they are accessed using the registers with capacity of 8 bit register or 32 bit registers. Based on these registers the value from specific address is accessed by the CPU. If you really need to understand how it works, you will need to run couple of programs using Assembly language to navigate in the physical memory by reading the values directly using registers and register move commands.

Memory, Stack and 64 bit

On a x86 system a memory location can hold 4 bytes (32 / 8) of data, therefore a single memory address in a 64 bit system can hold 8 bytes per memory address. When examining the stack in GDB though this doesn't appear to be the case, example:
0x7fff5fbffa20: 0x00007fff5fbffa48 0x0000000000000000
0x7fff5fbffa30: 0x00007fff5fbffa48 0x00007fff857917e1
If I have this right then each hexadecimal pair (48) is a byte, thus the first memory address
0x7fff5fbffa20: is actually holding 16 bytes of data and not 8.
This has had me really confused and has for a while, so absolutely any input is vastly appreciated.
Short answer: on both x86 and x64 the minimum addressable entity is a byte: each "memory location" contains one byte, in each case. What you are seeing from GDB is only formatting: it is dumping 16 contiguous bytes, as the address increasing from ....20 to ....30, (on the left) indicates.
Long answer: 32bit or 64bit is used to indicate many things, in an architecture: almost always, is the addressable size (how many bits are in an address = how much memory you can directly address - again, bytes of memory). It also usually indicates the dimension of registers, and also (but not always) the native word size.
That means that usually, even if you can address a single byte, the machine works "better" using data of different (longer) size. What "better" means is beyond the question; a little background, however, is good to understand some misconceptions about word size in the question.

In a byte addressed space with 32bit addressing, it takes up 32bits of memory to reference 8 bits?

In a byte addressed space with 32bit addressing, it takes up 32bits of memory to reference 8 bits?
So the addressing is the major portion of Memory?
Am I conceptualizing this correctly?
No. It takes 32 bits to reference a contiguous region of any size. If you have a 1 megabyte buffer, you're not going to store a pointer to every byte inside it, you'll just store a pointer to the beginning of it.

Resources