What are the alignas and alignof keywords used for? - c++11

I'm having trouble understand what the purpose of the alignas and alignof keywords are, and I'm not quite sure I fully understand what alignment is.
As I understand it, a memory address is aligned to n bytes if it is divisible by n, that is, it can be got to by counting 'n' bytes at a time (from 0? or some default value?). Also, the alignas keyword, when prefixing a variable declaration, specifies how the address at which the variable is stored is to be aligned, and the alignof returns how a variable's address is aligned.
However, I am not confident that this is a correct understanding of alignment or the alignof/alignas keywords - please correct me on any of the points I got wrong. I also don't see what use these keywords serve, so I would appreciate it if anyone could point out what their purpose is.

Some special types must be aligned at more bytes than usual- for example, matrices must be aligned at 16bytes on x86 for the most efficient copying to the GPU. SSE vector types can behave this way too. As such, if you want to make a container type, then you must know the alignment requirements of the type you're trying to contain or allocate.

Related

Why is non-zeroed memory only a problem with big data usage?

I was doing a graded programming assignment — an implementation of Rope data structure. The grader fed it an initial string and a series of edit operations. I did my development in C++ on a Linux machine. After testing my solution locally with small inputs (a string of ca 10 chars) I posted it to the grader, but got Segmentation Fault on one of the test cases.
I have generated a random input data with the maximum size given in the assignment specs (the string of 300k characters). I also got the Segmentation Fault locally. After a short debugging I found out that the leaves of my tree had random left and right pointers instead of NULL. After replacing the new Vertex calls with new Vertex() (the latter calls the default constructor, unlike the former which leaves the memory as-is) the code worked fine and got accepted by the grader.
This however makes me wonder — why did my code work correctly with a small input, both locally and on the grader’s machine? Is some amount of heap guaranteed to be zeroed when I run a process? Is this an artifact of some previously run program? What exactly is happening here?
Uninitialised objects can have any value. Uninitialised pointers can contain null, they can contain valid pointers by coincidence, or contain invalid pointers. It is completely undefined. Your program will behave accordingly. And it’s quite possible that memory is filled with some amount of zeroes followed by some amount of rubbish.
There may be a compiler option that will fill uninitialised variables with data that is likely to lead to a crash. More likely, there may be compiler options warning you when you use an uninitialised variable.

Efficient Algorithm for Parsing OpCodes

Let's say I'm writing a virtual machine. I read in the program data into an array of bytes. Now I need to loop through those bytes (instructions are two bytes) and instantiate a little class representing each instruction and it's arguments.
What would be a fast parsing approach? Here are the two way's I've thought of:
Logically branching by inspecting each bit from the left to the right until I narrowed it down to a particular op code. This would be like a binary search.
Inspecting some programs to come up with a list of opcodes ordered by frequency of use, and then checking the for the full opcode in that order.
Note: I will be using bit shifting and masking in C to check, not regexes or string comps or anything high-level like that.
You don't need to parse anything. If this is in C, you make a table of function pointers which has 256 entries in it, one for each possible byte value, then jump to the appropriate function based on the first byte value. If the second byte is significant then a switch statement can be used within the function to handle the second byte. This is how the original Visual Basic interpreter (versions 1-6) worked.

Is it fastest to access a byte than a bit? Why?

The question is very straight: is it fastest to access a byte than a bit? If I store 8 booleans in a byte will it be slower when I have to compare them than if I used 8 bytes? Why?
Chances are no. The smallest addressable unit of memory in most machines today is a byte. In most cases, you can't address or access by bit.
In fact, accessing a specific bit might be even more expensive because you have to build a mask and use some logic.
EDIT:
Your question mentions "compare", I'm not sure exactly what you mean by that. But in some cases, you perform logic very efficiently on multiple booleans using bitwise operators if your booleans are densely packed into larger integer types.
As for which to use: array of bytes (with one boolean per byte), or a densely packed structure with one boolean per bit is a space-effiicency trade-off. For some applications that need to store a massive amount of bools, dense packing is better since it saves memory.
The underlying hardware that your code runs on is built to access bytes (or longer words) from memory. To read a bit, you have to read the entire byte, and then mask off the bits you don't care about, and possibly also shift to get the bit into the ones position. So the instructions to access a bit are a superset of the instructions to access a byte.
It may be faster to store the data as bits for a different reason - if you need to traverse and access many 8-bit sets of flags in a row. You will perform more ops per boolean flag, but you will traverse less memory by having it packed in fewer bytes. You will also be able to test multiple flags in a single operation, although you may be able to do this with bools to some extent as well, as long as they lie within a single machine word.
The memory latency penalty is far higher than register bit twiddling. In the end, only profiling the code on the hardware on which it will actually run will tell you which way is best.
From a hardware point of view, I would say that in general all the bit masking and other operations in the best case might occur within a single clock (resulting in no different), but that entirely depends on hardware layer that you likely won't ever know the specifics of, and as such you cannot bank on it.
It's worth pointing out that things like the .NET system.collections.bitarray uses a 32bit integer array underneath to store it's bit data. There is likely a performance reason behind this implementation (even if only in a general case that 32bit words perform above average), I would suggest reading up about the inner workings of that might be revealing.
From a coding point of view, it really depends what you're going to do with the bits afterwards. That is to say if you're going to store your data in booleans such as:
bool a0, a1, a2, a3, a4, a5, a6, a7;
And then in your code you compare them one by one (and most of them together):
if ( a0 && a1 && !a2 && a3 && !a4 && (!a5 || a6) || a7) {
...
}
Then you will find that it will be faster (and likely neater in code) to use a bit mask. But really the only time this would matter is if you're going to be running this code millions of times in a high performance or time critical environment.
I guess what I'm getting at here is that you should do whatever your coding standards say (and if you don't have any or they don't consider such details then just do what looks neatest for your application and need).
But I highly suggest trying to look around and read a blog or two explaining the inner workings of the .NET system.collections.bitarray.
This depends on the kind of processor and motherboard data bus, i.e. 32 bit data bus will compare your data faster if you collect them into "word"s rather than "bool"s or "byte"s....
This is only valid when you are writing in assembly language when you can compare each instruction how many cycles it takes .... but since you are using compiler then it is almost the same.
However, collecting booleans into words or integers will be useful in saving memory required for variables.
Computers tend to access things in words. Accessing a bit is slower because it requires more effort:
Imagine I said something to you, then said "oh change my second word to instead".
Now imagine my edit instead was "oh, change the third letter in the second word to 's'".
Which requires more thinking on your part?

Mapping Untyped Lisp data into a typed binary format for use in compiled functions

Background: I'm writing a toy Lisp (Scheme) interpreter in Haskell. I'm at the point where I would like to be able to compile code using LLVM. I've spent a couple days dreaming up various ways of feeding untyped Lisp values into compiled functions that expect to know the format of the data coming at them. It occurs to me that I am not the first person to need to solve this problem.
Question: What are some historically successful ways of mapping untyped data into an efficient binary format.
Addendum: In point of fact, I do know which of about a dozen different types the data is, I just don't know which one might be sent to the function at compile time. The function itself needs a way to determine what it got.
Do you mean, "I just don't know which [type] might be sent to the function at runtime"? It's not that the data isn't typed; certainly 1 and '() have different types. Rather, the data is not statically typed, i.e., it's not known at compile time what the type of a given variable will be. This is called dynamic typing.
You're right that you're not the first person to need to solve this problem. The canonical solution is to tag each runtime value with its type. For example, if you have a dozen types, number them like so:
0 = integer
1 = cons pair
2 = vector
etc.
Once you've done this, reserve the first four bits of each word for the tag. Then, every time two objects get passed in to +, first you perform a simple bit mask to verify that both objects' first four bits are 0b0000, i.e., that they are both integers. If they are not, you jump to an error message; otherwise, you proceed with the addition, and make sure that the result is also tagged accordingly.
This technique essentially makes each runtime value a manually-tagged union, which should be familiar to you if you've used C. In fact, it's also just like a Haskell data type, except that in Haskell the taggedness is much more abstract.
I'm guessing that you're familiar with pointers if you're trying to write a Scheme compiler. To avoid limiting your usable memory space, it may be more sensical to use the bottom (least significant) four bits, rather than the top ones. Better yet, because aligned dword pointers already have three meaningless bits at the bottom, you can simply co-opt those bits for your tag, as long as you dereference the actual address, rather than the tagged one.
Does that help?
Your default solution should be a simple tagged union. If you want to narrow your typing down to more specific types, you can do it - but it won't be that "toy" any more. A thing to look at is called abstract interpretation.
There are few successful implementations of such an optimisation, with V8 being probably the most widespread. In the Scheme world, the most aggressively optimising implementation is Stalin.

Hashing of pointer values

Sometimes you need to take a hash function of a pointer; not the object the pointer points to, but the pointer itself. Lots of the time, folks just punt and use the pointer value as an integer, chop off some high bits to make it fit, maybe shift out known-zero bits at the bottom. Thing is, pointer values aren't necessarily well-distributed in the code space; in fact, if your allocator is doing its job, there's an excellent chance they're all clustered close together.
So, my question is, has anyone developed hash functions that are good for this? Take a 32- or 64-bit value that's maybe got 12 bits of entropy in it somewhere and spread it evenly across a 32-bit number space.
This page lists several methods that might be of use. One of them, due to Knuth, is a simple as multiplying (in 32 bits) by 2654435761, but "Bad hash results are produced if the keys vary in the upper bits." In the case of pointers, that's a rare enough situation.
Here are some more algorithms, including performance tests.
It seems that the magic words are "integer hashing".
They'll likely exhibit locality, yes - but in the lower bits, which means objects will be distributed through the hashtable. You'll only see collisions if a pointer's address is a multiple of the hashtable's length from another pointer.
If you know the lowest possible pointer address (which is often the case if you're working within a large buffer), just convert the pointer to an integer by subtracting the lowest possible pointer value; eg. that could be the buffer's base address.
-Remember: pointer subtracted from pointer equals an offset (integer).
So: Don't "chop off" bits; it's much better to convert to an offset.
This will result in that the offset value is much smaller than a pointer value.
It may help further to shift the pointer value right twice (eg. divide by 4) in some cases as well, before hashing it.
The problem with pointers is often that small blocks of memory is likely to be allocated on the same address (eg. a block being freed and another block is taking the freed block's place).
Why not just use an existing hash function?

Resources