Do bytecode commands aligned? - bytecode

I know that compilers perform data structure alignment and padding according to 4-byte(for 32-bit systems) or 8-byte(64-bit systems) boundaries.
But do interpreters align bytecode commands when they generate bytecode? If a command is coded by 1 byte and operands are coded by 1, 2, 4 or 8 bytes then it's seems it's not good for a processor to fetch data if bytecode is interpreted in looped switch? What do you think?
P.S I'm not asking about interpreters that perform JIT.

In general, the answer is no, but the JVM does require 32-bit alignment for the data portions of the lookupswitch and tableswitch instructions. Up to 3 bytes of padding (zeros) must be encoded to ensure proper alignment.

Related

x86 store when data is in 2 different blocks

Supose linux-32: the aligment rules say, for example, that doubles (8 Bytes) must be aligned to 4 Bytes. This means that, if we assume 64 Bytes cache blocks (a typical value for modern processors) we can have a double aligned in the 60th position, which mean that this double will be in 2 different cache blocks.
It could even happen that both parts of the double were in 2 different cache blocks located in 2 different 4KB pages.
After this brief introduction to put the question in context, I have a couple of doubts:
1- For an assembler programming where we seek maximum performance, it is recommended to prevent these things from happenning by putting alignment directives, right? Or, for any reason that I unknow, making the alignment to make the double in only 1 block doesn't imply any performance change?
2- How will be the store instruction decoded in the in the mentioned case? (supose modern intel microarchitecture). I mean, I know that a normal store x86 instruction is decoded in a micro-fused pair of str-addr and str-data, but in this case where 2 different cache blocks (and maybe even 2 different 4KB pages) are involved, this will be decoded in 2 micro-fused pair of str-addr and str-data (one for the first 4 bytes of the double and another for the last 4 bytes)? Or it will be decoded to a single micro-fused pair but having to do both the str-addr and the str-data twice the work until finally being able to exit the execution port?
Yes, of course you should align a double whenever possible, like compilers do except when forced by ABI struct-layout rules to misalign them. (The ABI was designed when i386 was current so a double always required 2 loads anyway.)
The current version of the i386 System V ABI requires 16-byte stack alignment, so local doubles (that have to get spilled at all instead of kept in regs) can be aligned, and malloc has to return memory suitable for any type, and alignof(max_align_t) = 16 on 32-bit Linux (8 on 32-bit Windows) so 32-bit malloc will always give you at least 16 (or 8)-byte aligned memory. And of course in static storage you control the alignment with align (NASM) or .p2align (GAS) directives.
For the perf downsides of cacheline splits and page splits, see How can I accurately benchmark unaligned access speed on x86_64
re: decoding: The address isn't know at decode time so obviously any effects of a line-split page-split are resolved later. For stores, probably no effect until the store-buffer entry has to commit to L1d cache. Are two store buffer entries needed for split line/page stores on recent Intel? - probably no, allocating a 2nd entry after executing the store-address uop is implausible.
For loads, re-running the load through the execution unit to get the other half (or whatever uneven split), using internal line-split buffers to combine data. (Not re-dispatching from the RS, just internally handled in the load port. But the RS does aggressively replay uops waiting for the result of a load.)
Re-running the store-data uop for a misaligned store seems unlikely, too. I don't think we see extra counts for uops_dispatched_port.port_4 perf events.

Why is the register length static in any CPU

Why is the register length (in bits) that a CPU operates on not dynamically/manually/arbitrarily adjustable? Would it make the computer slower if it was adjustable this way?
Imagine you had an 8-bit integer. If you could adjust the CPU register length to 8 bits, the CPU would only have to go through the first 8 bits instead of extending the 8-bit integer to 64 bits and then going through all 64 bits.
At first I thought you were asking if it was possible to have a CPU with no definitive register size. That make no sense since the number and size of the registers is a physical property of the hardware and cannot be changed.
However some architecture let the programmer work on a smaller part of a register or to pair registers.
The x86 does both for example, with add al, 9 (uses only 8 bits of the 64-bit rax) and div rbx (pairs rdx:rax to form a 128-bit register).
The reason this scheme is not so diffuse is that it comes with a lot of trade-offs.
More registers means more bits needed to address them, simply put: longer instructions.
Longer instructions mean less code density, more complex decoders and less performance.
Furthermore most elementary operations, like the logic ones, addition and subtraction are already implemented as operating on a full register in a single cycle.
Finally, one execution unit can handle only one instruction at a time, we cannot issue eight 8-bit additions in a 64-bit ALU at the same time.
So there wouldn't be any improvement, nor in the latency nor in the throughput.
Accessing partial registers is useful for the programmer to fan-out the number of available registers, so for example if an algorithm works with 16-bit data, the programmer could use a single physical 64-bit register to store four items and operate on them independently (but not in parallel).
The ISAs that have variable length instructions can also benefit from using partial register because that usually means smaller immediate values, for example and instruction that set a register to a specific value usually have an immediate operand that matches the size of register being loaded (though RISC usually sign-extends or zero-extends it).
Architectures like ARM (presumably others as well) supports half precision floats. The idea is to do what you were speculating and #Margaret explained. With half precision floats, you can pack two float values in a single register, thereby introducing less bandwidth at a cost of reduced accuracy.
Reference:
[1] ARM
[2] GCC

Significance of Bytes as 8 bits

I was just wondering the reason why A BYTE IS 8 BITS ? Specifically if we talk about ASCII character set, then all its symbols can be represented just 7 bits leaving one spare bit(in reality where 8 bits is 1 Byte). So if we assume, that that there is big company wherein everyone has agreed to just use ASCII character set and nothing else(also this company doesn't have to do anything with the outside world) then couldn't in this company the developers develop softwares that would consider 7 Bits as 1 Byte and hence save one precious bit, and if done so they could save for instance 10 bits space for every 10 bytes(here 1 byte is 7 bits again) and so, ultimately lots and lots of precious space. The hardware(hard disk,processor,memory) used in this company specifically knows that it need to store & and bunch together 7 bits as 1 byte.If this is done globally then couldn't this revolutionise the future of computers. Can this system be developed in reality ?
Won't this be efficient ?
A byte is not necessarily 8 bits. A byte a unit of digital information whose size is processor-dependent. Historically, the size of a byte is equal to the size of a character as specified by the character encoding supported by the processor. For example, a processor that supports Binary-Coded Decimal (BCD) characters defines a byte to be 4 bits. A processor that supports ASCII defines a byte to be 7 bits. The reason for using the character size to define the size of a byte is to make programming easier, considering that a byte has always (as far as I know) been used as the smallest addressable unit of data storage. If you think about it, you'll find that this is indeed very convenient.
A byte is defined to be 8 bits in the extremely successful IBM S/360 computer family, which used an 8-bit character encoding called EBCDI. IBM, through its S/360 computers, introduced several crucially important computing techniques that became the foundation of all future processors including the ones we using today. In fact, the term byte has been coined by Buchholz, a computer scientist at IBM.
When Intel introduced its first 8-bit processor (8008), a byte was defined to be 8 bits even though the instruction set didn't support directly any character encoding, thereby breaking the pattern. The processor, however, provided numerous instructions that operate on packed (4-bit) and unpacked (8-bit) BCD-encoded digits. In fact, the whole x86 instruction set design was conveniently designed based on 8-bit bytes. The fact that 7-bit ASCII characters fit in 8-bit bytes was a free, additional advantage. As usual, a byte is the smallest addressable unit of storage. I would like to mention here that in digital circuit design, its convenient to have the number of wires or pins to be powers of 2 so that every possible value that appear as input or output has a use.
Later processors continued to use 8-bit bytes because it makes it much easier to develop newer designs based on older ones. It also helps making newer processors compatible with older ones. Therefore, instead of changing the size of a byte, the register, data bus, address bus sizes were doubled every time (now we reached 64-bit). This doubling enabled us to use existing digital circuit designs easily, significantly reducing processor design costs.
The main reason why it's 8 bits and not 7 is that is needs to be a power of 2.
Also: imagine what nibbles would look like in 7-bit bytes..
Also ideal (and fast) for conversion to and from hexadecimal.
Update:
What advantage do we get if we have power of 2... Please explain
First, let's distinguish between a BYTE and a ASCII character. Those are 2 different things.
A byte is used to store and process digital information (numbers) in a optimized way, whereas a character is (or should be) only meant to interact with us, humans, because we find it hard to read binary (although in modern days of big-data, big-internetspeed and big-clouds, even servers start talking to each other in text (xml, json), but that's a whole different story..).
As for a byte being a power of 2, the short answer:
The advantage of having powers of 2, is that data can easily be aligned efficiently on byte- or integer-boundaries - for a single byte that would be 1, 2, 4 and 8 bits, and it gets better with higher powers of 2.
Compare that to a 7-bit ASCII (or 7-bit byte): 7 is a prime number, which means only 1-bit and 7-bit values could be stored in an aligned form.
Of course there are a lot more reasons one could think of (for example the lay-out and structure of the logic gates and multiplexers inside CPU's/MCU's).
Say you want to control the in- or output pins on a multiplexer: with 2 control-lines (bits) you can address 4 pins, with 3 inputs, 8 pins can be addressed, with 4 -> 16,.. - idem for address-lines. So the more you look at it, the more sense it makes to use powers of 2. It seems to be the most efficient model.
As for optimized 7-bit ASCII:
Even on a system with 8-bit bytes, 7-bit ASCII can easily be compacted with some bit-shifting. A Class with a operator[] could be created, without the need to have 7-bit bytes (and of course, a simple compression would even do better).

What's the reason behind ZigZag encoding in Protocol Buffers and Avro?

ZigZag requires a lot of overhead to write/read numbers. Actually I was stunned to see that it doesn't just write int/long values as they are, but does a lot of additional scrambling. There's even a loop involved:
https://github.com/mardambey/mypipe/blob/master/avro/lang/java/avro/src/main/java/org/apache/avro/io/DirectBinaryEncoder.java#L90
I don't seem to be able to find in Protocol Buffers docs or in Avro docs, or reason myself, what's the advantage of scrambling numbers like that? Why is it better to have positive and negative numbers alternated after encoding?
Why they're not just written in little-endian, big-endian, network order which would only require reading them into memory and possibly reverse bit endianness? What do we buy paying with performance?
It is a variable length 7-bit encoding. The first byte of the encoded value has it high bit set to 0, subsequent bytes have it at 1. Which is the way the decoder can tell how many bytes were used to encode the value. Byte order is always little-endian, regardless of the machine architecture.
It is an encoding trick that permits writing as few bytes as needed to encode the value. So an 8 byte long with a value between -64 and 63 takes only one byte. Which is common, the range provided by long is very rarely used in practice.
Packing the data tightly without the overhead of a gzip-style compression method was the design goal. Also used in the .NET Framework. The processor overhead needed to en/decode the value is inconsequential. Already much lower than a compression scheme, it is a very small fraction of the I/O cost.

Why do bytes exist? Why don't we just use bits?

A byte consists of 8 bits on most systems.
A byte typically represents the smallest data type a programmer may use. Depending on language, the data types might be called char or byte.
There are some types of data (booleans, small integers, etc) that could be stored in fewer bits than a byte. Yet using less than a byte is not supported by any programming language I know of (natively).
Why does this minimum of using 8 bits to store data exist? Why do we even need bytes? Why don't computers just use increments of bits (1 or more bits) rather than increments of bytes (multiples of 8 bits)?
Just in case anyone asks: I'm not worried about it. I do not have any specific needs. I'm just curious.
because at the hardware level memory is naturally organized into addressable chunks. Small chunks means that you can have fine grained things like 4 bit numbers; large chunks allow for more efficient operation (typically a CPU moves things around in 'chunks' or multiple thereof). IN particular larger addressable chunks make for bigger address spaces. If I have chunks that are 1 bit then an address range of 1 - 500 only covers 500 bits whereas 500 8 bit chunks cover 4000 bits.
Note - it was not always 8 bits. I worked on a machine that thought in 6 bits. (good old octal)
Paper tape (~1950's) was 5 or 6 holes (bits) wide, maybe other widths.
Punched cards (the newer kind) were 12 rows of 80 columns.
1960s:
B-5000 - 48-bit "words" with 6-bit characters
CDC-6600 -- 60-bit words with 6-bit characters
IBM 7090 -- 36-bit words with 6-bit characters
There were 12-bit machines; etc.
1970-1980s, "micros" enter the picture:
Intel 4004 - 4-bit chunks
8008, 8086, Z80, 6502, etc - 8 bit chunks
68000 - 16-bit words, but still 8-bit bytes
486 - 32-bit words, but still 8-bit bytes
today - 64-bit words, but still 8-bit bytes
future - 128, etc, but still 8-bit bytes
Get the picture? Americans figured that characters could be stored in only 6 bits.
Then we discovered that there was more in the world than just English.
So we floundered around with 7-bit ascii and 8-bit EBCDIC.
Eventually, we decided that 8 bits was good enough for all the characters we would ever need. ("We" were not Chinese.)
The IBM-360 came out as the dominant machine in the '60s-70's; it was based on an 8-bit byte. (It sort of had 32-bit words, but that became less important than the all-mighty byte.
It seemed such a waste to use 8 bits when all you really needed 7 bits to store all the characters you ever needed.
IBM, in the mid-20th century "owned" the computer market with 70% of the hardware and software sales. With the 360 being their main machine, 8-bit bytes was the thing for all the competitors to copy.
Eventually, we realized that other languages existed and came up with Unicode/utf8 and its variants. But that's another story.
Good way for me to write something late on night!
Your points are perfectly valid, however, history will always be that insane intruder how would have ruined your plans long before you were born.
For the purposes of explanation, let's imagine a ficticious machine with an architecture of the name of Bitel(TM) Inside or something of the like. The Bitel specifications mandate that the Central Processing Unit (CPU, i.e, microprocessor) shall access memory in one-bit units. Now, let's say a given instance of a Bitel-operated machine has a memory unit holding 32 billion bits (our ficticious equivalent of a 4GB RAM unit).
Now, let's see why Bitel, Inc. got into bankruptcy:
The binary code of any given program would be gigantic (the compiler would have to manipulate every single bit!)
32-bit addresses would be (even more) limited to hold just 512MB of memory. 64-bit systems would be safe (for now...)
Memory accesses would be literally a deadlock. When the CPU has got all of those 48 bits it needs to process a single ADD instruction, the floppy would have already spinned for too long, and you know what happens next...
Who the **** really needs to optimize a single bit? (See previous bankruptcy justification).
If you need to handle single bits, learn to use bitwise operators!
Programmers would go crazy as both coffee and RAM get too expensive. At the moment, this is a perfect synonym of apocalypse.
The C standard is holy and sacred, and it mandates that the minimum addressable unit (i.e, char) shall be at least 8 bits wide.
8 is a perfect power of 2. (1 is another one, but meh...)
In my opinion, it's an issue of addressing. To access individual bits of data, you would need eight times as many addresses (adding 3 bits to each address) compared to using accessing individual bytes. The byte is generally going to be the smallest practical unit to hold a number in a program (with only 256 possible values).
Some CPUs use words to address memory instead of bytes. That's their natural data type, so 16 or 32 bits. If Intel CPUs did that it would be 64 bits.
8 bit bytes are traditional because the first popular home computers used 8 bits. 256 values are enough to do a lot of useful things, while 16 (4 bits) are not quite enough.
And, once a thing goes on for long enough it becomes terribly hard to change. This is also why your hard drive or SSD likely still pretends to use 512 byte blocks. Even though the disk hardware does not use a 512 byte block and the OS doesn't either. (Advanced Format drives have a software switch to disable 512 byte emulation but generally only servers with RAID controllers turn it off.)
Also, Intel/AMD CPUs have so much extra silicon doing so much extra decoding work that the slight difference in 8 bit vs 64 bit addressing does not add any noticeable overhead. The CPU's memory controller is certainly not using 8 bits. It pulls data into cache in long streams and the minimum size is the cache line, often 64 bytes aka 512 bits. Often RAM hardware is slow to start but fast to stream so the CPU reads kilobytes into L3 cache, much like how hard drives read an entire track into their caches because the drive head is already there so why not?
First of all, C and C++ do have native support for bit-fields.
#include <iostream>
struct S {
// will usually occupy 2 bytes:
// 3 bits: value of b1
// 2 bits: unused
// 6 bits: value of b2
// 2 bits: value of b3
// 3 bits: unused
unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
};
int main()
{
std::cout << sizeof(S) << '\n'; // usually prints 2
}
Probably an answer lies in performance and memory alignment, and the fact that (I reckon partly because byte is called char in C) byte is the smallest part of machine word that can hold a 7-bit ASCII. Text operations are common, so special type for plain text have its gain for programming language.
Why bytes?
What is so special about 8 bits that it deserves its own name?
Computers do process all data as bits, but they prefer to process bits in byte-sized groupings. Or to put it another way: a byte is how much a computer likes to "bite" at once.
The byte is also the smallest addressable unit of memory in most modern computers. A computer with byte-addressable memory can not store an individual piece of data that is smaller than a byte.
What's in a byte?
A byte represents different types of information depending on the context. It might represent a number, a letter, or a program instruction. It might even represent part of an audio recording or a pixel in an image.
Source

Resources