While looking at the atmel 8-bit AVR instruction set ( http://www.atmel.com/Images/doc0856.pdf ) I found the instruction format quite complex. A lot of instructions have different bit fields, where bits of operands/opcode are at different places in the instruction: why is that so? Isn't it more difficult for the decode unit to actually decode the opcode and operands with this format?
I think it largely may come from considerations of compatibility.
On some stage one instruction gets more powerful and the additional option is encoded in the bits that are free, so that the old instruction word still invokes the old behaviour.
Related
According to the SPIR-V 1.3 specification, as I understand it, all instructions are segmented in 4 byte words. This is different from traditional byte code IR languages like Pythons byte code and Java's JVM bytecode, which if I understand correctly literally segment operations as individual bytes. I've even heard people offhandedly talk about how this is specifically a good thing ("word code"), but never elaborate on why. Why did Khronos group decide to go down this route for its SPIR-V definition, instead of using a single byte format, or some other arbitrary size?
There are many reasons, but probably the biggest is that it solves the endian issue in a simple, efficient, and extensible fashion.
SPIR-V is defined to be little-endian-encoded. So on a big-endian system, step 1 has to be converting the data to the proper endian. In a more byte-oriented assembly style, you would have to read and interpret each opcode, then figure out which operands have to be byte-swapped. Basically, endian-swapping requires interpreting the data. With SPIR-V, you just transpose every 4 bytes: a simple and efficient loop.
Plus, SPIR-V is extensible; extensions can add to the set of opcodes. In a byte-oriented assembly, you would need to be able to understand every extension opcode in order to do endian conversion. With SPIR-V, you don't care if you can understand all of the opcodes; just transpose every 4 bytes and you're done. So you can write a tool that can at least read SPIR-V assembly even if it doesn't understand all of the opcodes.
This makes SPIR-V a bit bulkier compared to other byte-codes, but it's a lot easier to chew.
Extensive searching has sent me in a loop over the course of 3 days, so I'm depending on you guys to help me catch a break.
Why exactly does one 8-bit sequence of high's and low's perform this action, and 8-bit sequence performs that action.
My intuition tells me that the CPU's circuitry hard-wired one binary sequence to do one thing, and another to do another thing. That would mean different Processor's with potentially different chip circuitry wouldn't define one particular binary sequence as the same action as another?
Is this why we have assembly? I need someone to confirm and/or correct my hypothesis!
Opcodes are not always 8 bits but yes, it is hardcoded/wired in the logic to isolate the opcode and then send you down a course of action based on that. Think about how you would do it in an instruction set simulator, why would logic be any different? Logic is simpler than software languages, there is no magic there. ONE, ZERO, AND, OR, NOT thats as complicated as it gets.
Along the same lines if I was given an instruction set document and you were given an instruction set document and told to create a processor or write an instruction set simulator. Would we produce the exact same code? Even if the variable names were different? No. Ideally we would have programs that are functionally the same, they both parse the instruction and execute it. Logic is no different you give the spec to two engineers you might get two different processors that functionally are the same, one might perform better, etc. Look at the long running processor families, x86 in particular, they re-invent that every couple-three years being instruction set compatible for the legacy instructions while sometimes adding new instructions. Same for ARM and others.
And there are different instruction sets ARM is different from x86 is different from MIPS, the opcodes and/or bits you examine in the instruction vary, for none of these can you simply look at 8 bits, each you have some bits then if that is not enough to uniquely identify the instruction/operation then you need to examine some more bits, where those bits are what the rules are are very specific to each architecture. Otherwise what would be the point of having different names for them if they were the same.
And this information was out there you just didnt look in the right places, there are countless open online courses on the topic, books that google should hit some pages on, as well as open source processor cores you can look at and countless instruction set simulators with source code.
I would appreciate it a lot for your patience to explain on a seemingly naive question?
An Arduino Uno with 8-bit MCU (ATmega328), yet we program it with 32 bit C program customs? why?
Arduino Uno(for example), uses the 8-bit AVR MCU (ATmega328), which I understand the addressing mode and basic arithmetic operations are on 8-bit operations,
while when I program in the Arduino IDE, by default I am programming like it is a 32-bit C/C++ program (for example, I can define uint32_t,....or, )
so is this all done by the compiler in Arduino IDE ? (who's that ? avr-gcc? )
and... the compile does more work to translate 32-bit arithmetic operations to 8-bit arithmetic operations ?
Each processor / micro controller operates on a specific instruction set. Essentially it is the compilers job to compile your source code into machine code, thus the compiler has to know the (8-bit) instruction set of the processor. So if you take a uint32_t addition for example, it has to "compile" it into several add instructions, because the 8-bit AVR is only able to add two 8 bit values. This is a simplified example but I hope you get the idea.
As was previously said, 32-bit arithmetic operations get broken down into multiple 8-bit operations which your 8-bit MCU can handle. I haven't tried it myself, but I would suspect that doing anything more complicated than simple arithmetic with larger variable types will consume a lot more of your hardware resources, and should probably be avoided if possible.
I was wondering, why some architectures use little-endian and others big-endian. I remember I read somewhere that it has to do with performance, however, I don't understand how can endianness influence it. Also I know that:
The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses.
Which seems a nice feature, but, even so, many systems use big-endian, which probably means big-endian has some advantages too (if so, which?).
I'm sure there's more to it, most probably digging down to the hardware level. Would love to know the details.
I've looked around the net a bit for more information on this question and there is a quite a range of answers and reasonings to explain why big or little endian ordering may be preferable. I'll do my best to explain here what I found:
Little-endian
The obvious advantage to little-endianness is what you mentioned already in your question... the fact that a given number can be read as a number of a varying number of bits from the same memory address. As the Wikipedia article on the topic states:
Although this little-endian property is rarely used directly by high-level programmers, it is often employed by code optimizers as well as by assembly language programmers.
Because of this, mathematical functions involving multiple precisions are easier to write because the byte significance will always correspond to the memory address, whereas with big-endian numbers this is not the case. This seems to be the argument for little-endianness that is quoted over and over again... because of its prevalence I would have to assume that the benefits of this ordering are relatively significant.
Another interesting explanation that I found concerns addition and subtraction. When adding or subtracting multi-byte numbers, the least significant byte must be fetched first to see if there is a carryover to more significant bytes. Because the least-significant byte is read first in little-endian numbers, the system can parallelize and begin calculation on this byte while fetching the following byte(s).
Big-endian
Going back to the Wikipedia article, the stated advantage of big-endian numbers is that the size of the number can be more easily estimated because the most significant digit comes first. Related to this fact is that it is simple to tell whether a number is positive or negative by simply examining the bit at offset 0 in the lowest order byte.
What is also stated when discussing the benefits of big-endianness is that the binary digits are ordered as most people order base-10 digits. This is advantageous performance-wise when converting from binary to decimal.
While all these arguments are interesting (at least I think so), their applicability to modern processors is another matter. In particular, the addition/subtraction argument was most valid on 8 bit systems...
For my money, little-endianness seems to make the most sense and is by far the most common when looking at all the devices which use it. I think that the reason why big-endianness is still used, is more for reasons of legacy than performance. Perhaps at one time the designers of a given architecture decided that big-endianness was preferable to little-endianness, and as the architecture evolved over the years the endianness stayed the same.
The parallel I draw here is with JPEG (which is big-endian). JPEG is big-endian format, despite the fact that virtually all the machines that consume it are little-endian. While one can ask what are the benefits to JPEG being big-endian, I would venture out and say that for all intents and purposes the performance arguments mentioned above don't make a shred of difference. The fact is that JPEG was designed that way, and so long as it remains in use, that way it shall stay.
I would assume that it once were the hardware designers of the first processors who decided which endianness would best integrate with their preferred/existing/planned micro-architecture for the chips they were developing from scratch.
Once established, and for compatibility reasons, the endianness was more or less carried on to later generations of hardware; which would support the 'legacy' argument for why still both kinds exist today.
I have recently begun working with logic level design as an amateur hobbyist but have now found myself running up against software, where I am much less competent. I have completed designing a custom 4 bit CPU in Logisim loosely based on the paper "A Very Simple Microprocessor" by Etienne Sicard. Now that it does the very limited functions that I've built into it (addition, logical AND, OR, and XOR) without any more detectable bugs (crossing fingers) I am running into the problem of writing programs for it. Logisim has the functionality of importing a script of Hex numbers into a RAM or ROM module so I can write programs for it using my own microinstruction code, but where do I start? I'm quite literally at the most basic possible level of software design and don't really know where to go from here. Any good suggestions on resources for learning about this low level of programming or suggestions on what I should try from here? Thanks much in advance, I know this probably isn't the most directly applicable question ever asked on this forum.
I'm not aware of the paper you mention. But if you have designed your own custom CPU, then if you want to write software for it, you have two choices: a) write it in machine code, or b) write your own assembler.
Obviously I'd go with b. This will require that you shift gear a bit and do some high-level programming. What you are aiming to write is an assembler program that runs on a PC, and converts some simple assembly language into your custom machine code. The assembler itself will be a high-level program and as such, I would recommend writing it in a high-level programming language that is good at both string manipulation and binary manipulation. I would recommend Python.
You basically want your assembler to be able to read in a text file like this:
mov a, 7
foo:
mov b, 20
add a, b
cmp a, b
jg foo
(I just made this program up; it's nonsense.)
And convert each line of code into the binary pattern for that instruction, outputting a binary file (or perhaps a hex file, since you said your microcontroller can read in hex values). From there, you will be able to load the program up onto the CPU.
So, I suggest you:
Come up with (on paper) an assembly language that is a simple written representation for each of the opcodes your machine supports (you may have already done this),
Learn simple Python,
Write a Python script that reads one line at a time (sys.stdin.readline()), figures out which opcode it is and what values it takes, and outputs the corresponding machine code to stdout.
Write some assembly code in your assembly language that will run on your CPU.
Sounds like a fun project.
I have done something similar that you might find interesting. I also have created from scratch my own CPU design. It is an 8-bit multi-cycle RISC CPU based on Harvard architecture with variable length instructions.
I started in Logisim, then coded everything in Verilog, and I have synthesized it in an FPGA.
To answer your question, I have done a simple and rudimentary assembler that translates a program (instructions, ie. mnemonics + data) to the corresponding machine language that can then be loaded into the PROG memory. I've written it in shell script and I use awk, which is what I was confortable with.
I basically do two passes: first translate mnemonics to their corresponding opcode and translate data (operands) into hex, here I keep track of all the labels addresses. second pass will replace all labels with their corresponding address.
(labels and addresses are for jumps)
You can see all the project, including the assembler, documented here: https://github.com/adumont/hrm-cpu
Because your instruction set is so small, and based on the thread from the mguica answer, I would say the next step is to continue and/or fully test your instruction set. do you have flags? do you have branch instructions. For now just hand generate the machine code. Flags are tricky, in particular the overflow (V) bit. You have to examine carry in and carry out on the msbit adder to get it right. Because the instruction set is small enough you can try the various combinations of back to back instructions and followed by or and followed by xor and followed by add, or followed by and or followed by xor, etc. And mix in the branches. back to flags, if the xor and or for example do not touch carry and overflow then make sure you see carry and overflow being a zero and not touched by logical instructions and carry and overflow being one and not touched, and also independently show carry and overflow are separate, one on one off, not touched by logical, etc. make sure all the conditional branches only operate on that one condition, lead into the various conditional branches with flag bits that are ignored in both states insuring that the conditional branch ignores them. Also verify that if the conditional branch is not supposed to modify them that it doesnt. likewise if the condition doesnt cause a branch that the conditional flags are not touched...
I like to use randomization but it may be more work than you are after. I like to independently develop a software simulator of the instruction set, which I find easier to use that the logic also sometimes easier to use in batch testing. you can then randomize some short list of instructions, varying the instruction and the registers, naturally test the tester by hand computing some of the results, both state of registers after test complete and state of flag bits. Then make that randomized list longer, at some point you can take a long instruction list and run it on the logic simulator and see if the logic comes up with the same register results and flag bits as the instruction set simulator, if they vary figure out why. If the do not try another random sequence, and another. Filling registers with prime numbers before starting the test is a very good idea.
back to individual instruction testing and flags go through all the corner cases 0xFFFF + 0x0000 0xFFFF+1, things like that places just to the either side of and right on operands and results that are one count away from where a flag changes at the point where the flag changes and just the other side of that. for the logicals for example if they use the zero flag, then have various data patterns that test results that are on either side of and at zero 0x0000, 0xFFFF 0xFFFE 0x0001 0x0002, etc. Probably a walking ones result as well 0x0001 result 0x0002, ox0004, etc.
hopefully I understood your question and have not pointed out the obvious or what you have already done thus far.