Building a custom machine code from the ground up

Building a custom machine code from the ground up - machine-code

I have recently begun working with logic level design as an amateur hobbyist but have now found myself running up against software, where I am much less competent. I have completed designing a custom 4 bit CPU in Logisim loosely based on the paper "A Very Simple Microprocessor" by Etienne Sicard. Now that it does the very limited functions that I've built into it (addition, logical AND, OR, and XOR) without any more detectable bugs (crossing fingers) I am running into the problem of writing programs for it. Logisim has the functionality of importing a script of Hex numbers into a RAM or ROM module so I can write programs for it using my own microinstruction code, but where do I start? I'm quite literally at the most basic possible level of software design and don't really know where to go from here. Any good suggestions on resources for learning about this low level of programming or suggestions on what I should try from here? Thanks much in advance, I know this probably isn't the most directly applicable question ever asked on this forum.

I'm not aware of the paper you mention. But if you have designed your own custom CPU, then if you want to write software for it, you have two choices: a) write it in machine code, or b) write your own assembler.
Obviously I'd go with b. This will require that you shift gear a bit and do some high-level programming. What you are aiming to write is an assembler program that runs on a PC, and converts some simple assembly language into your custom machine code. The assembler itself will be a high-level program and as such, I would recommend writing it in a high-level programming language that is good at both string manipulation and binary manipulation. I would recommend Python.
You basically want your assembler to be able to read in a text file like this:
mov a, 7
foo:
mov b, 20
add a, b
cmp a, b
jg foo
(I just made this program up; it's nonsense.)
And convert each line of code into the binary pattern for that instruction, outputting a binary file (or perhaps a hex file, since you said your microcontroller can read in hex values). From there, you will be able to load the program up onto the CPU.
So, I suggest you:
Come up with (on paper) an assembly language that is a simple written representation for each of the opcodes your machine supports (you may have already done this),
Learn simple Python,
Write a Python script that reads one line at a time (sys.stdin.readline()), figures out which opcode it is and what values it takes, and outputs the corresponding machine code to stdout.
Write some assembly code in your assembly language that will run on your CPU.
Sounds like a fun project.

I have done something similar that you might find interesting. I also have created from scratch my own CPU design. It is an 8-bit multi-cycle RISC CPU based on Harvard architecture with variable length instructions.
I started in Logisim, then coded everything in Verilog, and I have synthesized it in an FPGA.
To answer your question, I have done a simple and rudimentary assembler that translates a program (instructions, ie. mnemonics + data) to the corresponding machine language that can then be loaded into the PROG memory. I've written it in shell script and I use awk, which is what I was confortable with.
I basically do two passes: first translate mnemonics to their corresponding opcode and translate data (operands) into hex, here I keep track of all the labels addresses. second pass will replace all labels with their corresponding address.
(labels and addresses are for jumps)
You can see all the project, including the assembler, documented here: https://github.com/adumont/hrm-cpu

Because your instruction set is so small, and based on the thread from the mguica answer, I would say the next step is to continue and/or fully test your instruction set. do you have flags? do you have branch instructions. For now just hand generate the machine code. Flags are tricky, in particular the overflow (V) bit. You have to examine carry in and carry out on the msbit adder to get it right. Because the instruction set is small enough you can try the various combinations of back to back instructions and followed by or and followed by xor and followed by add, or followed by and or followed by xor, etc. And mix in the branches. back to flags, if the xor and or for example do not touch carry and overflow then make sure you see carry and overflow being a zero and not touched by logical instructions and carry and overflow being one and not touched, and also independently show carry and overflow are separate, one on one off, not touched by logical, etc. make sure all the conditional branches only operate on that one condition, lead into the various conditional branches with flag bits that are ignored in both states insuring that the conditional branch ignores them. Also verify that if the conditional branch is not supposed to modify them that it doesnt. likewise if the condition doesnt cause a branch that the conditional flags are not touched...
I like to use randomization but it may be more work than you are after. I like to independently develop a software simulator of the instruction set, which I find easier to use that the logic also sometimes easier to use in batch testing. you can then randomize some short list of instructions, varying the instruction and the registers, naturally test the tester by hand computing some of the results, both state of registers after test complete and state of flag bits. Then make that randomized list longer, at some point you can take a long instruction list and run it on the logic simulator and see if the logic comes up with the same register results and flag bits as the instruction set simulator, if they vary figure out why. If the do not try another random sequence, and another. Filling registers with prime numbers before starting the test is a very good idea.
back to individual instruction testing and flags go through all the corner cases 0xFFFF + 0x0000 0xFFFF+1, things like that places just to the either side of and right on operands and results that are one count away from where a flag changes at the point where the flag changes and just the other side of that. for the logicals for example if they use the zero flag, then have various data patterns that test results that are on either side of and at zero 0x0000, 0xFFFF 0xFFFE 0x0001 0x0002, etc. Probably a walking ones result as well 0x0001 result 0x0002, ox0004, etc.
hopefully I understood your question and have not pointed out the obvious or what you have already done thus far.

Related

How are Opcodes & Operands Defined in a CPU?

Extensive searching has sent me in a loop over the course of 3 days, so I'm depending on you guys to help me catch a break.
Why exactly does one 8-bit sequence of high's and low's perform this action, and 8-bit sequence performs that action.
My intuition tells me that the CPU's circuitry hard-wired one binary sequence to do one thing, and another to do another thing. That would mean different Processor's with potentially different chip circuitry wouldn't define one particular binary sequence as the same action as another?
Is this why we have assembly? I need someone to confirm and/or correct my hypothesis!

Opcodes are not always 8 bits but yes, it is hardcoded/wired in the logic to isolate the opcode and then send you down a course of action based on that. Think about how you would do it in an instruction set simulator, why would logic be any different? Logic is simpler than software languages, there is no magic there. ONE, ZERO, AND, OR, NOT thats as complicated as it gets.
Along the same lines if I was given an instruction set document and you were given an instruction set document and told to create a processor or write an instruction set simulator. Would we produce the exact same code? Even if the variable names were different? No. Ideally we would have programs that are functionally the same, they both parse the instruction and execute it. Logic is no different you give the spec to two engineers you might get two different processors that functionally are the same, one might perform better, etc. Look at the long running processor families, x86 in particular, they re-invent that every couple-three years being instruction set compatible for the legacy instructions while sometimes adding new instructions. Same for ARM and others.
And there are different instruction sets ARM is different from x86 is different from MIPS, the opcodes and/or bits you examine in the instruction vary, for none of these can you simply look at 8 bits, each you have some bits then if that is not enough to uniquely identify the instruction/operation then you need to examine some more bits, where those bits are what the rules are are very specific to each architecture. Otherwise what would be the point of having different names for them if they were the same.
And this information was out there you just didnt look in the right places, there are countless open online courses on the topic, books that google should hit some pages on, as well as open source processor cores you can look at and countless instruction set simulators with source code.

Computer programming

i have a question concerning computer programming. Let's say i have only one computer with no OS running. And would like to start to "develop" an OS. basically what i have is a blank sheet an a pen to do so. an a couple of electronic devices. how do i put my instruction into that computer?
because today we use interpreter of compiler that "turn" programming language into what they call "machine code". But my question could be how to generate machine code from nowhere.
Thank you for your replies, a link to learn how to do that will be must welcome.

The first computers where programmed making the "machine code" directly. Just punching one's an zeros into cards (well, in fact they punched octal digits).
This was done that way until somebody thought it would be a nice idea to have an assembler which translated the machine code instructions into that ones and zeros.
After that, another guy thought that it can be very nice idea to have a programming language, who will translate "top level" instructions to machine code.
And after that, or probably at the same time, some "internal procedures" where created to ease the programming: to open a file, to close a file the only thing you have to do is to call an internal subroutine in the machine instead of programming all the open file and close file subroutines by yourself: the seed for the operating systems was planted.
The cross compiling issue that is commented here is the way to create an operating system for a new computer nowadays: you use a working computer as a "lever" to create an operating system for a new computer.

it depends on how far back you want to go. the earliest ones "programming" was moving wires from one essentially analog alu to another.
The woman/women programming at this point were called computers and used use pencil and paper.
later you use a pencil and paper and the datasheet/documentation for the instruction set. assemble by hand basically, there are no compilers or even the concept of a programming language at this point, this has to evolve still. you wrote down the ones and zeros in whatever form you preferred (binary or octal).
one way to enter code at this point is with switches. certainly computers predated it but look for a picture of the front panel of a pdp8 or the altair, etc. you set the switches for the data value and address, and you manually strobe a write. you load the bootstrap in this way and/or the whole program. set the start address and switch to run mode.
over time they developed card and tape readers for which you loaded the bootstrap in by hand (switches) then you could use a reader to load larger programs easier. cards could be punched on a typewriter type thing, literally a keyboard but instead of striking through a ribbon onto paper, it cut slots in a card.
oses and programming languages started to evolve at this point. until you bootstrapped your compiler you had to write the first compiler for a new language in some other language (no different than today). so the first assembler had to be in machine code, then from assembler you could create some other language and so on.
If you wanted to repeat something like this today you would have to build a computer with some sort of manual input. you could certainly but you would have to design it that way, like then you need the debouncing out but you could for example have a processor with an external flash, be it parallel or serial, mux the lines to the switches (a switch controls the mux) and either address/data/write your program, or for fun could use a spi flash and serially load the program into the flash. much better to just use one of the pdp or altair, etc online simulators to get a feel for the experience.
there is no magic here, there is no chicken and egg problem at all. humans had to do it by hand before the computer could do it. a smaller/simpler program had to generate more complicated programs, and so on. this long, slow, evolution is well documented all over the internet and in books in libraries everywhere.

Computers are based on a physical processor which was designed to accept instructions (eg. in assembly code) that only allowed primitive instructions like shift, move, copy, add. This processor decided how it spoke (eg. how big were the words (8-bit) and and other specs (speed/standards etc). Using some type of storage, we could store the instructions (punch cards, disk) and execute huge streams of these instructions.
If instructions were repeated over and over, you could move to an address and execute what was at that location and create loops and other constructs (branches, context switches, recursion).
Since you would have peripherals, you would have some kind of way to interface with it (draw, print dots), and you could create routines to build up on that to build letters, fonts, boxes, lines. Then you could run a subroutine to print the letter 'a' on screen..
An OS is basically a high-level implementation of all those lower level instructions. It is really a collection of all the instructions to interface with different areas (i/o, computations etc). Unix is a perfect example of different folks working on different areas and plugging them all into a single OS.

AVR's Program memory

I ve written a code in C for ATmega128 and
I d like to know how the changes that I do in the code influence the Program Memory.
To be more specific, let's consider that the code is similar to that one:
d=fun1(a,b);
c=fun2(c,d);
the change that I do in the code is that I call the same functions more times e.g.:
d=fun1(a,b);
c=fun2(c,d);
h=fun1(k,l);
n=fun2(p,m);
etc...
I build the solution at the AtmelStudio 6.1 and I see the changes in the Program Memory.
Is there anyway to foresee, without builiding the solution, how the chages in the code will affect the program memory?
Thanks!!

Generally speaking this is next to impossible using C/C++ (that means the effort does not pay off).
In your simple case (the number of calls increase), you can determine the number of instructions for each call, and multiply by the number. This will only be correct, if the compiler does not inline in all cases, and does not apply optimzations at a higher level.
These calculations might be wrong, if you upgrade to a newer gcc version.
So normally you only get exact numbers when you compare two builds (same compiler version, same optimisations). avr-size and avr-nm gives you all information, for example to compare functions by size. You can automate this task (by converting the output into .csv files), and use a spreadsheet or diff to look for changes.
This method normally only pays off, if you have to squeeze a program into a smaller device (from 4k flash into 2k for example - you already have 128k flash, that's quite a lot).
This process is frustrating, because if you apply the same design pattern in C with small differences, it can lead to different sizes: So from C/C++, you cannot really predict what's going to happen.

Minimal instruction set to solve any problem with a computer program

Years ago, I have heard that someone was about to demonstrate that every computer program could be solved with just three instructions:
Assignment
Conditional
Loop
Please I would like to hear your opinion. I mean representing any algorithm as a computer program. Do you agree with this?

No need. The minimal theoretical computer needs just one instruction. They are called One Instruction Set Computers (OISC for short, kinda like the ultimate RISC).
There are two types. The first is a theoretically "pure" one instruction machine in which the instruction really works like a regular instruction in normal CPUs. The instruction is usually:
subtract and branch if less than zero
or variations thereof. The wikipedia article have examples of how this single instruction can be used to write code that emulates other instructions.
The second type is not theoretically pure. It is the transfer triggered architecture (wikipedia again, sorry). This family of architectures are also known as move machines and I have designed and built some myself.
Some consider move machines cheating since the machine actually have all the regular instructions only that they are memory mapped instead of being part of the opcode. But move machines are not merely theoretical, they are practical (like I said, I've built some myself). There is even a commercially available family of CPUs built by Maxim: the MAXQ. If you look at the MAXQ instruction set (they call it transfer set since there is really only one instruction, I usually call it register set) you will see that MAXQ assembly looks rather like a standard accumulator based architecture.

This is a consequence of Turing Completeness, which is something that was established many decades ago.
Alan Turing, the famous computer scientist, proved that any computable function could be computed using a Turing Machine. A Turing machine is a very simple theoretical device which can do only a few things. It can read and write to a tape (i.e. memory), maintain an internal state which is altered by the contents read from memory, and use the internal state and the last read memory cell to determine which direction to move the tape before reading the next memory cell.
The operations of assignment, conditional, and loop are sufficient to simulate a Turing Machine. Reading and writing memory and maintaining state requires assignment. Changing the direction of the tape based on state and memory contents require conditionals and loops. "Loops" in fact are a bit more high-level than what is actually required. All that is really required is that program flow can jump backwards somehow. This implies that you can create loops if you want to, but the language does not need to have an explicit loop construct.
Since these three operations allow simulation of a Turing Machine, and a Turing Machine has been proven to be able to compute any computable function, it follows that any language which provides these operations is also able to compute any computable function.
Edit: And, as other answerers pointed out, these operations do not need to be discrete. You can craft a single instruction which does all three of these things (assign, compare, and branch) in such a way that it can simulate a Turing machine all by itself.

The minimal set is a single command, but you have to choose a fitting one, for example - One instruction set computer
When I studied, we used such a "computer" to calculate factorial, using just a single instruction:
SBN - Subtract and Branch if Negative:
SBN A, B, C
Meaning:
if((Memory[A] -= Memory[B]) < 0) goto C
// (Wikipedia has a slightly different definition)

Notable one instruction set computer (OSIC) implementations
This answer will focus on interesting implementations of single instruction set CPUs, compilers and assemblers.
movfuscator
https://github.com/xoreaxeaxeax/movfuscator
Compiles C code using only mov x86 instructions, showing in a very concrete way that a single instruction suffices.
The Turing completeness seems to have been proven in a paper: https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf
subleq
https://esolangs.org/wiki/Subleq:
https://github.com/hasithvm/subleq-verilog Verilog, Xilinx ISE.
https://github.com/purisc-group/purisc Verilog and VHDL, Altera. Maybe that project has a clang backend, but I can't use it: https://github.com/purisc-group/purisc/issues/5
http://mazonka.com/subleq/sqasm.cpp | http://mazonka.com/subleq/sqrun.cpp C++-based assembler and emulator.
See also
What is the minimum instruction set required for any Assembly language to be considered useful?
https://softwareengineering.stackexchange.com/questions/230538/what-is-the-absolute-minimum-set-of-instructions-required-to-build-a-turing-comp/325501

In 1964, Bohm and Jacopini published a paper in which they demonstrated that all programs could be written in terms of only three control structures:
the sequence structure,
the selection structure
and the repetition structure.

Programmers using Haskell might argue that you only need the Contional and Loop because assignments, and mutable state, don't exist in Haskell.

VM Design: More opcodes or less opcodes? What is better?

Don't be shocked. This is a lot of text but I'm afraid without giving some detailed information I cannot really show what this is all about (and might get a lot of answers that don't really address my question). And this definitely not an assignment (as someone ridiculously claimed in his comment).
Prerequisites
Since this question can probably not be answered at all unless at least some prerequisites are set, here are the prerequisites:
The Virtual Machine code shall be interpreted. It is not forbidden that there may be a JIT compiler, but the design should target an interpreter.
The VM shall be register based, not stack based.
The answer may neither assume that there is a fixed set of registers nor that there is an unlimited number of them, either one may be the case.
Further we need a better definition of "better". There are a couple of properties that must be considered:
The storage space for the VM code on disk. Of course you could always scrap all optimizations here and just compress the code, but this has a negative effect on (2).
Decoding speed. The best way to store the code is useless if it takes too long to transform that into something that can be directly executed.
The storage space in memory. This code must be directly executable either with or without further decoding, but if there is further decoding involved, this encoding is done during execution and each time the instruction is executed (decoding done only once when loading the code counts to item 2).
The execution speed of the code (taking common interpreter techniques into account).
The VM complexity and how hard it is to write an interpreter for it.
The amount of resources the VM needs for itself. (It is not a good design if the code the VM runs is 2 KB in size and executes faster than the wink of an eye, however it needs 150 MB to do this and its start up time is far above the run time of the code it executes)
Now examples what I actually mean by more or less opcodes. It may look like the number of opcodes is actually set, as you need one opcode per operation. However its not that easy.
Mulitple Opcodes for the Same Operation
You can have an operation like
ADD R1, R2, R3
adding the values of R1 and R2, writing the result to R3. Now consider the following special cases:
ADD R1, R2, R2
ADD R1, 1, R1
These are common operations you'll find in a lot of applications. You can express them with the already existing opcode (unless you need a different one because the last one has an int value instead of a register). However, you could also create special opcodes for these:
ADD2 R1, R2
INC R1
Same as before. Where's the advantage? ADD2 only needs two arguments, instead of 3, INC even only needs a single one. So this could be encoded more compact on disk and/or in memory. Since it is also easy to transform either form to the other one, the decoding step could transform between both ways to express these statements. I'm not sure how much either form will influence execution speed, though.
Combining Two Opcodes Into a Single One
Now let's assume you have an ADD_RRR (R for register) and a LOAD to load data into an register.
LOAD value, R2
ADD_RRR R1, R2, R3
You can have these two opcodes and always use constructs like this throughout your code... or you can combine them into a single new opcode, named ADD_RMR (M for memory)
ADD_RMR R1, value, R3
Data Types vs Opcodes
Assume you have 16 Bit integer and 32 Bit integer as native types. Registers are 32 Bit so either data type fits. Now when you add two registers, you could make the data type a parameter:
ADD int16, R1, R2, R3
ADD int32, R1, R2, R3
Same is true for a signed and unsigned integers for example. That way ADD can be a short opcode, one byte, and then you have another byte (or maybe just 4 Bit) telling the VM how to interpret the registers (do they hold 16 Bit or 32 Bit values). Or you can scrap type encoding and instead have two opcodes:
ADD16 R1, R2, R3
ADD32 R1, R2, R3
Some may say both are exactly the same - just interpreting the first way as 16 Bit opcodes would work. Yes, but a very naive interpreter might look quite different. E.g. if it has one function per opcode and dispatches using a switch statement (not the best way doing it, function calling overhead, switch statement maybe not optimal either, I know), the two opcodes could look like this:
case ADD16: add16(p1, p2, p3); break; // pX pointer to register
case ADD32: add32(p1, p2, p3); break;
and each function is centered around a certain kind of add. The second one though may look like this:
case ADD: add(type, p1, p2, p3); break;
// ...
// and the function
void add (enum Type type, Register p1, Register p2, Register p3)
{
switch (type) {
case INT16: //...
case INT32: // ...
}
}
Adding a sub-switch to a main switch or a sub dispatch table to a main dispatch table. Of course an interpreter can do either way regardless if types are explicit or not, but either way will feel more native to developers depending on opcode design.
Meta Opcodes
For lack of a better name I'll call them that way. These opcodes have no meaning at all on their own, they just change the meaning of the opcode following. Like the famous WIDE operator:
ADD R1, R2, R3
WIDE
ADD R1, R2, R3
E.g. in the second case the registers are 16 Bit (so you can addnress more of them), in the first one only 8. Alternatively you can not have such a meta opcode and have an ADD and an ADD_WIDE opcode. Meta opcodes like WIDE avoid having a SUB_WIDE, MUL_WIDE, etc. as you can always prepend every other normal opcode with WIDE (always just one opcode). Disadvantage is that an opcode alone becomes meaningless, you always must check the opcode before it if it was a meta opcode or not. Further the VM must store an extra state per thread (e.g. whether we are now in wide mode or not) and remove the state again after the next instruction. Even CPUs have such opcodes (e.g. x86 LOCK opcode).
How to Find a Good Trade-Off???
Of course the more opcodes you have, the bigger switches/dispatch-tables will become and the more bits you will need to express these codes on disk or in memory (though you can maybe store them more efficiently on disk where the data doesn't have to be directly executable by a VM); also the VM will become more complicated and have more lines of code - on the other hand the more powerful the opcodes are: You are getting closer to the point where every expression, even a complex one, will end up in one opcode.
Choosing little opcodes makes it easy to code the VM and will lead to very compact opcodes I guess - on the other hand it means you may need a very high number of opcodes to perform a simple task and every not extremely often used expression will have to become a (native) function call of some kind, as no opcode can be used for it.
I read a lot about all kind of VMs on the Internet, but no source was really making a good and fair trade-off going either way. Designing a VM is like designing a CPU, there are CPUs with little opcodes, they are fast, but you also need many of these. And there are CPUs with many opcodes, some are very slow, but you'll need much less of them to express the same piece of code. It looks like the "more opcodes are better" CPUs have totally won the consumer market and the "less opcodes are better" ones can only survive in some parts of the server market or super computer business. What about VMs?

To be honest, I think it's largely a matter of the purpose of the VM, similar to how the processor design is largely determined by how the processor is primarily meant to be used.
In other words, you'll preferably be able to determine common use case scenarios for your VM, so that you can establish features that are likely going to be required, and also establish those that are unlikely to be very commonly required.
Of course I do understand, that you are probably envisioning an abstract, very generic, Virtual Machine, that can be used as the internal/backend implementation for other programming languages?
However, I feel, it's important to realize and to emphasize that there really is no such thing as a "generic ideal" implementation of anything, i.e. once you keep things generic and abstract you will inevitably face a situation where you need to make compromises.
Ideally, these compromises will be based on real life use scenarios for your code, so that these compromises are actually based on well-informed assumptions and simplifications that you can make without going out on a limb.
In other words, I would think about what are the goals for your VM?
How is it primarily going to be used in your vision?
What are the goals you want to achieve?
This will help you come up with requirements and help you make simplifcations, so that you can design your instruction set based on reasonable assumptions.
If you expect your VM to be primarily used by programming languages for numbers crunching, you'll probably want to look for a fairly powerful foundation with maths operations, by providing lots of low level primitives, with support for wide data types.
If on the other hand, you'll server as the backend for OO languages, you will want to look into optimizing the corresponding low level instructions (i.e. hashes/dictionaries).
In general, I would recommend to keep the instruction set as simple and intuitive as possible in the beginning, and only add special instructions once you have proven that having them in place is indeed useful (i.e. profile & opcode dumps) and does cause a performance gain. So, this will be largely determine by the very first "customers" your VM will have.
If you are really eager to research more involved approaches, you could even look into dynamically optimizing the instruction set at runtime, using pattern matching to find common occurrences of opcodes in your bytecode, in order to derive more abstract implementations, so that your can transform your bytecode dynamically with custom, runtime-generated, opcodes.

For software performance it's easier if all opcodes are the same length, so you can have one gigantic switch statement and not have to examine various option bits that might have been set by preceding modifier opcodes.
Two matters that I think you didn't ask about are ease of writing compilers that translate programming languages to your VM code and ease of writing interpreters that execute your VM code. Both of these are easier with fewer opcodes. (But not too few. For example if you omit a divide opcode then you get an opportunity to learn how to code good division functions. Good ones are far harder than simple ones.)

I prefer minimalistic instruction-sets because there can be combined into one opcode. For example an opcode consisting of two 4 bit instruction fields can be dispatched with an 256 entry jump-table. As dispatch overhead is the main bottleneck in interpretation perfomance increased by an factor ~ two because only every second instruction needs to be dispatched. One way to implement an minimalistic but effective instruction set would be an accumulator/store design.

Less opcodes, atomic, in nature.
But, if a combination, of some opcodes, is used frequently, added as a single instruction.
For example, a lot, of Higher PL have the simpler "if" and "goto" instructions, yet, they also have the composed "while", "for", "do-while" or " repeat-until" instructions, based on the previous instructions.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio