How are Opcodes & Operands Defined in a CPU? - cpu

Extensive searching has sent me in a loop over the course of 3 days, so I'm depending on you guys to help me catch a break.
Why exactly does one 8-bit sequence of high's and low's perform this action, and 8-bit sequence performs that action.
My intuition tells me that the CPU's circuitry hard-wired one binary sequence to do one thing, and another to do another thing. That would mean different Processor's with potentially different chip circuitry wouldn't define one particular binary sequence as the same action as another?
Is this why we have assembly? I need someone to confirm and/or correct my hypothesis!

Opcodes are not always 8 bits but yes, it is hardcoded/wired in the logic to isolate the opcode and then send you down a course of action based on that. Think about how you would do it in an instruction set simulator, why would logic be any different? Logic is simpler than software languages, there is no magic there. ONE, ZERO, AND, OR, NOT thats as complicated as it gets.
Along the same lines if I was given an instruction set document and you were given an instruction set document and told to create a processor or write an instruction set simulator. Would we produce the exact same code? Even if the variable names were different? No. Ideally we would have programs that are functionally the same, they both parse the instruction and execute it. Logic is no different you give the spec to two engineers you might get two different processors that functionally are the same, one might perform better, etc. Look at the long running processor families, x86 in particular, they re-invent that every couple-three years being instruction set compatible for the legacy instructions while sometimes adding new instructions. Same for ARM and others.
And there are different instruction sets ARM is different from x86 is different from MIPS, the opcodes and/or bits you examine in the instruction vary, for none of these can you simply look at 8 bits, each you have some bits then if that is not enough to uniquely identify the instruction/operation then you need to examine some more bits, where those bits are what the rules are are very specific to each architecture. Otherwise what would be the point of having different names for them if they were the same.
And this information was out there you just didnt look in the right places, there are countless open online courses on the topic, books that google should hit some pages on, as well as open source processor cores you can look at and countless instruction set simulators with source code.

Related

Does it cost significant resources for a modern CPU to keep flags updated?

As I understand it, on a modern out of order CPU, one of the most expensive things is state, because that state has to be tracked in multiple versions, kept up-to-date across many instructions etc.
Some instruction sets like x86 and ARM make extensive use of flags, which were introduced when the cost model was not what what it is today, and the flags only cost a few logic gates. Things like every arithmetic instruction setting flags to detect zero, carry and overflow.
Are these particularly expensive to keep updated on a modern out of order implementation? Such that e.g. an ADD instruction updates the carry flag, and this must be tracked because although it will probably never be used, it is possible that some other instruction could use it N instructions later, with no fixed upper bound on N?
Are integer operations like addition and subtraction cheaper on instruction set architectures like MIPS that do not have these flags?
Various aspects of this are not very publicly known, so I will try to separate definitely known things from reasonable guesses and conjecture.
An approach has been to extend the (physical) integer registers (whether they take the form of a physical register file [eg P4 and SandyBridge+] or of results-in-ROB [eg P3]) with the flags that were produced by the operation that also produced the associated integer result. That's only about the arithmetic flags (sometimes AFLAGS, not to be confused with EFLAGS), but I don't think the "weird flags" are the focus of this question. Interestingly there is a patent[1] that hints at storing more than just the 6 AFLAGS themselves, putting some "combination flags" in there as well, but who know whether that was really done - most sources say the registers are extended by 6 bits, but AFAIK we (the public) don't really know. Lumping the integer result and associated flags together is described in for example this patent[2], which is primarily about preventing a certain situation where the flags might accidentally no longer be backed by any physical register. Aside from such quirks, during normal operation it has the nice effect of only needing to allocate 1 register for an arithmetic operation, rather than a separate main-result and flags-result, so renaming is normally not made much worse by the existence of the flags. Additionally, either the register alias table needs at least one more slot to keep track of which integer register contains the latest flags, or a separate flag-renaming-state buffer keeps track of the latest speculative flag state ([2] suggests Intel chose to separate them, which may simplify the main RAT but they don't go into such details). More slots may be used[3] to efficiently implement instructions which only update a subset of the flags (NetBurst™ famously lacked this, resulting in the now-stale advice to favour add over inc). Similarly, the non-speculative architectural state (whether it would be part of the retirement register file or be separate-but-similar again is not clear) needs at least one such slot.
A separate issue is computing the flags in the first place. [1] suggests separating flag generation from the main ALU simplifies the design. It's not clear to what degree they would be separated: the main ALU has to compute the Adjust and Sign flags anyway, and having an adder output a carry out the top is not much to ask (less than recomputing it from nothing). The overflow flag only takes an extra XOR gate to combine the carry into the top bit with the carry out of the top bit. The Zero flag and Parity flag are not for free though (and they depend on the result, not on the calculation of the result), if there is partial separation it would make sense that those would be computed separately. Perhaps it really is all separate. In NetBurst™, flag calculation took an extra half-cycle (the ALU was double-pumped and staggered)[4], but whether that means all flags are computed separately or a subset of them (or even a superset as [1] hinted) is not clear - the flags result is treated as monolithic so latency tests cannot distinguish whether a flag is computed in the third half-cycle by the flags unit or just handed to the flags unit by the ALU. In any case, typical ALU operations could be executed back-to-back, even if dependent (meaning that the high half of the first operation and the low half of the second operation ran in parallel), the delayed computation of the flags did not stand in the way of that. As you might expect though, ADC and SBB were not so efficient on NetBurst, but there may be other reasons for that too (for some reason a lot of µops are involved).
Overall I would conclude that the existence of arithmetic flags costs significant engineering resources to prevent them from having a significant performance impact, but that effort is also effective, so a significant impact is avoided.

Do computer/cpu really understand(binary)?

I have read and heard from many people,books, sites computer understand nothing but only binary!But they dont tell how computer/cpu understand binary.So I was thinking how can computer/cpu can understand?Cause as of my little knowledge and thinking, to think or understand something one must need a brain and off-course a life but cpu lack both.
*Additionally as cpu run by electricity, so my guess is cpu understand nothing,not even binary rather there are some natural rules for electricity or something like that and we human*(or who invented computer) found it(may be if we flow current in a certain combination or in certain number of circuits we get a row light or like so, who know!) and also a way to manipulate the current flow/straight light to make with it, what we need i.e different letters(with straight three light or magnetic wave occurred from the electricity with the help of manipulation we can have letter 'A') means computer/cpu dont understanad anything.
Its just my wild guess. I hope someone could help me to have a clear idea about if cpu really understand anything(binary)?And if, then how. Anyone detailed answer,article or book would be great.Thanks in advance.
From HashNode article "How does a computer machine understand 0s and 1s?"
A computer doesn't actually "understand" anything. It merely provides you with a way of information flow — input to output. The decisions to transform a given set of inputs to an output (computations) are made using boolean expressions (expressed using specific arrangements of logic gates).
At the hardware level we have bunch of elements called transistors (modern computers have billions of them and we are soon heading towards an era where they would become obsolete). These transistors are basically switching devices. Turning ON and OFF based on supply of voltage given to its input terminal. If you translate the presence of voltage at the input of the transistor as 1 and absence of voltage as 0 (you can do it other way too). There!! You have the digital language.
"understand" no. Computers don't understand anything, they're just machines that operate according to fixed rules for moving from one state to another.
But all these states are encoded in binary.
So if you anthropomorphise the logical (architectural) or physical (out-of-order execution, etc. etc.) operation of a computer, you might use the word "understand" as a metaphor for "process" / "operate in".
Taking this metaphor to the extreme, one toy architecture is called the Little Man Computer, LMC, named for the conceit / joke idea that there is a little man inside the vastly simplified CPU actually doing the binary operations.
The LMC model is based on the concept of a little man shut in a closed mail room (analogous to a computer in this scenario). At one end of the room, there are 100 mailboxes (memory), numbered 0 to 99, that can each contain a 3 digit instruction or data (ranging from 000 to 999).
So actually, LMC is based around a CPU that "understands" decimal, unlike a normal computer.
The LMC toy architecture is terrible to program for except for the very simplest of programs. It doesn't support left/right bit-shifts or bitwise binary operations, which makes sense because it's based on decimal not binary. (You can of course double a number = left shift by adding to itself, but right shift needs other tricks.)

Can Registers inside a CPU do Arithmetics

I read in many detailed articles that Data from the Registers are used as Operands for the ALU to add two 32-bit integers, and this is only one small part of what the ALU can actually do.
However I also read the Register can even do arithmetic too? The difference between the two is quite blurred to me, what is the clear cut difference between a Register and the actual ALU component?
I know ALU doesn't store values, rather it receives it, and is instructed to simply do the Logic part, but the Register can both store and do general purpose stuff?
If the latter is true, then when does one use the ALU, and when does when use the General Purpose Registers?
Registers can't do arithmetic. A "register" is just a term for a place where you can stick a value. You can do arithmetic on the values stored in registers and have the results saved back into the register. This arithmetic would be done by the "ALU," which is the generic term for the portion of the processor that does number-crunching.
If you're still confused by something specific that you read, please quote it here or post a citation and someone can try to clarify. Note that "register" and "ALU" are very generic terms and are implemented and used differently in every architecture.
Although it is true that registers, properly speaking, don't do arithmetic, it IS possible to construct a circuit which both stores a number and, when a particular input line is set, increments that number. One could interpret this as a register which does a (very limited amount of) arithmetic. As it happens, there is generally no use for such a circuit in a general-purpose CPU (except maybe as the Program Counter), but such circuits could be useful in very simple digital controllers.
[note: historically, the first such circuits were made from vacuum tubes and used in Geiger Counters to count how many radioactive decay events occurred over short periods of time]
Registers don't do arithmetic. Modern cores have several “execution units” or “functional units” rather than an “ALU“. A reasonable rule of thumb is to discard any text (online or hardcopy) that speaks in terms of ALUs in the context of mainstream CPUs. “ALU” is still meaningful in the context of embedded systems, where a µC might actually have an ALU, but otherwise it's essentially an anachronism. If you see it, usually all it tells you is that the material is seriously out-of-date.

Building a custom machine code from the ground up

I have recently begun working with logic level design as an amateur hobbyist but have now found myself running up against software, where I am much less competent. I have completed designing a custom 4 bit CPU in Logisim loosely based on the paper "A Very Simple Microprocessor" by Etienne Sicard. Now that it does the very limited functions that I've built into it (addition, logical AND, OR, and XOR) without any more detectable bugs (crossing fingers) I am running into the problem of writing programs for it. Logisim has the functionality of importing a script of Hex numbers into a RAM or ROM module so I can write programs for it using my own microinstruction code, but where do I start? I'm quite literally at the most basic possible level of software design and don't really know where to go from here. Any good suggestions on resources for learning about this low level of programming or suggestions on what I should try from here? Thanks much in advance, I know this probably isn't the most directly applicable question ever asked on this forum.
I'm not aware of the paper you mention. But if you have designed your own custom CPU, then if you want to write software for it, you have two choices: a) write it in machine code, or b) write your own assembler.
Obviously I'd go with b. This will require that you shift gear a bit and do some high-level programming. What you are aiming to write is an assembler program that runs on a PC, and converts some simple assembly language into your custom machine code. The assembler itself will be a high-level program and as such, I would recommend writing it in a high-level programming language that is good at both string manipulation and binary manipulation. I would recommend Python.
You basically want your assembler to be able to read in a text file like this:
mov a, 7
foo:
mov b, 20
add a, b
cmp a, b
jg foo
(I just made this program up; it's nonsense.)
And convert each line of code into the binary pattern for that instruction, outputting a binary file (or perhaps a hex file, since you said your microcontroller can read in hex values). From there, you will be able to load the program up onto the CPU.
So, I suggest you:
Come up with (on paper) an assembly language that is a simple written representation for each of the opcodes your machine supports (you may have already done this),
Learn simple Python,
Write a Python script that reads one line at a time (sys.stdin.readline()), figures out which opcode it is and what values it takes, and outputs the corresponding machine code to stdout.
Write some assembly code in your assembly language that will run on your CPU.
Sounds like a fun project.
I have done something similar that you might find interesting. I also have created from scratch my own CPU design. It is an 8-bit multi-cycle RISC CPU based on Harvard architecture with variable length instructions.
I started in Logisim, then coded everything in Verilog, and I have synthesized it in an FPGA.
To answer your question, I have done a simple and rudimentary assembler that translates a program (instructions, ie. mnemonics + data) to the corresponding machine language that can then be loaded into the PROG memory. I've written it in shell script and I use awk, which is what I was confortable with.
I basically do two passes: first translate mnemonics to their corresponding opcode and translate data (operands) into hex, here I keep track of all the labels addresses. second pass will replace all labels with their corresponding address.
(labels and addresses are for jumps)
You can see all the project, including the assembler, documented here: https://github.com/adumont/hrm-cpu
Because your instruction set is so small, and based on the thread from the mguica answer, I would say the next step is to continue and/or fully test your instruction set. do you have flags? do you have branch instructions. For now just hand generate the machine code. Flags are tricky, in particular the overflow (V) bit. You have to examine carry in and carry out on the msbit adder to get it right. Because the instruction set is small enough you can try the various combinations of back to back instructions and followed by or and followed by xor and followed by add, or followed by and or followed by xor, etc. And mix in the branches. back to flags, if the xor and or for example do not touch carry and overflow then make sure you see carry and overflow being a zero and not touched by logical instructions and carry and overflow being one and not touched, and also independently show carry and overflow are separate, one on one off, not touched by logical, etc. make sure all the conditional branches only operate on that one condition, lead into the various conditional branches with flag bits that are ignored in both states insuring that the conditional branch ignores them. Also verify that if the conditional branch is not supposed to modify them that it doesnt. likewise if the condition doesnt cause a branch that the conditional flags are not touched...
I like to use randomization but it may be more work than you are after. I like to independently develop a software simulator of the instruction set, which I find easier to use that the logic also sometimes easier to use in batch testing. you can then randomize some short list of instructions, varying the instruction and the registers, naturally test the tester by hand computing some of the results, both state of registers after test complete and state of flag bits. Then make that randomized list longer, at some point you can take a long instruction list and run it on the logic simulator and see if the logic comes up with the same register results and flag bits as the instruction set simulator, if they vary figure out why. If the do not try another random sequence, and another. Filling registers with prime numbers before starting the test is a very good idea.
back to individual instruction testing and flags go through all the corner cases 0xFFFF + 0x0000 0xFFFF+1, things like that places just to the either side of and right on operands and results that are one count away from where a flag changes at the point where the flag changes and just the other side of that. for the logicals for example if they use the zero flag, then have various data patterns that test results that are on either side of and at zero 0x0000, 0xFFFF 0xFFFE 0x0001 0x0002, etc. Probably a walking ones result as well 0x0001 result 0x0002, ox0004, etc.
hopefully I understood your question and have not pointed out the obvious or what you have already done thus far.

Minimal instruction set to solve any problem with a computer program

Years ago, I have heard that someone was about to demonstrate that every computer program could be solved with just three instructions:
Assignment
Conditional
Loop
Please I would like to hear your opinion. I mean representing any algorithm as a computer program. Do you agree with this?
No need. The minimal theoretical computer needs just one instruction. They are called One Instruction Set Computers (OISC for short, kinda like the ultimate RISC).
There are two types. The first is a theoretically "pure" one instruction machine in which the instruction really works like a regular instruction in normal CPUs. The instruction is usually:
subtract and branch if less than zero
or variations thereof. The wikipedia article have examples of how this single instruction can be used to write code that emulates other instructions.
The second type is not theoretically pure. It is the transfer triggered architecture (wikipedia again, sorry). This family of architectures are also known as move machines and I have designed and built some myself.
Some consider move machines cheating since the machine actually have all the regular instructions only that they are memory mapped instead of being part of the opcode. But move machines are not merely theoretical, they are practical (like I said, I've built some myself). There is even a commercially available family of CPUs built by Maxim: the MAXQ. If you look at the MAXQ instruction set (they call it transfer set since there is really only one instruction, I usually call it register set) you will see that MAXQ assembly looks rather like a standard accumulator based architecture.
This is a consequence of Turing Completeness, which is something that was established many decades ago.
Alan Turing, the famous computer scientist, proved that any computable function could be computed using a Turing Machine. A Turing machine is a very simple theoretical device which can do only a few things. It can read and write to a tape (i.e. memory), maintain an internal state which is altered by the contents read from memory, and use the internal state and the last read memory cell to determine which direction to move the tape before reading the next memory cell.
The operations of assignment, conditional, and loop are sufficient to simulate a Turing Machine. Reading and writing memory and maintaining state requires assignment. Changing the direction of the tape based on state and memory contents require conditionals and loops. "Loops" in fact are a bit more high-level than what is actually required. All that is really required is that program flow can jump backwards somehow. This implies that you can create loops if you want to, but the language does not need to have an explicit loop construct.
Since these three operations allow simulation of a Turing Machine, and a Turing Machine has been proven to be able to compute any computable function, it follows that any language which provides these operations is also able to compute any computable function.
Edit: And, as other answerers pointed out, these operations do not need to be discrete. You can craft a single instruction which does all three of these things (assign, compare, and branch) in such a way that it can simulate a Turing machine all by itself.
The minimal set is a single command, but you have to choose a fitting one, for example - One instruction set computer
When I studied, we used such a "computer" to calculate factorial, using just a single instruction:
SBN - Subtract and Branch if Negative:
SBN A, B, C
Meaning:
if((Memory[A] -= Memory[B]) < 0) goto C
// (Wikipedia has a slightly different definition)
Notable one instruction set computer (OSIC) implementations
This answer will focus on interesting implementations of single instruction set CPUs, compilers and assemblers.
movfuscator
https://github.com/xoreaxeaxeax/movfuscator
Compiles C code using only mov x86 instructions, showing in a very concrete way that a single instruction suffices.
The Turing completeness seems to have been proven in a paper: https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf
subleq
https://esolangs.org/wiki/Subleq:
https://github.com/hasithvm/subleq-verilog Verilog, Xilinx ISE.
https://github.com/purisc-group/purisc Verilog and VHDL, Altera. Maybe that project has a clang backend, but I can't use it: https://github.com/purisc-group/purisc/issues/5
http://mazonka.com/subleq/sqasm.cpp | http://mazonka.com/subleq/sqrun.cpp C++-based assembler and emulator.
See also
What is the minimum instruction set required for any Assembly language to be considered useful?
https://softwareengineering.stackexchange.com/questions/230538/what-is-the-absolute-minimum-set-of-instructions-required-to-build-a-turing-comp/325501
In 1964, Bohm and Jacopini published a paper in which they demonstrated that all programs could be written in terms of only three control structures:
the sequence structure,
the selection structure
and the repetition structure.
Programmers using Haskell might argue that you only need the Contional and Loop because assignments, and mutable state, don't exist in Haskell.

Resources