VM Design: More opcodes or less opcodes? What is better? - performance

Don't be shocked. This is a lot of text but I'm afraid without giving some detailed information I cannot really show what this is all about (and might get a lot of answers that don't really address my question). And this definitely not an assignment (as someone ridiculously claimed in his comment).
Prerequisites
Since this question can probably not be answered at all unless at least some prerequisites are set, here are the prerequisites:
The Virtual Machine code shall be interpreted. It is not forbidden that there may be a JIT compiler, but the design should target an interpreter.
The VM shall be register based, not stack based.
The answer may neither assume that there is a fixed set of registers nor that there is an unlimited number of them, either one may be the case.
Further we need a better definition of "better". There are a couple of properties that must be considered:
The storage space for the VM code on disk. Of course you could always scrap all optimizations here and just compress the code, but this has a negative effect on (2).
Decoding speed. The best way to store the code is useless if it takes too long to transform that into something that can be directly executed.
The storage space in memory. This code must be directly executable either with or without further decoding, but if there is further decoding involved, this encoding is done during execution and each time the instruction is executed (decoding done only once when loading the code counts to item 2).
The execution speed of the code (taking common interpreter techniques into account).
The VM complexity and how hard it is to write an interpreter for it.
The amount of resources the VM needs for itself. (It is not a good design if the code the VM runs is 2 KB in size and executes faster than the wink of an eye, however it needs 150 MB to do this and its start up time is far above the run time of the code it executes)
Now examples what I actually mean by more or less opcodes. It may look like the number of opcodes is actually set, as you need one opcode per operation. However its not that easy.
Mulitple Opcodes for the Same Operation
You can have an operation like
ADD R1, R2, R3
adding the values of R1 and R2, writing the result to R3. Now consider the following special cases:
ADD R1, R2, R2
ADD R1, 1, R1
These are common operations you'll find in a lot of applications. You can express them with the already existing opcode (unless you need a different one because the last one has an int value instead of a register). However, you could also create special opcodes for these:
ADD2 R1, R2
INC R1
Same as before. Where's the advantage? ADD2 only needs two arguments, instead of 3, INC even only needs a single one. So this could be encoded more compact on disk and/or in memory. Since it is also easy to transform either form to the other one, the decoding step could transform between both ways to express these statements. I'm not sure how much either form will influence execution speed, though.
Combining Two Opcodes Into a Single One
Now let's assume you have an ADD_RRR (R for register) and a LOAD to load data into an register.
LOAD value, R2
ADD_RRR R1, R2, R3
You can have these two opcodes and always use constructs like this throughout your code... or you can combine them into a single new opcode, named ADD_RMR (M for memory)
ADD_RMR R1, value, R3
Data Types vs Opcodes
Assume you have 16 Bit integer and 32 Bit integer as native types. Registers are 32 Bit so either data type fits. Now when you add two registers, you could make the data type a parameter:
ADD int16, R1, R2, R3
ADD int32, R1, R2, R3
Same is true for a signed and unsigned integers for example. That way ADD can be a short opcode, one byte, and then you have another byte (or maybe just 4 Bit) telling the VM how to interpret the registers (do they hold 16 Bit or 32 Bit values). Or you can scrap type encoding and instead have two opcodes:
ADD16 R1, R2, R3
ADD32 R1, R2, R3
Some may say both are exactly the same - just interpreting the first way as 16 Bit opcodes would work. Yes, but a very naive interpreter might look quite different. E.g. if it has one function per opcode and dispatches using a switch statement (not the best way doing it, function calling overhead, switch statement maybe not optimal either, I know), the two opcodes could look like this:
case ADD16: add16(p1, p2, p3); break; // pX pointer to register
case ADD32: add32(p1, p2, p3); break;
and each function is centered around a certain kind of add. The second one though may look like this:
case ADD: add(type, p1, p2, p3); break;
// ...
// and the function
void add (enum Type type, Register p1, Register p2, Register p3)
{
switch (type) {
case INT16: //...
case INT32: // ...
}
}
Adding a sub-switch to a main switch or a sub dispatch table to a main dispatch table. Of course an interpreter can do either way regardless if types are explicit or not, but either way will feel more native to developers depending on opcode design.
Meta Opcodes
For lack of a better name I'll call them that way. These opcodes have no meaning at all on their own, they just change the meaning of the opcode following. Like the famous WIDE operator:
ADD R1, R2, R3
WIDE
ADD R1, R2, R3
E.g. in the second case the registers are 16 Bit (so you can addnress more of them), in the first one only 8. Alternatively you can not have such a meta opcode and have an ADD and an ADD_WIDE opcode. Meta opcodes like WIDE avoid having a SUB_WIDE, MUL_WIDE, etc. as you can always prepend every other normal opcode with WIDE (always just one opcode). Disadvantage is that an opcode alone becomes meaningless, you always must check the opcode before it if it was a meta opcode or not. Further the VM must store an extra state per thread (e.g. whether we are now in wide mode or not) and remove the state again after the next instruction. Even CPUs have such opcodes (e.g. x86 LOCK opcode).
How to Find a Good Trade-Off???
Of course the more opcodes you have, the bigger switches/dispatch-tables will become and the more bits you will need to express these codes on disk or in memory (though you can maybe store them more efficiently on disk where the data doesn't have to be directly executable by a VM); also the VM will become more complicated and have more lines of code - on the other hand the more powerful the opcodes are: You are getting closer to the point where every expression, even a complex one, will end up in one opcode.
Choosing little opcodes makes it easy to code the VM and will lead to very compact opcodes I guess - on the other hand it means you may need a very high number of opcodes to perform a simple task and every not extremely often used expression will have to become a (native) function call of some kind, as no opcode can be used for it.
I read a lot about all kind of VMs on the Internet, but no source was really making a good and fair trade-off going either way. Designing a VM is like designing a CPU, there are CPUs with little opcodes, they are fast, but you also need many of these. And there are CPUs with many opcodes, some are very slow, but you'll need much less of them to express the same piece of code. It looks like the "more opcodes are better" CPUs have totally won the consumer market and the "less opcodes are better" ones can only survive in some parts of the server market or super computer business. What about VMs?

To be honest, I think it's largely a matter of the purpose of the VM, similar to how the processor design is largely determined by how the processor is primarily meant to be used.
In other words, you'll preferably be able to determine common use case scenarios for your VM, so that you can establish features that are likely going to be required, and also establish those that are unlikely to be very commonly required.
Of course I do understand, that you are probably envisioning an abstract, very generic, Virtual Machine, that can be used as the internal/backend implementation for other programming languages?
However, I feel, it's important to realize and to emphasize that there really is no such thing as a "generic ideal" implementation of anything, i.e. once you keep things generic and abstract you will inevitably face a situation where you need to make compromises.
Ideally, these compromises will be based on real life use scenarios for your code, so that these compromises are actually based on well-informed assumptions and simplifications that you can make without going out on a limb.
In other words, I would think about what are the goals for your VM?
How is it primarily going to be used in your vision?
What are the goals you want to achieve?
This will help you come up with requirements and help you make simplifcations, so that you can design your instruction set based on reasonable assumptions.
If you expect your VM to be primarily used by programming languages for numbers crunching, you'll probably want to look for a fairly powerful foundation with maths operations, by providing lots of low level primitives, with support for wide data types.
If on the other hand, you'll server as the backend for OO languages, you will want to look into optimizing the corresponding low level instructions (i.e. hashes/dictionaries).
In general, I would recommend to keep the instruction set as simple and intuitive as possible in the beginning, and only add special instructions once you have proven that having them in place is indeed useful (i.e. profile & opcode dumps) and does cause a performance gain. So, this will be largely determine by the very first "customers" your VM will have.
If you are really eager to research more involved approaches, you could even look into dynamically optimizing the instruction set at runtime, using pattern matching to find common occurrences of opcodes in your bytecode, in order to derive more abstract implementations, so that your can transform your bytecode dynamically with custom, runtime-generated, opcodes.

For software performance it's easier if all opcodes are the same length, so you can have one gigantic switch statement and not have to examine various option bits that might have been set by preceding modifier opcodes.
Two matters that I think you didn't ask about are ease of writing compilers that translate programming languages to your VM code and ease of writing interpreters that execute your VM code. Both of these are easier with fewer opcodes. (But not too few. For example if you omit a divide opcode then you get an opportunity to learn how to code good division functions. Good ones are far harder than simple ones.)

I prefer minimalistic instruction-sets because there can be combined into one opcode. For example an opcode consisting of two 4 bit instruction fields can be dispatched with an 256 entry jump-table. As dispatch overhead is the main bottleneck in interpretation perfomance increased by an factor ~ two because only every second instruction needs to be dispatched. One way to implement an minimalistic but effective instruction set would be an accumulator/store design.

Less opcodes, atomic, in nature.
But, if a combination, of some opcodes, is used frequently, added as a single instruction.
For example, a lot, of Higher PL have the simpler "if" and "goto" instructions, yet, they also have the composed "while", "for", "do-while" or " repeat-until" instructions, based on the previous instructions.

Related

Does it cost significant resources for a modern CPU to keep flags updated?

As I understand it, on a modern out of order CPU, one of the most expensive things is state, because that state has to be tracked in multiple versions, kept up-to-date across many instructions etc.
Some instruction sets like x86 and ARM make extensive use of flags, which were introduced when the cost model was not what what it is today, and the flags only cost a few logic gates. Things like every arithmetic instruction setting flags to detect zero, carry and overflow.
Are these particularly expensive to keep updated on a modern out of order implementation? Such that e.g. an ADD instruction updates the carry flag, and this must be tracked because although it will probably never be used, it is possible that some other instruction could use it N instructions later, with no fixed upper bound on N?
Are integer operations like addition and subtraction cheaper on instruction set architectures like MIPS that do not have these flags?
Various aspects of this are not very publicly known, so I will try to separate definitely known things from reasonable guesses and conjecture.
An approach has been to extend the (physical) integer registers (whether they take the form of a physical register file [eg P4 and SandyBridge+] or of results-in-ROB [eg P3]) with the flags that were produced by the operation that also produced the associated integer result. That's only about the arithmetic flags (sometimes AFLAGS, not to be confused with EFLAGS), but I don't think the "weird flags" are the focus of this question. Interestingly there is a patent[1] that hints at storing more than just the 6 AFLAGS themselves, putting some "combination flags" in there as well, but who know whether that was really done - most sources say the registers are extended by 6 bits, but AFAIK we (the public) don't really know. Lumping the integer result and associated flags together is described in for example this patent[2], which is primarily about preventing a certain situation where the flags might accidentally no longer be backed by any physical register. Aside from such quirks, during normal operation it has the nice effect of only needing to allocate 1 register for an arithmetic operation, rather than a separate main-result and flags-result, so renaming is normally not made much worse by the existence of the flags. Additionally, either the register alias table needs at least one more slot to keep track of which integer register contains the latest flags, or a separate flag-renaming-state buffer keeps track of the latest speculative flag state ([2] suggests Intel chose to separate them, which may simplify the main RAT but they don't go into such details). More slots may be used[3] to efficiently implement instructions which only update a subset of the flags (NetBurst™ famously lacked this, resulting in the now-stale advice to favour add over inc). Similarly, the non-speculative architectural state (whether it would be part of the retirement register file or be separate-but-similar again is not clear) needs at least one such slot.
A separate issue is computing the flags in the first place. [1] suggests separating flag generation from the main ALU simplifies the design. It's not clear to what degree they would be separated: the main ALU has to compute the Adjust and Sign flags anyway, and having an adder output a carry out the top is not much to ask (less than recomputing it from nothing). The overflow flag only takes an extra XOR gate to combine the carry into the top bit with the carry out of the top bit. The Zero flag and Parity flag are not for free though (and they depend on the result, not on the calculation of the result), if there is partial separation it would make sense that those would be computed separately. Perhaps it really is all separate. In NetBurst™, flag calculation took an extra half-cycle (the ALU was double-pumped and staggered)[4], but whether that means all flags are computed separately or a subset of them (or even a superset as [1] hinted) is not clear - the flags result is treated as monolithic so latency tests cannot distinguish whether a flag is computed in the third half-cycle by the flags unit or just handed to the flags unit by the ALU. In any case, typical ALU operations could be executed back-to-back, even if dependent (meaning that the high half of the first operation and the low half of the second operation ran in parallel), the delayed computation of the flags did not stand in the way of that. As you might expect though, ADC and SBB were not so efficient on NetBurst, but there may be other reasons for that too (for some reason a lot of µops are involved).
Overall I would conclude that the existence of arithmetic flags costs significant engineering resources to prevent them from having a significant performance impact, but that effort is also effective, so a significant impact is avoided.

How are Opcodes & Operands Defined in a CPU?

Extensive searching has sent me in a loop over the course of 3 days, so I'm depending on you guys to help me catch a break.
Why exactly does one 8-bit sequence of high's and low's perform this action, and 8-bit sequence performs that action.
My intuition tells me that the CPU's circuitry hard-wired one binary sequence to do one thing, and another to do another thing. That would mean different Processor's with potentially different chip circuitry wouldn't define one particular binary sequence as the same action as another?
Is this why we have assembly? I need someone to confirm and/or correct my hypothesis!
Opcodes are not always 8 bits but yes, it is hardcoded/wired in the logic to isolate the opcode and then send you down a course of action based on that. Think about how you would do it in an instruction set simulator, why would logic be any different? Logic is simpler than software languages, there is no magic there. ONE, ZERO, AND, OR, NOT thats as complicated as it gets.
Along the same lines if I was given an instruction set document and you were given an instruction set document and told to create a processor or write an instruction set simulator. Would we produce the exact same code? Even if the variable names were different? No. Ideally we would have programs that are functionally the same, they both parse the instruction and execute it. Logic is no different you give the spec to two engineers you might get two different processors that functionally are the same, one might perform better, etc. Look at the long running processor families, x86 in particular, they re-invent that every couple-three years being instruction set compatible for the legacy instructions while sometimes adding new instructions. Same for ARM and others.
And there are different instruction sets ARM is different from x86 is different from MIPS, the opcodes and/or bits you examine in the instruction vary, for none of these can you simply look at 8 bits, each you have some bits then if that is not enough to uniquely identify the instruction/operation then you need to examine some more bits, where those bits are what the rules are are very specific to each architecture. Otherwise what would be the point of having different names for them if they were the same.
And this information was out there you just didnt look in the right places, there are countless open online courses on the topic, books that google should hit some pages on, as well as open source processor cores you can look at and countless instruction set simulators with source code.

Can Registers inside a CPU do Arithmetics

I read in many detailed articles that Data from the Registers are used as Operands for the ALU to add two 32-bit integers, and this is only one small part of what the ALU can actually do.
However I also read the Register can even do arithmetic too? The difference between the two is quite blurred to me, what is the clear cut difference between a Register and the actual ALU component?
I know ALU doesn't store values, rather it receives it, and is instructed to simply do the Logic part, but the Register can both store and do general purpose stuff?
If the latter is true, then when does one use the ALU, and when does when use the General Purpose Registers?
Registers can't do arithmetic. A "register" is just a term for a place where you can stick a value. You can do arithmetic on the values stored in registers and have the results saved back into the register. This arithmetic would be done by the "ALU," which is the generic term for the portion of the processor that does number-crunching.
If you're still confused by something specific that you read, please quote it here or post a citation and someone can try to clarify. Note that "register" and "ALU" are very generic terms and are implemented and used differently in every architecture.
Although it is true that registers, properly speaking, don't do arithmetic, it IS possible to construct a circuit which both stores a number and, when a particular input line is set, increments that number. One could interpret this as a register which does a (very limited amount of) arithmetic. As it happens, there is generally no use for such a circuit in a general-purpose CPU (except maybe as the Program Counter), but such circuits could be useful in very simple digital controllers.
[note: historically, the first such circuits were made from vacuum tubes and used in Geiger Counters to count how many radioactive decay events occurred over short periods of time]
Registers don't do arithmetic. Modern cores have several “execution units” or “functional units” rather than an “ALU“. A reasonable rule of thumb is to discard any text (online or hardcopy) that speaks in terms of ALUs in the context of mainstream CPUs. “ALU” is still meaningful in the context of embedded systems, where a µC might actually have an ALU, but otherwise it's essentially an anachronism. If you see it, usually all it tells you is that the material is seriously out-of-date.

Building a custom machine code from the ground up

I have recently begun working with logic level design as an amateur hobbyist but have now found myself running up against software, where I am much less competent. I have completed designing a custom 4 bit CPU in Logisim loosely based on the paper "A Very Simple Microprocessor" by Etienne Sicard. Now that it does the very limited functions that I've built into it (addition, logical AND, OR, and XOR) without any more detectable bugs (crossing fingers) I am running into the problem of writing programs for it. Logisim has the functionality of importing a script of Hex numbers into a RAM or ROM module so I can write programs for it using my own microinstruction code, but where do I start? I'm quite literally at the most basic possible level of software design and don't really know where to go from here. Any good suggestions on resources for learning about this low level of programming or suggestions on what I should try from here? Thanks much in advance, I know this probably isn't the most directly applicable question ever asked on this forum.
I'm not aware of the paper you mention. But if you have designed your own custom CPU, then if you want to write software for it, you have two choices: a) write it in machine code, or b) write your own assembler.
Obviously I'd go with b. This will require that you shift gear a bit and do some high-level programming. What you are aiming to write is an assembler program that runs on a PC, and converts some simple assembly language into your custom machine code. The assembler itself will be a high-level program and as such, I would recommend writing it in a high-level programming language that is good at both string manipulation and binary manipulation. I would recommend Python.
You basically want your assembler to be able to read in a text file like this:
mov a, 7
foo:
mov b, 20
add a, b
cmp a, b
jg foo
(I just made this program up; it's nonsense.)
And convert each line of code into the binary pattern for that instruction, outputting a binary file (or perhaps a hex file, since you said your microcontroller can read in hex values). From there, you will be able to load the program up onto the CPU.
So, I suggest you:
Come up with (on paper) an assembly language that is a simple written representation for each of the opcodes your machine supports (you may have already done this),
Learn simple Python,
Write a Python script that reads one line at a time (sys.stdin.readline()), figures out which opcode it is and what values it takes, and outputs the corresponding machine code to stdout.
Write some assembly code in your assembly language that will run on your CPU.
Sounds like a fun project.
I have done something similar that you might find interesting. I also have created from scratch my own CPU design. It is an 8-bit multi-cycle RISC CPU based on Harvard architecture with variable length instructions.
I started in Logisim, then coded everything in Verilog, and I have synthesized it in an FPGA.
To answer your question, I have done a simple and rudimentary assembler that translates a program (instructions, ie. mnemonics + data) to the corresponding machine language that can then be loaded into the PROG memory. I've written it in shell script and I use awk, which is what I was confortable with.
I basically do two passes: first translate mnemonics to their corresponding opcode and translate data (operands) into hex, here I keep track of all the labels addresses. second pass will replace all labels with their corresponding address.
(labels and addresses are for jumps)
You can see all the project, including the assembler, documented here: https://github.com/adumont/hrm-cpu
Because your instruction set is so small, and based on the thread from the mguica answer, I would say the next step is to continue and/or fully test your instruction set. do you have flags? do you have branch instructions. For now just hand generate the machine code. Flags are tricky, in particular the overflow (V) bit. You have to examine carry in and carry out on the msbit adder to get it right. Because the instruction set is small enough you can try the various combinations of back to back instructions and followed by or and followed by xor and followed by add, or followed by and or followed by xor, etc. And mix in the branches. back to flags, if the xor and or for example do not touch carry and overflow then make sure you see carry and overflow being a zero and not touched by logical instructions and carry and overflow being one and not touched, and also independently show carry and overflow are separate, one on one off, not touched by logical, etc. make sure all the conditional branches only operate on that one condition, lead into the various conditional branches with flag bits that are ignored in both states insuring that the conditional branch ignores them. Also verify that if the conditional branch is not supposed to modify them that it doesnt. likewise if the condition doesnt cause a branch that the conditional flags are not touched...
I like to use randomization but it may be more work than you are after. I like to independently develop a software simulator of the instruction set, which I find easier to use that the logic also sometimes easier to use in batch testing. you can then randomize some short list of instructions, varying the instruction and the registers, naturally test the tester by hand computing some of the results, both state of registers after test complete and state of flag bits. Then make that randomized list longer, at some point you can take a long instruction list and run it on the logic simulator and see if the logic comes up with the same register results and flag bits as the instruction set simulator, if they vary figure out why. If the do not try another random sequence, and another. Filling registers with prime numbers before starting the test is a very good idea.
back to individual instruction testing and flags go through all the corner cases 0xFFFF + 0x0000 0xFFFF+1, things like that places just to the either side of and right on operands and results that are one count away from where a flag changes at the point where the flag changes and just the other side of that. for the logicals for example if they use the zero flag, then have various data patterns that test results that are on either side of and at zero 0x0000, 0xFFFF 0xFFFE 0x0001 0x0002, etc. Probably a walking ones result as well 0x0001 result 0x0002, ox0004, etc.
hopefully I understood your question and have not pointed out the obvious or what you have already done thus far.

Minimal instruction set to solve any problem with a computer program

Years ago, I have heard that someone was about to demonstrate that every computer program could be solved with just three instructions:
Assignment
Conditional
Loop
Please I would like to hear your opinion. I mean representing any algorithm as a computer program. Do you agree with this?
No need. The minimal theoretical computer needs just one instruction. They are called One Instruction Set Computers (OISC for short, kinda like the ultimate RISC).
There are two types. The first is a theoretically "pure" one instruction machine in which the instruction really works like a regular instruction in normal CPUs. The instruction is usually:
subtract and branch if less than zero
or variations thereof. The wikipedia article have examples of how this single instruction can be used to write code that emulates other instructions.
The second type is not theoretically pure. It is the transfer triggered architecture (wikipedia again, sorry). This family of architectures are also known as move machines and I have designed and built some myself.
Some consider move machines cheating since the machine actually have all the regular instructions only that they are memory mapped instead of being part of the opcode. But move machines are not merely theoretical, they are practical (like I said, I've built some myself). There is even a commercially available family of CPUs built by Maxim: the MAXQ. If you look at the MAXQ instruction set (they call it transfer set since there is really only one instruction, I usually call it register set) you will see that MAXQ assembly looks rather like a standard accumulator based architecture.
This is a consequence of Turing Completeness, which is something that was established many decades ago.
Alan Turing, the famous computer scientist, proved that any computable function could be computed using a Turing Machine. A Turing machine is a very simple theoretical device which can do only a few things. It can read and write to a tape (i.e. memory), maintain an internal state which is altered by the contents read from memory, and use the internal state and the last read memory cell to determine which direction to move the tape before reading the next memory cell.
The operations of assignment, conditional, and loop are sufficient to simulate a Turing Machine. Reading and writing memory and maintaining state requires assignment. Changing the direction of the tape based on state and memory contents require conditionals and loops. "Loops" in fact are a bit more high-level than what is actually required. All that is really required is that program flow can jump backwards somehow. This implies that you can create loops if you want to, but the language does not need to have an explicit loop construct.
Since these three operations allow simulation of a Turing Machine, and a Turing Machine has been proven to be able to compute any computable function, it follows that any language which provides these operations is also able to compute any computable function.
Edit: And, as other answerers pointed out, these operations do not need to be discrete. You can craft a single instruction which does all three of these things (assign, compare, and branch) in such a way that it can simulate a Turing machine all by itself.
The minimal set is a single command, but you have to choose a fitting one, for example - One instruction set computer
When I studied, we used such a "computer" to calculate factorial, using just a single instruction:
SBN - Subtract and Branch if Negative:
SBN A, B, C
Meaning:
if((Memory[A] -= Memory[B]) < 0) goto C
// (Wikipedia has a slightly different definition)
Notable one instruction set computer (OSIC) implementations
This answer will focus on interesting implementations of single instruction set CPUs, compilers and assemblers.
movfuscator
https://github.com/xoreaxeaxeax/movfuscator
Compiles C code using only mov x86 instructions, showing in a very concrete way that a single instruction suffices.
The Turing completeness seems to have been proven in a paper: https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf
subleq
https://esolangs.org/wiki/Subleq:
https://github.com/hasithvm/subleq-verilog Verilog, Xilinx ISE.
https://github.com/purisc-group/purisc Verilog and VHDL, Altera. Maybe that project has a clang backend, but I can't use it: https://github.com/purisc-group/purisc/issues/5
http://mazonka.com/subleq/sqasm.cpp | http://mazonka.com/subleq/sqrun.cpp C++-based assembler and emulator.
See also
What is the minimum instruction set required for any Assembly language to be considered useful?
https://softwareengineering.stackexchange.com/questions/230538/what-is-the-absolute-minimum-set-of-instructions-required-to-build-a-turing-comp/325501
In 1964, Bohm and Jacopini published a paper in which they demonstrated that all programs could be written in terms of only three control structures:
the sequence structure,
the selection structure
and the repetition structure.
Programmers using Haskell might argue that you only need the Contional and Loop because assignments, and mutable state, don't exist in Haskell.

Resources