What instructions can modern CPUs execute? - instructions

I tried searching for instruction sets of modern CPUs and did not find an answer to the question. I am interested in how modern computers compare to abstractions such as Turing machines (and showing them equivalent), so this is naturally the first question to ask. By modern CPU I mean for example a stock AMD/Intel CPU.

Here is an example of a modern computer instruction set:
https://en.wikipedia.org/wiki/X86_instruction_listings
(Or at least, that is a summary. For a complete description, look at the Intel or AMD manuals (links in the x86 tag wiki), or HTML extracts like https://www.felixcloutier.com/x86/.)
And, yes, if you compile a C++ program for an x86 CPU, you will get native machine instructions. Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” might be a good introduction.
A Turing Machine is not and never has been a practical computer, and it doesn't have an instruction set. So comparing instruction sets doesn't make sense. (It is possible implement a physical Turing Machine, but given the way that they work, it would have no practical use as a computation device.)
You could prove that a modern computer is "Turing complete" is by creating a Turing machine emulator. You would probably write it in a high level language and compile it and run it on the hardware of your choice. Doing this is the proof.
But in practice nobody would bother because it is rather tedious ... and it has been done before. (If you want to find an example, Google for "turing machine simulator".)

Related

ARM Cortex-M compiler differences

I'm about to develop some firmwares for Cortex-M cores on STM32 processors using C for my projects, and searching on the web I've found a lot of different compilers:
Keil, IAR, Linaro, Yagarto and GNU Tools for ARM Embedded Processors.
I was wondering, what functional differences are there between these compilers that might influence my choice? For example as an enthusiast I don't need support or assistance from the vendor, and a limitation on the code size is OK for the moment. Also the ease of use is not a main concern since I like to learn (and for the moment I have both Keil Lite and Eclipse with GNU ARM configured and working).
Is the generated code so different in terms of size/speed between these compilers? Are there any comparison table? (I've found only stale infos on the web)
benchmarking is an artform in and of itself, usually easy to manipulate the results to show whatever you want. I would not expect the compilers to generate the same results except for very small test cases, and sometimes in those small test cases their results are either identical or sometimes vastly different as your test has exposed an optimization that one compiler knows/uses and one the other doesnt.
I used to keep track of such things (compiler performance numbers) with dhrystone for example, but in the case of known benchmarks (not that dhrystone means much anymore, but others) you may find that some compilers are tuning themselves to look good under benchmarks perhaps at the expense of something else.
There is no right answer, there is no universal "best", it is all in the eye of the beholder, you. Which tool is easier for you to use, which do you like better be it for the gui or pretty colors or sound card sounds or whatever. And go from there.
The gnu compiler generally for applications I have tested does not produce code as "fast" which is my benchmark, compared to the others, but there are way more people using the free gnu tools so the support for it is considerably wider due to the number of web pages and forums and examples. gnu wont have a size restriction either, but it may require more learning or whatever to get up and running...
The cortex-ms are split into the armv6m and armv7m families, the v6m (cortex-m0) only have a small number of thumb2 extensions, the armv7m have about 150 thumbv2 extensions to thumb, so you need to know what your tools support and not use the wrong stuff on the wrong chip. Then the compilers if they know all of this may and will produce different instruction mixes from the same source code. Further within the same compiler or family using different command line options you can/will get vastly different code. And then beyond that with a cortex-m4 with cache on if you have one with such a thing, depending on how the code lies in the cache lines you may get vastly different performance, so benchmarking is a research project in itself for each blob of C code you want to benchmark. The performance range within a single compiler may shadow another compiler or the overlap may be enough to not matter.
If you have access to the tools you add value to yourself professionally by learning to use the competing tools and being able to walk into a job and or within your job choose what you see as the right tool for the job or walk into a Kiel house and be able to work right away or a gnu house and work right away. Where you might lose a job if you are gnu only and the job is for a Kiel house.
We have done some comparisons; IAR and Keil typically outperform GCC with default settings. But with some compiler flags you can make GCC come pretty close to the result of IAR and Keil.
Some of the compilers you mention are integrated development environments. Others are just plain compilers.
Some people prefer a integrated environment with compiler, editor and debugger nicely packaged for you. Others prefer to set up their own environment. It is a matter of taste.
In addition to Yagarto, there is also the "Code Sourcery" distribution of GCC for ARM.
Performance should not be your first concern unless when it becomes so in a production environment. The reason is that first, most ARM compilers are plenty good enough, and really you are down to GCC based, Keil, and IAR. Second, most ARM MCU are "blazingly fast" and have "so much memory" (these are comparing to 8-bit MCU like AVR/PIC but also to older PC). A decent Cortex-M4 MCU runs up to 100MHz and has 256K of flash. Again, to put it in perspective, that's more memory and 10x faster clock rate than the original Macintosh etc. We went to the Moon with much less ;-)
Now the performance of the tools itself, in particular, the IDE and the debuggers, differ greatly. For example, the popular Eclipse is written in Java, might be a bit sluggish to slower or memory-starved PCs. The best thing to do is to install GCC+Eclipse, and the vendors' demos and see for yourself.

Byte code stack versus three address

When designing a byte code interpreter, is there a consensus these days on whether stack or three address format (or something else?) is better? I'm looking at these considerations:
The objective language is a dynamic language fairly similar to Javascript.
Performance is important, but development speed and portability are more so for the moment.
Therefore the implementation will be strictly an interpreter for the time being; a JIT compiler may come later, resources permitting.
The interpreter will be written in C.
Read The evolution of Lua and The implementation of Lua 5.0 for how Lua changed from a stack-based virtual machine to a register-based virtual machine and why it gained performance doing it.
Experiments done by David Gregg and Roberto Ierusalimschy have shown that a register-based bytecode works better than a stack-based bytecode because fewer bytecode instructions (and therefore less decoding overhead) are required to do the same tasks. So three-address format is a clear winner.
I don't have much (not really any) experience in this area, so you might want to verify some of the following for yourself (or maybe someone else can correct me where necessary?).
The two languages I work with most nowadays are C# and Java, so I am naturally inclined to their methodologies. As most people know, both are compiled to byte code, and both platforms (the CLR and the JVM) utilize JIT (at least in the mainstream implementations). Also, I would guess that the jitters for each platform are written in C/C++, but I really don't know for sure.
All-in-all, these languages and their respective platforms are pretty similar to your situation (aside from the dynamic part, but I'm not sure if this matters). Also, since they are such mainstream languages, I'm sure their implementations can serve as a pretty good guide for your design.
With that out of the way, I know for sure that both the CLR and the JVM are stack-based architectures. Some of the advantages which I remember for stack-based vs register-based are
Smaller generated code
Simpler interpreters
Simpler compilers
etc.
Also, I find stack-based to be a little more intuitive and readable, but that's a subjective thing, and like I said before, I haven't seen too much byte code yet.
Some advantages of the register-based architecture are
Less instructions must be executed
Faster interpreters (follows from #1)
Can more readily be translated to machine code, since most commonplace hardwares are register based
etc.
Of course, there are always ways to offset the disadvantages for each, but I think these describe the obvious things to consider.
Take a look at the OCaml bytecode interpreter - it's one of the fastest of its kind. It is pretty much a stack machine, translated into a threaded code on loading (using the GNU computed goto extension). You can generate a Forth-like threaded code as well, should be relatively easy to do.
But if you're keeping a future JIT compilation in mind, make sure that your stack machine is not really a full-featured stack machine, but an expression tree serialisation form instead (like .NET CLI) - this way you'd be able to translate your "stack" bytecode into a 3-address form and then into an SSA.
If you have JIT in your mind then bytecodes is the only option.
Just in case you can take a look on my TIScript: http://www.codeproject.com/KB/recipes/TIScript.aspx
and sources: http://code.google.com/p/tiscript/

Unified assembly language

I wonder if there exists some kind of universal and easy-to-code opcode (or assembly) language which provides basic set of instructions available in most of today's CPUs (not some fancy CISC, register-only computer, just common one). With possibility to "compile", micro-optimize and "interpret" on any mentioned CPUs?
I'm thinking about something like MARS MIPS simulator (rather simple and easy to read code), with possibility to make real programs. No libraries necessary (but nice thing if that possible), just to make things (libraries or UNIX-like tools) faster in uniform way.
Sorry if that's silly question, I'm new to assembler. I just don't find NASM or UNIX assembly language neither extremely cross-platform nor easy to read and code.
The JVM bytecode is sort of like assembly language, but without pointer arithmetic. However, it's quite object-oriented. On the positive side, it's totally cross-platform.
You might want to look at LLVM bytecode - but bear in mind this warning: http://llvm.org/docs/FAQ.html#can-i-compile-c-or-c-code-to-platform-independent-llvm-bitcode
First thing: writing in Assembly does not guarantee a speed increase. Using the correct algorithm for the job at hand has the greatest impact on speed. By the time you need to go down to Assembly to squeeze the last few drops out you can only really do that by adapting the algorithm to the specific architecture of the hardware in question. A generic HLA (High Level Assembler) pretty much defeats the purpose of writing your code in Assembly. Note that I am not knocking Randall Hyde’s HLA, which is a great product, I’m just saying that you don’t gain anything from writing Assembly the way a compiler generates machine code. Most C and C++ compilers have very good optimizers, and can produce machine code superior to almost any naïve implementation in ASM.
See if you can find these books (2nd hand, they are out of print) by Michael Abrash: "Zen of Assembly Language", and "Zen of Code Optimization". Or look if you can find his articles on DDJ. They will give you an insight into optimization second to none,
Related stuff, so I hope might be useful :
There is
flat assembler
with an approach of a kind of portable assembler.
Interesting project of operating system with graphical user interface written in assembler, and great assembly API :
Menuet OS
LLVM IR provides quite portable assembly, backed with powerful compiler, backing many projects including Clang

every language eventually compiled into low-level computer language?

Isn't every language compiled into low-level computer language?
If so, shouldn't all languages have the same performance?
Just wondering...
As pointed out by others, not every language is translated into machine language; some are translated into some form (bytecode, reverse Polish, AST) that is interpreted.
But even among languages that are translated to machine code,
Some translators are better than others
Some language features are easier to translate to high-performance code than others
An example of a translator that is better than some others is the GCC C compiler. It has had many years' work invested in producing good code, and its translations outperform those of the simpler compilers lcc and tcc, for example.
An example of a feature that is hard to translate to high-performance code is C's ability to do pointer arithmetic and to dereference pointers: when a program stores through a pointer, it is very difficult for the compiler to know what memory locations are affected. Similarly, when an unknown function is called, the compiler must make very pessimistic assumptions about what might happen to the contents of objects allocated on the heap. In a language like Java, the compiler can do a better job translating because the type system enforces greater separation between pointers of different types. In a language like ML or Haskell, the compiler can do better still, because in these languages, most data allocated in memory cannot be changed by a function call. But of course object-oriented languages and functional languages present their own translation challenges.
Finally, translation of a Turing-complete language is itself a hard problem: in general, finding the best translation of a program is an NP-hard problem, which means that the only solutions known potentially take time exponential in the size of the program. This would be unacceptable in a compiler (can't wait forever to compile a mere few thousand lines), and so compilers use heuristics. There is always room for improvement in these heuristics.
It is easier and more efficient to map some languages into machine language than others. There is no easy analogy that I can think of for this. The closest I can come to is translating Italian to Spanish vs. translating a Khoisan language into Hawaiian.
Another analogy is saying "Well, the laws of physics are what govern how every animal moves, so why do some animals move so much faster than others? Shouldn't they all just move at the same speed?".
No, some languages are simply interpreted. They never actually get turned into machine code. So those languages will generally run slower than low-level languages like C.
Even for the languages which are compiled into machine code, sometimes what comes out of the compiler is not the most efficient possible way to write that given program. So it's often possible to write programs in, say, assembly language that run faster than their C equivalents, and C programs that run faster than their JIT-compiled Java equivalents, etc. (Modern compilers are pretty good, though, so that's not so much of an issue these days)
Yes, all programs get eventually translated into machine code. BUT:
Some programs get translated during compilation, while others are translated on-the-fly by an interpreter (e.g. Perl) or a virtual machine (e.g. original Java)
Obviously, the latter is MUCH slower as you spend time on translation during running.
Different languages can be translated into DIFFERENT machine code. Even when the same programming task is done. So that machine code might be faster or slower depending on the language.
You should understand the difference between compiling (which is translating) and interpreting (which is simulating). You should also understand the concept of a universal basis for computation.
A language or instruction set is universal if it can be used to write an interpreter (or simulator) for any other language or instruction set. Most computers are electronic, but they can be made in many other ways, such as by fluidics, or mechanical parts, or even by people following directions. A good teaching exercise is to write a small program in BASIC and then have a classroom of students "execute" the program by following its steps. Since BASIC is universal (to a first approximation) you can use it to write a program that simulates the instruction set for any other computer.
So you could take a program in your favorite language, compile (translate) it into machine language for your favorite machine, have an interpreter for that machine written in BASIC, and then (in principle) have a class full of students "execute" it. In this way, it is first being reduced to an instruction set for a "fast" machine, and then being executed by a very very very slow "computer". It will still get the same answer, only about a trillion times slower.
Point being, the concept of universality makes all computers equivalent to each other, even though some are very fast and others are very slow.
No, some languages are run by a 'software interpreter' as byte code.
Also, it depends on what the language does in the background as well, so 2 identically functioning programs in different languages may have different mechanics behind the scenes and hence be actually running different instructions resulting in differing performance.

Fortran's performance

Fortran's performances on Computer Language Benchmark Game are surprisingly bad. Today's result puts Fortran 14th and 11th on the two quad-core tests, 7th and 10th on the single cores.
Now, I know benchmarks are never perfect, but still, Fortran was (is?) often considered THE language for high performance computing and it seems like the type of problems used in this benchmark should be to Fortran's advantage. In an recent article on computational physics, Landau (2008) wrote:
However, [Java] is not as efficient or
as well supported for HPC and parallel
processing as are FORTRAN and C, the
latter two having highly developed
compilers and many more scientific
subroutine libraries available.
FORTRAN, in turn, is still the
dominant language for HPC, with
FORTRAN 90/95 being a surprisingly
nice, modern, and effective language;
but alas, it is hardly taught by any
CS departments, and compilers can be
expensive.
Is it only because of the compiler used by the language shootout (Intel's free compiler for Linux) ?
No, this isn't just because of the compiler.
What benchmarks like this -- where the program differs from benchmark to benchmark -- is largely the amount of effort (and quality of effort) that the programmer put into writing any given program. I suspect that Fortran is at a significant disadvantage in that particular metric -- unlike C and C++, the pool of programmers who'd want to try their hand at making the benchmark program better is pretty small, and unlike most anything else, they likely don't feel like they have something to prove either. So, there's no motivation for someone to spend a few days poring over generated assembly code and profiling the program to make it go faster.
This is fairly clear from the results that were obtained. In general, with sufficient programming effort and a decent compiler, neither C, C++, nor Fortran will be significantly slower than assembly code -- certainly not more than 5-10%, at worst, except for pathological cases. The fact that the actual results obtained here are more variant than that indicates to me that "sufficient programming effort" has not been expended.
There are exceptions when you allow the assembly to use vector instructions, but don't allow the C/C++/Fortran to use corresponding compiler intrinsics -- automatic vectorization is not even a close approximation of perfect and probably never will be. I don't know how much those are likely to apply here.
Similarly, an exception is in things like string handling, where you depend heavily on the runtime library (which may be of varying quality; Fortran is rarely a case where a fast string library will make money for the compiler vendor!), and on the basic definition of a "string" and how that's represented in memory.
Some random thoughts:
Fortran used to do very well because it was easier to identify loop invariants which made some optimizations easier for the compiler. Since then
Compilers have gotten much more sophisticated. Enormous effort has been put into c and c++ compilers in particular. Have the fortran compilers kept up? I suppose the gfortran uses the same back end of gcc and g++, but what of the intel compiler? It used to be good, but is it still?
Some languages have gotten a lot specialized keywords and syntax to help the compiler (restricted and const int const *p in c, and inline in c++). Not knowing fortran 90 or 95 I can't say if these have kept pace.
I've looked at these tests. It's not like the compiler is wrong or something. In most tests Fortran is comparable to C++ except some where it gets beaten by a factor of 10. These tests just reflect what one should know from the beggining - that Fortran is simply NOT an all-around interoperable programming language - it is suited for efficient computation, has good list operations & stuff but for example IO sucks unless you are doing it with specific Fortran-like methods - like e.g. 'unformatted' IO.
Let me give you an example - the 'reverse-complement' program that is supposed to read a large (of order of 10^8 B) file from stdin line-by-line, does something with it & prints the resulting large file to stdout. The pretty straighforward Fortran program is about 10 times slower on a single core (~10s) than a HEAVILY optimized C++ (~1s). When you try to play with the program, you'll see that only simple formatted read & write take more than 8 seconds. In a Fortran way, if you care for efficiency, you'd just write an unformatted structure to a file & read it back in no time (which is totally non-portable & stuff but who cares anyway - an efficient code is supposed to be fast & optimized for a specific machine, not able to run everywhere).
So the short answer is - don't worry, just do your job - and if you want to write a super-efficient operating system, than sorry - Fortran is just not the way for that kind of performance.
This benchmark is stupid at all.
For example, they measure CPU-time for the whole program to run. As mcmint stated (and it might be actually true) Fortran I/O sucks*. But who cares? In real-world tasks one read input for some seconds than do calculations for hours/days/months and finally write output for the seconds. Thats why in most benchmarks I/O operations are excluded from time measurements (if you of course do not benchmark I/O by itself).
Norber Wiener in his book God & Golem, Inc. wrote
Render unto man the things which are man’s and unto the computer the things which are the computer’s.
In my opinion the usage of this principle while implementing algorithm in any programming language means:
Write as readable and simple code as you can and let compiler do the optimizations.
Especially it is important in real-world (huge) applications. Dirty tricks (so heavily used in many benchmarks) even if they might improve the efficiency to some extent (5%, maybe 10%) are not for the real-world projects.
/* C/C++ uses stream I/O, but Fortran traditionally uses record-based I/O. Further reading. Anyway I/O in that benchmarks are so surprising. The usage of stdin/stdout redirection might also be the source of problem. Why not simply use the ability of reading/writing files provided by the language or standard library? Once again this woud be more real-world situation.
I would like to say that even if the benchmark do not bring up the best results for FORTRAN, this language will still be used and for a long time. Reasons of use are not just performance but also some kind of thing called easyness of programmability. Lots of people that learnt to use it in the 60's and 70's are now too old for getting into new stuff and they know how to use FORTRAN pretty well. I mean, there are a lot of human factors for a language to be used. The programmer also matters.
Considering they did not publish the exact compiler options they used for the Intel Fortran Compiler, I have little faith in their benchmark.
I would also remark that both Intel's math library, MKL, and AMD's math library, ACML, use the Intel Fortran Compiler.
Edit:
I did find the compilation options when you click on the benchmark's name. The result is surprising since the optimization level seems reasonable. It may come down to the efficiency of the algorithm.

Resources