Language Engines VS. Runtimes VS. Process Virtual Machines - compilation

Since the rising popularity of JavaScript, I found it intriguing to hear (even during the same speech) either about JavaScript engine, or about JavaScript virtual machine, or JavaScript interpreter, in the same context and referring to the same thing.
Therefore I've tried to do some research, why all those buzzwords do coexisting simultaneously (there must be a reason) and if there are some real (although slight) differences between them.
I'm trying to summarize what I've learnt so far, from many different sources (which I've quoted in the following lines). Feel free to correct me if I'm wrong/misleading with something.
Although I've found out that the runtime-duties of all of them are basically the sames, I've found also some main differences:
Process virtual machines
are the most complex (implements even the IO, virtual instruction set etc.)
always includes an interpreter + intermediate language
besides may include also compilation (just-in-time and/or dynamic recompilation)
applications run only inside VM process as a thread
coupled to an runtime environment
examples: Java Virtual Machine, Dalvik Virtual Machine
Runtimes
does not abstract away from native code
uses solely compilation techniques (either just-in-time or ahead-of-time)
lacks a VM process/sandboxed applications
tight-coupled/dependencies to the underlying operating system
examples: Common Language Runtime, Android Runtime
Language Engines
more lightweight
may use either an interpreter or compilation (just-in-time and/or dynamic recompilation)
decoupled from underlying environment/operating system
exemples: all JavaScript engines, Zend Engine
Questions:
is the upper list accurate, or is it just a byproduct based on totally coincidental similarities between most common runtime systems, which use the same denomination?
are there also other noteworthy differences?

TL;DR:
The virtual machine is a concept that represents how one could decouple the actual instructions that your CPU will use to execute your program and your human-readable source code. You, the programmer, are able to write less code that does more.
Longer answer
I'm pretty sure that you're looking for an answer that's much simpler than the classifications that you outlined in your question. Typically, a virtual machine simply stands for the abstraction that is inserted between the source code that you've written and how the physical hardware receives those instructions.
All of those nitty-gritty details that are needed to talk to the processor are handled by a "virtual machine", which itself automates many of the repetitive commands that might be required to do a very simple operation, like manipulating a string UTF-8 encoded characters and printing them to the command line. Python uses one, Java uses one, and the class of "Language Engine" that you outlined above is just a fancier name for the same concept of a virtual machine. Any VM can be as fast and lightweight as the VM programmer designs it to be, which will play into both the usability and reliability of the applications are developed for it.
As is the case when using Java and Python, you're able to write programs using verbose languages that have very little in common with the "language" that the physical processor needs in order to execute commands. Thankfully, people that are far smarter than I am have created programs using that processor-specific language, called the "assembler" or "assembly" languages, which differ between instruction set architectures (e.g. RISC-V, x8/x86-64, ARMv8). These programs are what can translate what you've written into that processor-specific language procedurally, essentially acting as a translation layer.
What you could potentially end up with is a far simpler interface between you, the developer, and the hardware that you're trying to leverage. To boot, the only thing that you need in order to run the program across different OS environments is for the VM to have been implemented in whatever assembly language is used by the physical host processor.
Note: I say can because of the very subjective nature that is the appreciation of any given language's syntax.
To give a high-level example, the Python "interpreter" that we can use both to execute pre-written scripts and interact with in a command-line application relies on a compiler that creates the Python bytecode from your code, which is then fed into the so-called Python Virtual Machine for processor execution.
Also, there's no real difference between a virtual machine and a "language engine". The JavaScript language engines, such as Google's V8, are simply implementations of this virtual machine concept that allows for developers to rely on platform agnosticism and portability when developing dynamic, web-based applications without worrying about breaking the usability of their programs.
If you're interested, look into the various implementations of Python (Jython, IronPython, etc.). These use the conceptual virtual machines implemented by other languages/frameworks, like Java and .NET, to create an implementation of Python that uses the same syntax, but gets cross-compiled into Java bytecode or Common Intermediate Language bytecode instructions to achieve the same results. The "python interpreter" that is almost universally referred to is the C-implemented interpreter that is officially maintained by the Python Software Foundation, colloquially named "CPython".
Just to really drive it home, here's a prime-checking algorithm written in Python, C++ using C libraries, and x86 ASM (x86 assembly language) by xmdi on youtube, and you'll be able to see how the Python VM enables the kind of syntax that allows people to easily get into programming without losing their mind trying to juggle register manipulation using assembly language:
Python:
def isPrime(n):
for i in range(2,n//2+1):
if (not (n%i)):
return 0
return 1
numPrimes = 0
for i in range(2,250001):
numPrimes+=isPrime(i)
print(str(numPrimes))
C++:
#include <stdio.h>
int isPrime(int n){
for (int i = 2; i <= n/2; i++){
if (!(n%2)){
return 0;
}
}
return 1;
}
int main(){
int numPrimes = 0;
for (int i = 2; i <= 250001; i++){
numPrimes += isPrime(i);
}
printf("%d\n", numPrimes);
return 0;
}
x86 ASM:
.section .data
f: .string "%d/n"
.section .text
.globl main
main:
movl $2, %eax
xor %r8d,%r8d
loop:
cmpl $250000,%eax
jg end_loop
movl $2, %r10d
movl %eax,%r11d
shr $1,%r11d
prime_loop:
cmpl %r11d,%r10d
jg prime
push %rax
xor %edx,%edx
div %r10d
test %edx,%edx
pop %rax
je not_prime
inc %r10d
jmp prime_loop
prime:
inc%r8d
not_prime:
inc %eax
jmp loop
end_loop:
lea f(%rip),%rdi
mov %r8d,%esi
xor %eax,%eax
call printf

Related

Proper way to manipulate registers (PUT32 vs GPIO->ODR)

I'm learning how to use microcontrollers without a bunch of abstractions. I've read somewhere that it's better to use PUT32() and GET32() instead of volatile pointers and stuff. Why is that?
With a basic pin wiggle "benchmark," the performance of GPIO->ODR=0xFFFFFFFF seems to be about four times faster than PUT32(GPIO_ODR, 0xFFFFFFFF), as shown by the scope:
(The one with lower frequency is PUT32)
This is my code using PUT32
PUT32(0x40021034, 0x00000002); // RCC IOPENR B
PUT32(0x50000400, 0x00555555); // PB MODER
while (1) {
PUT32(0x50000414, 0x0000FFFF); // PB ODR
PUT32(0x50000414, 0x00000000);
}
This is my code using the arrow thing
* (volatile uint32_t *) 0x40021034 = 0x00000002; // RCC IOPENR B
GPIOB->MODER = 0x00555555; // PB MODER
while (1) {
GPIOB->ODR = 0x00000000; // PB ODR
GPIOB->ODR = 0x0000FFFF;
}
I shamelessly adapted the assembly for PUT32 from somewhere
PUT32 PROC
EXPORT PUT32
STR R1,[R0]
BX LR
ENDP
My questions are:
Why is one method slower when it looks like they're doing the same thing?
What's the proper or best way to interact with GPIO? (Or rather what are the pros and cons of different methods?)
Additional information:
Chip is STM32G031G8Ux, using Keil uVision IDE.
I didn't configure the clock to go as fast as it can, but it should be consistent for the two tests.
Here's my hardware setup: (Scope probe connected to the LEDs. The extra wires should have no effect here)
Thank you for your time, sorry for any misunderstandings
PUT32 is a totally non-standard method that the poster in that other question made up. They have done this to avoid the complication and possible mistakes in defining the register access methods.
When you use the standard CMSIS header files and assign to the registers in the standard way, then all the complication has already been taken care of for you by someone who has specific knowledge of the target that you are using. They have designed it in a way that makes it hard for you to make the mistakes that the PUT32 is trying to avoid, and in a way that makes the final syntax look cleaner.
The reason that writing to the registers directly is quicker is because writing to a register can take as little as a single cycle of the processor clock, whereas calling a function and then writing to the register and then returning takes four times longer in the context of your experiment.
By using this generic access method you also risk introducing bugs that are not possible if you used the manufacturer provided header files: for example using a 32 bit access when the register is 16 or 8 bits.

change instruction set in GCC

I want to test some architecture changes on an already existing architecture (x86) using simulators. However to properly test them and run benchmarks, I might have to make some changes to the instruction set, Is there a way to add these changes to GCC or any other compiler?
Simple solution:
One common approach is to add inline assembly, and encode the instruction bytes directly.
For example:
int main()
{
asm __volatile__ (".byte 0x90\n");
return 0;
}
compiles (gcc -O3) into:
00000000004005a0 <main>:
4005a0: 90 nop
4005a1: 31 c0 xor %eax,%eax
4005a3: c3 retq
So just replace 0x90 with your inst bytes. Of course you wont see the actual instruction on a regular objdump, and the program would likely not run on your system (unless you use one of the nop combinations), but the simulator should recognize it if it's properly implemented there.
Note that you can't expect the compiler to optimize well for you when it doesn't know this instruction, and you should take care and work with inline assembly clobber/input/output options if it changes state (registers, memory), to ensure correctness. Use optimizations only if you must.
Complicated solution
The alternative approach is to implement this in your compiler - it can be done in gcc, but as stated in the comments LLVM is probably one of the best ones to play with, as it's designed as a compiler development platform, but it's still very complicated as LLVM is best suited for IR optimization stages, and is somewhat less friendly when trying to modify the target-specific backends.
Still, it's doable, and you have to do that if you also plan to have your compiler decide when to issue this instruction. I'd suggest to start from the first option though, to see if your simulator even works with this addition, and only then spending time on the compiler side.
If and when you do decide to implement this in LLVM, your best bet is to define it as an intrinsic function, there's relatively more documentation about this in here - http://llvm.org/docs/ExtendingLLVM.html
You can add new instructions, or change existing by modifying group of files in GCC called "machine description". Instruction patterns in <target>.md file, some code in <target>.c file, predicates, constraints and so on. All of these lays in $GCCHOME/gcc/config/<target>/ folder. All of this stuff using on step of generation ASM code from RTL. You can also change cases of emiting instructions by change some other general GCC source files, change SSA tree generation, RTL generation, but all of this a little bit complicated.
A simple explanation what`s happened:
https://www.cse.iitb.ac.in/grc/slides/cgotut-gcc/topic5-md-intro.pdf
It's doable, and I've done it, but it's tedious. It is basically the process of porting the compiler to a new platform, using an existing platform as a model. Somewhere in GCC there is a file that defines the instruction set, and it goes through various processes during compilation that generate further code and data. It's 20+ years since I did it so I have forgotten all the details, sorry.

What exactly happened with the Lisp REPL on JPL's DS-1?

I've heard the Google talk (http://www.youtube.com/watch?v=_gZK0tW8EhQ) by Ron Garret and read the paper (http://www.flownet.com/gat/jpl-lisp.html), but I'm not understanding how it worked to "correct" supposedly running code with a REPL. Was the DS-1's Lisp code running is some sort of virtual machine? Or was it "live" in the REPL's actual world? Or was the Lisp code an executable that got replaced? What exactly happened/happens when you dynamically change running Lisp code through a REPL?
Whereas most programs are built and distributed as an executable that contains only the necessary components to run the program, Lisp can be distributed as an image that contains not just the components for the specific program, but also much or all of the Lisp runtime and development environment.
The REPL is the quintessential mechanism for providing interactive access to a running Lisp environment. The two key components of the REPL, Read, and Eval, expose much of the Lisp runtime system. For example, many Lisp systems today implement Eval by compiling the provided form (that is read by the Reader), compiling the form to machine code, and then executing the result. This is in contrast to interpreting the form. Some systems, especially in the past, contained both an interpreter that executes quickly and is suitable for interactive access, and a compiler that produces better code. But modern systems are fast enough that the compiler phase isn't noticeable and simply forgo the interpreter.
Of course, you can do very similar things today. A simple example is running SSH to your Linux box that's hosting PHP. Your PHP server is up and running and live, serving pages and requests. But you login through SSH, go over and fix a PHP file, and as soon as you save that file, all of your users see the new result in real time -- the system updated itself on the fly.
The fact that PHP is running on a Linux runtime vs Lisp running on a Lisp runtime, is a detail. The effect is the same. The fact that PHP isn't compiled is a detail also. For example, you can do the same thing on a Java server: modify a JSP, save it, and the JSP is converted in to a Servlet as Java source code, then compiled on the fly by the Java runtime, then loaded in to the executing container, replacing the old code.
Lisps capability to do this is very nice, and it was very interesting far back in the day. Today, it's less so, as there are different system providing similar capabilities.
Addenda:
No, Lisp is not a virtual machine, there's no need for it to be that complicated.
The key to the concept is dynamic dispatch. With dynamic dispatch there is some lookup involved before a function is invoked.
In a static language like C, locations of things are pretty much set in stone once the linker and loader have finished processing the executable in preparation to start executing.
So, in C if you have something simple like:
int add(int i) {
return i + 1;
}
void main() {
add(1);
}
After compiling and linking and loading of the program, the address of the add function will be set in stone, and thus thing referring to that function will know exactly where to find it.
So, in assembly language: (note this is a contrived assembly language)
add: pop r1 ; pop R1 from the stack, loading the i parameter
add r1, 1; Add 1 to the parameter.
push r1 ; push result of function call
rts ; return from subroutine
main: push 1 ; Push parameter to function
call add ; call function
pop r1 ; gather (and ignore) the result
So, you can see here that add is fixed in place.
In something like Lisp, function are referred to indirectly.
int add(int i) {
return i + 1;
}
int *add_ptr() = &add;
void main() {
*(add_ptr)(1);
}
In assembly you get:
add: pop r1 ; pop R1 from the stack, loading the i parameter
add r1, 1; Add 1 to the parameter.
push r1 ; push result of function call
rts ; return from subroutine
add_ptr: dw add ; put the address of the add routine in add_ptr
main: push 1 ; Push parameter to function
mov r1, add_ptr ; Put the contents of add_ptr into R1
call (r1) ; call function indirectly through R1
pop r1 ; gather (and ignore) the result
Now, you can see here that rather than calling add directly, it is called indirectly through the add_ptr. In a Lisp runtime, it has the capability of compiling new code, and when that happens, add_ptr would be overwritten to point to the newly compiled code. You can see how the code in main never has to change, it will call whatever function add_ptr is pointing to.
Since most all of the functions in Lisp are indirectly referenced through their symbols, a lot can change "behind the back" of a running system, and the system, will continue to run.
When a function is recompiled, the old function code (assuming no other references) become eligible for garbage collection, and will, typically, eventually go away.
You can also see that when the system is garbage collected, any code that is moved (such as the code for the add function) can be moved by the runtime, and it's new location updated in the add_ptr so the system will continue operating even after code and been relocated by the garbage collector.
So, the key to it all, is to have your functions invoked through some lookup mechanism. Having this gives you a lot of flexibility.
Note, you can also do this is a running C system, for example. You can put code in a dynamic library, load the library, execute the code, and if you want you can build a new dynamic library, close the old one, open the new one, and invoke the new code -- all in a "running" system. The dynamic library interface provides the lookup mechanism that isolates the code.

Windows initial execution context

Once Windows has loaded an executable in memory and transfert execution to the entry point, do values in registers and stack are meaningful? If so, where can I find more informations about it?
Officially, the registers at the entry point of PE file do not have defined values. You're supposed to use APIs, such as GetCommandLine to retrieve the information you need. However, since the kernel function that eventually transfers control to the entry point did not change much from the old days, some PE packers and malware started to rely on its peculiarities. The two more or less reliable registers are:
EAX points to the entry point of the application (because the kernel function uses call eax to jump to it)
EBX points to the Process Environment Block (PEB).
Chapter 5 of Windows Internals Fifth Edition covers the mechanism of Windows creating a process in detail. That would give you more information about Windows loading an executable in memory and transferring execution to the entry point.
I found this up-to-date reference that covers how registers are used in various calling conventions on various operating systems and by various compilers. It's quite detailed, and seems comprehensive:
Agner Fog's Calling Conventions document

How to read / write .exe machine code manually?

I am not well acquainted to the compiler magic. The act of transforming human-readable code (or the not really readable Assembly instructions) into machine code is, for me, rocket science combined with sorcery.
I will narrow down the subject of this question to Win32 executables (.exe). When I open these files up in a specialized viewer, I can find strings (usually 16b per character) scattered at various places, but the rest is just garbage. I suppose the unreadable part (majority) is the machine code (or maybe resources, such as images etc...).
Is there any straightforward way of reading the machine code? Opening the exe as a file stream and reading it byte by byte, how could one turn these individual bytes into Assembly? Is there a straightforward mapping between these instruction bytes and the Assembly instruction?
How is the .exe written? Four bytes per instruction? More? Less? I have noticed some applications can create executable files just like that: for example, in ACD See you can export a series of images into a slideshow. But this does not necessarily have to be a SWF slideshow, ACD See is also capable of producing EXEcutable presentations. How is that done?
How can I understand what goes on inside an EXE file?
OllyDbg is an awesome tool that disassembles an EXE into readable instructions and allows you to execute the instructions one-by-one. It also tells you what API functions the program uses and if possible, the arguments that it provides (as long as the arguments are found on the stack).
Generally speaking, CPU instructions are of variable length, some are one byte, others are two, some three, some four etc. It mostly depends on the kind of data that the instruction expects. Some instructions are generalised, like "mov" which tells the CPU to move data from a CPU register to a place in memory, or vice versa. In reality, there are many different "mov" instructions, ones for handling 8-bit, 16-bit, 32-bit data, ones for moving data from different registers and so on.
You could pick up Dr. Paul Carter's PC Assembly Language Tutorial which is a free entry level book that talks about assembly and how the Intel 386 CPU operates. Most of it is applicable even to modern day consumer Intel CPUs.
The EXE format is specific to Windows. The entry-point (i.e. the first executable instruction) is usually found at the same place within the EXE file. It's all kind of difficult to explain all at once, but the resources I've provided should help cure at least some of your curiosity! :)
You need a disassembler which will turn the machine code into assembly language. This Wikipedia link describes the process and provides links to free disassemblers. Of course, as you say you don't understand assembly language, this may not be very informative - what exactly are you trying to do here?
You can use debug from the command line, but that's hard.
C:\WINDOWS>debug taskman.exe
-u
0D69:0000 0E PUSH CS
0D69:0001 1F POP DS
0D69:0002 BA0E00 MOV DX,000E
0D69:0005 B409 MOV AH,09
0D69:0007 CD21 INT 21
0D69:0009 B8014C MOV AX,4C01
0D69:000C CD21 INT 21
0D69:000E 54 PUSH SP
0D69:000F 68 DB 68
0D69:0010 69 DB 69
0D69:0011 7320 JNB 0033
0D69:0013 7072 JO 0087
0D69:0015 6F DB 6F
0D69:0016 67 DB 67
0D69:0017 7261 JB 007A
0D69:0019 6D DB 6D
0D69:001A 206361 AND [BP+DI+61],AH
0D69:001D 6E DB 6E
0D69:001E 6E DB 6E
0D69:001F 6F DB 6F
The executable file you see is Microsofts PE (Portable Executable) format. It is essentially a container, which holds some operating system specific data about a program and the program data itself split into several sections. For example code, resources, static data are stored in seperate sections.
The format of the section depends on what is in it. The code section holds the machine code according to the executable target architecture. In the most common cases this is Intel x86 or AMD-64 (same as EM64T) for Microsoft PE binaries. The format of the machine code is CISC and originates back to the 8086 and earlier. The important aspect of CISC is that its instruction size is not constant, you have to start reading at the right place to get something valuable out of it. Intel publishes good manuals on the x86/x64 instruction set.
You can use a disassembler to view the machine code directly. In combination with the manuals you can guess the source code most of the time.
And then there's MSIL EXE: The .NET executables holding Microsofts Intermediate Language, these do not contain machine specific code, but .NET CIL code. The specifications for that are available online at the ECMA.
These can be viewed with a tool such as Reflector.
The contents of the EXE file are described in Portable Executable. It contains code, data, and instructions to OS on how to load the file.
There is an 1:1 mapping between machine code and assembly. A disassembler program will perform the reverse operation.
There isn't a fixed number of bytes per instruction on i386. Some are a single byte, some are much longer.
Just relating to this question, anyone still read things like
CD 21?
I remembered Sandra Bullock in one show, actually reading a screenful of hex numbers and figure out what the program does. Sort of like the current version of reading Matrix code.
if you do read stuff like CD 21, how do you remember the different various combinations?
Win32 exe format on MSDN
I'd suggest taking an bit of Windows C source code and build and start debugging it in Visual Studio. Switch to the disassembly view and step over the commands. You can see how the C code has been compiled into machine code - and watch it run step-by-step.
If it's as foreign to you as it seems, I don't think a debugger or disassembler is going to help - you need to learn assembler programming first; study the architecture of the processor (plenty of documentation downloadable from Intel). And then since most machine code is generated by compilers, you'll need to understand how compilers generate code - the simplest way to write lots of small programs and then disassemble them to see what your C/C++ is turned into.
A couple of books that'll help you understand:-
Reversing
Hacking = The Art of Exploitation
To get an idea, set a breakpoint on some interesting code, and then go to the CPU window.
If you are interested in more, it is easier to compile short fragments with Free Pascal using the -al parameter.
FPC allows to output the generated assembler in a multitude of assembler formats (TASM,MASM,GAS ) using the -A parameter, and you can have the original pascal code interleaved in comments (and more) for easy crossreference.
Because it is compiler generated assembler, as opposed to assembler from disassembled .exe, it is more symbolic and easier to follow.
Familiarity with low level assembly (and I mean low level assembly, not "macros" and that bull) is probably a must. If you really want to read the raw machine code itself directly, usually you would use a hex editor for that. In order to understand what the instructions do, however, most people would use a disassembler to convert that into the appropriate assembly instructions. If you're one of the minority who wants to understand the machine language itself, I think you'd want the IntelĀ® 64 and IA-32 Architectures Software Developer's Manuals. Volume 2 specifically covers the instruction set, which relates to your query about how to read machine code itself and how assembly relates to it.
Both your curiosity and your level of understanding is exactly where I was at one point. I highly recommend Code: The Hidden Language of Computer Hardware and Software. This will not answer all of the questions you ask here but it will shed light on some of the utterly black magic aspects of computers. It's a thick book but highly readable.
ACD See is probably taking advantage of the fact that .EXE files do no error checking on file length or anything beyond the length of the expected portion of the file. Because of this, you can make an .EXE file that will open its self and load everything beyond a given point as data. This is useful because you can then make a .EXE that works on a given set of data by just tacking that data on the end of a suitably written .EXE
(I have no idea what exactly ACD See is so take that with a big grain of salt but I do know that some program are generated that way.)
Every instruction is in machine code kept in a special memory area within the cpu. EARLY INTEL books gave the machine code for their instructions, so one should try to obtain such books so as to understand this. Obviously today machine codeis not easily available. What would be nice is a program which can reverse hex to machine code. Or do it manually _!!
tedious

Resources