So I write a program in some language, LanguageX, using a simple text pad.
Then I put that text into a Compiler. The compiler outputs machine code (or assembly which is then compiled into machine code)
My question is, who actually executes the compiled program.
Does the compiler execute it? Or do I need another "executor app" execute it?
Or does the hardware execute the program directly? But who orders the hardware to do that?
I'm confused because the concepts of compiling a program, and executing a program, seem to be used interchangeably.
An example is HTML. I can write html code in a text file and save it as .html, open it with Firefox, and it will run. Is Firefox a compiler, an executor, both, neither?
Another example is a commercial app I buy and install. Whenever I click on the .exe, is the app compiled or executed? Both?
A program is data that explains how to execute things. You can read the program yourself and have a sense of what it should be doing, or give it to another program that can execute it. When a program is directly executed from its source, it is said to be "interpreted".
For example, your browser is interpreting HTML to render a page. When there is Javascript associated with a page, this is loaded and executed by a Javascript interpreter that has access to your page and its elements. The Javascript interpreter is part of your browser program, and your browser is a program that is executed by a processor.
A compiler is a program that transforms the source code into another language, typically instructions that can be decoded by your CPU, but that may be also bytes not directly executable by a processor, but by a virtual machine (another program that knows how to interpret the byte code).
For some languages the compilation phase also involves also a step called linking, but the resulting file is basically a bit of metadata and a sequence of instructions your processor can understand.
In order to execute a program, you ask (through a shell or the graphical interface) your operating systems to load the program: the kernel allocates resources for your process and put the code in a portion of memory that is flagged as executable (there are a lot more details to it).
The kernel keeps track of processes and executes them in one or more processors. The code is being fed directly to a processor which can decode the instructions resulting from compilation. Periodically, a process is interrupted by the kernel to let other processes run (when the process is waiting for something, or due to something called an "interrupt"). When you have multiple processors, multiple programs can execute truly in parallel.
See for example Linux Kernel Teaching for a lot more details.
Related
My understanding is that executables of applications written in Go can stand alone without the need of Go installed in the machine.
Normally my understanding is that the GC (Garbage Collection) is handled by a VM. In this case, if the application is running independently without such a runtime how is GC handled?
A help on this and the documentation on the same would be nice.
my understanding is that the GC (Garbage Collection) is handled by a VM.
In the case of a typical VM supporting programming language
featuring GC, (the compiled form of) a program written in that
language is literally managed by the VM: the VM runs the code of the program and intervenes periodically to perform the GC tasks.
The crucial point is that each program running in such a VM
may consider its VM as a part of its execution environment.
Another crucial point is that such VM represents the so-called
runtime system
for the so-called execution model of that programming language.
In this case, if the application is running independently without such a runtime how is GC handled?
Quite similar to the VM case.
Each Go program compiled by the stock toolchain (which can be downloaded from the official site
contains the Go runtime linked with the program itself.
Each compiled Go program is created in a way that when the program runs, the program's entry point executes the runtime first
which is responsible for initializing itself, then the program, and once this is finished, the execution is transferred to the program's main().
Among other things, the initialized Go runtime continuously
runs one or more pieces of code of its own, which includes the
goroutine scheduler and the GC (they are tightly coupled FWIW).
As you can see, the difference from the VM is that in that case
the runtime is "external" to the running program while in the
(typical) case of Go programs it's "along" the running program.
Nothing in the Go language specification mandates the
precise way the runtime must be made available to the
running program.
For instance, Go 1.11 can be compiled to
WASM, and the runtime is partially
provided by the linked-in code of the Go runtime
and partially—by the WASM host (typically a browser).
As another example, GCC
features a Go frontend, and contrary to the "stock"
Go toolchan, and on those platforms where it's possible,
GCC supports building Go in a way where their compiled forms
dynamically link against a shared library containing most
of the Go runtime code (and the code of the standard library).
In this case, a compiled Go program does not contain the runtime
code but it gets linked in when the program is being loaded
and then it also works in-process with the program itself.
It's perfectly possible to implement an execution model for
Go programs which would use a VM.
Binaries in golang have no external dependencies as they are directly compiled in. Unlike a C/C++ binary which typically requires dynamic linking, golang binaries do not require this by default.
https://golang.org/doc/faq#runtime
This allows you to copy, scp , rsync, etc your binary across to any machine of the same architecture type. For example, if you have a compiled binary on Ubuntu then you can copy that binary across to any other Ubuntu machine. You would have to cross-compile your binary for MacOS in order to do that, but again you can build on any operating system.
https://golang.org/doc/install/source#environment
Im new to "Operating Systems" to please just dont blast me.
I studied the User Mode and the Kernel Mode of the CPU. I've just discovered that some CPU has intermediate modes beetween user and kernel mode.
But why is this necessary ? Is it dangerous to use always kernel mode for privileged instructions ? Or is a matter of prestations ?
The VAX/VMS system is one that used four modes. VMS works quite differently from Eunuchs. In Eunuchs variants you have a shell process. Each time you run a program you create a new process. In fact, the shell in Eunuchs variants is just a program with nothing special about it.
In VMS, the the command interpreter exists in the same process as the running program. Programs can (and often do) interact with the command interpreter. When your program ends, the command interpreter takes back control. Run another program and you remain in the same process with a new executable loaded.
The command interpreter runs in "supervisor mode" which is one level higher than user mode. It is then protected from the user mode access messing with it. As the same time, any bugs in the command interpreter will not cause the system to crash.
Also the debugger exists in supervisor mode within the process it is debugging.
For people brought up under Windoze and Eunuchs you cannot appreciate how primitive their file handling is. VMS, like most real non-toy operating systems, has different file structures. It supports stream files like Eunuchs and Windows. However, it also supports sequential file structures, fixed record file structures and files indexed on keys. The system services for managing this run in executive mode (above supervisor and below kernel). Again, that allows having protected system services that will not crash the entire operating system.
I should also mention that non-toy operating systems support file versions. If you open a document, edit it, and save it, you create a new version of the file with the same name. If you make a misteak or eror you can go back and fix it.
The general answer to your question is these other modes provide means for the operating system to provide interfaces to services that are otherwise protected from users messing with that will not affect the entire operating system when there are problems.
Ideally, an operating system would do as little as possible in kernel mode. When you have operating systems that are quick and dirty and do very little, they just use kernel mode.
I know that software breakpoints in an executable file can work through replacing some assembler instruction at the desired place with another one, which cause interrupt. So debugger can stop execution exactly at this place and replace this instruction with original one and ask user about what to do the next or call some commands and etc.
But code of such executable file is not used by another programs and has only one copy in memory. How can software breakpoints work with a shared libraries? For instance, how software breakpoints work if I set one at some internal function of C-library (as I understand it has only one copy for all the applications, so we cannot just replace some instruction in it)? Are there any "software breakpoints" techniques for that purpose?
The answer for Linux is that the Linux kernel implements COW (Copy-on-Write): If the code of a shared library is written to, the kernel makes a private duplicate copy of the shared page first, remaps internally virtual memory just for that process to the copy, and allows the application to continue. This is completely invisible to userland applications and done entirely in the kernel.
Thus, until the first time a software breakpoint is put into the shared library, its code is indeed shared; But afterwards, not. The process thereafter operates with a dirty but private copy.
This kernel magic is what allows the debugger to not cause every other application to suddenly stop.
On OSes such as VxWorks, however, this is not possible. From personal experience, when I was implementing a GDB remote debug server for VxWorks, I had to forbid my users from ever single-stepping within semTake() and semGive() (the OS semaphore functions), since a) GDB uses software breakpoints in its source-level single-step implementation and b) VxWorks uses a semaphore to protect its breakpoints list...
The unpleasant consequence was an interrupt storm in which a breakpoint would cause an interrupt, and within this interrupt there would be another interrupt, and another and another in an unescapable chain resistant even to Ctrl-Z. The only way out was to power off the machine.
Every now and then I use the mdb debugger to examine core dumps on Solaris. One nice article that I looked at to get up to speed on the possibilities with mdb is http://blogs.oracle.com/ace/entry/mdb_an_introduction_drilling_sigsegv where the author performs a step-by-step examination of a SIGSEGV crash. In the article the author uses "walkers" which is a kind of add-on to mdb that can perform specific tasks.
My problem is that I don't have any of those walkers in my mdb. By using the "::walkers" command, all walkers available can be listed and my list is empty. So the question is, how can I install/add/load walkers such as the ones used in the above article? I don't really know where they are supposed to be loaded from, if you have to download and add them from somewhere or if it's a configuration step when installing Solaris?
mdb automatically loads walkers and dcmds appropriate for what you're debugging, usually from /usr/lib/mdb and similar directories (see mdb(1) for details). If you just run "mdb" by itself, you'll get almost nothing. If you run "mdb" on a userland process or core dump (e.g., "mdb $$"), you'll get walkers and dcmds appropriate for userland debugging. If you run "mdb" on the kernel (e.g., "mdb -k"), you'll get walkers and dcmds for kernel debugging.
I would like to write a C++ function, on Microsoft Windows, that spawns a process and returns, in addition to the process's termination status, a list of all the files the process read or wrote. It should not require any cooperation from the spawned application.
For example, if the program spawned is the Visual Studio C++ compiler, the function would produce a list containing the source file the compiler opened, all header files it read, and the .OBJ file it created. If it also contained things like .DLL files the program contained, that would be fine. But again, it should work regardless of the program spawned; the compiler is just an example.
A twist: if the process creates subprocesses, I need to monitor their file accesses as well.
A second twist: if the process tries to open a file, I would like to be able to make it wait until I can create that file—and only then let it resume and open the file. (I think this rules out ETW.)
I know this probably sounds like an ingredient for some horrible kludge. But if I can get this working, the end result will be really cool.
A second twist: if the process tries to open a file, I would like to be able to make it wait until I can create that file—and only then let it resume and open the file
You just put yourself into Hack City with that requirement - you're right that ETW would've been a far easier solution, but it also has no way to block the file call.
Basically, here's what you're going to have to do:
Create the process suspended
Create two named pipes in opposite directions whose names are well known (perhaps containing the PID of the process)
Hook LoadModule, and the hook will watch for Kernel32 to get loaded
When Kernel32 gets loaded, hook CreateFileW and CreateFileA - also hook CreateProcessEx and ShellExecute
When your CreateFile hook hits, you write the name to one of the named pipes, then execute a ReadFile on the other one until the parent process signals you to continue.
When your CreateProcessEx hook hits, you get to do the same process all over again from inside the current process (remember that you can't have the parent process do the CreateProcess'ing because it'll mess up inherited handles).
Start the child process.
Keep in mind that you'll be injecting code and making fixups to an in-memory image that may be a different bitness than yours (i.e. your app is 64-bit, but it's starting a 32-bit process), so you'll have to have both x86 and amd64 versions of your shim code to inject. I hope by writing this lengthy diatribe you have convinced yourself that this is actually an awful idea that is very difficult to get right and that people who hook Win32 functions make Windows OS developers sad.