Why executables aren't written in a way that the OS understand instead of machine code? - compilation

We know that computer programs are either AOT compiled, JIT compiled or interpreted
And we also know that AOT compiled programs usually get compiled from its high level source code into machine code
Now the question is, if machine code is so hard to understand and write, why the idea of compilers wasn't to translate programs into a simpler language understood by the operating system instead of translating directly into machine code
And if such operating-system-dependent language existed, the OS should read the executables written in it and translate them into the corresponding machine code understood by the CPU
In other words, wouldn't the process of compiling into machine code have been easier if OSs had some kind of JIT compiler (VM?) that translated a specific kind of bytecode (which should exist) into machine code?
Are there any disadvantages of all this?

Related

Designing a virtual machine with JIT compilation

I'm writing a virtual machine for a dynamically typed interpreted language that I'm creating, i thought of JIT and if it will be worth it to add it, the design i was having in mind is to have it embedded in the normal VM somewhere because like you know JITs aren't portable so i might not have all backends and would like to still reuse the normal VM and disable JIT with a simple #define flag, i just need a little design advices on how would this go, do i have to jit compile all instructions to machine code or what i was having in mind was something like a "partial jit" to jit compile a few performance critical instructions but then i don't know how would that work, how would it interfer with other instructions that aren't jit compiled and another question that comes up in my mind is how would i even represent types and all that on assembly level, overall I'm a total noob in jit compilers and need someone to guide to the right path, What do other VMs like JVM and LuaJIT do in this case?

Does the compiler actually produce Machine Code?

I've been reading that in most cases (like gcc) the compiler reads the source code in a high level language and spits out the corresponding machine code. Now, machine code by definition is the code that a processor can understand directly. So, machine code should be only machine (processor) dependent and OS independent. But this is not the case. Even if 2 different operating systems are running on the same processor, I can not run the same compiled file (.exe for Windows or .out for Linux) on both the Operating Systems.
So, what am I missing? Is the output of a gcc compiler (and most compilers) not Machine Code? Or is Machine Code not the lowest level of code and the OS translated it further to a set of instructions that the processor can execute?
You are confusing a few things. I retargettable compiler like gcc and other generic compilers compile files to objects, then the linker later links objects with other libraries as needed to make a so called binary that the operating system can then read, parse, load the loadable blocks and start execution.
A sane compiler author will use assembly language as the output of the compiler then the compiler or the user in their makefile calls the assembler which creates the object. This is how gcc works. And how clang works sorta, but llc can make objects directly now not just assembly that gets assembled.
It makes far more sense to generate debuggable assembly language that produce raw machine code. You really need a good reason like JIT to skip the step. I would avoid toolchains that go straight to machine code just because they can, they are harder to maintain and more likely to have bugs or take longer to fix bugs.
If the architecture is the same there is no reason why you cant have a generic toolchain generate code for incompatible operating systems. the gnu tools for example can do this. Operating system differences are not by definition at the machine code level most are at the high level language level C libraries that you can to create gui windows, etc have nothing to do with the machine code nor the processor architecture, for some operating systems the same operating system specific C code can be used on mips or arm or powerpc or x86. where the architecture becomes specific is the mechanism that actual system calls are invoked. A specific instruction is often used. and machine code is eventually used yes but no reason why this cant be coded in real or inline assembly.
And then this leads to libraries, even fopen and printf which are generic C calls eventually have to make a system call so much of the library support code can be in a compatible across systems high level language, there will need to be a system and architecture specific bit of code for the last mile. You should see this in glibc sources, or hooks into newlib for example in other library solutions. As examples.
Same is true for other languages like C++ as it is for C. Interpreted languages have additional layers but their virtual machines are just programs that sit on similar layers.
Low level programming doesnt mean machine nor assembly language it just means whatever programming language you are using accesses at a lower level, below the application or below the operating system, etc...
Compilers produce assembly code, which is a human-readable version of machine code (eg, instead of 1's and 0's you have actual commands). However, the correct assembly/machine code needed to make your program run correctly is different depending on the operating system. So the language the processors use is the same, but your program needs to talk to the operating system, which is different.
For example, say you're writing a Hello World program. You need to print the phrase "Hello, World" onto the screen. Your program, will need to go through the OS to actually do that, and different OSes have different interfaces.
I'm deliberately avoiding technical terms here to keep the answer understandable for beginners. To be more precise, your program needs to go through the operating system to interact with the other hardware on your computer(eg, keyboard, display). This is done through system calls that are different for each family of OS.
The machine code that is generated can run on any of the same type of processor it was generated for. The challenge is that your code will interact with other modules or programs on the system and to do that you need a conventions for calling and returning. The code generated assumes a runtime environment (OS) as well as library support (calling conventions). Those are not consistent across operating systems.
So, things break when they need to transition to and depend on other modules using conventions defined by the operating system's calling conventions.
Even if the machine code instructions are identical for the compiled program on two different operating systems (not at all likely, since different operating systems provide different services in different ways), the machine code needs to be stored in a format that the host OS can use "load into" a process for execution. And those formats are frequently different between different operating systems.

How can a program on my CPU run the same way on another CPU?

Let's say that I would code a program with Windows API and then compile it. The code is compiled to machine code for the CPU to execute. Now, my question is: If I share the executable file for someone else with another instruction set in their CPU. How can their CPU run the code the same way and not give errors or run a different code?
someone else with another instruction set in their CPU
...
How can their CPU run the code the same way
The code won't run. The CPU's, simply put, speak another language.
You have two options
recompile your code for the target CPU (assuming you can use the same source language and no platform specific API, so you're left with C/C++ with stdlib)
Write a script / bytecode and use a runtime available for both platforms to interpret the script (or bytecode)
That's why there are Runtime installations such as JVM (for Java) and scripts (Python, Scala, Lua, JavaScript, etc) where the code is in a form of a script or as platform independent code.
And now - next step. If you're using Windows API, well - as the name suggests - it's API (services) provided by the Windows system. So even using the same CPU without the Windows system (e.g. on a Linux system), the application won't run. (ok, there is often a way how to expose Windows API on Linux, but it can be tricky sometimes).
Conclusion: Binaries are not portable between instruction sets, if you're using any high level API (Win32, ...), you're pretty much hooked to the operating system too
When high-level languages are compiled into executable, often they are compiled to intermediate code. This is a representation of the source code compiled closer to assembly language, however it is not specific to any CPU instruction set. It is up to the machine running the executable to interpret this intermediate code and run it in the CPU's native instruction set.

In which Platform C language Coded?

I just Likely know that in which platform operating system coded.
as per my knowledge.
Windows kernel written in C language.
Linux kernel is also written in C language.
but remain operating system in?
In which Platform C language is written?
Yes, the Windows kernel and Linux kernels are written in C. Most operating systems tend to be.
There are operating systems written in other languages though, the Chorus kernel for example is written in C++.
Most C compilers are also written in C. That has the advantage that once you managed to get the compiler running on the machine (generally by compiling it on another machine that already has a working compiler/cross compiler), the machine itself can compile updates to its own compiler without maintaining yet another compiler.
Most parts of the C compiler (like gcc) are written in C themselves. Of course you would need something to bootstrap your compiler such that it can compile itself. That would then be a lower type language like Assembler.
The C language is one of many languages that are considered to be Self Hosting - that is to say that the compiler can compile its own source code, which is written in the same language that the compiler is designed to compile.
You might also want to look into the process of Bootstrapping, which is the process used to get the first compiler for a particular language to run on a given platform - as others have noted, this can be by way of cross-compiling, or by writing the original compiler in a different language, though other techniques are possible.
First off, you might want to improve your question with actual sentences.
Second,
C is not written in a platform, it is written in another programming language.
Most compilers are written in assembler, a somewhat readable version of the actual machine codes sent to the processor.
I don't know if there are other compilers, written in some intermediary language but eventually, everything boils down to assembler code, which compiles to machine code.

How Does An Operating System Introduces The C Language To Write The Kernel

Does this introduction occurs at the NTLDR stage because it must be introduce, I mean isn't the Kernel written in C? I thought a computers only "known-before" programming language was Assembly Language that is hard coded at the Microcode of the Processor?
The first operating systems were all written in assembly. The C Language was created because its first use case was the creation of UNIX. A C compiler was written to handle this code and produce the assembly that the system understands (compiler was written in assembly of course). The effect snowballs from there. We now have a more powerful system to write code so we can of course write better compilers and better software with a more high level approach and let the compiler do the work for us.
As far as Windows is concerned it was a rewrite of an operating system called QDOS which was written in C.
Sidenote: Operating systems still require assembly code to function as there are many hardware independent pieces of information required (for example CR2 read on page fault on x86). Bootloaders and BIOS (older ones) are written in assembly because they are very specific to the hardware and are required to setup things such as interrupts and the stack pointer.
C is a compiled language, as opposed to an interpreted language. C programs as well as the C runtime library are compiled into machine code, so they don't need any kind of runtime environment such as an interpreter or virtual machine to be loaded in order to execute.
The entry point of a compiled program (including a kernel) will call into its runtime library and perform any initialization required before executing the program, but this is all machine code.

Resources