What is it meant by "developers must optimise their apps to run on ARM-based processors"? - macos

This is a subject that I am not very knowledgable about and I was hoping to get a better understanding on the topic.
I was going through articles about Apple's transition to Apple Silicon and at some point I read "Apple is going to ship Rosetta 2, an emulation layer that lets you run old apps on new Macs."
As far as I know, an application is written in a high level language (e.g. C/C++,Java etc.). Then the compiler (let's assume interpreters don't exist for a moment) reads that code and translates it to assembly code. Then the assembler will convert assembly code to machine code which is readable by the processor.
My question is, assuming the above are correct, why is Rosetta 2 required since a CPU is supposed to translate high level code into readable machine code anyway? Why would developers need to "optimise" (or care on what processor their applications are run on) their applications since they are written (mostly) in high level language (which the processor can compile) ? I don't get why would programmers care if the CPU is supposed to handle compiling and assembling.
This question is probably rather trivial but I couldn't find what I was looking for just by reading about compilers or CPU architecture.

a CPU is supposed to translate high level code into readable machine code anyway?
No, the CPU doesn't do that itself, it happens via software running on the CPU (JIT or ahead-of-time compiler).
For ahead-of-time compiler (e.g. normal C++ implementations), closed source software only ships x86 machine code, not source. So you can't just recompile it yourself. Open-source software is usually easily portable by recompiling.
Rewritten is an overstatement for most apps, most can just recompile.
But if you have custom x86-specific code, like manually vectorized SIMD loops using SSE / AVX intrinsics or hand-written asm, you'd have to port those to NEON / AArch64 SIMD.

Related

What does M1 mac optimization process for an application mean?

You know the ARM-based M1 chips that are used in modern mac computers. On those macs, some number of software are ran through the layer called Rosetta (Discord, Steam), some natively, directly through M1 (Slack, IntelliJ) and some actually doesn't work in either way (Virtual Box). Huge list holding the status can be found here.
Apps that can be ran only with Rosetta are not yet M1 optimized, their developers have to optimize it, it takes some time to do so. But what does it mean to optimize it? What the process looks like? I'm quite sure that they don't rewrite the whole application code to another language (like Swift), because Jetbrains was able to M1 optimize their apps quite quickly. On the other hand, Discord is not yet optimized, same for Unity game engine (it's in beta though).
At bottom, it just means that the compiler's backend was configured to emit ARM64 instructions for the program instead of (or in-addition to) x86-64 instructions.
This means that certain x86-64 specific functionality instruction can no longer be used, unless equivalent ARM instructions are used instead.
This usually isn't much of a problem though, because most macOS software is typically written at a higher level of abstraction, using system-provided frameworks.
For example, using CoreImage to manipulate images abstracts you from the details of the CPU and GPU. In such cases, Apple does the heavy lifting of porting over their frameworks. All you have to do as an application developer is to check a box that says "target ARM64".

Does the compiler actually produce Machine Code?

I've been reading that in most cases (like gcc) the compiler reads the source code in a high level language and spits out the corresponding machine code. Now, machine code by definition is the code that a processor can understand directly. So, machine code should be only machine (processor) dependent and OS independent. But this is not the case. Even if 2 different operating systems are running on the same processor, I can not run the same compiled file (.exe for Windows or .out for Linux) on both the Operating Systems.
So, what am I missing? Is the output of a gcc compiler (and most compilers) not Machine Code? Or is Machine Code not the lowest level of code and the OS translated it further to a set of instructions that the processor can execute?
You are confusing a few things. I retargettable compiler like gcc and other generic compilers compile files to objects, then the linker later links objects with other libraries as needed to make a so called binary that the operating system can then read, parse, load the loadable blocks and start execution.
A sane compiler author will use assembly language as the output of the compiler then the compiler or the user in their makefile calls the assembler which creates the object. This is how gcc works. And how clang works sorta, but llc can make objects directly now not just assembly that gets assembled.
It makes far more sense to generate debuggable assembly language that produce raw machine code. You really need a good reason like JIT to skip the step. I would avoid toolchains that go straight to machine code just because they can, they are harder to maintain and more likely to have bugs or take longer to fix bugs.
If the architecture is the same there is no reason why you cant have a generic toolchain generate code for incompatible operating systems. the gnu tools for example can do this. Operating system differences are not by definition at the machine code level most are at the high level language level C libraries that you can to create gui windows, etc have nothing to do with the machine code nor the processor architecture, for some operating systems the same operating system specific C code can be used on mips or arm or powerpc or x86. where the architecture becomes specific is the mechanism that actual system calls are invoked. A specific instruction is often used. and machine code is eventually used yes but no reason why this cant be coded in real or inline assembly.
And then this leads to libraries, even fopen and printf which are generic C calls eventually have to make a system call so much of the library support code can be in a compatible across systems high level language, there will need to be a system and architecture specific bit of code for the last mile. You should see this in glibc sources, or hooks into newlib for example in other library solutions. As examples.
Same is true for other languages like C++ as it is for C. Interpreted languages have additional layers but their virtual machines are just programs that sit on similar layers.
Low level programming doesnt mean machine nor assembly language it just means whatever programming language you are using accesses at a lower level, below the application or below the operating system, etc...
Compilers produce assembly code, which is a human-readable version of machine code (eg, instead of 1's and 0's you have actual commands). However, the correct assembly/machine code needed to make your program run correctly is different depending on the operating system. So the language the processors use is the same, but your program needs to talk to the operating system, which is different.
For example, say you're writing a Hello World program. You need to print the phrase "Hello, World" onto the screen. Your program, will need to go through the OS to actually do that, and different OSes have different interfaces.
I'm deliberately avoiding technical terms here to keep the answer understandable for beginners. To be more precise, your program needs to go through the operating system to interact with the other hardware on your computer(eg, keyboard, display). This is done through system calls that are different for each family of OS.
The machine code that is generated can run on any of the same type of processor it was generated for. The challenge is that your code will interact with other modules or programs on the system and to do that you need a conventions for calling and returning. The code generated assumes a runtime environment (OS) as well as library support (calling conventions). Those are not consistent across operating systems.
So, things break when they need to transition to and depend on other modules using conventions defined by the operating system's calling conventions.
Even if the machine code instructions are identical for the compiled program on two different operating systems (not at all likely, since different operating systems provide different services in different ways), the machine code needs to be stored in a format that the host OS can use "load into" a process for execution. And those formats are frequently different between different operating systems.

Why is bytecode JIT compiled at execution time and not at installation time?

Compiling a program to bytecode instead of native code enables a certain level of portability, so long a fitting Virtual Machine exists.
But I'm kinda wondering, why delay the compilation? Why not simply compile the byte code when installing an application?
And if that is done, why not adopt it to languages that directly compile to native code? Compile them to an intermediate format, distribute a "JIT" compiler with the installer and compile it on the target machine.
The only thing I can think of is runtime optimization. That's about the only major thing that can't be done at installation time. Thoughts?
Often it is precompiled. Consider, for example, precompiling .NET code with NGEN.
One reason for not precompiling everything would be extensibility. Consider those languages which allow use of reflection to load additional code at runtime.
Some JIT Compilers (Java HotSpot, for example) use type feedback based inlining. They track which types are actually used in the program, and inline function calls based on the assumption that what they saw earlier is what they will see later. In order for this to work, they need to run the program through a number of iterations of its "hot loop" in order to know what types are used.
This optimization is totally unavailable at install time.
The bytecode has been compiled just as well as the C++ code has been compiled.
Also the JIT compiler, i.e. .NET and the Java runtimes are massive and dynamic; And you can't foresee in a program which parts the apps use so you need the entire runtime.
Also one has to realize that a language targeted to a virtual machine has very different design goals than a language targeted to bare metal.
Take C++ vs. Java.
C++ wouldn't work on a VM, In particular a lot of the C++ language design is geared towards RAII.
Java wouldn't work on bare metal for so many reasons. primitive types for one.
EDIT: As delnan points out correctly; JIT and similar technologies, though hugely benificial to bytecode performance, would likely not be available at install time. Also compiling for a VM is very different from compiling to native code.

Assembly Programming on Mac

I am on a Mac with Snow Leopard (10.6.3). I hear that the assembly language I work with has to be valid with the chipset that you use. I am completely new to this I have a basic background in C and Objective-C programming and an almost strong background in PHP. I have always wanted to see what assembly is all about.
The tutorial I'll be looking at is by VTC [link].
What I want to know is: are the tutorials that I'm about to do compatible with the assembly version on the Mac that I have?
I am completely new to this language although I do recall studying some of it way, way back in the day. I do have Xcode and what I'm wondering is what kind of document would I open in Xcode to work with assembly and does the Mac have a built in hex editor (when it comes time to needing it)?
The assembly language you use is not dependent on your OS but rather your CPU's instruction set. Judging by your Mac version, I'd say you are using an Intel processor - so you would want to learn x86 or amd64 assembly.
A good way to pick up assembly is to get yourself an embedded device to play with.
TI has some nice, inexpensive devkits to play with. I've poked around with the Chronos kit ($50) which has digital watch with a programmable MSP430 microcontroller with a wireless link to your computer. It's pretty sweet.
Update: I forgot to mention the Arduino. It's a pretty nifty open platform with tons of interesting peripherals and projects online.
An assembly language is instruction architecture specific. Chips are an instantiation of an instruction architecture.
In my opinion, you are best served by getting TextWrangler and directly compiling with gcc.
The file extension you are looking for is .s.
Assembly, for any processor, will be more or less the same in concept. However, the complexity varies between processors. From what I see in your site, you'd be doing x86 assembler, (x86 being the instruction set all consumer-line Intel processors use, which recent Macs and all PCs use) which can turn out to be fairly complex, but not overwhelming if you learn by steps.
XCode works with plain text files, I believe. Hex Fiend for your hex editing needs, if you come across them.
Do keep in mind, Assembly is extremely low-level. No ifs, whiles, or in fact any control loop save for "do operation and GOTO if results in (not) zero/equal" (unless your assembler provides them as syntactic sugar, which kind of beats the purpose, in my opinion). PHP knowledge will be at most tangentially useful. You C knowledge should serve you well, though.
The linked tutorials look like they use NASM, which is included with Macs. However, system calls are usually different on different platforms (they're very different between Mac and Linux), and without seeing the tutorials, it's hard to know whether they'll target different platforms (I'd guess not, though). A better bet might be to install SPIM and to learn MIPS assembly, which is more straightforward than x86 anyways.

Basic questions about Assembly and Macs

Okay. I want to learn how to assemble programs on my Mac (Early 2009 MBP, Intel Core 2 Duo). So far, I understand only that Assembly languages are comprised of direct one-to-one mnemonics for CPU instructions. After some Googling, I've seen a lot of terms, mostly "x86" and "x86_64". I've also seen MASM, NASM, and GAS, among others.
Correct me if I'm wrong:
x86 and x86_64 are instruction sets. If I write something using these instruction sets (as raw machine code), I'm fine so long as my program stays on the processor it was designed for.
NASM, MASM, and GAS are all different assemblers.
There are different Assembly languages. There's the AT&T syntax and the Intel syntax, for example. Support for these syntaxes differ across assemblers.
Now, questions:
As a Mac user, which instruction sets should I be concerned about?
Xcode uses GCC. Does this mean it also uses GAS?
If it does use GAS, then should I be learning the AT&T syntax?
Is there a book I can get on this. Not a tutorial, not a reference manual on the web. Those things assume to much about me; for example, as far as I know, a register is just a little bit of memory on the CPU. That's how little I really know.
Thanks for your help.
If you want to learn assembly language, start with the x86 instruction set. That's the basic set.
A good book on the subject is Randall Hyde's the Art of Assembly Language, which is also available on his website. He uses a high-level assembler to make things easy to grasp and to get going, but deep down it uses GAS.
I don't believe that XCode comes with any assembler, but you can for example find GAS in MacPort's binutils package.
If you just want to make programs on your Mac and you're not that interested in the life of all the bits in the CPU, you're much better off with a more high-level language like Python or Ruby.
"I'm fine so long as my program stays on the processor it was designed for." Not really. In many cases, assembler programs will also make assumptions about the operating system they run on (e.g. when they call library functions or make system calls). Otherwise, your assumpptions are correct.
Onto questions:
Current Macs support both x86 and x86-64 (aka AMD64 aka EM64T aka Intel64). Both 32-bit and 64-bit binaries can be run on recent systems; Apple itself ships its libraries in "fat" (aka "universal") mode, i.e. machine code for multiple architectures.
Use "as -v" to find out what precise assembler you have; mine reports as "Apple Inc version cctools-698.1~1, GNU assembler version 1.38". So yes, it's GAS.
Yes.
https://stackoverflow.com/questions/4845/good-x86-assembly-book
I'll answer the first question:
Macs use Intel chips now, and modern processors are 64-bit.

Resources