How does Go compile so quickly? - performance

I've Googled and poked around the Go website, but I can't find an explanation for Go's extraordinary build times. Are they products of the language features (or lack thereof), a highly optimized compiler, or something else? I'm not trying to promote Go; I'm just curious.

Dependency analysis.
The Go FAQ used to contain the following sentence:
Go provides a model for software
construction that makes dependency
analysis easy and avoids much of the
overhead of C-style include files and
libraries.
While the phrase is not in the FAQ anymore, this topic is elaborated upon in the talk Go at Google, which compares the dependency analysis approach of C/C++ and Go.
That is the main reason for fast compilation. And this is by design.

I think it's not that Go compilers are fast, it's that other compilers are slow.
C and C++ compilers have to parse enormous amounts of headers - for example, compiling C++ "hello world" requires compiling 18k lines of code, which is almost half a megabyte of sources!
$ cpp hello.cpp | wc
18364 40513 433334
Java and C# compilers run in a VM, which means that before they can compile anything, the operating system has to load the whole VM, then they have to be JIT-compiled from bytecode to native code, all of which takes some time.
Speed of compilation depends on several factors.
Some languages are designed to be compiled fast. For example, Pascal was designed to be compiled using a single-pass compiler.
Compilers itself can be optimized too. For example, the Turbo Pascal compiler was written in hand-optimized assembler, which, combined with the language design, resulted in a really fast compiler working on 286-class hardware. I think that even now, modern Pascal compilers (e.g. FreePascal) are faster than Go compilers.

There are multiple reasons why the Go compiler is much faster than most C/C++ compilers:
Top reason: Most C/C++ compilers exhibit exceptionally bad designs (from compilation speed perspective). Also, from compilation speed perspective, some parts of the C/C++ ecosystem (such as editors in which programmers are writing their code) aren't designed with speed-of-compilation in mind.
Top reason: Fast compilation speed was a conscious choice in the Go compiler and also in the Go language
The Go compiler has a simpler optimizer than C/C++ compilers
Unlike C++, Go has no templates and no inline functions. This means that Go doesn't need to perform any template or function instantiation.
The Go compiler generates low-level assembly code sooner and the optimizer works on the assembly code, while in a typical C/C++ compiler the optimization passes work on an internal representation of the original source code. The extra overhead in the C/C++ compiler comes from the fact that the internal representation needs to be generated.
Final linking (5l/6l/8l) of a Go program can be slower than linking a C/C++ program, because the Go compiler is going through all of the used assembly code and maybe it is also doing other extra actions that C/C++ linkers aren't doing
Some C/C++ compilers (GCC) generate instructions in text form (to be passed to the assembler), while the Go compiler generates instructions in binary form. Extra work (but not much) needs to be done in order to transform the text into binary.
The Go compiler targets only a small number of CPU architectures, while the GCC compiler targets a large number of CPUs
Compilers which were designed with the goal of high compilation speed, such as Jikes, are fast. On a 2GHz CPU, Jikes can compile 20000+ lines of Java code per second (and the incremental mode of compilation is even more efficient).

Compilation efficiency was a major design goal:
Finally, it is intended to be fast: it should take at most a few seconds to build a large executable on a single computer. To meet these goals required addressing a number of linguistic issues: an expressive but lightweight type system; concurrency and garbage collection; rigid dependency specification; and so on. FAQ
The language FAQ is pretty interesting in regards to specific language features relating to parsing:
Second, the language has been designed to be easy to analyze and can be parsed without a symbol table.

While most of the above is true, there is one very important point that was not really mentionend: Dependency management.
Go only needs to include the packages that you are importing directly (as those already imported what they need). This is in stark contrast to C/C++, where every single file starts including x headers, which include y headers etc. Bottom line: Go's compiling takes linear time w.r.t to the number of imported packages, where C/C++ take exponential time.

A good test for the translation efficiency of a compiler is self-compilation: how long does it take a given compiler to compile itself? For C++ it takes a very long time (hours?). By comparison, a Pascal/Modula-2/Oberon compiler would compile itself in less than one second on a modern machine [1].
Go has been inspired by these languages, but some of the main reasons for this efficiency include:
A clearly defined syntax that is mathematically sound, for efficient scanning and parsing.
A type-safe and statically-compiled language that uses separate compilation with dependency and type checking across module boundaries, to avoid unnecessary re-reading of header files and re-compiling of other modules - as opposed to independent compilation like in C/C++ where no such cross-module checks are performed by the compiler (hence the need to re-read all those header files over and over again, even for a simple one-line "hello world" program).
An efficient compiler implementation (e.g. single-pass, recursive-descent top-down parsing) - which of course is greatly helped by points 1 and 2 above.
These principles have already been known and fully implemented in the 1970s and 1980s in languages like Mesa, Ada, Modula-2/Oberon and several others, and are only now (in the 2010s) finding their way into modern languages like Go (Google), Swift (Apple), C# (Microsoft) and several others.
Let's hope that this will soon be the norm and not the exception. To get there, two things need to happen:
First, software platform providers such as Google, Microsoft and Apple should start by encouraging application developers to use the new compilation methodology, while enabling them to re-use their existing code base. This is what Apple is now trying to do with the Swift programming language, which can co-exist with Objective-C (since it uses the same runtime environment).
Second, the underlying software platforms themselves should eventually be re-written over time using these principles, while simultaneously redesigning the module hierarchy in the process to make them less monolithic. This is of course a mammoth task and may well take the better part of a decade (if they are courageous enough to actually do it - which I am not at all sure in the case of Google).
In any case, it's the platform that drives language adoption, and not the other way around.
References:
[1] http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.System.pdf, page 6: "The compiler compiles itself in about 3 seconds". This quote is for a low cost Xilinx Spartan-3 FPGA development board running at a clock frequency of 25 MHz and featuring 1 MByte of main memory. From this one can easily extrapolate to "less than 1 second" for a modern processor running at a clock frequency well above 1 GHz and several GBytes of main memory (i.e. several orders of magnitude more powerful than the Xilinx Spartan-3 FPGA board), even when taking I/O speeds into account. Already back in 1990 when Oberon was run on a 25MHz NS32X32 processor with 2-4 MBytes of main memory, the compiler compiled itself in just a few seconds. The notion of actually waiting for the compiler to finish a compilation cycle was completely unknown to Oberon programmers even back then. For typical programs, it always took more time to remove the finger from the mouse button that triggered the compile command than to wait for the compiler to complete the compilation just triggered. It was truly instant gratification, with near-zero wait times. And the quality of the produced code, even though not always completely on par with the best compilers available back then, was remarkably good for most tasks and quite acceptable in general.

Go was designed to be fast, and it shows.
Dependency Management: no header file, you just need to look at the packages that are directly imported (no need to worry about what they import) thus you have linear dependencies.
Grammar: the grammar of the language is simple, thus easily parsed. Although the number of features is reduced, thus the compiler code itself is tight (few paths).
No overload allowed: you see a symbol, you know which method it refers to.
It's trivially possible to compile Go in parallel because each package can be compiled independently.
Note that Go isn't the only language with such features (modules are the norm in modern languages), but they did it well.

Quoting from the book "The Go Programming Language" by Alan Donovan and Brian Kernighan:
Go compilation is notably faster than most other compiled languages, even when building from scratch. There are three main reasons for the compiler’s speed. First, all imports must be explicitly listed at the beginning of each source file, so the compiler does not have to read and process an entire file to determine its dependencies. Second, the dependencies of a package form a directed acyclic graph, and because there are no cycles, packages can be compiled separately and perhaps in parallel. Finally, the object file for a compiled Go package records export information not just for the package itself, but for its dependencies too. When compiling a package, the compiler must read one object file for each import but need not look beyond these files.

The basic idea of compilation is actually very simple. A recursive-descent parser, in principle, can run at I/O bound speed. Code generation is basically a very simple process. A symbol table and basic type system is not something that requires a lot of computation.
However, it is not hard to slow down a compiler.
If there is a preprocessor phase, with multi-level include directives, macro definitions, and conditional compilation, as useful as those things are, it is not hard to load it down. (For one example, I'm thinking of the Windows and MFC header files.) That is why precompiled headers are necessary.
In terms of optimizing the generated code, there is no limit to how much processing can be added to that phase.

Simply ( in my own words ), because the syntax is very easy ( to analyze and to parse )
For instance, no type inheritance means, not problematic analysis to find out if the new type follows the rules imposed by the base type.
For instance in this code example: "interfaces" the compiler doesn't go and check if the intended type implement the given interface while analyzing that type. Only until it's used ( and IF it is used ) the check is performed.
Other example, the compiler tells you if you're declaring a variable and not using it ( or if you are supposed to hold a return value and you're not )
The following doesn't compile:
package main
func main() {
var a int
a = 0
}
notused.go:3: a declared and not used
This kinds of enforcements and principles make the resulting code safer, and the compiler doesn't have to perform extra validations that the programmer can do.
At large all these details make a language easier to parse which result in fast compilations.
Again, in my own words.

Go imports dependencies once for all files, so the import time doesn't increase exponentially with project size.
Simpler linguistics means interpreting them takes less computing.
What else?

Related

Difference between compiled and interpreted languages?

What are the relative strengths and weaknesses of compiled and interpreted languages?
Neither approach has a clear advantage over the other - if one approach was always better, chances are that we'd start using it everywhere!
Generally speaking, compilers offer the following advantages:
Because they can see all the code up-front, they can perform a number of analyses and optimizations when generating code that makes the final version of the code executed faster than just interpreting each line individually.
Compilers can often generate low-level code that performs the equivalent of a high-level ideas like "dynamic dispatch" or "inheritance" in terms of memory lookups inside of tables. This means that the resulting programs need to remember less information about the original code, lowering the memory usage of the generated program.
Compiled code is generally faster than interpreted code because the instructions executed are usually just for the program itself, rather than the program itself plus the overhead from an interpreter.
Generally speaking, compilers have the following drawbacks:
Some language features, such as dynamic typing, are difficult to compile efficiently because the compiler can't predict what's going to happen until the program is actually run. This means that the compiler might not generate very good code.
Compilers generally have a long "start-up" time because of the cost of doing all the analysis that they do. This means that in settings like web browsers where it's important to load code fast, compilers might be slower because they optimize short code that won't be run many times.
Generally speaking, interpreters have the following advantages:
Because they can read the code as written and don't have to do expensive operations to generate or optimize code, they tend to start up faster than compilers.
Because interpreters can see what the program does as its running, interpreters can use a number of dynamic optimizations that compilers might not be able to see.
Generally speaking, interpreters have the following disadvantages:
Interpreters typically have higher memory usage than compilers because the interpreter needs to keep more information about the program available at runtime.
Interpreters typically spend some CPU time inside of the code for the interpreter, which can slow down the program being run.
Because interpreters and compilers have complementary strengths and weaknesses, it's becoming increasingly common for language runtimes to combine elements of both. Java's JVM is a good example of this - the Java code itself is compiled, and initially it's interpreted. The JVM can then find code that's run many, many times and compile it directly to machine code, meaning that "hot" code gets the benefits of compilation while "cold" code does not. The JVM can also perform a number of dynamic optimizations like inline caching to speed up performance in ways that compilers typically don't.
Many modern JavaScript implementations use similar tricks. Most JavaScript code is short and doesn't do all that much, so they typically start off using an interpreter. However, if it becomes clear that the code is being run repeatedly, many JS engines will compile the code - or at least, compile bits and pieces of it - and optimize it using standard techniques. The net result is that the code is fast at startup (useful for loading web pages quickly) but gets faster the more that it runs.
One last detail is that languages are not compiled or interpreted. Usually, C code is compiled, but there are C interpreters available that make it easier to debug or visualize the code that's being run (they're often used in introductory programming classes - or at least, they used to be.) JavaScript used to be thought of as an interpreted language until some JS engines started compiling it. Some Python implementations are purely interpreters, but you can get Python compilers that generate native code. Now, some languages are easier to compile or interpret than others, but there's nothing stopping you from making a compiler or interpreter for any particular programming language. There's a theoretical result called the Futamura projections that shows that anything that can be interpreted can be compiled, for example.
Because the Start up time it’s cheaper and you can read interpreters program as is makes it a no brainer to me for the interpreter

What levels should static analyzers analyze?

I've noticed that some static analyzers operate on source code, while others operate on bytecode (e.g., FindBugs). I'm sure there are even some that work on object code.
My question is a simple one, what are the advantages and disadvantages of writing different kinds of static analyzers for different levels of analysis?
Under "static analyzers" I'm including linters, bug finders, and even full-blown verifiers.
And by levels of analysis I would include source code, high-level IRs, low-level IRs, bytecode, object code, and compiler plugins that have access to all phases.
These different facets can influence the level at which an analyzer may decide to work:
Designing a static analyzer is a lot of work. It would be a shame not to factor this work for several languages compiled to the same bytecode, especially when the bytecode retains most of the structure of the source program: Java (FindBugs), .NET (various tools related to Code Contracts). In some cases, the common target language was made up for the purpose of analysis although the compilation scheme wasn't following this path.
Related to 1, you may hope that your static analyzer will be a little less costly to write if it works on a normalized version of the program with a minimum number of constructs. When authoring static analyzers, having to write the treatment for repeat until when you have already written while do is a bother. You may structure your analyzer so that several functions are shared for these two cases, but the care-free way to handle this is to translate one to the other, or to translate the source to an intermediate language that only has one of them.
On the other hand as already pointed out in Flash Sheridan's answer, source code contains the most information. For instance, in languages with fuzzy semantics, bugs at the source level may be removed by compilation. C and C++ have numerous "undefined behaviors" where the compiler is allowed to do anything, including generating a program that works accidentally. Fine, you might think, if the bug is not in the executable it's not a problematic bug. But when you ever re-compile the program for another architecture or with the next version of the compiler, the bug may appear again. This is one reason for not doing the analysis after any phase that might potentially remove bugs.
Some properties can only be checked with reasonable precision on compiled code. That includes absence of compiler-introduced bugs as pointed out again by Flash Sheridan, but also worst-case execution time. Similarly, many languages do not let you know what floating-point code does precisely unless you look at the assembly generated by the compiler (this is because existing hardware does not make it convenient for them to guarantee more). The choice is then to write an imprecise source-level analyzer that takes into account all possibilities, or to analyze precisely one particular compilation of a floating-point program, as long as it is understood that it is that precise assembly code that will be executed.
Source code analysis is the most generally useful, of course; sometimes heuristics even need to analyze comments or formatting. But you’re right that even object code analysis can be necessary, e.g., to detect bugs introduced by GCC misfeatures. Thomas Reps, head of GrammaTech and a Wisconsin professor, gave a good talk on this at Stanford a couple of years ago: http://pages.cs.wisc.edu/~reps/#TOPLAS-WYSINWYX.

Questions about possible java(or other memory managed language) optimizations

From what I have read java (usually) seems to compile java to not very (is at all?) optimised java bytecode, leaving it to the jit to optimise. Is this true? And if it is has there been any exploration (possibly in alternative implementations) of getting the compiler to optimise the code so the jit has less work to do (is this possible)?
Also many people seem to have a dislike for native code generation (sometimes referred to as ahead of time compilation) for Java (and many other high level memory managed languages) , for many reasons such as loss of portability (and ect.) , but also partially because (at least for those languages that have a just in time compiler) the thinking goes that ahead of time compilation to machine code will miss the possible optimisations that can be done by a jit compiler and therefore may be slower in the long run.
This leads me to wonder whether anyone has ever tried to implement http://en.wikipedia.org/wiki/Profile-guided_optimization (compiling to a binary + some extras then running the program and analysing the runtime information of the test run to generate a hopefully more optimised binary for real world usage) for java/(other memory managed languages) and how this would compare to jit code? Anyone have a clue?
Personally, I think the big difference is not between JIT compiling and AOT compiling, but between class-compilation and whole-program optimization.
When you run javac, it only looks at a single .java file, compiling it into a single .class file. All the interface implementations and virtual methods and overrides are checked for validity but left unresolved (because it's impossible to know the true method invocation targets without analyzing the whole program).
The JVM uses "runtime loading and linking" to assemble all of your classes into a coherent program (and any class in your program can invoke specialized behavior to change the default loading/linking behavior).
But then, at runtime, the JVM can remove the vast majority of virtual methods. It can inline all of your getters and setters, turning them into raw fields. And when those raw fields are inlined, it can perform constant-propagation to further optimize the code. (At runtime, there's no such thing as a private field.) And if there's only one thread running, the JVM can eliminate all synchronization primitives.
To make a long story short, there are a lot of optimizations that aren't possible without analyzing the whole program, and the best time for doing whole program analysis is at runtime.
Profile-guided optimization has some caveats, one of them mentioned even in the Wiki article you linked. It's results are valid
for the given samples, representing how your code is actually used by the user or other code.
for the given platform (CPU, memory + other hardware, OS, whatever).
From the performance point of view there are quite big differences even among platforms that are usually considered (more or less) the same (e.g. compare a single core, old Athlon with 512M with a 6 core Intel with 8G, running on Linux, but with very different kernel versions).
for the given JVM and its config.
If any of these change then your profiling results (and the optimizations based on them) are not necessary valid any more. Most likely some of the optimizations will still have a beneficial effect, but some of them may turn out suboptimal (or even degrading performance).
As it was mentioned the JIT JVMs do something very similar to profiling, but they do it on the fly. It's also called 'hotspot', because it constantly monitors the executed code, looks for hot spots that are executed frequently and will try to optimize only those parts. At this point it will be able to exploit more knowledge about the code (knowing the context of it, how it is used by other classes, etc.) so - as mentioned by you and the other answers - it can do better optimizations as a static one. It will continue monitoring and if its needed it will do another turn of optimization later, this time trying even harder (looking for more, more expensive optimizations).
Working on the real life data (usage statistics + platform + config) it can avoid the caveats mentioned before.
The price of it is some additional time it needs to spend on "profiling" + JIT-ing. Most of the time its spent quite well.
I guess a profile-guided optimizer could still compete with it (or even beat it), but only in some special cases, if you can avoid the caveats:
you are quite sure that your samples represent the real life scenario well and they won't change too much during execution.
you know your target platform quite precisely and can do the profiling on it.
and of course you know/control the JVM and its config.
It will happen rarely and I guess in general JIT will give you better results, but I have no evidence for it.
Another possibility for getting value from the profile-guided optimization if you target a JVM that can't do JIT optimization (I think most small devices have such a JVM).
BTW one disadvantage mentioned in other answers would be quite easy to avoid: if static/profile guided optimization is slow (which is probably the case) then do it only for releases (or RCs going to testers) or during nightly builds (where time does not matter so much).
I think the much bigger problem would be to have good sample test cases. Creating and maintaining them is usually not easy and takes a lot of time. Especially if you want to be able to execute them automatically, which would be quite essential in this case.
The official Java Hot Spot compiler does "adaptive optimisation" at runtime, which is essentially the same as the profile-guided optimisation you mentioned. This has been a feature of at least this particular Java implementation for a long time.
The trade-off to performing more static analysis or optimisation passes up-front at compile time is essentially the (ever-diminishing) returns you get from this extra effort against the time it takes for the compiler to run. A compiler like MLton (for Standard ML) is a whole-program optimising compiler with a lot of static checks. It produces very good code, but becomes very, very slow on medium-to-large programs, even on a fast system.
So the Java approach seems to be to use JIT and adaptive optimisation as much as possible, with the initial compilation pass just producing an acceptable valid binary. The absolute opposite end is to use an approach like that of something like MLKit, which does a lot of static inference of regions and memory behaviour.

Assembly language and compiled languages

How is assembly faster than compiled languages if both are translated to machine code?
I'm talking about truly compiled languages which are translated to machine code. Not C# or Java which are compiled to an intermediate language first and then compiled to native code by a software interpreter, etc.
On Wikipedia, I found something which I'm not sure if it's in any way related to this. Is it because that translation from a higher level language generates extra machine code? Or is my understanding wrong?
A utility program called an assembler is used to translate assembly language statements into the target computer's machine code. The assembler performs a more or less isomorphic translation (a one-to-one mapping) from mnemonic statements into machine instructions and data. This is in contrast with high-level languages, in which a single statement generally results in many machine instructions.
Well, it relates a bit to your question, indeed. The point is that compilers produce inefficient machine code at times for various reasons, such as not being able to completely analyze your code, inserting automatic range checks, automatic checks for objects being null, etc.
On the other hand if you write assembler code by hand and know what you're doing, then you can probably write some things much more efficient than the compiler, although the compiler's behavior may be tweaked and you can usually tell it not to do range checking, for example.
Most people, however, will not write better assembler code than a compiler, simply because compilers are written by people who know a good deal of really weird but really cool optimizations. Also things like loop unrolling are usually a pain to write yourself and make the resulting code faster in many cases.
While it's generally true that everything that a computer executes is machine code, the code that runs differs greatly depending on how many abstraction levels you put between the machine and the programmer. For Assembler that's one level, for Java there are a few more ...
Also many people mistakenly believe that certain optimizations at a higher abstraction layer pay off at a lower one. This is not necessarily the case and the compiler may just have trouble understanding what you are trying to do and fail to properly optimize it.
Assembly may sometimes be faster than a compiled language if an assembly programmer writes better assembly than that generated by the compiler.
A compiled language is often faster than assembly because programmers who write compilers usually know the CPU architecture better than programmers who are utilizing assembly in a one-off, limited-case, situation.
An assembly expert may be able to write assembly code that is more effective (fewer instructions, more efficient instructions, SIMD, ...) than what a compiler generates automatically.
However, most of the time, you're better off trusting the optimizer of your compiler.
Learn what your compiler does. Then let the compiler do it.
My standard answer when questions about assembly vs. high-level come up is to take a look at Michael Abrash's Graphics Programming Black Book.
The first couple of chapters give a good idea of what you can optimise effectively using assembly, and what you can't.
You can download it from GameDev - Jeff's links seem to be broken now unfortunately.
All good answers. My only additional point is that programmers tend to write a certain number of lines of code per day, regardless of language. Since the advantage of a high-level language is that it lets you get more done with less code, it takes incredible programmer discipline to actually write less code.
This is especially an issue for performance because it matters almost nowhere except in a tiny part of the code. It only matters in your hotspots - code that you write (1) consuming a significant fraction of execution time (2) without calling functions (3).
First of all, compilers generate very good (fast) assembly code.
It's true that compilers can add extra code since high order languages have mechanisms, like virtual methods and exceptions in C++. Thus the compiler will have to produce more code. There are cases where raw assembly could speed up the code but that's rare nowdays.
First - assembler should be used only in small code pieces, which eat most of the CPU time in a program - some kind of calculations for example - in the "bottle neck" of algorithm.
Secondly - it depends on experience in ASM of those who implements the same code in Assembler. If the assembler implementation of "bottle neck" code will be faster. If experience is low - it will be slower. And it will contain a lot of bugs. If experience is high enough - ASM will give significant profit.
How is assembly faster than compiled languages if both are translated to machine code?
The implicit assumption is hand-written assembly code. Of course, most compilers (e.g. GCC for C, C++, Fortran, Go, D etc...) are generating some assembler code; for example you might compile your foo.cc C++ source code with g++ -fverbose-asm -Wall -S -O2 -march=native foo.cc and look into the generated foo.s assembler code.
However, efficient assembler code is so difficult to write that, today, compilers can optimize better than human do. See this.
So practically speaking, it is not worth coding in assembler (also, take into account that development efforts cost very often much more than the hardware running the compiled code). Even when performance matters a lot and is worth spending a lot of money, it is better to hand-code only very few routines in assembler, or even to embed some assembler code in some of your C routines.
Look into the CppCon 2017 talk: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

how to minimize a programming language compile time?

I was thinking more about the programming language i am designing. and i was wondering, what are ways i could minimize its compile time?
Your main problem today is I/O. Your CPU is many times faster than main memory and memory is about 1000 times faster than accessing the hard disk.
So unless you do extensive optimizations to the source code, the CPU will spend most of the time waiting for data to be read or written.
Try these rules:
Design your compiler to work in several, independent steps. The goal is to be able to run each step in a different thread so you can utilize multi-core CPUs. It will also help to parallelize the whole compile process (i.e. compile more than one file at the same time)
It will also allow you to load many source files in advance and preprocess them so the actual compile step can work faster.
Try to allow to compile files independently. For example, create a "missing symbol pool" for the project. Missing symbols should not cause compile failures as such. If you find a missing symbol somewhere, remove it from the pool. When all files have been compiled, check that the pool is empty.
Create a cache with important information. For example: File X uses symbols from file Y. This way, you can skip compiling file Z (which doesn't reference anything in Y) when Y changes. If you want to go one step further, put all symbols which are defined anywhere in a pool. If a file changes in such a way that symbols are added/removed, you will know immediately which files are affected (without even opening them).
Compile in the background. Start a compiler process which checks the project directory for changes and compile them as soon as the user saves the file. This way, you will only have to compile a few files each time instead of everything. In the long run, you will compile much more but for the user, turnover times will be much shorter (= time user has to wait until she can run the compiled result after a change).
Use a "Just in time" compiler (i.e. compile a file when it is used, for example in an import statement). Projects are then distributed in source form and compiled when run for the first time. Python does this. To make this perform, you can precompile the library during the installation of your compiler.
Don't use header files. Keep all information in a single place and generate header files from the source if you have to. Maybe keep the header files just in memory and never save them to disk.
what are ways i could minimize its compile time?
No compilation (interpreted language)
Delayed (just in time) compilation
Incremental compilation
Precompiled header files
I've implemented a compiler myself, and ended up having to look at this once people started batch feeding it hundreds of source files. I was quite suprised what I found out.
It turns out that the most important thing you can optimize is not your grammar. It's not your lexical analyzer or your parser either. Instead, the most important thing in terms of speed is the code that reads in your source files from disk. I/O's to disk are slow. Really slow. You can pretty much measure your compiler's speed by the number of disk I/Os it performs.
So it turns out that the absolute best thing you can do to speed up a compiler is to read the entire file into memory in one big I/O, do all your lexing, parsing, etc. from RAM, and then write out the result to disk in one big I/O.
I talked with one of the head guys maintaining Gnat (GCC's Ada compiler) about this, and he told me that he actually used to put everything he could onto RAM disks so that even his file I/O was really just RAM reads and writes.
In most languages (pretty well everything other than C++), compiling individual compilation units is quite fast.
Binding/linking is often what's slow - the linker has to reference the whole program rather than just a single unit.
C++ suffers as - unless you use the pImpl idiom - it requires the implementation details of every object and all inline functions to compile client code.
Java (source to bytecode) suffers because the grammar doesn't differentiate objects and classes - you have to load the Foo class to see if Foo.Bar.Baz is the Baz field of object referenced by the Bar static field of the Foo class, or a static field of the Foo.Bar class. You can make the change in the source of the Foo class between the two, and not change the source of the client code, but still have to recompile the client code, as the bytecode differentiates between the two forms even though the syntax doesn't. AFAIK Python bytecode doesn't differentiate between the two - modules are true members of their parents.
C++ and C suffer if you include more headers than are required, as the preprocessor has to process each header many times, and the compiler compile them. Minimizing header size and complexity helps, suggesting better modularity would improve compilation time. It's not always possible to cache header compilation, as what definitions are present when the header is preprocessed can alter its semantics, and even syntax.
C suffers if you use the preprocessor a lot, but the actual compilation is fast; much of C code uses typedef struct _X* X_ptr to hide implementation better than C++ does - a C header can easily consist of typedefs and function declarations, giving better encapsulation.
So I'd suggest making your language hide implementation details from client code, and if you are an OO language with both instance members and namespaces, make the syntax for accessing the two unambiguous. Allow true modules, so client code only has to be aware of the interface rather than implementation details. Don't allow preprocessor macros or other variation mechanism to alter the semantics of referenced modules.
Here are some performance tricks that we've learned by measuring compilation speed and what affects it:
Write a two-pass compiler: characters to IR, IR to code. (It's easier to write a three-pass compiler that goes characters -> AST -> IR -> code, but it's not as fast.)
As a corollary, don't have an optimizer; it's hard to write a fast optimizer.
Consider generating bytecode instead of native machine code. The virtual machine for Lua is a good model.
Try a linear-scan register allocator or the simple register allocator that Fraser and Hanson used in lcc.
In a simple compiler, lexical analysis is often the greatest performance bottleneck. If you are writing C or C++ code, use re2c. If you're using another language (which you will find much more pleasant), read the paper aboug re2c and apply the lessons learned.
Generate code using maximal munch, or possibly iburg.
Surprisingly, the GNU assembler is a bottleneck in many compilers. If you can generate binary directly, do so. Or check out the New Jersey Machine-Code Toolkit.
As noted above, design your language to avoid anything like #include. Either use no interface files or precompile your interface files. This tactic dramatically reduces the burdern on the lexer, which as I said is often the biggest bottleneck.
Here's a shot..
Use incremental compilation if your toolchain supports it.
(make, visual studio, etc).
For example, in GCC/make, if you have many files to compile, but only make changes in one file, then only that one file is compiled.
Eiffel had an idea of different states of frozen, and recompiling didn't necessarily mean that the whole class was recompiled.
How much can you break up the compliable modules, and how much do you care to keep track of them?
Make the grammar simple and unambiguous, and therefore quick and easy to parse.
Place strong restrictions on file inclusion.
Allow compilation without full information whenever possible (eg. predeclaration in C and C++).
One-pass compilation, if possible.
One thing surprisingly missing in answers so far: make you you're doing a context free grammar, etc. Have a good hard look at languages designed by Wirth such as Pascal & Modula-2. You don't have to reimplement Pascal, but the grammar design is custom made for fast compiling. Then see if you can find any old articles about the tricks Anders pulled implementing Turbo Pascal. Hint: table driven.
it depends on what language/platform you're programming for. for .NET development, minimise the number of projects that you have in your solution.
In the old days you could get dramatic speedups by setting up a RAM drive and compiling there. Don't know if this still holds true, though.
In C++ you could use distributed compilation with tools like Incredibuild
A simple one: make sure the compiler can natively take advantage of multi-core CPUs.
Make sure that everything can be compiled the fist time you try to compile it. E.g. ban forward references.
Use a context free grammar so that you can find the correct parse tree without a symbol table.
Make sure that the semantics can be deduced from the syntax so you can construct the correct AST directly rather than by mucking with a parse tree and symbol table.
How serious a compiler is this?
Unless the syntax is pretty convoluted, the parser should be able to run no more than 10-100 times slower than just indexing through the input file characters.
Similarly, code generation should be limited by output formatting.
You shouldn't be hitting any performance issues unless you're doing a big, serious compiler, capable of handling mega-line apps with lots of header files.
Then you need to worry about precompiled headers, optimization passes, and linking.
I haven't seen much work done for minimizing the compile time. But some ideas do come to mind:
Keep the grammar simple. Convoluted grammar will increase your compile time.
Try making use of parallelism, either using multicore GPU or CPU.
Benchmark a modern compiler and see what are the bottlenecks and what you can do in you compiler/language to avoid them.
Unless you are writing a highly specialized language, compile time is not really an issue..
Make a build system that doesn't suck!
There's a huge amount of programs out there with maybe 3 source files that take under a second to compile, but before you get that far you'd have to sit through an automake script that takes about 2 minutes checking things like the size of an int. And if you go to compile something else a minute later, it makes you sit through almost exactly the same set of tests.
So unless your compiler is doing awful things to the user like changing the size of its ints or changing basic function implementations between runs, just dump that info out to a file and let them get it in a second instead of 2 minutes.

Resources