What languages still start fast with larger amounts of code?

What languages still start fast with larger amounts of code? - performance

tl;dr I want a rapid edit-compile-run workflow, but prefixing every single function call in "somenamespace_" is annoying.
One (debatable) advantage of C is that you can have separate compilation units. Large amounts of code can be compiled into objects, and libraries, which are much faster to link together than parsing any amount of C code. It can be slower to run, since inlining optimizations can't be done between compliation units, but it is very fast to link, especially with that ld.gold linker.
Problem being, it's C. Lack of namespaces, pretty much.
But even C++ doesn't (in practice) use separate compliation units. Sure you can, but for the most part it's about including megabytes of templates in header files. The whole philosophy behind "private:" is to pretend you have separated interfaces, without actually having them at all. So the standard practice in a language is important too, because even if I make my own isolated binary interfaces, if each implementation has to #include the same tons of code from third parties, the time saved in isolating them doesn't add up. C++ is kind of... featureful for me, anyway. Really, I just want namespaces. (And modules... sigh)
Languages like python, racket and java use partial compilation, which seems fast enough, but you still get slowdowns in startup for large projects, as they have to translate all that bytecode into machine code every time. There's no option (outside of writing a C interface) to isolate code in a way that's fast to combine with code that you are working on.
I'd just like to know what languages where large amounts of code can be concealed behind small, quick to load interfaces, so that compiling them initially might be slow, but then I can get a rapid edit-compile-run cycle as I work on parts of it. Instead of this python hack, where I change something in the progress displayer, and then have to sit there staring at it as it loads the standard library, and then loads the database code, and then loads the web server code, and then loads the image processing code, and then sits there for another 20 seconds figuring out the gobject-introspection thing for some gui code.
It's not always obvious. I'm staring at D trying to figure out when it parses the code from dependencies when it recompiles stuff, without a clue. Go seems to just slap all the code together (including dependencies!) into a single compilation unit, but maybe I'm wrong there? And I don't think Nim regenerates all the generated C every compile, but maybe it does? Rust uses separate compliation units (I think) but it's still slow as heck to compile! And python really does compile fast, so it's only once my projects start getting big and successful that I start getting tripped up by it.

I haven't learned the other languages you mentioned such as Rust, D, or Python. When you compile a Nim program, it creates a folder called nimcache. That contains all the .c and .o files. If you make a change to the Nim program and recompile, it tries to reuse the files in that nimcache folder. The files have a .c and .o extension.

Related

How to link a program in GCC to prelinked library?

OK, I have the problem, I do not know exactly the correct terms in order to find what I am looking for on google. So I hope someone here can help me out.
When developing real time programs on embedded devices you might have to iterate a few hundred or thousand times until you get the desired result. When using e.g. ARM devices you wear out the internal flash quite quickly. So typically you develop your programs to reside in the RAM of the device and all is ok. This is done using GCC's functionality to split the code in various sections.
Unfortunately, the RAM of most devices is much smaller than the flash. So at one point in time, your program gets too big to fit in RAM with all variables etc. (You choose the size of the device such that one assumes it will fit the whole code in flash later.)
Classical shared objects do not work as there is nothing like a dynamical linker in my environment. There is no OS or such.
My idea was the following: For the controller it is no problem to execute code from both RAM and flash. When compiling with the correct attributes for the functions this is also no big problem for the compiler to put part of the program in RAM and part in flash.
When I have some functionality running successfully I create a library and put this in the flash. The main development is done in the 'volatile' part of the development in RAM. So the flash gets preserved.
The problem here is: I need to make sure, that the library always gets linked to the exact same location as long as I do not reflash. So a single function must always be on the same address in flash for each compile cycle. When something in the flash is missing it must be placed in RAM or a lining error must be thrown.
I thought about putting together a real library and linking against that. Here I am a bit lost. I need to tell GCC/LD to link against a prelinked file (and create such a prelinked file).
It should be possible to put all the library objects together and link this together in the flash. Then the addresses could be extracted and the main program (for use in RAM) can link against it. But: How to do these steps?
In the internet there is the term prelink as well as a matching program for linux. This is intended to speed up the loading times. I do not know if this program might help me out as a side effect. I doubt it but I do not understand the internals of its work.
Do you have a good idea how to reach the goal?

You are solving a non-problem. Embedded flash usually has a MINIMUM write cycle of 10,000. So even if you flash it 20 times a day, it will last a year and half. An St-Nucleo is $13. So that's less than 3 pennies a day :-). The TYPICAL write cycle is even longer, at about 100,000. It will be a long time before you wear them out.
Now if you are using them for dynamic storage, that might be a concern, depending on the usage patterns.
But to answer your questions, you can build your code into a library .a file easily enough. However, GCC does not guarantee that it links the object code in any order, as it depends on optimization level. Furthermore, only functions that are referenced in a library file is pulled in, so if your function calls change, it may pull in more or less library functions.

How to know what code has been optimized out?

Is there a way to get information during or after compilation about what parts of the code have been optimized out, but without looking at the assembly or executing the code.
It'd be nice to know immediately if a big code chunk gets optimized away.

Sorry, but your expectations do not match what compilers actually do. Whether you're trying to find dead code or to find bugs that cause code that should run to be skipped, it is not information that a compiler can provide in an easy-to-read form.
With a compiler that translates each line of source code into a sequence of machine instructions, the compiler could easily tell you that it didn't include anything corresponding to a particular line. Of course it couldn't tell you if a line was translated to machine instructions but those machine instructions in fact won't ever be executed — code reachability is undecidable — but I don't think that's what you're after anyway.
The problem is that modern optimizing compilers are a lot more complex than that. A piece of code is often copied around and compiled multiple times under different assumptions (specialization, partial evaluation, loop unrolling, …). Or, conversely, pieces of code can be merged together (function inlining, …). There isn't a simple correspondence between source code and machine code. (That's why debuggers sometimes have trouble reporting the exact source code location of a binary instruction.)
If a big chunk of code gets optimized away, that may simply because it's one of many specialized copies and that particular specialization never happens (e.g. there's separate code for x==0 and x!=0, and separate code for y==0 and y!=0, and x and y are never 0 together so the x==0 && y==0 branch is eventually dropped). It may be something generated by a compile-time conditional instruction, such as a C macro that the compiler optimizes; this happens so often in C code that if compilers reported all such instances, that would create a lot of false positives.
Getting useful reports of potentially unused code or suspicious-looking program code that could indicate a bug requires a rather different kind of static analysis than what compilers do. There are tools that can do that, but they're typically not the same tools that convert source code to optimized machine code. Making static analysis tools that both detect potential problems often enough to be useful and don't produce so many false positives that they're practically unusable is not easy.

gcc -S
will output the assembly code that would have been passed to the assembler (and eventually been linked into the executable). If you squint the right way (and are patient), you can work backwards from that to confirm whether a given bit of code has actually been included in the executable, or was optimized away.
Obviously not something you'd do unless you have a suspicion that something was going on, given the time and effort required...

good programming practice

I'm reading Randall Hyde's Write great code (volume 2) and I found this:
[...] it’s not good programming practice to create monolithic
applications, where all the source code appears in one source file (or is processed by a single compilation) [...]
I was wondering, why is this so bad?
Thanks everyone for your answers, I really wish to accept more of them, but I've chosen the most synthetic, so that who reads this question finds immediately the essentials.
Thanks guys ;)

The main reason, more important in my opinion than the sheer size of the file in question, is the lack of modularity. Software becomes easy to maintain if you have it broken down into small pieces whose interactions with each other are through a few well defined places, such as a few public methods or a published API. If you put everything in one big file, the tendency is to have everything dependent on the internals of everything else, which leads to maintenance nightmares.
Early in my career, I worked on a Geographic Information System that consisted of over a million lines of C. The only thing that made it maintainable, the only thing that made it work in the first place is that we had a sharp dividing line between everything "above" and everything "below". The "above" code implemented user interfaces, application specific processing, etc, and everything "below' implemented the spatial database. And the dividing line was a published API. If you were working "above", you didn't need to know how the "below" code worked, as long as it followed the published API. If you were working "below", you didn't care how your code was being used, as long as you implemented the published API. At one point, we even replaced a huge chunk of the "below" side that had stored stuff in proprietary files with tables in SQL databases, and the "above" code didn't have to know or care.

Because everything gets so crammed.
If you have separate files for separate things then you'll be able to find and edit it much faster.
Also, if there's an error, you can locate it more easily.

Because you quickly get (tens of) thousands of lines of unmaintainable code.

Impossible to read, impossible to maintain, impossible to extend.
Also, it's good for team work, since there are a lot of small files which can be edited in parallel by different developers.

Some reasons:
It's difficult to understand something that's all jammed together like that. Breaking
the logic into functions makes it easier to follow.
Factoring the logic into discrete components allows you to re-use those components elsewhere. Sometimes "elsewhere" means "elsewhere in the same program."
You cannot possibly unit-test something that's built in a monolithic fashion. Breaking
a huge function (or program) into pieces allows you to test those pieces individually.

While I understand that HTML and CSS aren't 'programming languages,' I suppose you could look at it in terms of having all of the .css, .js and .html all on one page: It makes it difficult to recycle or debug code.

Including all the source code in a single file makes it hard to manage code. The better approach should be to divide your program in separate modules. Each module should have its own source file and all the modules should be linked finally to create the executable. This helps a lot in maintaining code and makes it easy to find module specific bugs.
Moreover, if you have 2 or more people working simultaneously on the same project, there is a high probability that you will be wasting lots of your time in resolving code conflicts.
Also if you have to test individual components (or modules), you can do it easily without bothering about the other independent modules (if any).

Because decomposition is a computer science fundamental: Solve large problems by decomposing them into smaller pieces. Not only are the smaller problems more manageable, but they make it easier to keep a picture of the entire problem in your head.
When you talk about a monolithic program in a single file, all I can think of are the huge spaghetti code that found its way into COBOL and FORTRAN way back in the day. It encourages a cut and paste mentality that only gets worse with time.
Object-oriented languages are an attempt to help with this. OO languages decompose problems software components, with data and functions encapsulated together, that map to our mental models of how the world works.

Think about it in terms of the Single Responsibility Principle. If that one large file, or one monolithic application, or one "big something" is responsible for everything concerning that system, it's a catch-all bucket of functionality. That functionality should be separated into its discrete components for maintainability, re-usability, etc.

Programmers do a thousand line of codes right, if you do things on separate side. you can easily manage and locate your files.

Apart from all the above mentioned reasons, typically all the text from a compilation unit is included in final program. However if you split things up and it turns out all code in particular file is not getting used it will not be linked. Even if its getting linked it might be easy to turn it into a DLL should you want to decide using the functionality at run time. This facilitates dependency management, reduces build times by compiling only the modified source file leading to greater maintainability and productivity.

From some really bad experience I can tell you that it's unreadable. Function after function of code, you can never understand what the programmer meant or how to find your way out of it.
You cannot even scroll in the file because a little movement of the scroll bar scrolls two pages of code.
Right now I'm rewriting an entire application because the original programmers thought it a great idea to create 3000-line code files. It just cannot be maintained. And of course it's totally untestable.

I think you have never worked in big company that has 6000 lines of code in one file... and every couple thousand lines you see comments like /** Do not modify this block, it's critical */ and you think the bug assigned to you is coming from that block. And your manager says, "yeah look into that file, it's somewhere in it."
And the code breaks all those beautiful OOP concepts, has dirty switches. And like fifty contributors.
Lucky you.

How does Go compile so quickly?

I've Googled and poked around the Go website, but I can't find an explanation for Go's extraordinary build times. Are they products of the language features (or lack thereof), a highly optimized compiler, or something else? I'm not trying to promote Go; I'm just curious.

Dependency analysis.
The Go FAQ used to contain the following sentence:
Go provides a model for software
construction that makes dependency
analysis easy and avoids much of the
overhead of C-style include files and
libraries.
While the phrase is not in the FAQ anymore, this topic is elaborated upon in the talk Go at Google, which compares the dependency analysis approach of C/C++ and Go.
That is the main reason for fast compilation. And this is by design.

I think it's not that Go compilers are fast, it's that other compilers are slow.
C and C++ compilers have to parse enormous amounts of headers - for example, compiling C++ "hello world" requires compiling 18k lines of code, which is almost half a megabyte of sources!
$ cpp hello.cpp | wc
18364 40513 433334
Java and C# compilers run in a VM, which means that before they can compile anything, the operating system has to load the whole VM, then they have to be JIT-compiled from bytecode to native code, all of which takes some time.
Speed of compilation depends on several factors.
Some languages are designed to be compiled fast. For example, Pascal was designed to be compiled using a single-pass compiler.
Compilers itself can be optimized too. For example, the Turbo Pascal compiler was written in hand-optimized assembler, which, combined with the language design, resulted in a really fast compiler working on 286-class hardware. I think that even now, modern Pascal compilers (e.g. FreePascal) are faster than Go compilers.

There are multiple reasons why the Go compiler is much faster than most C/C++ compilers:
Top reason: Most C/C++ compilers exhibit exceptionally bad designs (from compilation speed perspective). Also, from compilation speed perspective, some parts of the C/C++ ecosystem (such as editors in which programmers are writing their code) aren't designed with speed-of-compilation in mind.
Top reason: Fast compilation speed was a conscious choice in the Go compiler and also in the Go language
The Go compiler has a simpler optimizer than C/C++ compilers
Unlike C++, Go has no templates and no inline functions. This means that Go doesn't need to perform any template or function instantiation.
The Go compiler generates low-level assembly code sooner and the optimizer works on the assembly code, while in a typical C/C++ compiler the optimization passes work on an internal representation of the original source code. The extra overhead in the C/C++ compiler comes from the fact that the internal representation needs to be generated.
Final linking (5l/6l/8l) of a Go program can be slower than linking a C/C++ program, because the Go compiler is going through all of the used assembly code and maybe it is also doing other extra actions that C/C++ linkers aren't doing
Some C/C++ compilers (GCC) generate instructions in text form (to be passed to the assembler), while the Go compiler generates instructions in binary form. Extra work (but not much) needs to be done in order to transform the text into binary.
The Go compiler targets only a small number of CPU architectures, while the GCC compiler targets a large number of CPUs
Compilers which were designed with the goal of high compilation speed, such as Jikes, are fast. On a 2GHz CPU, Jikes can compile 20000+ lines of Java code per second (and the incremental mode of compilation is even more efficient).

Compilation efficiency was a major design goal:
Finally, it is intended to be fast: it should take at most a few seconds to build a large executable on a single computer. To meet these goals required addressing a number of linguistic issues: an expressive but lightweight type system; concurrency and garbage collection; rigid dependency specification; and so on. FAQ
The language FAQ is pretty interesting in regards to specific language features relating to parsing:
Second, the language has been designed to be easy to analyze and can be parsed without a symbol table.

While most of the above is true, there is one very important point that was not really mentionend: Dependency management.
Go only needs to include the packages that you are importing directly (as those already imported what they need). This is in stark contrast to C/C++, where every single file starts including x headers, which include y headers etc. Bottom line: Go's compiling takes linear time w.r.t to the number of imported packages, where C/C++ take exponential time.

A good test for the translation efficiency of a compiler is self-compilation: how long does it take a given compiler to compile itself? For C++ it takes a very long time (hours?). By comparison, a Pascal/Modula-2/Oberon compiler would compile itself in less than one second on a modern machine [1].
Go has been inspired by these languages, but some of the main reasons for this efficiency include:
A clearly defined syntax that is mathematically sound, for efficient scanning and parsing.
A type-safe and statically-compiled language that uses separate compilation with dependency and type checking across module boundaries, to avoid unnecessary re-reading of header files and re-compiling of other modules - as opposed to independent compilation like in C/C++ where no such cross-module checks are performed by the compiler (hence the need to re-read all those header files over and over again, even for a simple one-line "hello world" program).
An efficient compiler implementation (e.g. single-pass, recursive-descent top-down parsing) - which of course is greatly helped by points 1 and 2 above.
These principles have already been known and fully implemented in the 1970s and 1980s in languages like Mesa, Ada, Modula-2/Oberon and several others, and are only now (in the 2010s) finding their way into modern languages like Go (Google), Swift (Apple), C# (Microsoft) and several others.
Let's hope that this will soon be the norm and not the exception. To get there, two things need to happen:
First, software platform providers such as Google, Microsoft and Apple should start by encouraging application developers to use the new compilation methodology, while enabling them to re-use their existing code base. This is what Apple is now trying to do with the Swift programming language, which can co-exist with Objective-C (since it uses the same runtime environment).
Second, the underlying software platforms themselves should eventually be re-written over time using these principles, while simultaneously redesigning the module hierarchy in the process to make them less monolithic. This is of course a mammoth task and may well take the better part of a decade (if they are courageous enough to actually do it - which I am not at all sure in the case of Google).
In any case, it's the platform that drives language adoption, and not the other way around.
References:
[1] http://www.inf.ethz.ch/personal/wirth/ProjectOberon/PO.System.pdf, page 6: "The compiler compiles itself in about 3 seconds". This quote is for a low cost Xilinx Spartan-3 FPGA development board running at a clock frequency of 25 MHz and featuring 1 MByte of main memory. From this one can easily extrapolate to "less than 1 second" for a modern processor running at a clock frequency well above 1 GHz and several GBytes of main memory (i.e. several orders of magnitude more powerful than the Xilinx Spartan-3 FPGA board), even when taking I/O speeds into account. Already back in 1990 when Oberon was run on a 25MHz NS32X32 processor with 2-4 MBytes of main memory, the compiler compiled itself in just a few seconds. The notion of actually waiting for the compiler to finish a compilation cycle was completely unknown to Oberon programmers even back then. For typical programs, it always took more time to remove the finger from the mouse button that triggered the compile command than to wait for the compiler to complete the compilation just triggered. It was truly instant gratification, with near-zero wait times. And the quality of the produced code, even though not always completely on par with the best compilers available back then, was remarkably good for most tasks and quite acceptable in general.

Go was designed to be fast, and it shows.
Dependency Management: no header file, you just need to look at the packages that are directly imported (no need to worry about what they import) thus you have linear dependencies.
Grammar: the grammar of the language is simple, thus easily parsed. Although the number of features is reduced, thus the compiler code itself is tight (few paths).
No overload allowed: you see a symbol, you know which method it refers to.
It's trivially possible to compile Go in parallel because each package can be compiled independently.
Note that Go isn't the only language with such features (modules are the norm in modern languages), but they did it well.

Quoting from the book "The Go Programming Language" by Alan Donovan and Brian Kernighan:
Go compilation is notably faster than most other compiled languages, even when building from scratch. There are three main reasons for the compiler’s speed. First, all imports must be explicitly listed at the beginning of each source file, so the compiler does not have to read and process an entire file to determine its dependencies. Second, the dependencies of a package form a directed acyclic graph, and because there are no cycles, packages can be compiled separately and perhaps in parallel. Finally, the object file for a compiled Go package records export information not just for the package itself, but for its dependencies too. When compiling a package, the compiler must read one object file for each import but need not look beyond these files.

The basic idea of compilation is actually very simple. A recursive-descent parser, in principle, can run at I/O bound speed. Code generation is basically a very simple process. A symbol table and basic type system is not something that requires a lot of computation.
However, it is not hard to slow down a compiler.
If there is a preprocessor phase, with multi-level include directives, macro definitions, and conditional compilation, as useful as those things are, it is not hard to load it down. (For one example, I'm thinking of the Windows and MFC header files.) That is why precompiled headers are necessary.
In terms of optimizing the generated code, there is no limit to how much processing can be added to that phase.

Simply ( in my own words ), because the syntax is very easy ( to analyze and to parse )
For instance, no type inheritance means, not problematic analysis to find out if the new type follows the rules imposed by the base type.
For instance in this code example: "interfaces" the compiler doesn't go and check if the intended type implement the given interface while analyzing that type. Only until it's used ( and IF it is used ) the check is performed.
Other example, the compiler tells you if you're declaring a variable and not using it ( or if you are supposed to hold a return value and you're not )
The following doesn't compile:
package main
func main() {
var a int
a = 0
}
notused.go:3: a declared and not used
This kinds of enforcements and principles make the resulting code safer, and the compiler doesn't have to perform extra validations that the programmer can do.
At large all these details make a language easier to parse which result in fast compilations.
Again, in my own words.

Go imports dependencies once for all files, so the import time doesn't increase exponentially with project size.
Simpler linguistics means interpreting them takes less computing.
What else?

how to minimize a programming language compile time?

I was thinking more about the programming language i am designing. and i was wondering, what are ways i could minimize its compile time?

Your main problem today is I/O. Your CPU is many times faster than main memory and memory is about 1000 times faster than accessing the hard disk.
So unless you do extensive optimizations to the source code, the CPU will spend most of the time waiting for data to be read or written.
Try these rules:
Design your compiler to work in several, independent steps. The goal is to be able to run each step in a different thread so you can utilize multi-core CPUs. It will also help to parallelize the whole compile process (i.e. compile more than one file at the same time)
It will also allow you to load many source files in advance and preprocess them so the actual compile step can work faster.
Try to allow to compile files independently. For example, create a "missing symbol pool" for the project. Missing symbols should not cause compile failures as such. If you find a missing symbol somewhere, remove it from the pool. When all files have been compiled, check that the pool is empty.
Create a cache with important information. For example: File X uses symbols from file Y. This way, you can skip compiling file Z (which doesn't reference anything in Y) when Y changes. If you want to go one step further, put all symbols which are defined anywhere in a pool. If a file changes in such a way that symbols are added/removed, you will know immediately which files are affected (without even opening them).
Compile in the background. Start a compiler process which checks the project directory for changes and compile them as soon as the user saves the file. This way, you will only have to compile a few files each time instead of everything. In the long run, you will compile much more but for the user, turnover times will be much shorter (= time user has to wait until she can run the compiled result after a change).
Use a "Just in time" compiler (i.e. compile a file when it is used, for example in an import statement). Projects are then distributed in source form and compiled when run for the first time. Python does this. To make this perform, you can precompile the library during the installation of your compiler.
Don't use header files. Keep all information in a single place and generate header files from the source if you have to. Maybe keep the header files just in memory and never save them to disk.

what are ways i could minimize its compile time?
No compilation (interpreted language)
Delayed (just in time) compilation
Incremental compilation
Precompiled header files

I've implemented a compiler myself, and ended up having to look at this once people started batch feeding it hundreds of source files. I was quite suprised what I found out.
It turns out that the most important thing you can optimize is not your grammar. It's not your lexical analyzer or your parser either. Instead, the most important thing in terms of speed is the code that reads in your source files from disk. I/O's to disk are slow. Really slow. You can pretty much measure your compiler's speed by the number of disk I/Os it performs.
So it turns out that the absolute best thing you can do to speed up a compiler is to read the entire file into memory in one big I/O, do all your lexing, parsing, etc. from RAM, and then write out the result to disk in one big I/O.
I talked with one of the head guys maintaining Gnat (GCC's Ada compiler) about this, and he told me that he actually used to put everything he could onto RAM disks so that even his file I/O was really just RAM reads and writes.

In most languages (pretty well everything other than C++), compiling individual compilation units is quite fast.
Binding/linking is often what's slow - the linker has to reference the whole program rather than just a single unit.
C++ suffers as - unless you use the pImpl idiom - it requires the implementation details of every object and all inline functions to compile client code.
Java (source to bytecode) suffers because the grammar doesn't differentiate objects and classes - you have to load the Foo class to see if Foo.Bar.Baz is the Baz field of object referenced by the Bar static field of the Foo class, or a static field of the Foo.Bar class. You can make the change in the source of the Foo class between the two, and not change the source of the client code, but still have to recompile the client code, as the bytecode differentiates between the two forms even though the syntax doesn't. AFAIK Python bytecode doesn't differentiate between the two - modules are true members of their parents.
C++ and C suffer if you include more headers than are required, as the preprocessor has to process each header many times, and the compiler compile them. Minimizing header size and complexity helps, suggesting better modularity would improve compilation time. It's not always possible to cache header compilation, as what definitions are present when the header is preprocessed can alter its semantics, and even syntax.
C suffers if you use the preprocessor a lot, but the actual compilation is fast; much of C code uses typedef struct _X* X_ptr to hide implementation better than C++ does - a C header can easily consist of typedefs and function declarations, giving better encapsulation.
So I'd suggest making your language hide implementation details from client code, and if you are an OO language with both instance members and namespaces, make the syntax for accessing the two unambiguous. Allow true modules, so client code only has to be aware of the interface rather than implementation details. Don't allow preprocessor macros or other variation mechanism to alter the semantics of referenced modules.

Here are some performance tricks that we've learned by measuring compilation speed and what affects it:
Write a two-pass compiler: characters to IR, IR to code. (It's easier to write a three-pass compiler that goes characters -> AST -> IR -> code, but it's not as fast.)
As a corollary, don't have an optimizer; it's hard to write a fast optimizer.
Consider generating bytecode instead of native machine code. The virtual machine for Lua is a good model.
Try a linear-scan register allocator or the simple register allocator that Fraser and Hanson used in lcc.
In a simple compiler, lexical analysis is often the greatest performance bottleneck. If you are writing C or C++ code, use re2c. If you're using another language (which you will find much more pleasant), read the paper aboug re2c and apply the lessons learned.
Generate code using maximal munch, or possibly iburg.
Surprisingly, the GNU assembler is a bottleneck in many compilers. If you can generate binary directly, do so. Or check out the New Jersey Machine-Code Toolkit.
As noted above, design your language to avoid anything like #include. Either use no interface files or precompile your interface files. This tactic dramatically reduces the burdern on the lexer, which as I said is often the biggest bottleneck.

Here's a shot..
Use incremental compilation if your toolchain supports it.
(make, visual studio, etc).
For example, in GCC/make, if you have many files to compile, but only make changes in one file, then only that one file is compiled.

Eiffel had an idea of different states of frozen, and recompiling didn't necessarily mean that the whole class was recompiled.
How much can you break up the compliable modules, and how much do you care to keep track of them?

Make the grammar simple and unambiguous, and therefore quick and easy to parse.
Place strong restrictions on file inclusion.
Allow compilation without full information whenever possible (eg. predeclaration in C and C++).
One-pass compilation, if possible.

One thing surprisingly missing in answers so far: make you you're doing a context free grammar, etc. Have a good hard look at languages designed by Wirth such as Pascal & Modula-2. You don't have to reimplement Pascal, but the grammar design is custom made for fast compiling. Then see if you can find any old articles about the tricks Anders pulled implementing Turbo Pascal. Hint: table driven.

it depends on what language/platform you're programming for. for .NET development, minimise the number of projects that you have in your solution.

In the old days you could get dramatic speedups by setting up a RAM drive and compiling there. Don't know if this still holds true, though.

In C++ you could use distributed compilation with tools like Incredibuild

A simple one: make sure the compiler can natively take advantage of multi-core CPUs.

Make sure that everything can be compiled the fist time you try to compile it. E.g. ban forward references.
Use a context free grammar so that you can find the correct parse tree without a symbol table.
Make sure that the semantics can be deduced from the syntax so you can construct the correct AST directly rather than by mucking with a parse tree and symbol table.

How serious a compiler is this?
Unless the syntax is pretty convoluted, the parser should be able to run no more than 10-100 times slower than just indexing through the input file characters.
Similarly, code generation should be limited by output formatting.
You shouldn't be hitting any performance issues unless you're doing a big, serious compiler, capable of handling mega-line apps with lots of header files.
Then you need to worry about precompiled headers, optimization passes, and linking.

I haven't seen much work done for minimizing the compile time. But some ideas do come to mind:
Keep the grammar simple. Convoluted grammar will increase your compile time.
Try making use of parallelism, either using multicore GPU or CPU.
Benchmark a modern compiler and see what are the bottlenecks and what you can do in you compiler/language to avoid them.
Unless you are writing a highly specialized language, compile time is not really an issue..

Make a build system that doesn't suck!
There's a huge amount of programs out there with maybe 3 source files that take under a second to compile, but before you get that far you'd have to sit through an automake script that takes about 2 minutes checking things like the size of an int. And if you go to compile something else a minute later, it makes you sit through almost exactly the same set of tests.
So unless your compiler is doing awful things to the user like changing the size of its ints or changing basic function implementations between runs, just dump that info out to a file and let them get it in a second instead of 2 minutes.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio