Related
Maybe a newb's question but if you never ask youll never know
Will using Stripe's Sorbet (https://sorbet.org/) on a RoR app, can potentially improve the app's performance?
(performance meaning response times, not robustness \ runtime error rate)
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
It doesn't yet, but one could potentially build this one day.
Goal of Sorbet was to build a type system for people, compared to building a type system for computers(compiler). It can introduce some performance overhead, but as Stripe runs it in production, we keep it in check. Internally, we page us if overhead is >7% of cpu time.
I did some reading on dynamically typed languages (particularly Javascript in this case) and found out that if we keep sending some function (foo for example) the same type of objects, the engine does some optimising work on that function, so that when it is invoked again with the same types, there interpreting work would be quicker.
Yes, this can be done. What you're describing is a common optimization in Just-In-Time(JIT) compilers. The technique that you seem to refer to uses run time profiling and actually is a common alternative technique that allows to achieve this result in absence of type system. It's also worth noting that well-build JITs can do it more frequently than a type system, as type system encodes what could happen, while profiling & JITs can optimize for what actually happens in practice.
That said, building a JIT is frequently much more work than building an online compiler, thus, depending on amount of investment one wants to put into speeding up Ruby, either using building a JIT or using types can prove better under different real-world constrains.
I thought maybe ruby interpreter does a similar work which can potentially mean that type-checking may increase interpreting speed
Summarizing the previous paragraph, Sorbet typesystem does not currently speedup Ruby, but it doesn't slow it down much either.
Type systems could be indeed used to speed up languages, but they aren't your only tool, with profiling & JIT compilation being the major competitor.
the optimizations you are talking about apply more to the JIT that is beeing worked on for the ruby runtime.
in general, sorbet aims at type-safety by introducing type interfaces or method signatures. they enable static type-checks that are applied before deploying the application in order to get rid of "type errors".
sorbet comes with a runtime component that can enforce type checks at runtime in your runnable application, but those are going to decrease the applications performance as they wrap method-calls in order to check for correct types https://sorbet.org/docs/runtime#runtime-checked-sig-s
tl;dr I want a rapid edit-compile-run workflow, but prefixing every single function call in "somenamespace_" is annoying.
One (debatable) advantage of C is that you can have separate compilation units. Large amounts of code can be compiled into objects, and libraries, which are much faster to link together than parsing any amount of C code. It can be slower to run, since inlining optimizations can't be done between compliation units, but it is very fast to link, especially with that ld.gold linker.
Problem being, it's C. Lack of namespaces, pretty much.
But even C++ doesn't (in practice) use separate compliation units. Sure you can, but for the most part it's about including megabytes of templates in header files. The whole philosophy behind "private:" is to pretend you have separated interfaces, without actually having them at all. So the standard practice in a language is important too, because even if I make my own isolated binary interfaces, if each implementation has to #include the same tons of code from third parties, the time saved in isolating them doesn't add up. C++ is kind of... featureful for me, anyway. Really, I just want namespaces. (And modules... sigh)
Languages like python, racket and java use partial compilation, which seems fast enough, but you still get slowdowns in startup for large projects, as they have to translate all that bytecode into machine code every time. There's no option (outside of writing a C interface) to isolate code in a way that's fast to combine with code that you are working on.
I'd just like to know what languages where large amounts of code can be concealed behind small, quick to load interfaces, so that compiling them initially might be slow, but then I can get a rapid edit-compile-run cycle as I work on parts of it. Instead of this python hack, where I change something in the progress displayer, and then have to sit there staring at it as it loads the standard library, and then loads the database code, and then loads the web server code, and then loads the image processing code, and then sits there for another 20 seconds figuring out the gobject-introspection thing for some gui code.
It's not always obvious. I'm staring at D trying to figure out when it parses the code from dependencies when it recompiles stuff, without a clue. Go seems to just slap all the code together (including dependencies!) into a single compilation unit, but maybe I'm wrong there? And I don't think Nim regenerates all the generated C every compile, but maybe it does? Rust uses separate compliation units (I think) but it's still slow as heck to compile! And python really does compile fast, so it's only once my projects start getting big and successful that I start getting tripped up by it.
I haven't learned the other languages you mentioned such as Rust, D, or Python. When you compile a Nim program, it creates a folder called nimcache. That contains all the .c and .o files. If you make a change to the Nim program and recompile, it tries to reuse the files in that nimcache folder. The files have a .c and .o extension.
I've been looking over a few tutorials for JIT and allocating heaps of executable memory at runtime. This is mainly a conceptual question, so please correct me if I got something wrong.
If I understand it correctly, a JIT takes advantage of a runtime interpreter/compiler that outputs native or executable code and, if native binary, places it in an executable code heap in memory, which is OS-specific (e.g. VirtualAlloc() for Windows, mmap() for Linux).
Additionally, some languages like Erlang can have a distributed system such that each part is separated from each other, meaning that if one fails, the others can account for such a case in a modular way, meaning that modules can also be switched in and out at will if managed correctly without disturbing overall execution.
With a runtime compiler or some sort of code delivery mechanism, wouldn't it be feasible to load code at runtime arbitrarily to replace modules of code that could be updated?
Example
Say I have a sort(T, T) function that operates on T or T. Now, suppose I have a merge_sort(T,T) function that I have loaded at runtime. If I implement a sort of ledger or register system such that users of the first sort(T,T) can reassign themselves to use the new merge_sort(T,T) function and detect when all users have adjusted themselves, I should then be able to deallocate and delete sort(T,T) from memory.
This basically sounds a lot like a JIT, but the attractive part, to me, was the aspect where you can swap out code arbitrarily at runtime for modules. That way, while a system is not under a full load such that each module is being used, modules could be automated to switch to new code, if needed, and etc. Theoretically, wouldn't this be a way to implement patches such that a user who uses a program should never have to "restart" the program if the program can swap out code silently in the individual modules? I'd imagine much larger distributed systems can make use of this, but what about smaller ones?
Additionally, some languages like Erlang can have a distributed system
such that each part is separated from each other, meaning that if one
fails, the others can account for such a case in a modular way,
meaning that modules can also be switched in and out at will if
managed correctly without disturbing overall execution.
You're describing how to make a fault-tolerant system which is entirely different from replacing code at run-time (known at Dynamic Software Update or DSU). Indeed, in Erlang, you can have one process monitoring other processes and if one fails, it will migrate the work to another process to keep the system running as expected. Note that DSU is not used to implement fault-tolerance. They are different features with different purposes.
Say I have a sort(T, T) function that operates on T or T. Now, suppose
I have a merge_sort(T,T) function that I have loaded at runtime. If I
implement a sort of ledger or register system such that users of the
first sort(T,T) can reassign themselves to use the new merge_sort(T,T)
function and detect when all users have adjusted themselves, I should
then be able to deallocate and delete sort(T,T) from memory.
This is called DSU and is used to be able to do any of the following tasks without the need to take the system down:
Fix one or more bugs in a piece of code.
Patch security holes.
Employ a more efficient code.
Deploy new features.
Therefore, any app or system can use DSU so that it can perform these tasks without requiring a restart.
Erlang enables you to perform DSU in addition to facilitating fault-tolerance as discussed above. For more information, refer to this Erlang write paper.
There are numerous ways to implement DSU. Since you're interested in JIT compilers and assuming that by "JIT compiler" you mean the component that not only compiles IL code but also allocates executable memory and patches function calls with binary code addresses, I'll discuss how to implement DSU in JIT environments. The JIT compiler has to support the following two features:
The ability to obtain or create new binary code at run-time. If you have IL code, no need to allocate executable memory yet since it has to be compiled.
The ability to replace a piece of IL code (which might have already been JITted) or binary code with the new piece of code.
Clearly, with these two features, you can perform DSU on a single function. Swapping a whole module or library requires swapping all the functions and global variables exported by that module.
Inspired by this question which started out innocently but is turning into a major flame war.
Let's say you need to a utility method - reasonably straightforward but not a one-liner. Quoted question was how to repeat a string X times. How do you decide whether to use a 3rd party implementation or write your own?
The obvious downside to 3rd party approach is you're adding a dependency to your code.
But if you're writing your own you need to code it, test it, (maybe) profile it so you'll likely end up spending more time.
I know the decision itself is subjective, but criteria you use to arrive at it should not be.
So, what criteria do you use to decide when to write your own code?
General Decision
Before deciding on what to use, I will create a list of criteria that must be met by the library. This could include size, simplicity, integration points, speed, problem complexity, dependencies, external constraints, and license. Depending on the situation the factors involved in making the decision will differ.
Generally, I will hunt for a suitable library that solves the problem before writing my own implementation. If I have to write my own, I will read up on appropriate algorithms and seek ideas from other implementations (e.g., in a different language).
If, after all the aspects described below, I can find no suitable library or source code, and I have searched (and asked on suitable forums), then I will develop my own implementation.
Complexity
If the task is relatively simple (e.g., a MultiValueMap class), then:
Find an existing open-source implementation.
Integrate the code.
Rewrite it, or trim it down, if it excessive.
If the task is complex (e.g., a flexible object-oriented graphing library), then:
Find an open-source implementation that compiles (out-of-the-box).
Execute its "Hello, world!" equivalent.
Perform any other evaluations as required.
Determine its suitability based on the problem domain criteria.
Speed
If the library is too slow, then:
Profile it.
Optimize it.
Contribute the results back to the community.
If the code is too complex to be optimized, and speed is a factor, discuss it with the community and provide profiling details. Otherwise, look for an equivalent, but faster (possibly less feature-rich) library.
API
If the API is not simple, then:
Write a facade and contribute it back to the community.
Or find a simpler API.
Size
If the compiled library is too large, then:
Compile only the necessary source files.
Or find a smaller library.
Bugs
If the library does not compile out of the box, seek alternatives.
Dependencies
If the library depends on scores of external libraries, seek alternatives.
Documentation
If there is insufficient documentation (e.g., user manuals, installation guides, examples, source code comments), seek alternatives.
Time Constraints
If there is ample time to find an optimal solution, then do so. Often there is not sufficient time to write from scratch. And usually there are a number of similar libraries to evaluate. Keep in mind that, by meticulous loose coupling, you can always swap one library for another. Find what works, initially, and if it later becomes a burden, replace it.
Development Environment
If the library is tied to a specific development environment, seek alternatives.
License
Open source.
10 questions ...
+++ (use library) ... --- (write own library)
Is the library exactly what I need? Customizable in a few steps? +++
Does it provide almost all functionality? Easily extensible? +++
No time? +++
It's good for one half and plays well with other? ++
Hard to extend, but excellent documentation? ++
Hard to extend, yet most of the functionality? +
Functionality ok, but outdated? -
Functionality ok, .. but weird (crazy interface, not robust, ...)? --
Library works, but the person who needs to decide is in the state of hybris? ---
Library works, manageable code size, portfolio needs update? ---
Some thoughts ...
If it is something that is small but useful, probably for others, too, then why now write a library and put it on the web. The cost publishing this kind of small libraries decreased, as well as the hurdle for others to tune in (see bitbucket or github). So what's the criteria?
Maybe it should not exactly replicate an existing already known library. If it replicates something existing, it should approach the problem from new angle, or better it should provide a shorter or more condensed* solution.
*/fun
If it's a trivial function, it's not worth pulling in an entire library.
If it's a non-trivial function, then it may be worth it.
If it's multiple functions which can all be handled by pulling in a single library, it's almost definitely worth it.
Keep it in balance
You should keep several criteria in balance. I'd consider a few topics and ask a few questions.
Developing time VS maintenance time
Can I develop what I need in a few hours? If yes, why do I need a library? If I get a lib am I sure that it will not cause hours spent to debug and documentation reading? The answer - if I need something obvious and straightforward I don't need an extra-flexible lib.
Simplicity VS flexibility
If I need just an error wrapper do I need a lib with flexible types and stack tracing and color prints and.... Nope! Using even beautifully designed but flexible and multipurpose libs could slow your code. If you plan to use 2% of functionality you don't need it.
Dig task VS small task
Did I faced a huge task and I need external code to solve it? Definitely AMQP or SQL operations is too big tasks to develop from scratch but tiny logging could be solved in place. Don't use external libs to solve small tasks.
My own libs VS external libs
Sometimes is better to grow your own library because it is for 100% used, for 100% appropriate your goals, you know it best, it is always up to date with your applications. Don't build your own lib just to be cool and keep in mind that a lot of libs in your vendor directory developed "just to be cool".
For me this would be a fairly easy answer.
If you need to be cost effective, then it would probably be best to try and find a library/framework that does what you want. If you can't find it, then you will be forced to write it or find a different approach.
If you have the time and find it fun, write one. You will learn a lot along the way and you can give back to the open source community with you killer new bundle of code. If you don't, well, then don't. But if you can't find one, then you have to write it anyway ;)
Personally, if I can justify writing a library, I always opt for that. It's fun, you learn a lot about what you are directing your focus towards, and you have another tool to add to your arsenal and put on your CV.
If the functionality is only a small part of the app, or if your needs are the same as everyone else's, then a library is probably the way to go. If you need to consume and output JSON, for example, you can probably knock something together in five minutes to handle your immediate needs. But then you start adding to it, bit by bit. Eventually, you have all the functionality that you would find in any library, but 1) you had to write it yourself and 2) it isn't a robust and well document as what you would find in a library.
If the functionality is a big part of the app, and if your needs aren't exactly the same as everyone else's, then think much more carefully. For example, if you are doing machine learning, you might consider using a package like Weka or Mahout, but these are two very different beasts, and this component is likely to be a significant part of your application. A library in this case could be a hindrance, because your needs might not fit the design parameters of the original authors, and if you attempt to modify it, you will need to worry about a much larger and more complex system than the minimum that you would build yourself.
There's a good article out there talking about sanitizing HTML, and how it was a big part of the app, and something that would need to be heavily tuned, so using an outside library wasn't the best solution, in spite of the fact that there were many libraries out that did exactly what seemed to be called for.
Another consideration is security.
If a black-hat hacker finds a bug in your code they can create an exploit and sell it for money. The more popular the library is, the more the exploit worth. Think about OpenSSL or Wordpress exploits. If you re-implement the code, chances that your code is not vulnerable exactly the same way the popular library is. And if your lib is not popular, then an zero-day exploit of your code probably wouldn't worth much, and there is a good chance your code is not targeted by bounty hunters.
Another consideration is language safety. C language can be very fast. But from the security standpoint it's asking for trouble. If you reimplement the lib in some script language, chances of arbitrary code execution exploits are low (as long as you know the possible attack vectors, like serialization, or evals).
I was thinking more about the programming language i am designing. and i was wondering, what are ways i could minimize its compile time?
Your main problem today is I/O. Your CPU is many times faster than main memory and memory is about 1000 times faster than accessing the hard disk.
So unless you do extensive optimizations to the source code, the CPU will spend most of the time waiting for data to be read or written.
Try these rules:
Design your compiler to work in several, independent steps. The goal is to be able to run each step in a different thread so you can utilize multi-core CPUs. It will also help to parallelize the whole compile process (i.e. compile more than one file at the same time)
It will also allow you to load many source files in advance and preprocess them so the actual compile step can work faster.
Try to allow to compile files independently. For example, create a "missing symbol pool" for the project. Missing symbols should not cause compile failures as such. If you find a missing symbol somewhere, remove it from the pool. When all files have been compiled, check that the pool is empty.
Create a cache with important information. For example: File X uses symbols from file Y. This way, you can skip compiling file Z (which doesn't reference anything in Y) when Y changes. If you want to go one step further, put all symbols which are defined anywhere in a pool. If a file changes in such a way that symbols are added/removed, you will know immediately which files are affected (without even opening them).
Compile in the background. Start a compiler process which checks the project directory for changes and compile them as soon as the user saves the file. This way, you will only have to compile a few files each time instead of everything. In the long run, you will compile much more but for the user, turnover times will be much shorter (= time user has to wait until she can run the compiled result after a change).
Use a "Just in time" compiler (i.e. compile a file when it is used, for example in an import statement). Projects are then distributed in source form and compiled when run for the first time. Python does this. To make this perform, you can precompile the library during the installation of your compiler.
Don't use header files. Keep all information in a single place and generate header files from the source if you have to. Maybe keep the header files just in memory and never save them to disk.
what are ways i could minimize its compile time?
No compilation (interpreted language)
Delayed (just in time) compilation
Incremental compilation
Precompiled header files
I've implemented a compiler myself, and ended up having to look at this once people started batch feeding it hundreds of source files. I was quite suprised what I found out.
It turns out that the most important thing you can optimize is not your grammar. It's not your lexical analyzer or your parser either. Instead, the most important thing in terms of speed is the code that reads in your source files from disk. I/O's to disk are slow. Really slow. You can pretty much measure your compiler's speed by the number of disk I/Os it performs.
So it turns out that the absolute best thing you can do to speed up a compiler is to read the entire file into memory in one big I/O, do all your lexing, parsing, etc. from RAM, and then write out the result to disk in one big I/O.
I talked with one of the head guys maintaining Gnat (GCC's Ada compiler) about this, and he told me that he actually used to put everything he could onto RAM disks so that even his file I/O was really just RAM reads and writes.
In most languages (pretty well everything other than C++), compiling individual compilation units is quite fast.
Binding/linking is often what's slow - the linker has to reference the whole program rather than just a single unit.
C++ suffers as - unless you use the pImpl idiom - it requires the implementation details of every object and all inline functions to compile client code.
Java (source to bytecode) suffers because the grammar doesn't differentiate objects and classes - you have to load the Foo class to see if Foo.Bar.Baz is the Baz field of object referenced by the Bar static field of the Foo class, or a static field of the Foo.Bar class. You can make the change in the source of the Foo class between the two, and not change the source of the client code, but still have to recompile the client code, as the bytecode differentiates between the two forms even though the syntax doesn't. AFAIK Python bytecode doesn't differentiate between the two - modules are true members of their parents.
C++ and C suffer if you include more headers than are required, as the preprocessor has to process each header many times, and the compiler compile them. Minimizing header size and complexity helps, suggesting better modularity would improve compilation time. It's not always possible to cache header compilation, as what definitions are present when the header is preprocessed can alter its semantics, and even syntax.
C suffers if you use the preprocessor a lot, but the actual compilation is fast; much of C code uses typedef struct _X* X_ptr to hide implementation better than C++ does - a C header can easily consist of typedefs and function declarations, giving better encapsulation.
So I'd suggest making your language hide implementation details from client code, and if you are an OO language with both instance members and namespaces, make the syntax for accessing the two unambiguous. Allow true modules, so client code only has to be aware of the interface rather than implementation details. Don't allow preprocessor macros or other variation mechanism to alter the semantics of referenced modules.
Here are some performance tricks that we've learned by measuring compilation speed and what affects it:
Write a two-pass compiler: characters to IR, IR to code. (It's easier to write a three-pass compiler that goes characters -> AST -> IR -> code, but it's not as fast.)
As a corollary, don't have an optimizer; it's hard to write a fast optimizer.
Consider generating bytecode instead of native machine code. The virtual machine for Lua is a good model.
Try a linear-scan register allocator or the simple register allocator that Fraser and Hanson used in lcc.
In a simple compiler, lexical analysis is often the greatest performance bottleneck. If you are writing C or C++ code, use re2c. If you're using another language (which you will find much more pleasant), read the paper aboug re2c and apply the lessons learned.
Generate code using maximal munch, or possibly iburg.
Surprisingly, the GNU assembler is a bottleneck in many compilers. If you can generate binary directly, do so. Or check out the New Jersey Machine-Code Toolkit.
As noted above, design your language to avoid anything like #include. Either use no interface files or precompile your interface files. This tactic dramatically reduces the burdern on the lexer, which as I said is often the biggest bottleneck.
Here's a shot..
Use incremental compilation if your toolchain supports it.
(make, visual studio, etc).
For example, in GCC/make, if you have many files to compile, but only make changes in one file, then only that one file is compiled.
Eiffel had an idea of different states of frozen, and recompiling didn't necessarily mean that the whole class was recompiled.
How much can you break up the compliable modules, and how much do you care to keep track of them?
Make the grammar simple and unambiguous, and therefore quick and easy to parse.
Place strong restrictions on file inclusion.
Allow compilation without full information whenever possible (eg. predeclaration in C and C++).
One-pass compilation, if possible.
One thing surprisingly missing in answers so far: make you you're doing a context free grammar, etc. Have a good hard look at languages designed by Wirth such as Pascal & Modula-2. You don't have to reimplement Pascal, but the grammar design is custom made for fast compiling. Then see if you can find any old articles about the tricks Anders pulled implementing Turbo Pascal. Hint: table driven.
it depends on what language/platform you're programming for. for .NET development, minimise the number of projects that you have in your solution.
In the old days you could get dramatic speedups by setting up a RAM drive and compiling there. Don't know if this still holds true, though.
In C++ you could use distributed compilation with tools like Incredibuild
A simple one: make sure the compiler can natively take advantage of multi-core CPUs.
Make sure that everything can be compiled the fist time you try to compile it. E.g. ban forward references.
Use a context free grammar so that you can find the correct parse tree without a symbol table.
Make sure that the semantics can be deduced from the syntax so you can construct the correct AST directly rather than by mucking with a parse tree and symbol table.
How serious a compiler is this?
Unless the syntax is pretty convoluted, the parser should be able to run no more than 10-100 times slower than just indexing through the input file characters.
Similarly, code generation should be limited by output formatting.
You shouldn't be hitting any performance issues unless you're doing a big, serious compiler, capable of handling mega-line apps with lots of header files.
Then you need to worry about precompiled headers, optimization passes, and linking.
I haven't seen much work done for minimizing the compile time. But some ideas do come to mind:
Keep the grammar simple. Convoluted grammar will increase your compile time.
Try making use of parallelism, either using multicore GPU or CPU.
Benchmark a modern compiler and see what are the bottlenecks and what you can do in you compiler/language to avoid them.
Unless you are writing a highly specialized language, compile time is not really an issue..
Make a build system that doesn't suck!
There's a huge amount of programs out there with maybe 3 source files that take under a second to compile, but before you get that far you'd have to sit through an automake script that takes about 2 minutes checking things like the size of an int. And if you go to compile something else a minute later, it makes you sit through almost exactly the same set of tests.
So unless your compiler is doing awful things to the user like changing the size of its ints or changing basic function implementations between runs, just dump that info out to a file and let them get it in a second instead of 2 minutes.