Profiling the OCaml compiler

Profiling the OCaml compiler - performance

Background information (you do not need to repeat these steps to answer the question, this just gives some background):
I am trying to compile a rather large set of generated modules. These files are the output of a prototype Modelica to OCaml compiler and reflect the Modelica class structure of the Modelica Standard Library.
The main feature is the use of polymorphic, open recursion: Every method takes a this argument which contains the final superclass hierarchy. So for instance the model:
model A type T = Real type S = T end A;
is translated into
let m_A = object
method m_T this = m_Modelica_Real
method m_S this = this#m_T this
end
and has to be closed before usage:
let _ = m_A#m_T m_A
This seems to postpone a lot of typechecking until the superclass hierarchy is actually fixed, which in turn makes it impossible to compile the final linkage module (try ocamlbuild Linkage.cmo after editing the comments in the corresponding file to see what I mean).
Unfortunately, since the code base is rather large and uses a lot of objects, the type-structure might not be the root cause after all, it might as well be some optimization or a flaw in the code-generation (although I strongly suspect the typechecker). So my question is: Is there any way to profile the ocaml compiler in a way that signals when a certain phase (typechecking, intermediate code generation, optimization) is over and how long it took? Any further insights into my particular use case are also welcome.

As of right now, there isn't.
You can do it yourself though, the compiler source are open and you can get those and modify them to fit your needs.
Depending on whether you use ocamlc or ocamlopt, you'll need to modify either driver/compile.ml or driver/optcompile.ml to add timers to the compilation process.
Fortunately, this already has been done for you here. Just compile with the option -dtimings or environment variable OCAMLPARAM=timings=1,_.
Even more easily, you can download the opam Flambda switch:
opam switch install 4.03.0+pr132
ocamlopt -dtimings myfile.ml
Note: Flambda itself changes the compilation time (most what happens after typing) and its integration into the OCaml compiler is not confirmed yet.

OCaml compiler is an ordinary OCaml program in that regard. I would use poorman's profiler for a quick inspection, using e.g. pmp script.

Related

Will go compilers ignore unused functions

If there is a function from an external package that is not used at all in my project, will the compiler remove the function from the generated machine code?
This question could be targeted at any language compiler in general. But, I think the behaviour may vary language to language. So, I am interested in knowing what does go compilers do.
I would appreciate any help on understanding this.

The language spec does not mention this anywhere, and from a correctness point of view this is irrelevant.
But know that the current version does remove certain constructs that the compiler can prove is not used and will not change the runtime behaviour of the app.
Quoting from The Go Blog: Smaller Go 1.7 binaries:
The second change is method pruning. Until 1.6, all methods on all used types were kept, even if some of the methods were never called. This is because they might be called through an interface, or called dynamically using the reflect package. Now the compiler discards any unexported methods that do not match an interface. Similarly the linker can discard other exported methods, those that are only accessible through reflection, if the corresponding reflection features are not used anywhere in the program. That change shrinks binaries by 5–20%.
Methods are a "harder" case than functions because methods can be listed and called with reflection (unlike functions), but the Go tools do what they can even to remove unused methods too.
You can see examples and proof of removed / unlinked code in this answer:
How to remove unused code at compile time?
Also see other relevant questions:
Splitting client/server code
Call all functions with special prefix or suffix in Golang

How many times does a Common Lisp compiler recompile?

While not all Common Lisp implementations do compilation to machine code, some of them do, including SBCL and CCL.
In C/C++, if the source files don't change, the binary output of a C/C++ compiler will also not change, assuming the underlying system remains the same.
In a Common Lisp compiler, the compilation is not under the user's direct control, unlike C/C++. My question is that if the Lisp source files haven't changed, under what circumstances will a CL compiler compile the code more than once, and why? If possible, a simple illustrative example would be helpful.

I think that the question is based on some misconceptions. The compiler doesn't compile files, and it's not something that the user has no control over. The compiler is quite readily available through the compile function. The compiler operates on code, not on files. E.g., you can type at the REPL
CL-USER> (compile nil (list 'lambda (list 'x) (list '+ 'x 'x)))
#<FUNCTION (LAMBDA (X)) {100460E24B}>
NIL
NIL
There's no file involved at all. However, there is also a compile-file function, but notice that its description is:
compile-file transforms the contents of the file specified by
input-file into implementation-dependent binary data which are placed
in the file specified by output-file.
The contents of the file are compiled. Then that compiled file can be loaded. (You can also load uncompiled source files, too.) I think your question might boil down to asking under what circumstances would compile-file generate a file with different contents. I think that's really implementation dependent, and it's not really predictable. I don't know that your characterization of compilers for other languages necessarily holds either:
In C/C++, if the source files don't change, the binary output of a
C/C++ compiler will also not change, assuming the underlying system
remains the same.
What if the compiler happens to include a timestamp into the output in some data segment? Then you'd get different binary output every time. It's true that some common scripted compilation/build systems (e.g., make and similar) will check whether previous output can be reused based on whether the input files have changed in the meantime. That doesn't really say what the compiler does, though.

The rules are pretty much the same, but in Common Lisp, it's not a practice to separate declarations from implementation, so usually you must recompile every dependency to be sure. This is a shared practical consequence of dynamic environments.
Imagining there was such separation in place, the following are blantant examples (clearly not exhaustive) of changes that require recompiling specific dependent files, as the output may be different:
A changed package definition
A changed macro character or a change in its code
A changed macro
Adding or removing a inline or notinline declaration
A change in a global type or function type declaration
A changed function used in #., defvar, defparameter, defconstant, load-time-value, eql specializer, make-load-form generated code, defmacro et al (e.g. setf expanders)...
A change in the Lisp compiler, or in the base image
I mean, you can see it's not trivial to determine which files need to be recompiled. Sometimes, the answer is "all subsequent files", e.g. changing the " (double-quotes) macro-character, which might affect every literal string, or the compiler evolved in a non-backwards compatible way. In essence, we end where we started: you can only be sure with a full recompile and not reusing fasls across compilations. And sometimes it's faster than determining the minimum set of files that need to be recompiled.
In practice, you end up compiling single definitions a lot in development (e.g. with Slime) and not recompiling files when there's a fasl as old or younger than the source file. Many times, you reuse files from e.g. Quicklisp. But for testing and deployment, I advise clearing all fasls and recompiling everything.
There have been efforts to automate minimum dependency compilation with SBCL, but I think it's too slow when you change the interim projects more often that not (it involves a lot of forking, so in Windows it's either infeasible or very slow). However, it may be a time saver for base libraries that rarely change, if at all.
Another approach is to make custom base images with base libraries built-in, i.e. those you always load. It'll save both compilation and load times.

Windows DLL & Dynamic Initialization Ordering

I have some question regarding dynamic initialization (i.e. constructors before main) and DLL link ordering - for both Windows and POSIX.
To make it easier to talk about, I'll define a couple terms:
Load-Time Libraries: libraries which have been "linked" at compile
time such that, when the system loads my application, they get loaded
in automatically. (i.e. ones put in CMake's target_link_libraries
command).
Run-Time Libraries: libraries which I load manually by dlopen or
equivalents. For the purposes of this discussion, I'll say that I only
ever manually load libraries using dlopen in main, so this should
simplify things.
Dynamic Initialization: if you're not familiar with the C++ spec's
definition of this, please don't try to answer this question.
Ok, so let's say I have an application (MyAwesomeApp) and it links against a dynamic library (MyLib1), which in turn links against another library (MyLib2). So the dependency tree is:
MyAwesomeApp -> MyLib1 -> MyLib2
For this example, let's say MyLib1 and MyLib2 are both Load-Time Libraries.
What's the initialization order of the above? It is obvious that all static initialization, including linking of exported/imported functions (windows-only) will occur first... But what happens to Dynamic Initialization? I'd expect the overall ordering:
ALL import/export symbol linking
ALL Static Initialization
ALL of MyLib2's Dynamic Initialization
ALL of MyLib1's Dynamic Initialization
ALL of MyAwesomeApp's Dynamic Initialization
MyAwesomeApp's main() function
But I can't find anything in specs that mandate this. I DID see something with elf that hinted at it, but I need to find guarantees in specs for me to do something I'm trying to do.
Just to make sure my thinking is clear, I'd expect that library loading works very similarly to 'import in Python in that, if it hasn't been loaded yet, it'll be loaded fully (including any initialization) before I do anything... and if it has been loaded, then I'll just link to it.
To give a more complex example to make sure there isn't another definition of my first example that yields a different response:
MyAwesomeApp depends on MyLib1 & MyLib2
MyLib1 depends on MyLib2
I'd expect the following initialization:
ALL import/export symbol linking
ALL Static Initialization
ALL of MyLib2's Dynamic Initialization
ALL of MyLib1's Dynamic Initialization
ALL of MyAwesomeApp's Dynamic Initialization
MyAwesomeApp's main() function
I'd love any help pointing out specs that say this is how it is. Or, if this is wrong, any spec saying what REALLY happens!
Thanks in advance!
-Christopher

Nothing in the C++ standard mandates how dynamic linking works.
Having said that, Visual Studio ships with the C Runtime (aka CRT) source, and you can see where static initializers get run in dllcrt0.c.
You can also derive the relative ordering of operations if you think about what constraints need to be satisfied to run each stage:
Import/export resolution just needs .dlls.
Static initialization just needs .dlls.
Dynamic initialization requires all imports to be resolved for the .dll.
Step 1 & 2 do not depend on each other, so they can happen independently. Step 3 requires 1 & 2 for each .dll, so it has to happen after both 1 & 2.
So any specific loading order that satisfies the constraints above will be a valid loading order.
In other words, if you need to care about the specific ordering of specific steps, you probably are doing something dangerous that relies on implementation specific details that will not be preserved across major or minor revisions of the OS. For example, the way the loader lock works for .dlls has changed significantly over the various releases of Windows.

Is there a way to perform compile time type-check in Ruby?

I know Ruby is dynamically and strongly typed, but AFAIK, current syntax doesn't allow checking the type of arguments at compile time due to lack of explicit type notation (or contract) for each argument.
If I want to perform compile-time type check, what (practically matured) options do I have?
Update
What I mean type-check is something like typical statically typed language. Such as C.
For example, C function denotes type of each argument and compiler checks passing-in argument is correct or not.
void func1(struct AAA aaa)
{
struct BBB bbb;
func1(bbb); // Wrong type. Compile time error.
}
As an another example, Objective-C does that by putting explicit type information.
- (id)method1:(AAA*)aaa
{
BBB* bbb = [[AAA alloc] init]; // Though we actually use correctly typed object...
[self method1:bbb]; // Compile time warning or error due to type contract mismatch.
}
I want something like that.
Update 2
Also, I mean compile-time = before running the script. I don't have better word to describe it…

There was a project for developing a type system, a type inferencer, a type checker and a syntax for type annotations for (a subset of) Ruby, called Diamondback Ruby. It was abandoned 4 years ago, you can find its source on GitHub.
But, basically, that language would no longer be Ruby. If static types are so important to you, you should probably just use a statically typed language such as Haskell, Scala, ML, Agda, Coq, ATS etc. That's what they're here for, after all.

RDL is a library for static type checking of Ruby/Rails programs. It has type annotations included for the standard library and (I think) for Rails. It lets you add types to methods/variables/etc. like so:
file.rb:
require 'rdl'
type '(Fixnum) -> Fixnum', typecheck: :now
def id(x)
"forty-two"
end
And then running file.rb will perform static type checking:
$ ruby file.rb
.../lib/rdl/typecheck.rb:32:in `error': (RDL::Typecheck::StaticTypeError)
.../file.rb:5:5: error: got type `String' where return type `Fixnum' expected
.../file.rb:5: "forty-two"
.../file.rb:5: ^~~~~~~~~~~
It seems to be pretty well documented!

While you can't check this in a static time-sense, you can use conditionals in your methods to run only after checking the object.
Here the #is_a? and #kind_of? come in handy...
def method(variable)
if variable.is_a? String
...
else
...
end
end
You would have the choice of returning specified error values or raise an exception. Hopefully this is close to what you are looking for.

You are asking for a "compile-time" type check, but in Ruby, there is no "compile" phase. Static analysis of Ruby code is almost impossible, since any method, even from the built-in classes, can be redefined at runtime. Classes can also be dynamically created and instantiated at runtime. How would you do type-checking for a class which doesn't even exist when the program starts?
Surely, your real goal is not just to "type-check your code". Your goal is to "write code that works", right? Type-checking is just a tool which can help you "write code that works". However, while type-checking is helpful, it has its limits. It can catch some simple bugs, but not most bugs, and not the most difficult bugs.
When you choose to use Ruby, you are giving up the benefits of type-checking. However, Ruby may allow you to get things done with much less code than other languages you are used to. Writing programs using less code, means that generally there are less bugs for you to fix. If you use Ruby skillfully, I believe the tradeoff is worth it.
Although you can't type-check your code in Ruby, there is great value in using assertions which check method arguments. In some cases, those assertions might check the type of an argument. More frequently, they will check other properties of the arguments. Then you need some tests which exercise the code. You will find that with a relatively small number of tests, you will catch more bugs than your C/C++ compiler can do.

It seems you want static types. There is not an effective way to do this in Ruby due to the language's dynamic nature.
A naive approach I can think of is to make a "contract" like this:
def up(name)
# name(string)
name.upcase
end
So the first line of each method will be a comment declaring what type each argument must have.
Then implement a tool that will statically scan & analyze the source and catch such errors by scanning the call sites of the above method and check the type of the passed argument whenever possible.
For example this would be easy to check:
x = "George"
up(x)
but how would you check this one:
x = rand(2).zero? "George" : 5
up(x)
In other words, most of the time the types are impossible to be deduced before runtime.
However if you do not care about the "type checking" happening statically, you could also do:
def up(name)
raise "TypeError etc." unless name.is_a? String
# ...
end
In any way, I don't think you will benefit from the above. I would recommend to make use of duck typing instead.

You might be interested in the idea of a "Pluggable type system". It means adding a static type system to a dynamic language, but the programmer decides what should be typed and what is left untyped. The typechecker stands aside the core language and it is usually implemented as a library. It can either do static checking or check types at runtime in a special "checked" mode that should be used during development and to execute tests.
The type checker for Ruby I found is called Rtc (Ruby Type Checker). Github, academic paper. The motivation is to make the requirements on a type of a parameter of a function or method explicit, move the requirements out of the tests into type annotations and turn the type annotation into an "executable documentation". Source.

GCC hidden/little-known features

This is my attempt to start a collection of GCC special features which usually do not encounter. this comes after #jlebedev in the another question mentioned "Effective C++" option for g++,
-Weffc++
This option warns about C++ code which breaks some of the programming guidelines given in the books "Effective C++" and "More Effective C++" by Scott Meyers. For example, a warning will be given if a class which uses dynamically allocated memory does not define a copy constructor and an assignment operator. Note that the standard library header files do not follow these guidelines, so you may wish to use this option as an occasional test for possible problems in your own code rather than compiling with it all the time.
What other cool features are there?

From time to time I go through the current GCC/G++ command line parameter documentation and update my compiler script to be even more paranoid about any kind of coding error. Here it is if you are interested.
Unfortunately I didn't document them so I forgot most, but -pedantic, -Wall, -Wextra, -Weffc++, -Wshadow, -Wnon-virtual-dtor, -Wold-style-cast, -Woverloaded-virtual, and a few others are always useful, warning me of potentially dangerous situations. I like this aspect of customizability, it forces me to write clean, correct code. It served me well.
However they are not without headaches, especially -Weffc++. Just a few examples:
It requires me to provide a custom copy constructor and assignment operator if there are pointer members in my class, which are useless since I use garbage collection. So I need to declare empty private versions of them.
My NonInstantiable class (which prevents instantiation of any subclass) had to implement a dummy private friend class so G++ didn't whine about "only private constructors and no friends"
My Final<T> class (which prevents subclassing of T if T derived from it virtually) had to wrap T in a private wrapper class to declare it as friend, since the standard flat out forbids befriending a template parameter.
G++ recognizes functions that never return a return value, and throw an exception instead, and whines about them not being declared with the noreturn attribute. Hiding behind always true instructions didn't work, G++ was too clever and recognized them. Took me a while to come up with declaring a variable volatile and comparing it against its value to be able to throw that exception unmolested.
Floating point comparison warnings. Oh god. I have to work around them by writing x <= y and x >= y instead of x == y where it is acceptable.
Shadowing virtuals. Okay, this is clearly useful to prevent stupid shadowing/overloading problems in subclasses but still annoying.
No previous declaration for functions. Kinda lost its importance as soon as I started copypasting the function declaration right above it.
It might sound a bit masochist, but as a whole, these are very cool features that increased my understanding of C++ and general programming.
What other cool features G++ has? Well, it's free, open, it's one of the most widely used and modern compilers, consistently outperforms its competitors, can eat almost anything people throw at it, available on virtually every platform, customizable to hell, continuously improved, has a wide community - what's not to like?

A function that returns a value (for example an int) will return a random value if a code path is followed that ends the function without a 'return value' statement. Not paying attention to this can result in exceptions and out of range memory writes or reads.
For example if a function is used to obtain the index into an array, and the faulty code path is used (the one that doesn't end with a return 'value' statement) then a random value will be returned which might be too big as an index into the array, resulting in all sorts of headaches as you wrongly mess up the stack or heap.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio