LLVM Interoperability (Like JVM or .Net) - Is it possible to do? - interop

I recently played around a bit with different LLVM Frontends like Clang (C Familiy ), LDC2 (D), Terra, ...
All these languages can be compiled into the LLVM IR (somewhat readable) and LLVM IR Bitcode . So at this stage they are all on the same "level" right?
My Question is : Is there some way of language interoperability like the .NET Languages or JVM Languages on "language level" or is it only possible to do it by editing in the IR?
I already looked this question up in Google but didn't find what is was looking for.
If yes how can I do it and can I do it with all frontends or only some specific?

For language X be able to call language Y, it must possess an ability to
Call Y functions (know Y's calling convetions)
Convert data passed to Y into form it expects (called marshalling)
This mostly should be done on front-end level (not middle-end, which LLVM is). C language can be used as common ground for interop, so if two languages can call C and export their own functions to C, they can talk to each other.
Haskell and C++ can serve as example. C++ can export code as C using
extern "C" {
}
block, and Haskell can also export its functions with foreign export ccall keywords. It also features marshalling functions to convert Haskell strings to C string and back.
As you can see, LLVM plays minor role here, but you were right mentioning that with LLVM you can, theoretically, interop any language that compiles to LLVM by manually editing resulting IR.

Related

Where is __builtin_va_start defined?

I'm trying to locate where __builtin_va_start is defined in GCC's source code, and see how it is implemented. (I was looking for where va_start is defined and then found that this macro is defined as __builtin_va_start.) I used cscope -r in GCC 9.1's source code directory to search the definition but haven't found it. Can anyone point where this function is defined?
That __builtin_va_start is not defined anywhere. It is a GCC compiler builtin (a bit like sizeof is a compile-time operator). It is an implementation detail related to the <stdarg.h> standard header (provided by the compiler, not the C standard library implementation libc). What really matters are the calling conventions and ABI followed by the generated assembler.
GCC has special code to deal with compiler builtins. And that code is not defining the builtin, but implementing its ad-hoc behavior inside the compiler. And __builtin_va_start is expanded into some compiler-specific internal representation of your compiled C/C++ code, specific to GCC (some GIMPLE perhaps)
From a comment of yours, I would infer that you are interested in implementation details. But that should be in your question
If you study GCC 9.1 source code, look inside some of gcc-9.1.0/gcc/builtins.c (the expand_builtin_va_start function there), and for other builtins inside gcc-9.1.0/gcc/c-family/c-cppbuiltin.c, gcc-9.1.0/gcc/cppbuiltin.c, gcc-9.1.0/gcc/jit/jit-builtins.c
You could write your own GCC plugin (in 2Q2019, for GCC 9, and the C++ code of your plugin might have to change for the future GCC 10) to add your own GCC builtins. BTW, you might even overload the behavior of the existing __builtin_va_start by your own specific code, and/or you might have -at least for research purposes- your own stdarg.h header with #define va_start(v,l) __my_builtin_va_start(v,l) and have your GCC plugin understand your __my_builtin_va_start plugin-specific builtin. Be however aware of the GCC runtime library exception and read its rationale: I am not a lawyer, but I tend to believe that you should (and that legal document requires you to) publish your GCC plugin with some open source license.
You first need to read a textbook on compilers, such as the Dragon book, to understand that an optimizing compiler is mostly transforming internal representations of your compiled code.
You further need to spend months in studying the many internal representations of GCC. Remember, GCC is a very complex program (of about ten millions lines of code). Don't expect to understand it with only a few days of work. Look inside the GCC resource center website.
My dead GCC MELT project had references and slides explaining more of GCC (the design philosophy and architecture of GCC changes slowly; so the concepts are still relevant, even if individual details changed). It took me almost ten years full time to partly understand some of the middle-end layers of GCC. I cannot transmit that knowledge in a StackOverflow answer.
My draft Bismon report (work in progress, funded by H2020, so lot of bureaucracy) has a dozen of pages (in its sections ยง1.3 and 1.4) introducing the internal representations of GCC.

I'm using Antlr4 to create a language which I then want to generate LLVM IR with. Do I need to hand-write LLVM IR in response to my visitor events?

While learning Antlr4, I used Golang as a target language, so a statement in my toy language like:
$myVar = 10
$myVar + 5
Would translate to some Golang code that generates "15" for the result
However, as far as I can see, there isn't an LLVM IR target for ANTLR, so the question is: what are my options?
1) Generate C/C++ and then use it to emit LLVM IR?
2) Try to find a Golang LLVM IR emitter?
3) Keep using the generated Go lexer/parser but hand-write LLVM IR?
I tried to go through the LLVM documentation and watched a few videos on LLVM< but they all seem to generate C/C++ and then communicate with the API that way. Not sure if they do that because that's what they know or if it's because that's the only way.
Thanks in advance for any insights!
While learning Antlr4, I used Golang as a target language, so a statement in my toy language like:
$myVar = 10
$myVar + 5
Would translate to some Golang code that generates "15" for the result
That's not accurate. Your grammar is translated into Go code that parses your language. Your own code can then use that generated parser to translate the above into whatever you want.
there isn't an LLVM IR target for ANTLR
Nor would it help you if there were one. All that would do would be to create a parser written in LLVM instead of Go. You'd still have to write the code to translate your language into LLVM yourself (just like you'd have to write your own code into translate your language to Go).
As to whether to use the LLVM-API to generate LLVM or to generate it as strings, either option would work. There are Go bindings for LLVM, but it's also perfectly possible to just write LLVM assembly into an .ll file and then run that through llc.

How to bolt on ANTLR 4 front to GCC Generic/GIMBLE?

I'm writing a DSL front end using ANTLR v4 that I'd like to bolt on to GCC framework. The goal is to have a C language AST to leverage the rest of the GCC framework.
I haven't found any info or preexisting work to use an example of how to proceed. What I'm looking for is how to move the ANTLR 4 AST to GCC Generic/GIMBLE.
ANTLR 4 does not support a C language target, so I'll have to cludge up the C++ target to the GCC C language framework.
Help is appreciated
Gluing a C++ implementation of ANTLR into GCC so that GCC will call it is likely to be the easy step.
[Don't expect to be easy; GCC wants to be GCC, not your pet. You might get some help from GCC Melt, a package for interfacing to GCC machinery.]
The AST produced for an arbitrary (e.g., your custom DSL) language doesn't "just move (easily)" to a C AST or to the GCC Gimple (not GIMBLE) framework.
You will have to build, in essence, your DSL-AST to C-AST translator, or your DSL-AST to Gimple translator. There is no a priori reason to believe that building such a translator is easy; for example, you didn't tell us your DSL was "just like C except ...". So, you're going to have to build a translator. In the absence of evidence this is easy, you'll have to translate your DSL concepts to C concepts. The better ("non C-like") your DSL is, the harder this is going to be.
This SO link discusses the issues behind translation in more detail: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

Compiling Pharo to C?

It is said that Pharo's VM (CogVM) is developed, tested, profiled and etc in Smalltalk, but then the Smalltalk code is transcompiled to C, which is then compiled along side with some OS abstraction C code using the default system C compiler.
Well, I'd like to do a similar thing, I wan't to develop, test and profile code using Pharo, but then compile it to C. How can I do it? How the compilation to C works? Does Pharo comes with a Smalltalk to C transcompiler? How can I invoke it? Does it compile the full Smalltalk, or I have to use some kind of a Smalltalk subset? Is there any good documentation about it?
The Pharo VM is hosted on github.
Follow the steps to build it and you'll get a Smalltalk image called "generator.image" which you can run (it's a regular image). Inside of that image you'll find the VMMaker package which is responsible for generating the C code from the special Smalltalk dialect used for this (which is called Slang; it's a subset of Smalltalk). Look at the code in the generator image to get a feel for what it does. There's also some information contained in the workspaces that are open when you first open the image.
As soon as you have the C sources it's basically straight forward C compilation (which we do with Cmake + gcc / clang).
As for documentation: you should probably read the Blue Book.
clarifiation
As #Leandro Caniglia points out in the comment, the purpose of Slang is to generate C source code for the VM. It has been designed to ease translation to C. That does not mean that:
arbitrary Smalltalk code can be translated to C using the generating mechanism
arbitrary Smalltalk code can be rewritten in Slang (at least not "easily")

GCC technical details

I don't know if this is the right place for things like this, but I am curious about a few aspects of the GCC front-end/back-end architecture:
I know I can compile .o files from C code and link them to C++ code, and I think I can do it the other way round, too. Does this work because the two languages are similar, or because the GCC back-end is really language-independent? Would this work with ADA code too? (I don't even know if that makes sense, since I don't know ADA or if it even has "functions", but the question is understood. If it makes no sense, think "Pascal" or even "my own custom language front-end")
Where would garbage-collection be implemented? For example, a Java front-end. The way I understand, if compiling to a JVM back-end, the "platform" will take care of the GC, and so the front-end needs not do anything about it, but if compiling to native code, would the front-end send garbage-collecting GENERIC code to the back-end, or does it turn on some flag telling the back-end to produce garbage-collecting code? The first makes more sense to me, but that would mean the front-end produces different output based on the target, which seems to miss the point of the GCC's front-end/back-end architecture.
Where would language-specific libraries go? For instance, the standard Java classes or standard C headers. If they are linked in at the end, then could a C program theoretically call functions from the Java library or something like that, since it is just another linked library?
Yes, the backend is at least reasonably language independent. Yes, it works with Ada.
GCJ generates native code which uses a runtime library. The garbage collector is part of the runtime library.
GCJ implements the CNI, which allows you to write code in C++ that can be used as native methods by Java code -- but being able to do this is a consequence of them having designed it in, not just an accidental byproduct of using the same back-end.
It is possible because calling convention is compatible, but name mangling is different (no mangling in C). To call C function from C++ you should declare it with extern "C". And to call C++ function from C you should declare it with mangled name (and may be with additional or different type args). The calling Fortran code is possible in some cases too, but argument passing convention is different (pass by ref in Fortran).
There were actually a converters from C++ to C (cfront) and from fortran to c (f2c) and some solutions from them are still used.
garbage-collection is implemented in run-time library, e.g. boehm. Backend should generate objects compatible with selected GC library.
Compiler driver (g++, gfortran, ..) will add language-specific libraries to linking step.

Resources