Can I compile code from multiple languages together with LLVM? - compilation

Can I write a program in two different languages and compile them together in one LLVM executable?
For example, part of my program is in C++, and part of it is in D.

Not in the general case. Only if the languages are ABI-compatible. This is true for C and C++ to a very limited degree (extern "C" code from the C++ side), and much less so for other languages.

Related

Do all compiled codes have same speed no matter what language they were written in?

Suppose I write a program in both Python and C++ and I turn these to executable. Now, will both the executable have the same speed or will it vary (I guess it shouldn't cause it should now be in machine code form) ?
Suppose I write a program in both Python and C++ and I turn these to executable. Now, will both the executable have the same speed
Of course usually not (assuming both code implement the same algorithm). And the runtime speed depends a lot of the compiler itself (e.g. tinycc -for C- and GCC or Clang ....) and even of its versions and compilation flags (e.g. -Os vs -O2 with g++). BTW, Python is compiled to some bytecode, not to machine code.
Of course, some software are mostly spending CPU time elsewhere (e.g. in some relational database manager such as PostGreSQL). Then rewriting them in C++ instead of Python won't gain a lot of performance. And some software are mostly IO bound (e.g. tar(1) used without compression)
At last, some C++ programs could generate machine code at runtime (e.g. using AsmJit...) using partial evaluation techniques, which may give a huge speedup.
On Linux, you could generate some C or C++ code at runtime, compile it as a temporary plugin, then dlopen(3) that temporary plugin (fetching new function pointers using dlsym(3)... Adapt the manydl.c example to your needs)
Also, C++ is a very difficult language to learn. Read some good book about it.
Read of course the Dragon book.
Since an entire book is needed to answer your question !

Can a compiler, that generate a source code from a simple text, be considered as a source-to-source compiler?

I'm writing a compiler to generate a JSON code from a random simple text but I didn't understand the type of these kind of compiler. can I consider it as a source-to-source compiler?
A source-to-source compiler is a compiler that takes the source code of a program written in one programming language as its input and produces the equivalent source code in another programming language.
(the problem is the input is a text, not a the source code of a program written in one programming language)
Or is there another type for these kind of compilers?
and thank you
Generally, a source-to-source compiler is understood to
translate between programming languages that operate at approximately the same level of abstraction
wiki
Thus, I would argue -- that if by "random simple text" you mean a simple English phase -- you are just writing a regular old compiler.
I.E., I would consider English a "high-level language," and JSON a relatively "low-level language." Thus meaning that you are compiling from a higher level of abstraction to a lower level of abstraction -- just like a regular compiler.

How can I detect architecture with fortran while compiling?

I am actually working on a scientific project in Fortran and the set of employed functions are divided into the 64bit and 32bit version. In addition, some variables are defined with different properties for a same function in two different architectures. For example, in 32bit a variable is INTEGER*4 while in 64 bit it is INTEGER*8.
Now, I saw that in C++ it is possible to check this using #ifndef at the beginning of the file, like it was explained in this post. Is there something available in Fortran? Which possible solutions would you suggest me?
Keep in mind that the project should run on Windows and Linux, with a large variety of architecture. But still any suggestion would be appreciated!
Edit: to reply to some comments, imagine you want to employ PARDISO solver, part of the MKL libraries. There are two subroutines that we can call: pardiso and pardiso_64. Pardiso requires a variable, called PT in the manual (page 6, here), that allows pardiso to work with data. In the 32 bit version, it is a INTEGER*4, while in the 64 bit one is INTEGER*8. Basically, i do not want to allocate memory for the two and then select the right variable with a IF statement.
I immagine now that preprocessing would do the job, but has it to be a C preprocessor even if I am working in Fortran? For example, would Intel Fortran call the C preprocessor as gcc/gfortran does?
You can test the properties of variables with Fortran intrinsic functions, such as range. There is no need to use preprocessor directives for this. The intrinsics, as part of the language, would be standard and portable.
As already answered, most Fortran compilers do support preprocessor directives.

Does a compiler always produce an assembly code?

From Thinking in C++ - Vol 1:
In the second pass, the code generator walks through the parse tree
and generates either assembly language code or machine code for the
nodes of the tree.
Well at least in GCC if we give the option of generating the assembly code, the compiler obeys by creating a file containing assembly code. But, when we simply run the command gcc without any options does it not produce the assembly code internally?
If yes, then why does it need to first produce an assembly code and then translate it to machine language?
TL:DR different object file formats / easier portability to new Unix platforms (historically) is one of the main reasons for gcc keeping the assembler separate from the compiler, I think. Outside of gcc, the mainstream x86 C and C++ compilers (clang/LLVM, MSVC, ICC) go straight to machine code, with the option of printing asm text if you ask them to.
LLVM and MSVC are / come with complete toolchains, not just compilers. (Also come with assembler and linker). LLVM already has object-file handling as a library function, so it can use that instead of writing out asm text to feed to a separate program.
Smaller projects often choose to leave object-file format details to the assembler. e.g. FreePascal can go straight to an object file on a few of its target platforms, but otherwise only to asm. There are many claims (1, 2, 3, 4) that almost all compilers go through asm text, but that's not true for many of the biggest most-widely-used compilers (except GCC) that have lots of developers working on them.
C compilers tend to either target a single platform only (like a vendor's compiler for a microcontroller) and were written as "the/a C implementation for this platform", or be very large projects like LLVM where including machine code generation isn't a big fraction of the compiler's own code size. Compilers for less widely used languages are more usually portable, but without wanting to write their own machine-code / object-file handling. (Many compilers these days are front-ends for LLVM, so get .o output for free, like rustc, but older compilers didn't have that option.)
Out of all compilers ever, most do go to asm. But if you weight by how often each one is used every day, going straight to a relocatable object file (.o / .obj) is significant fraction of the total builds done on any given day worldwide. i.e. the compiler you care about if you're reading this might well work this way.
Also, compilers like javac that target a portable bytecode format have less reason to use asm; the same output file and bytecode format work across every platform they have to run on.
Related:
https://retrocomputing.stackexchange.com/questions/14927/when-and-why-did-high-level-language-compilers-start-targeting-assembly-language on retrocomputing has some other answers about advantages of keeping as separate.
What is the need to generate ASM code in gcc, g++
What do C and Assembler actually compile to? - even compilers that go straight to machine code don't produce linked executables directly, they produce relocatable object files (.o or .obj). Except for tcc, the Tiny C Compiler, intended for use on the fly for one-file C programs.
Semi-related: Why do we even need assembler when we have compiler? asm is useful for humans to look at machine code, not as a necessary part of C -> machine code.
Why GCC does what it does
Yes, as is a separate program that the gcc front-end actually runs separately from cc1 (the C preprocessor+compiler that produces text asm).
This makes gcc slightly more modular, making the compiler itself a text -> text program.
GCC internally uses some binary data structures for GIMPLE and RTL internal representations, but it doesn't write (text representations of) those IR formats to files unless you use a special option for debugging.
So why stop at assembly? This means GCC doesn't need to know about different object file formats for the same target. For example, different x86-64 OSes use ELF, PE/COFF, MachO64 object files, and historically a.out. as assembles the same text asm into the same machine code surrounded by different object file metadata on different targets. (There are minor differences gcc has to know about, like whether to prepend an _ to symbol names or not, and whether 32-bit absolute addresses can be used, and whether code has to be PIC.)
Any platform-specific quirks can be left to GNU binutils as (aka GAS), or gcc can use the vendor-supplied assembler that comes with a system.
Historically, there were many different Unix systems with different CPUs, or especially the same CPU but different quirks in their object file formats. And more importantly, a fairly compatible set of assembler directives like .globl main, .asciiz "Hello World!\n", and similar. GAS syntax comes from Unix assemblers.
It really was possible in the past to port GCC to a new Unix platform without porting as, just using the assembler that comes with the OS.
Nobody has ever gotten around to integrating an assembler as a library into GCC's cc1 compiler. That's been done for the C preprocessor (which historically was also done in a separate process), but not the assembler.
Most other compilers do produce object files directly from the compiler, without a text asm temporary file / pipe. Often because the compiler was only designed for one or a couple targets, like MSVC or ICC or various compilers that started out as x86-only, or many vendor-supplied compilers for embedded chips.
clang/LLVM was designed much more recently than GCC. It was designed to work as an optimizing JIT back-end, so it needed a built-in assembler to make it fast to generate machine code. To work as an ahead-of-time compiler, adding support for different object-file formats was presumably a minor thing since the internal software architecture was there to go straight to binary machine code.
LLVM of course uses LLVM-IR internally for target-independent optimizations before looking for back-end-specific optimizations, but again it only writes out this format as text if you ask it to.
The assembler stage can be justified by two reasons:
it allows c/c++ code to be translated to a machine independent abstract assembler, from which there exists easy conversions to a multitude of different instruction set architectures
it takes out the burden of validating correct opcode, prefix, r/m, etc. instruction encoding for CISC architectures, when one can utilize an existing software [component].
The 1st edition of that book is from 2000, but is may as well talk about the early 90's, when c++ itself was translated to c and when the gnu/free software idea (including source code for compilers) was not really known.
EDIT: One of several nonsensical abstract machine independent languages used by GCC is RTL -- Register Transfer Language.
It's a matter of compiler implementation. Assembly code is an intermediate step between higher-level language (the one being compiled) and the resulting binary output. In general it's easier first to convert to assembly and after that to binary code instead of directly creating the binary code.
Gcc does create the assembly code as a temporary file, calls the assembler, and maybe the linker depending on what you do or dont add on the command line. That makes an object and then if enabled the binary, then all the temporary files are cleaned up. Use -save-temps to see what is really going on (there are a number of temporary files).
Running gcc without any options absolutely creates an asm file.
There is no "need" for this, it is simply how they happened to design it. I assume for multiple reasons, you will already want/need an assembler and linker before you start on a compiler (cart before the horse, asm on a processor before some other language). "The unix way" is to not re-invent tools or libraries, but just add a little on top, so that would imply going to asm then letting the assembler and linker do the rest. You dont have to re-invent so much of the assemblers job that way (multiple passes, resolving labels, etc). It is easier for a developer to debug ascii asm than bits. Folks have been doing it this way for generations of compilers. Just in time compilers are the primary exception to this habit, by definition they have to be able to go to machine code, so they do or can. Only recently though did llvm provide a way for the command line tools (llc) to go straight to object without stopping at asm (or at least it appears that way to the user).

can you link D object files with C object files?

Let's say I have two source files, one written in the D programming language and the other one written in the C programming language. I both just compile them, the D source with the DMD (Digital Mars D-Compiler) and the C source with the GCC compiler.
The result will be two .o (object) files which originated from a different source. Is it possible to link these two files into one executable?
That depends on lots of things. There are different ways of handling arguments: The caller sets them up, the callee cleans up (Pascal-style in Windows, more compact); or the caller sets up and cleans up (C style, uses more space as the cleanup is repeated for each call site). Arguments can be passed by value or reference. Data (particularly arrays and structures) can be laid out differently in memory. From a rapid look at D's homepage it has stuff like inmutable data and native associative arrays, that would have to be matched in C (and probably requires linking in D's runtime, and unless that one builds on your system's C library you'll be in a lot of pain). And so on. If you know details of how things are done, you can certainly provide the necessary glue and missing compiler support functions, but easy it won't be. In case of GCC compilers there are guarantees and commonalities that help, for unrelated compilers it is probably more of a gamble. There is a LLVM based D compiler, which I'd guess has more chance of working with gcc, as one of clang's objectives is GCC compatibility.

Resources