Compiling Pharo to C? - compilation

It is said that Pharo's VM (CogVM) is developed, tested, profiled and etc in Smalltalk, but then the Smalltalk code is transcompiled to C, which is then compiled along side with some OS abstraction C code using the default system C compiler.
Well, I'd like to do a similar thing, I wan't to develop, test and profile code using Pharo, but then compile it to C. How can I do it? How the compilation to C works? Does Pharo comes with a Smalltalk to C transcompiler? How can I invoke it? Does it compile the full Smalltalk, or I have to use some kind of a Smalltalk subset? Is there any good documentation about it?

The Pharo VM is hosted on github.
Follow the steps to build it and you'll get a Smalltalk image called "generator.image" which you can run (it's a regular image). Inside of that image you'll find the VMMaker package which is responsible for generating the C code from the special Smalltalk dialect used for this (which is called Slang; it's a subset of Smalltalk). Look at the code in the generator image to get a feel for what it does. There's also some information contained in the workspaces that are open when you first open the image.
As soon as you have the C sources it's basically straight forward C compilation (which we do with Cmake + gcc / clang).
As for documentation: you should probably read the Blue Book.
clarifiation
As #Leandro Caniglia points out in the comment, the purpose of Slang is to generate C source code for the VM. It has been designed to ease translation to C. That does not mean that:
arbitrary Smalltalk code can be translated to C using the generating mechanism
arbitrary Smalltalk code can be rewritten in Slang (at least not "easily")

Related

Are there any good extensible language cross compilers?

I am working on a project right now, and I would greatly enjoy being able to extend a cross compiler to convert some code into other languages. For example, I might have an AST of some code, and I would like to pass that off to a cross compiler with the intended language and receive some code in the language specified in return.
So to sum it up: is there any extensible cross compiler that I can just give an AST or equivalent and receive code in return?
(I know about Haxe, but the compiler is not very extensible and I would prefer to not transpile)
I have made the decision to use LLVM as the native compiler, and will write my own custom transpilers to other languages, as I could find no other decent option. If you would like to follow my project, head over to Provalang.

What is a programming language called if it is not interpreted or compiled

If I had a "host" application that was executed at some point and knew the location of some code. What would it be called if it then read that code real-time and then did the proper response such as creating a window in this code:
int main()
{
create magical mystical window()
}
I know that if a language compiled the code directly into binary it would be called a compiled language and that if a language converted the code into another language it would be called a interpreted language.
I know programming languages that read the code and convert it to another language and then compiled it would be called an interpreted language.
No, that's a compiled language with an extra step.
What you are describing is an interpreted language where an interpreter figured out what each line of code means while it is running.
Actually, you're wrong in what you know.
A piece of software that takes code and converts it to binary that
can run on the machine's OS and/or hardware directly is often called
compiler.
If the compiler produces code for a different platform than the one on which it itself executes then this is sometimes called a cross compiler.
A piece of software that takes code and executes it
directly, converting its high-level structures into low level
structures that run on the machine's OS and/or hardware is usually called an
interpreter.
A piece of software that takes code and converts it into
another set of code that can then be compiled or interpreted is
sometimes called a transpiler.
However, things aren't quite as simple as that. For example, java is interpreted but it dynamically compiles some of the code it runs, but, we still call it an interpreted language. C is called a compiled language, but a lot of compilers will turn C into assembler, then assemble that into bytecode that the processor will run. So, C is a transpiled language in reality, but we call it a compiled language both by convention, and by the fact that some modern compilers (unfortunately) bypass the assembler step.
So, for a lot of languages, what they are is determined by convention and how it's being used. But, as David Schwarz has just said in his comment to his own answer to this question:
Really, it's not a particularly good idea to describe the execution process as an attribute of the language.

GCC technical details

I don't know if this is the right place for things like this, but I am curious about a few aspects of the GCC front-end/back-end architecture:
I know I can compile .o files from C code and link them to C++ code, and I think I can do it the other way round, too. Does this work because the two languages are similar, or because the GCC back-end is really language-independent? Would this work with ADA code too? (I don't even know if that makes sense, since I don't know ADA or if it even has "functions", but the question is understood. If it makes no sense, think "Pascal" or even "my own custom language front-end")
Where would garbage-collection be implemented? For example, a Java front-end. The way I understand, if compiling to a JVM back-end, the "platform" will take care of the GC, and so the front-end needs not do anything about it, but if compiling to native code, would the front-end send garbage-collecting GENERIC code to the back-end, or does it turn on some flag telling the back-end to produce garbage-collecting code? The first makes more sense to me, but that would mean the front-end produces different output based on the target, which seems to miss the point of the GCC's front-end/back-end architecture.
Where would language-specific libraries go? For instance, the standard Java classes or standard C headers. If they are linked in at the end, then could a C program theoretically call functions from the Java library or something like that, since it is just another linked library?
Yes, the backend is at least reasonably language independent. Yes, it works with Ada.
GCJ generates native code which uses a runtime library. The garbage collector is part of the runtime library.
GCJ implements the CNI, which allows you to write code in C++ that can be used as native methods by Java code -- but being able to do this is a consequence of them having designed it in, not just an accidental byproduct of using the same back-end.
It is possible because calling convention is compatible, but name mangling is different (no mangling in C). To call C function from C++ you should declare it with extern "C". And to call C++ function from C you should declare it with mangled name (and may be with additional or different type args). The calling Fortran code is possible in some cases too, but argument passing convention is different (pass by ref in Fortran).
There were actually a converters from C++ to C (cfront) and from fortran to c (f2c) and some solutions from them are still used.
garbage-collection is implemented in run-time library, e.g. boehm. Backend should generate objects compatible with selected GC library.
Compiler driver (g++, gfortran, ..) will add language-specific libraries to linking step.

Extracting the Control Flow Graph from the gcc output

I am trying to extract the Control Flow Graph from the assembly code that gcc produces. I have manage to dump the CFG of several IRs (rtl phases) into .vcg files using the arguments -fdump-rtl-* and -dv. Is there any way to do the same thing but for the final assembly code? I would like a generic, target-independent and easy to be parsed representation (like vcg representation). My source code is in C language (in case that it plays any important role).
Best regards,
Michalis.
Intel PTU and VTune will do it if you can run the app for profiling... not sure if it can generate the graph without having run the code though. Otherwise you might be looking at something like this: http://compilers.cs.ucla.edu/avrora/cfg.html.

What tools for migrating programs from a platform A to B

As a pet project, I was thinking about writing a program to migrate applications written in a language A into a language B.
A and B would be object-oriented languages. I suppose it is a very hard task : mapping language constructs that are alike is doable, but mapping libraries concepts will be a very long task.
I was wondering what tools to use, I know this has to do with compilation, but I'm a bit afraid to use Lex and Yacc and all that stuff.
I was thinking of maybe using the Eclipse Modeling Framework, which would help me write models (of application code) transformations in a readable form.
But first I would have to write parsers for creating the models (and also create the metamodel from the language grammar).
Are there tools that exist that would make my task easier?
You can use special transformation tools/languages for that TXL or Stratego/XT.
Also you can have a look and easily try Java to Python and Java to Tcl migrating projects made by me with TXL.
You are right about mapping library concepts. It is rather hard and long task. There are two ways here:
Fully migrate the class library from language A to B
Migrate classes/functions from language A to the corresponding concepts in language B
The approach you will choose depends on your goals and time/resources available. Also in many cases you wont be doing a general A->B migration which will cover all possible cases, you will need just to convert some project/library/etc. so you will see in your particular cases what is better to do with classes/libraries.
I think this is almost impossibly hard, especially as a personal project. But if you are going to do it, don't make life even more difficult for yourself by trying to come up with a general solution. Choose two specific real-life programming languages ind investigate the possibities of converting between them. I think you will be shocked by the number of problems and issues this will expose.
There are some tools for direct migration for some combinations of A and B.
There are a variety of reverse engineering and code generation tools for different languages and platforms. It's fairly rare to see reverse engineering tools which capture all the semantics of the source language, and the semantics of UML are not well defined ( since it's designed to map to different implementation languages, it itself doesn't define a complete execution model for its behavioural representations ), so you're unlikely to be able to reverse engineer and generate code between tools. You may find one tool that does full reverse engineering and full code generation for your A and B languages, and so may be able to get somewhere.
In general you don't use the same idioms on different platforms, so you're more likely to get something which emulates A code on B rather than something which corresponds to a native B solution.
If you want to use Java as the source language(that language you try to convert) than you might use Checkstyle AST(its used to write Rules). It gives you tree structure with every operation done in the source code. This will be much more easier than writing your own paser or using regex.
You can run com.puppycrawl.tools.checkstyle.gui.Main from checkstyle-4.4.jar to launch Swing GUI that parse Java Source Code.
Based on your comment
I'm not sure yet, but I think the source language/framework would be Java/Swing and the target some RIA language like Flex or a Javascript/Ajax framework. – Alain Michel 3 hours ago
Google Web Toolkit might be worth a look.
See this answer: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

Resources