How do you write a compiler for a language in that language? [duplicate] - ruby

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
How can a language's compiler be written in that language?
implementing a compiler in “itself”
I was looking at Rubinius, a Ruby implementation that compiles to bytecode using a compiler written in Ruby. I cannot get my head around this. How do you write a compiler for a language in the language itself? It seems like it would be just text without anything to compile it into an executable that could then compile the future code written in Ruby. I get confused just typing that sentence. Can anyone help explain this?

To simplify: you first write a compiler for the compiler, in a different language. Then, you compile the compiler, and voila!
So, you need some sort of language which already has a compiler - but since there are many such, you can write the Ruby compiler compiler (!) e.g. in C, which will then compile the Ruby compiler, which can then compile Ruby programs, even further versions of itself.
Of course, the original compilers were written in machine code, compiled compilers for assembly, which in turn compiled compilers for e.g. C or Fortran, which compiled compilers for...pretty much everything. Iterative development in action.
The process is called bootstrapping - possibly named after Baron Munchhausen's story in which he pulled himself out of a swamp by his own bootstraps :)

Regarding the bootstrapping of a compiler it's worth reading about this devilishly clever hack.
http://catb.org/jargon/html/B/back-door.html

I get confused just reading that sentence.
It may help to think of the compiler as a translator, which compilers are often called. Its purpose is to take source code that humans can read and translate it into binary code that computers can read. In the case of Rubinius, the code that it reads happens to be Ruby code, and the code that it converts it into is machine code (actually LLVM machine code which is itself further compiled into Intel machine code, but that's just a background detail). Rubinius itself could have been written in just about any programming language. It just happened to have been written in the same language that it compiles.
Of course, you need something to run Rubinius in the first place, and this most likely a regular Ruby interpreter. Note, however, that once you are able to run Rubinius on an interpreter, you can pass it its own source code, and it will create and run a compiled version of itself. This is called bootstrapping, from the old phrase, "pulling yourself up by the bootstraps".
One final note: Ruby programs can't invoke arbitrary machine code. That part of Rubinius is actually written in C++.

Well it is possible to do it in the following order:
Write a compiler in any language, say C for your Ruby code.
Now that you can compile Ruby code, you can write a compiler that compiles ruby code and compile this compiler with the C compiler you wrote in step 1. wahh this sentence is strange!
From now on you can compile all your ruby code with the compiler written in 2. :)
Have fun! :)

A compiler is just something that transforms source code into an executable. So it doen't matter what it is written in - it can be the same language it is compiling or any other language of sufficient power.
The fun comes when you are writing a compiler for a language for a platform, written in the same language, that doesn't yet have a compiler for your implementation language. Your choices here are to compile on another platform for which you do have a compiler, or write a compiler in another language, and use that to compile the "real" compiler.

It's a 2 step process:
write a Ruby compiler in some other lanaguage like C, assuming a Ruby compiler doesn't yet exist
since you now have a Ruby compiler, you can write a Ruby program that is a (new) Ruby compiler
Since somebody already wrote a Ruby compiler (Matz), you "only" have to do the second part. Easier said than done.

All of the answers so far have explained how to bootstrap the compiler by using a different compiler. However, there is an alternative: compiling the compiler by hand. There's no reason why the compiler has to be executed by a machine, it can just as well be executed by a human.

Related

How to compile P4 from source code?

I am a student in Computer Science, and I am learning programming with Pascal.
I have found an interesting Pascal compiler, P4 (http://homepages.cwi.nl/~steven/pascal/).
To know more about Pascal, I am trying to compile their source code, but I failed.
In this web page, they said:
Compile pcom.p and pint.p with a Pascal compiler. You obviously have to have a Pascal compiler already. This gives you a Pascal compiler (pcom) that produces P4 code, and an interpreter (pint) that runs P4 code.
To use the compiler, run pcom with the Pascal program as standard input. This produces any diagnostics on standard output, and its code on a Pascal file that is called prr. Check with your Pascal compiler how this gets assigned to a file in the filestore. You may have to change the lines 'rewrite(prr)' in pcom.p and pint.p and 'reset(prd)' in pint.p for your compiler, for instance to "rewrite(prr, 'prr')" etc.
To run the resulting code, run pint with the prr output produced by pcom as input for the file 'prd', and input for the compiled Pascal program on standard input.
I have compiled it with Free Pascal (on https://ideone.com/), but failed too.
Free Pascal Compiler version 2.6.4+dfsg-4 [2014/10/14] for i386
Copyright (c) 1993-2014 by Florian Klaempfl and others
Target OS: Linux for i386
Compiling pcom.p
pcom.p(1,3) Warning: Unsupported switch "$L"
pcom.p(88,23) Fatal: Syntax error, ":" expected but ")" found
Fatal: Compilation aborted
Error: /usr/bin/ppc386 returned an error exitcode (normal if you did not specify a source file to be compiled)
I don't know how to compile this source code in Windows machine, because I know Pascal language only.
Can I compile it with Turbo Pascal (without any requirement) on Windows XP? Can you remove some part of script for Pascal compiling only?
Free Pascal's Florian has been working getting Scott Moore's P5 compiler (which is a P4 compiler accepting a larger subset of Pascal) to work with FPC's ISO mode for old sources. However it will work (mostly) only in development versions (including the upcoming "stable" branch 3.0.x).
I tried last summer and it compiled and generally worked with FPC 3.x and the -Miso parameter (to select ISO style dialects). IIRC the last thing fixed was ISO style parameter transfer.
I quickly tried the referenced P4 compiler version and it seems to stumble on a few spots with "comment this" comments related to switching back and fro from ISO Mode. If I comment those files, pint compiles. (and then you could run the original bytecode if necessary)
pcom then still stumbles on taking the ord() of a pointer, which is obviously not very portable either, but unfortunately with 20+ occurrences that have to be replaced with ord(ptrint()).
pcom still doesn't compile then, FPC doesn't like passing union fields to VAR parameters. Working around that with a variable and the source compiles, 15 minutes total.
The fixed sourcecode with extra mode statements is at http://www.stack.nl/~marcov/files/p4fixed.zip but requires (as yet unreleased) FPC 3.0 or newer.
The resulting EXE binary can compile the original pcom source to bootstrap itself to bytecode.
You want to get an ISO 7185 compliant compiler to compile that. It is true that Pascal-P4 (the proper name) was written prior to the ISO 7185 standard. However, the adaption to the standard is generally less of a change set than adaption to a dielect.
You will find that work already done and documented at:
http://sourceforge.net/projects/pascalp4/
It specifies use of GPC. However, as Marco said, it is possible with more work to adapt to FPC, and I believe the FPC folks are improving the ISO 7185 capability of their compiler.
Having said that, I'm not sure why Pascal-P4 would be an interesting target. Pascal-P4 was a subset compiler, meaning an incomplete implementation of Pascal. You will find a complete implementation as Pascal-P5:
http://sourceforge.net/projects/pascalp5/
And I believe it has less portability issues as well.
Good luck.

In which Platform C language Coded?

I just Likely know that in which platform operating system coded.
as per my knowledge.
Windows kernel written in C language.
Linux kernel is also written in C language.
but remain operating system in?
In which Platform C language is written?
Yes, the Windows kernel and Linux kernels are written in C. Most operating systems tend to be.
There are operating systems written in other languages though, the Chorus kernel for example is written in C++.
Most C compilers are also written in C. That has the advantage that once you managed to get the compiler running on the machine (generally by compiling it on another machine that already has a working compiler/cross compiler), the machine itself can compile updates to its own compiler without maintaining yet another compiler.
Most parts of the C compiler (like gcc) are written in C themselves. Of course you would need something to bootstrap your compiler such that it can compile itself. That would then be a lower type language like Assembler.
The C language is one of many languages that are considered to be Self Hosting - that is to say that the compiler can compile its own source code, which is written in the same language that the compiler is designed to compile.
You might also want to look into the process of Bootstrapping, which is the process used to get the first compiler for a particular language to run on a given platform - as others have noted, this can be by way of cross-compiling, or by writing the original compiler in a different language, though other techniques are possible.
First off, you might want to improve your question with actual sentences.
Second,
C is not written in a platform, it is written in another programming language.
Most compilers are written in assembler, a somewhat readable version of the actual machine codes sent to the processor.
I don't know if there are other compilers, written in some intermediary language but eventually, everything boils down to assembler code, which compiles to machine code.

Configuring GCC with FreeRTOS and OpenOCD

I'm pretty sure this is possible but I'm not sure how to go about it. I'm very new to building with GCC in general and I have never used FreeRTOS, but I'd like to try getting the OS up and running on a TI ARM Cortex MCU but with a slight twist: I'd like to get it up and running with Pascal. I'm curious:
Is this even possible to get work? If not, the next issues are kind of moot points.
From my Delphi days, I vaguely recall the ability to access functions in C libraries. I'm wondering if I would have access to the C routines in FreeRTOS.
If I use the GCC version (preferable) would I be able to debug using OpenOCD on the target? I'm not quite sure how debug symbols work and if it's more or less language agnostic (hopefully, in this case).
As kind of a bonus question a bit outside the scope of the original query, can I simulate FreeRTOS on an x86 processor (e.g. my development PC) for easier debugging during development? (With a Pascal program, of course..)
I haven't found any documentation on achieving this, so hopefully someone here can shed some light! Any resources would be most helpful. Like I said, I'm very new to this kind of development. I'm also open to suggestions if you think there is a better alternative.
FYI, my preferred host configuration would be something similar to:
Linux (Ubuntu/Debian)
Eclipse IDE for development, unit testing, and hopefully simulation / debugging
OpenOCD for target debugging
GNU Pascal + FreeRTOS on target
FreeRTOS is C source code, so like you say you would have to have some mechanism for linking C with your Pascal programs. Also, FreeRTOS relies on certain registers to be used for things like passing a parameter into a task (as a hypothetical example, the task might always expect the parameter to be in register R0) so you would have to ensure the ABI for the C compiler and the Pascal compiler was the same - or have your task entry in C then have it call a Pascal function (very nasty). Then there is the issue of interrupts, calling inline macros, etc. I would say this would be extremely difficult to achieve.
Both GNU Pascal and Free Pascal support linking to C (gcc) and ARM, as well as calling pascal code from C etc. Writing a header and declaring the prototypes with cdecl is all there is to it.
Macros are a bit bigger problem. Usually I just rewrite them to inline functions (what they should have been anyway). Except for the macro/header issue, the problems are more compiler specific functionality (which you also would have a problem with when porting from one C compiler to the next)
If you prefer TP/Delphi dialect, Free Pascal is the better choice.
I run my old Delphi code fine on my sheevaplug.
There is already an example for FreeRTOS/GCC/OpenOCD on a TI Cortex-M3 (was Luminary Micro Cortex-M3). Be aware though that this is a really old example and both the Eclipse and OpenOCD versions used are out of date.
Although there is an Eclipse project provided, the project is configured as a standard make (as opposed to a managed make) project, so there is a standard makefile that can be just as easily executed from the command line as from within Eclipse.
http://www.freertos.org/portLM3Sxxxx_Eclipse.html

What is the state of Ruby as a compiled language?

Ruby has been around for a while now so I was wondering if there was any work being done on a compiler for it? I know that compiler design is hindered by things like Eval() so I would not expect implementations to be 100 percent accurate? My own searches have turned up sparse results.
MacRuby offers Ahead-of-Time Compilation as of v0.5. It uses LLVM to compile binaries that will run on the Objective-C runtime.
Rubinius is a JIT compiler for Ruby. A pure compiler will never exist for Ruby because the language is far too dynamic for a static compiler to work. Whatever it did internally would be incredibly ugly and would evolve towards a JIT as they tried to optimize it anyway.
There's Mirah, for compiling Ruby code into Java bytecode:
http://www.mirah.org/
I believe you could obfuscate your code this way.

Is it possible to compile Ruby to byte code as with Python?

In Python, if I want to give out an application without sources I can compile it into bytecode .pyc, is there a way to do something like it in Ruby?
I wrote a much more detailed answer to this question in the question "Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?"
The answer is: it depends. The Ruby Language has no provisions for compiling to bytecode and/or running bytecode. It also has no specfication of a bytecode format. The reason for this is simple: it would be much too restricting for language implementors if they were forced to use a specific bytecode format, or even bytecodes at all. For example, XRuby and JRuby compile to JVM bytecode, Ruby.NET and IronRuby compile to CIL bytecode, Cardinal compiles to PAST, SmallRuby compiles to Smalltalk/X bytecode, MagLev compiles to GemStone/S bytecode. For all of these implementations it would be plain stupid to use any other bytecode format than the one they currently use, since their whole point is interoperating with other language implementations that use the same bytecode format.
Simlar for MacRuby: it compiles to native code, not bytecode. Again, using bytecode would be stupid, since one of the goals is to run Ruby on the iPhone, which pretty much requires native code.
And of course there is MRI, which is a pure AST-walking script interpreter and thus doesn't have a bytecode format.
That being said, there are some Ruby Implementations which allow compiling to and loading from bytecode. Rubinius allows that, for example. (Indeed, it has to have that functionality since its Ruby compiler is written in Ruby, and thus the compiler must be compiled to Rubinius bytecode first, in order to solve the Catch-22.)
YARV also can save and load bytecode, although the loading functionality is currently disabled until a bytecode verifier is implemented that prevents users from loading manipulated bytecode that could crash or otherwise subvert the interpreter.
But, of course, both of these have their own bytecode formats and don't understand each other's (nor tinyrb's or RubyGoLightly's or ...) Also, neither of those formats is understood by a JVM or a CLR and vice versa.
However, the whole point is irrelevant because, as Mark points out, you can always reverse engineer the byte code anyway, especially in cases like CPython, PyPy, Rubinius, YARV, tinyrb, RubyGoLightly, where the bytecode format was specifically designed to be very close to the source language.
In general it is simply impossible to protect code that way. The reason is simple: you want the machine to be able to execute the code. (Otherwise what's the point in writing it in the first place?) However, in order to execute the code, the machine must understand the code. Since machines are much dumber than humans, it follows that any code that can be understood by a machine can just as well be understood by a human, no matter whether that code happens to be in source form, bytecode, assembly, native code or a deck of punch cards.
There is only one workable technical solution: if you control the entire execution pipeline, i.e. build your own CPU, your own computer, your own operating system, your own compiler, your own interpreter, and so forth and use strong cryptography to protect all of those, then and only then might you be able to protect your code. However, as e.g. Microsoft found out the hard way with the XBox 360, even doing all of that and hiring some of the smartest cryptographers and mathematicians on the planet, doesn't guarantee success.
The only real solution is not a technical but a social one: as soon as you have written your code, it is automatically fully protected by copyright law, without you having to do one single thing. That's it. Your code is protected.
The short answer is "YES",
check rubini.us
It will solve your problem.
Here is how to compile ruby code:
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Although Ruby's 1.9 YARV VM is a byte-code compiler I don't believe it can dump the byte-code to disk. You might want to look at the alternative compiler, Rubinius, I believe it has this ability. You should note though that byte-code pyc files (and I imagine the ruby equivalent) can be pretty easily "decompiled".
Not with the MRI interpretter, no.
Some newer VM's are being worked on where this is on the table, but these aren't widely used (or even ready to be used) at this point.
If you use Jruby, you can compile your Ruby code into Java .class files (including your Rails stuff) to execute them with (open)jdk out of the box!
You can even compile your complete stuff into a .war file to deploy it on Apache Tomcat or Jboss with a tool called "warbler"
https://rubygems.org/gems/warbler/
Depends on your ruby.
JRuby - https://github.com/jruby/jruby/wiki/JRubyCompiler
MRuby - http://mruby.org/docs/articles/executing-ruby-code-with-mruby.html
MRI (C)Ruby - https://devtechnica.com/ruby-language/compile-ruby-code-to-binary-and-execute-it

Resources