Trying to understand Wirth's Pascal pl/0 compiler code - pascal

Is there a simple explanation of Wirth's source code or even a version with a little more commenting so that I can figure out how it works?
Wirths pl/0 compiler is here: http://www.moorecad.com/standardpascal/plzero.pas
My main goal is to modify it to work with integer arrays similarly to Oberon but to touch the code as little as possible
Oberon referenced here: http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

The code is described in detail in Wirth's book, Algorithms + Data Structures = Programs. I'm looking at the 1976 edition, which contains about 70 pages about the program.
As far as I can tell, the 1976 version of the book is not online, but he later ported the code to Modula-2 and then Oberon. The Oberon edition is available as a free PDF, but the PL/0 chapter was removed and expanded into a second book (also free online), Compiler Construction.
This expanded book uses a more robust language called Oberon-0, which includes arrays, records, types, etc. He discusses in detail how to implement each of these things.
The entire compiler is different, since it's written in Oberon and targets a different machine, but all of Wirth's compilers share the same basic structure, so you should be able to map ideas between them.
Alternatively, he also wrote another expanded compiler in pascal (the "p4" reference implementation for ISO pascal. That compiler has been extensively studied and documented in the book Pascal Implementation, now transformed into a nice website with hypertext cross references to the source.
Finally, there is also a python port of the PL/0 compiler by Samuel G Williams. My fork of his PL/0 Languages Tools includes a couple additional back-ends, as well as a copy of Wirth's original code (the program you linked), modified slightly to run under Free Pascal.

Related

Is the OCR Computer Science GCSE wrong about compilers and interpreters?

I'm a secondary school student currently taking the OCR Computer Science GCSE (J276). I taught myself to program and was recently surprised by the context of a question in one of OCR's specimen papers (this one) as it goes against my knowledge of programming.
In question 5b, a question goes on to ask for a description of the differences between compilers and interpreters:
Harry can use either a compiler or an interpreter to translate the code [that he has created].
This confused me, as it seemed to suggest that the code written could either be interpreted or compiled in order to run, which would be odd as it was my understanding that languages fit into one of two boxes, interpreted (python, javascript) or compiled (c++, java), rather than fitting into both.
Is it true that a single programming language can be either compiled or interpreted based on the programmer's desire, or is this another case of OCR simplifying the course to make it easier to understand?
C is a language that is usually compiled, but interpreted implementations exist.
According to #delnan in this answer,
First off, interpreted/compiled is not a property of the language but a property of the implementation. For most languages, most if not all implementations fall in one category, so one might save a few words saying the language is interpreted/compiled too, but it's still an important distinction, both because it aids understanding and because there are quite a few languages with usable implementations of both kinds (mostly in the realm of functional languages, see Haskell and ML). In addition, there are C interpreters and projects that attempt to compile a subset of Python to C or C++ code (and subsequently to machine code).
In reality, it looks like the designers of your course said something that was true in theory, but in practice tends to be more restricted. This is found all over programming and, in fact, the world in general. Could you write a JavaScript compiler for Commodore 64? Sure, the C64 implements a full, general purpose computer system, and JavaScript is Turing Complete. Just because something is possible doesn't mean that a lot of people actually do it, though, or that it is easy.

Gcc llvm backend guides to make reading source codes a little bit easier?

I begin to get acquainted with the implementation of algorithms of code-generation and optimizations in gcc and llvm. Can anyone give an advice on where to see materials, articles, lectures about how it arranged in these compilers? I was trying to find something where described in fairly simple language such things as optimization and code generation algorithm's implementation or simply detailed explanation, but I didn't find. Maybe there is a exhaustive guide where I'll be able to find information about exact classes and methods which are called, in what files are these algorithms written, basic structures with which they operate (symbol tables and their entries, graphs, AST, struct tree and rtl in gcc etc). I'm familiar with Steven Muchnick's "Advanced compiler design and implementation", but it's quite complicated to find something similar in source codes of gcc and llvm to algorithms in ICAN notation without some useful information.
Summary:
My goal is to get acquainted with the implementation of optimization algorithms and code generation on the example of gcc and llvm. So I would like to find materials that somehow simplify reading of source code of gcc or llvm. I hope that these materials exist.
Your question is off-topic here (since about finding resources and books).
However, for GCC, I did collect several references and wrote hundreds of slides, see the documentation page of GCC MELT (and many web pages pointed from it).
For LLVM, you need to find equivalent documentation (there are lot of them too).
GCC MELT is now -in November 2017- an inactive project (so my slides cover older GCC versions). I could be funded to work on something similar.
Maybe there is a exhaustive guide
You won't find anything exhaustive and up to date because both GCC and Clang are evolving significantly and continuously. The most exhaustive is still the source code (of millions of lines, growing by a few percents each year), and the community behind it. You'll need several years of work (full-time) to comprehend these monster free software projects, and you should also follow their evolution.
Once you have spent several weeks reading about GCC and looking inside the source code, you can ask some precise questions on gcc#gcc.gnu.org. If you experiment some GCC plugin or work on your own fork of GCC, be sure to make it free software and publish now your alpha-quality -even buggy and incomplete- source code somewhere -perhaps on github- before asking, under a GPL license.
BTW, real-life compilers are much more complex than what is taught in textbooks, even as good as the Dragon Book. Nobody can understand GCC (or LLVM) completely (it is too complex for a single brain, and is evolving too fast) - and that also holds for any multi-million lines software project.
So I would like to find materials that somehow simplify reading of source code of gcc or llvm
Most of what I have written on GCC MELT (notably the slides that are not MELT specific, and all the references I have collected) fits that goal. However, the autoritative material is the -continuously changing- source code of GCC.
NB: My gcc-melt.org domain will be lost in April 2018 (and I probably won't renew it). So look on http://starynkevitch.net/Basile/gcc-melt which should be kept longer.

Are custom-made programming language- compilers, based on existing languages?

I'm trying to start, figuring out, how creating a simple programming language work. Both with the syntax and the compiler itself. I've done some research on this topic, but I really don't get what my true question is all about.
I would think, that existing programming languages- compilers, is built on already existing programming languages, and therefore it would only make sense, to also base my compiler, on one of these languages.
Altho, since this in theory, the very first language with a compiler, didn't have another language to be based on, this can't be a true fact, and really must be based on something else, like the core Computer System language.
Which way is the best way to go, aswell as how, to get to my goal, which is creating a simple (With room to expanding) programming language?
Any answer is appreciated!
The very first compilers were based on assembler coding. Where did the assemblers come from?
The very first assemblers were based on painfully entered raw binary machine code instructions.
Hardly anybody enters binary; at very least, some kind of debugger program is used to do this. Hardly anybody codes compilers using assemblers anymore either; in many cases, a first compiler for a language is coded in C.
If you want to build a programming language, your first step is to get a compiler book (google "compiler book") and read it from cover to cover. If you try to avoid this step, you'll spend a huge amount of energy to try and invent what you need, and you'll likely fail.
Key tools for building compilers are parser generators, and program transformation systems. The former is the classic answer. The latter is a high-tech answer, and isn't very common, but can produce language processing tools much more quickly than classic answers. You need the compiler book background to understand these tools.
Which way is the best way to creating a simple programming language?
Unlike a majority of people I don't believe that creating a language is about using a compiler or interpreter. While you will most likely need a compiler or interpreter to implement your new language, they are tools just as is a pencil and paper. Don't start by using a tool and think you have accomplished something. It would be like using a wrench to make an engine that doesn't work, but you claim you made an engine because use used a wrench.
To create a good programming language you have to have goal for your language.
Since you mention programming language as opposed to some other type of language such as SQL, or a markup language such as HTML, I will take it that you want a Turing complete language.
Since most Turing complete languages support arithmetic I would start with a simple arithmetic expression language and build on that. There are a huge amount of examples of these on the Internet, but be fore warned that many have problems.
Next learn how to build Abstract Syntax Trees (AST) for arithmetic expressions. i.e.
3 + 2 * 6
+
/ \
3 *
/ \
2 6
Do not use a compiler to build the AST, but build them by hand in the language you are using to write your programming language. i.e. If you are using Java to create a C++ compiler, then create the AST using Java.
Then write an evaluator for the AST that will walk the tree.
Once you are able to correctly build an AST and evaluate then add the lexer/parser which translates human readable source code into an AST. This is were you will need to get a good compiler design book.
Now you can compile the AST into assembly or byte code or just continue using an evaluator.
From this point on you just add features to your language, again starting by with the AST and then modifying the parser and code generator if you implemented one.
How to create a simple (with room to expanding) programming language?
As I noted: start with an arithmetic evaluator and add language concepts one at a time. Since you are new at this, you may find that a concept you add is actually better as a composition of simpler concepts and that you should add one of the simpler concepts first before adding the other concept finally reaching the higher concept.
Because your question is so general I can't give more specific answers. I see that you already have a few close votes noting such.
If you want to build an unlimited extensibility into your language, consider implementing a simple metaprogramming system in it.
This way you can start with some very simple and small language, and then build an arbitrary complex language or a set of different languages by extending it with its own macros. Such language can be trivially turned into any other language.
Take a look at Forth and Lisp - both can be built upon some extremely trivial core which is then extended to a fully capable language. You don't even need any other high level language to implement such a chain: a simple Forth can be bootstrapped in about a couple of hundred lines of x86 assembly.
If you're determined enough, you can even skip assembler and write in machine code straight away, for something of this scale it's quite manageable in a reasonable time and might give you some indispensable experience.
well inventing a language is inventing a language...how you implement it you usually use an existing language and then at some point assuming your new language is such that it can be used as a compiler, then you write a compiler in your new language and you use the binary from the current language to compile the same language compiler, then you do it once more with the binary from the same language compiler if that all works you are self-hosting. a compiler that can compile its own language compiler.
If you have never made a language or compiler then you are a long long way from that, you might try one of the many examples on line of a simple C like compiler that can only do some simple things (and can never self-compile), get your feet wet with something like that.
At the end of the day the programming language to be useful has to compile down to something, ideally machine code be it a real machine or virtual like python or java or old pascal. But sometimes one language compiles down to another known language, C++ for example, and then you use existing tools for that language to take down to something can execute.
It has been asked and answered a number of times now. If you go far enough back or want to get as pure as you can you start with machine code and a way to enter it (see the many computers for this, dec pdp series, altair, etc, the entry method being address, data and clock manual switches). The "compiler" or in the case of assembly/machine code the "assembler" is the human with paper and pencil or pen if you are that good. You manually write out your assembly language, you then manually convert that to machine code, then you manually flip switches to enter the program into ram then you manually push the run button.
The first assemblers and then later compilers were written this way, you make an assembler using machine code using a human assembler, then self host that. Then you either use the human assembler or software assembler to the write your first compiler for your first ever non-assembly language, then you re-write the compiler in the new language, then you self-host that. Repeat until it is present day and there are more compilers and languages that you could ever master and a myriad of choices of editors and languages to build a compiler for a new language upon.

Cuda Source to Source translation using Rose compiler

I would like to know about the extent of support for cuda in rose compiler. I am trying to build a source to source translator for cuda. Is it possible using Rose compiler? Which distribution of Rose compiler should I use?
I know this has been discussed earlier (support for cuda in rose compiler), but I cannot understand whether cuda support is there or not. Rose user manual does not have much information either.
Rose has a C++ front end and a Fortran front end that seem reasonably well integrated. The Rose system design IMHO is not amenable to easy integration of other front end parsers (such as you would need presumably to parse Cuda), although with enough effort you could do it. (Rose originally only had C++, and Fortran was grafted on).
If you don't see explicit mention of Cuda in the Rose manuals, its pretty like because it simply isn't there.
If you want to process Cuda using source to source transformations, you'll need both a Cuda parser and an appropriate set of transformation machinery something like what Rose has.
I cannot offer you a Cuda parser, but my company does provide industrial strength source-to-source program transformation systems in the form the DMS Software Reengineering Toolkit.
DMS has been used to carry out massive transformations on large C++ systems, so I think it quite reasonable to say it is at least as competent as Rose for that purpose. DMS has also been used to process extremely large C and Fortran systems, and other codes in Java, C#, ECMAScript, PHP, and many other languages, so I think it safe to say it is considerably easier to integrate a different front end into DMS.
Cuda, as I understand it, is a C99 derivative. DMS has a C front end, with explicit support for building various C dialects. Most of C99 is already built using the dialect mechanism. That might be a pretty good starting point.
You can try other tools such as ANTLR as an alternative, but I think it will soon become obvious that ANTLR, and Rose and DMS are in very different leagues in terms of their ability to parse, analyze and transform complex systems of real code.

Unified assembly language

I wonder if there exists some kind of universal and easy-to-code opcode (or assembly) language which provides basic set of instructions available in most of today's CPUs (not some fancy CISC, register-only computer, just common one). With possibility to "compile", micro-optimize and "interpret" on any mentioned CPUs?
I'm thinking about something like MARS MIPS simulator (rather simple and easy to read code), with possibility to make real programs. No libraries necessary (but nice thing if that possible), just to make things (libraries or UNIX-like tools) faster in uniform way.
Sorry if that's silly question, I'm new to assembler. I just don't find NASM or UNIX assembly language neither extremely cross-platform nor easy to read and code.
The JVM bytecode is sort of like assembly language, but without pointer arithmetic. However, it's quite object-oriented. On the positive side, it's totally cross-platform.
You might want to look at LLVM bytecode - but bear in mind this warning: http://llvm.org/docs/FAQ.html#can-i-compile-c-or-c-code-to-platform-independent-llvm-bitcode
First thing: writing in Assembly does not guarantee a speed increase. Using the correct algorithm for the job at hand has the greatest impact on speed. By the time you need to go down to Assembly to squeeze the last few drops out you can only really do that by adapting the algorithm to the specific architecture of the hardware in question. A generic HLA (High Level Assembler) pretty much defeats the purpose of writing your code in Assembly. Note that I am not knocking Randall Hyde’s HLA, which is a great product, I’m just saying that you don’t gain anything from writing Assembly the way a compiler generates machine code. Most C and C++ compilers have very good optimizers, and can produce machine code superior to almost any naïve implementation in ASM.
See if you can find these books (2nd hand, they are out of print) by Michael Abrash: "Zen of Assembly Language", and "Zen of Code Optimization". Or look if you can find his articles on DDJ. They will give you an insight into optimization second to none,
Related stuff, so I hope might be useful :
There is
flat assembler
with an approach of a kind of portable assembler.
Interesting project of operating system with graphical user interface written in assembler, and great assembly API :
Menuet OS
LLVM IR provides quite portable assembly, backed with powerful compiler, backing many projects including Clang

Resources