Is the OCR Computer Science GCSE wrong about compilers and interpreters? - compilation

I'm a secondary school student currently taking the OCR Computer Science GCSE (J276). I taught myself to program and was recently surprised by the context of a question in one of OCR's specimen papers (this one) as it goes against my knowledge of programming.
In question 5b, a question goes on to ask for a description of the differences between compilers and interpreters:
Harry can use either a compiler or an interpreter to translate the code [that he has created].
This confused me, as it seemed to suggest that the code written could either be interpreted or compiled in order to run, which would be odd as it was my understanding that languages fit into one of two boxes, interpreted (python, javascript) or compiled (c++, java), rather than fitting into both.
Is it true that a single programming language can be either compiled or interpreted based on the programmer's desire, or is this another case of OCR simplifying the course to make it easier to understand?

C is a language that is usually compiled, but interpreted implementations exist.
According to #delnan in this answer,
First off, interpreted/compiled is not a property of the language but a property of the implementation. For most languages, most if not all implementations fall in one category, so one might save a few words saying the language is interpreted/compiled too, but it's still an important distinction, both because it aids understanding and because there are quite a few languages with usable implementations of both kinds (mostly in the realm of functional languages, see Haskell and ML). In addition, there are C interpreters and projects that attempt to compile a subset of Python to C or C++ code (and subsequently to machine code).
In reality, it looks like the designers of your course said something that was true in theory, but in practice tends to be more restricted. This is found all over programming and, in fact, the world in general. Could you write a JavaScript compiler for Commodore 64? Sure, the C64 implements a full, general purpose computer system, and JavaScript is Turing Complete. Just because something is possible doesn't mean that a lot of people actually do it, though, or that it is easy.

Related

Are programming languages converted in machine code by compilers?

If so, why different programs written in different languages have different execution speeds?
Simple answer: they don't produce the same machine code. They might produce different machine code which still produces the same side effects (same end result), but via different machine instructions.
Imagine you have two interpreters (let's say male and female just to distinguish them) to translate what you say into some other language. Both of them may translate what you say properly into the desired language, but they won't necessarily be equally efficient. One of them might feel the need to explain more of what you meant, one might be very terse and translate what you say in a very short and sweet way.
Performance doesn't just vary between languages. They vary between compilers for the same programming language.
For example, with C, the performance difference between GCC and Tiny-C can be about 2 to 3x, with Tiny-C being roughly 2-3 times slower.
And it's because even within the same programming language (C), GCC and Tiny-C don't produce identical machine instructions. In the case of Tiny-C, it was optimized to compile quickly, not to produce code that runs as quickly. For example, it doesn't make the best use of the fastest form of memory available to the machine (registers) and spills more data into the stack (which uses anything from L1 to DRAM depending on the access patterns). Because it doesn't bother to get so fancy with register allocation, Tiny-C can compile code quite quickly, but the resulting code isn't as efficient.
If you want a more in-depth answer, then you should study compiler design starting with the Dragon Book.
Though programs written in different languages are converted into machine code at the end of the day, different languages have different implementation to say same thing.
You can take analogy from human languages e.g the English statement I am coming home. is translated to Chinese as 我未来的家。, as you can see the Chinese one is more concise though it is not always true; same concept applies to programming languages.
So in the case of programming languages a machine code X can be written in programming language A as 2X-X, programming language B as X/2 + X/2...but executing machine code X and 2X-X will result same result though their performance wont same ( this is hypothetical example but hope it makes sense.)
Basically it is not guaranteed that a program with same output written in different programming languages results in same machine code, but is converted into a machine code that gives same output, that where the difference comes.
But this will give you thorough info
Because 1) the compilers are written by different people so the machine code they generate is not the same, and 2) they make use of preexisting run-time libraries of routines to do math, input-output, memory management, and more, and those libraries are also not the same, for the same reason.
Some compilers do not generate machine code, because then the resulting code would not be portable to different machines, so instead they generate code for a fictitious general computer.
Then on any particular machine that code is either interpreted directly by an interpreter program, or it is translated into that machine's code, or a combination of these (look up just-in-time(JIT) compiler).

Are custom-made programming language- compilers, based on existing languages?

I'm trying to start, figuring out, how creating a simple programming language work. Both with the syntax and the compiler itself. I've done some research on this topic, but I really don't get what my true question is all about.
I would think, that existing programming languages- compilers, is built on already existing programming languages, and therefore it would only make sense, to also base my compiler, on one of these languages.
Altho, since this in theory, the very first language with a compiler, didn't have another language to be based on, this can't be a true fact, and really must be based on something else, like the core Computer System language.
Which way is the best way to go, aswell as how, to get to my goal, which is creating a simple (With room to expanding) programming language?
Any answer is appreciated!
The very first compilers were based on assembler coding. Where did the assemblers come from?
The very first assemblers were based on painfully entered raw binary machine code instructions.
Hardly anybody enters binary; at very least, some kind of debugger program is used to do this. Hardly anybody codes compilers using assemblers anymore either; in many cases, a first compiler for a language is coded in C.
If you want to build a programming language, your first step is to get a compiler book (google "compiler book") and read it from cover to cover. If you try to avoid this step, you'll spend a huge amount of energy to try and invent what you need, and you'll likely fail.
Key tools for building compilers are parser generators, and program transformation systems. The former is the classic answer. The latter is a high-tech answer, and isn't very common, but can produce language processing tools much more quickly than classic answers. You need the compiler book background to understand these tools.
Which way is the best way to creating a simple programming language?
Unlike a majority of people I don't believe that creating a language is about using a compiler or interpreter. While you will most likely need a compiler or interpreter to implement your new language, they are tools just as is a pencil and paper. Don't start by using a tool and think you have accomplished something. It would be like using a wrench to make an engine that doesn't work, but you claim you made an engine because use used a wrench.
To create a good programming language you have to have goal for your language.
Since you mention programming language as opposed to some other type of language such as SQL, or a markup language such as HTML, I will take it that you want a Turing complete language.
Since most Turing complete languages support arithmetic I would start with a simple arithmetic expression language and build on that. There are a huge amount of examples of these on the Internet, but be fore warned that many have problems.
Next learn how to build Abstract Syntax Trees (AST) for arithmetic expressions. i.e.
3 + 2 * 6
+
/ \
3 *
/ \
2 6
Do not use a compiler to build the AST, but build them by hand in the language you are using to write your programming language. i.e. If you are using Java to create a C++ compiler, then create the AST using Java.
Then write an evaluator for the AST that will walk the tree.
Once you are able to correctly build an AST and evaluate then add the lexer/parser which translates human readable source code into an AST. This is were you will need to get a good compiler design book.
Now you can compile the AST into assembly or byte code or just continue using an evaluator.
From this point on you just add features to your language, again starting by with the AST and then modifying the parser and code generator if you implemented one.
How to create a simple (with room to expanding) programming language?
As I noted: start with an arithmetic evaluator and add language concepts one at a time. Since you are new at this, you may find that a concept you add is actually better as a composition of simpler concepts and that you should add one of the simpler concepts first before adding the other concept finally reaching the higher concept.
Because your question is so general I can't give more specific answers. I see that you already have a few close votes noting such.
If you want to build an unlimited extensibility into your language, consider implementing a simple metaprogramming system in it.
This way you can start with some very simple and small language, and then build an arbitrary complex language or a set of different languages by extending it with its own macros. Such language can be trivially turned into any other language.
Take a look at Forth and Lisp - both can be built upon some extremely trivial core which is then extended to a fully capable language. You don't even need any other high level language to implement such a chain: a simple Forth can be bootstrapped in about a couple of hundred lines of x86 assembly.
If you're determined enough, you can even skip assembler and write in machine code straight away, for something of this scale it's quite manageable in a reasonable time and might give you some indispensable experience.
well inventing a language is inventing a language...how you implement it you usually use an existing language and then at some point assuming your new language is such that it can be used as a compiler, then you write a compiler in your new language and you use the binary from the current language to compile the same language compiler, then you do it once more with the binary from the same language compiler if that all works you are self-hosting. a compiler that can compile its own language compiler.
If you have never made a language or compiler then you are a long long way from that, you might try one of the many examples on line of a simple C like compiler that can only do some simple things (and can never self-compile), get your feet wet with something like that.
At the end of the day the programming language to be useful has to compile down to something, ideally machine code be it a real machine or virtual like python or java or old pascal. But sometimes one language compiles down to another known language, C++ for example, and then you use existing tools for that language to take down to something can execute.
It has been asked and answered a number of times now. If you go far enough back or want to get as pure as you can you start with machine code and a way to enter it (see the many computers for this, dec pdp series, altair, etc, the entry method being address, data and clock manual switches). The "compiler" or in the case of assembly/machine code the "assembler" is the human with paper and pencil or pen if you are that good. You manually write out your assembly language, you then manually convert that to machine code, then you manually flip switches to enter the program into ram then you manually push the run button.
The first assemblers and then later compilers were written this way, you make an assembler using machine code using a human assembler, then self host that. Then you either use the human assembler or software assembler to the write your first compiler for your first ever non-assembly language, then you re-write the compiler in the new language, then you self-host that. Repeat until it is present day and there are more compilers and languages that you could ever master and a myriad of choices of editors and languages to build a compiler for a new language upon.

Is there any scripting language that's fast, easy to embed, and well-suited for high-level game-programming?

First off, I'm aware that there are many questions related to this, but none of them seemed to help my specific situation. In particular, lua and python don't fit my needs as well as I could hope. It may be that no language with my requirements exists, but before coming to that conclusion it'd be nice to hear a few more opinions. :)
As you may have guessed, I need such a language for a game engine I'm trying to create. The purpose of this game engine is to provide a user with the basic tools for building a game, while still giving her the freedom of creating many different types of games.
For this reason, the scripting language should be able to handle game concepts intuitively. Among other things, it should be easy to define a variety of types, sub-type them with slightly different properties, query and modify objects dynamically, and so on.
Furthermore, it should be possible for the game developer to handle every situation they come across in the scripting language. While basic components like the renderer and networking would be implemented in C++, game-specific mechanisms such as rotating a few hundred objects around a planet will be handled in the scripting language. This means that the scripting language has to be insanely fast, 1/10 C speed is probably the minimum.
Then there's the problem of debugging. Information about the function, stack trace and variable states that the error occurred in should be accessible.
Last but not least, this is a project done by a single person. Even if I wanted to, I simply don't have the resources to spend weeks on just the glue code. Integrating the language with my project shouldn't be much harder than integrating lua.
Examining the two suggested languages, lua and python, lua is fast(luajit) and easy to integrate, but its standard debugging facilities seem to be lacking. What's even worse, lua by default has no type-system at all. Of course you can implement that on your own, but the syntax will always be weird and unintuitive.
Python, on the other hand, is very comfortable to use and has a basic class system. However, it's not that easy to integrate, it's paradigm doesn't really involve type-checking and it's definitely not fast enough for more complex games. I'd again like to point out that everything would be done in python. I'm well aware that python would likely be fast enough for 90% of the code.
There's also Scala, which I haven't seen suggested so far. Scala seems to actually fulfill most of the requirements, but embedding the Java VM with C doesn't seem very easy, and it generally seems like java expects you to build your application around java rather than the other way around. I'm also not sure if Scala's functional paradigm would be good for intuitive game-development.
EDIT: Please note that this question isn't about finding a solution at any cost. If there isn't any language better than lua, I will simply compromise and use that(I actually already have the thing linked into my program). I just want to make sure I'm not missing something that'd be more suitable before doing so, seeing as lua is far from the perfect solution for me.
You might consider mono. I only know of one success story for this approach, but it is a big one: C++ engine with mono scripting is the approach taken in Unity.
Try the Ring programming language
http://ring-lang.net
It's general-purpose multi-paradigm scripting language that can be embedded in C/C++ projects, extended using C/C++ code and/or used as standalone language. The supported programming paradigms are Imperative, Procedural, Object-Oriented, Functional, Meta programming, Declarative programming using nested structures, and Natural programming.
The language is simple, trying to be natural, encourage organization and comes with transparent implementation. It comes with compact syntax and a group of features that enable the programmer to create natural interfaces and declarative domain-specific languages in a fraction of time. It is very small, fast and comes with smart garbage collector that puts the memory under the programmer control. It supports many programming paradigms, comes with useful and practical libraries. The language is designed for productivity and developing high quality solutions that can scale.
The compiler + The Virtual Machine are 15,000 lines of C code
Embedding Ring Interpreter in C/C++ Programs
https://en.wikibooks.org/wiki/Ring/Lessons/Embedding_Ring_Interpreter_in_C/C%2B%2B_Programs
For embeddability, you might look into Tcl, or if you're into Scheme, check out SIOD or Guile. I would suggest Lua or Python in general, of course, but your question precludes them.
Since noone seems to know a combination better than lua/luajit, I think I will leave it at that. Thanks for everyone's input on this. I personally find lua to be very lacking as a high-level language for game-programming, but it's probably the best choice out there. So to whomever finds this question and has the same requirements(fast, easy to use, easy to embed), you'll either have to use lua/luajit or make your own. :)

every language eventually compiled into low-level computer language?

Isn't every language compiled into low-level computer language?
If so, shouldn't all languages have the same performance?
Just wondering...
As pointed out by others, not every language is translated into machine language; some are translated into some form (bytecode, reverse Polish, AST) that is interpreted.
But even among languages that are translated to machine code,
Some translators are better than others
Some language features are easier to translate to high-performance code than others
An example of a translator that is better than some others is the GCC C compiler. It has had many years' work invested in producing good code, and its translations outperform those of the simpler compilers lcc and tcc, for example.
An example of a feature that is hard to translate to high-performance code is C's ability to do pointer arithmetic and to dereference pointers: when a program stores through a pointer, it is very difficult for the compiler to know what memory locations are affected. Similarly, when an unknown function is called, the compiler must make very pessimistic assumptions about what might happen to the contents of objects allocated on the heap. In a language like Java, the compiler can do a better job translating because the type system enforces greater separation between pointers of different types. In a language like ML or Haskell, the compiler can do better still, because in these languages, most data allocated in memory cannot be changed by a function call. But of course object-oriented languages and functional languages present their own translation challenges.
Finally, translation of a Turing-complete language is itself a hard problem: in general, finding the best translation of a program is an NP-hard problem, which means that the only solutions known potentially take time exponential in the size of the program. This would be unacceptable in a compiler (can't wait forever to compile a mere few thousand lines), and so compilers use heuristics. There is always room for improvement in these heuristics.
It is easier and more efficient to map some languages into machine language than others. There is no easy analogy that I can think of for this. The closest I can come to is translating Italian to Spanish vs. translating a Khoisan language into Hawaiian.
Another analogy is saying "Well, the laws of physics are what govern how every animal moves, so why do some animals move so much faster than others? Shouldn't they all just move at the same speed?".
No, some languages are simply interpreted. They never actually get turned into machine code. So those languages will generally run slower than low-level languages like C.
Even for the languages which are compiled into machine code, sometimes what comes out of the compiler is not the most efficient possible way to write that given program. So it's often possible to write programs in, say, assembly language that run faster than their C equivalents, and C programs that run faster than their JIT-compiled Java equivalents, etc. (Modern compilers are pretty good, though, so that's not so much of an issue these days)
Yes, all programs get eventually translated into machine code. BUT:
Some programs get translated during compilation, while others are translated on-the-fly by an interpreter (e.g. Perl) or a virtual machine (e.g. original Java)
Obviously, the latter is MUCH slower as you spend time on translation during running.
Different languages can be translated into DIFFERENT machine code. Even when the same programming task is done. So that machine code might be faster or slower depending on the language.
You should understand the difference between compiling (which is translating) and interpreting (which is simulating). You should also understand the concept of a universal basis for computation.
A language or instruction set is universal if it can be used to write an interpreter (or simulator) for any other language or instruction set. Most computers are electronic, but they can be made in many other ways, such as by fluidics, or mechanical parts, or even by people following directions. A good teaching exercise is to write a small program in BASIC and then have a classroom of students "execute" the program by following its steps. Since BASIC is universal (to a first approximation) you can use it to write a program that simulates the instruction set for any other computer.
So you could take a program in your favorite language, compile (translate) it into machine language for your favorite machine, have an interpreter for that machine written in BASIC, and then (in principle) have a class full of students "execute" it. In this way, it is first being reduced to an instruction set for a "fast" machine, and then being executed by a very very very slow "computer". It will still get the same answer, only about a trillion times slower.
Point being, the concept of universality makes all computers equivalent to each other, even though some are very fast and others are very slow.
No, some languages are run by a 'software interpreter' as byte code.
Also, it depends on what the language does in the background as well, so 2 identically functioning programs in different languages may have different mechanics behind the scenes and hence be actually running different instructions resulting in differing performance.

(When) Should I learn compilers?

According to this http://steve-yegge.blogspot.com/2007/06/rich-programmer-food.html article, I defnitely should.
Quote Gentle, yet insistent executive
summary: If you don't know how
compilers work, then you don't know
how computers work. If you're not 100%
sure whether you know how compilers
work, then you don't know how they
work.
I thought that it was a very interesting article, and the field of application is very useful (do yourself a favour and read it)
But then again, I have seen successful senior sw engineers that didn’t know compilers very well, or internal machine architecture for that matter,
but did know a thing or two of each item in the following list :
A programming paradigm (OO, functional,…)
A programming language API (C#, Java..) and at least 2 very different some say! (Java / Haskell)
A programming framework (Java, .NET)
An IDE to make you more productive (Eclipse, VisualStudio, Emacs,….)
Programming best practices (see fxcop rules for example)
Programming Principles (DRY, High Cohesion, Low Coupling, ….)
Programming methodologies (TDD, MDE)
Design patterns (Structural, Behavioural,….)
Architectural Basics (Tiers, Layers, Process Models (Waterfall, Agile,…)
A Testing Tool (Unit Testing, Model Testing, …)
A GUI technique (WPF, Swing)
A documenting tool (Javadoc, Sandcastle..)
A modelling languague (and tool maybe) (UML, VisualParadigm, Rational)
(undoubtedly forgetting very important stuff here)
Not all of these tools are necessary to be a good programmer (like a GUI when you just don’t need it)
but most of them are. Where do compilers come in, and are they really that important, since, as I mentioned,
lots of programmers seems to be doing fine without knowing them and especially, becoming a good programmer is seen the multitude of knowledge domains almost a lifetime achievement :-) , so even if compilers are extremely important, isn't there always stuff still more important?
Or should i order 'The Unleashed Compilers Unlimited Bible (in 24H..))) today?
For those who have read the article, and want to start studying right away :
Learning Resources on Parsers, Interpreters, and Compilers
If you just want to be a run-of-the-mill coder, and write stuff... you don't need to take compilers.
If you want to learn computer science and appreciate and really become a computer scientist, you MUST take compilers.
Compilers is a microcosm of computer science! It contains every single problem, including (but not limited to) AI (greedy algorithms & heuristic search), algorithms, theory (formal languages, automata), systems, architecture, etc.
You get to see a lot of computer science come together in an amazing way. Not only will you understand more about why programming languages work the way that they do, but you will become a better coder for having that understanding. You will learn to understand the low level, which helps at the high level.
As programmers, we very often like to talk about things being a "black box"... but things are a lot smoother when you understand a little bit about what's in the box. Even if you don't build a whole compiler, you will surely learn a lot. You will get to see the formalisms behind parsing (and realize it's not just a bunch of special cases hacked together), and a bunch of NP complete problems. You will see why the theory of computer science is so important to understand for practical things. (After all, compilers are extremely practical... and we wouldn't have the compilers we have today without formalisms).
I really hope you consider learning about them... it will help you get to the next level as a computer scientist :-).
You should learn about compilers, for the simple reason that implementing a compiler makes you a better programmer. The compiler will surely suck, but you will have learned a lot during the way. It is a great way of improving (or practising) your programming skill.
You do not need to understand compilers to be a good programmer, but it can help. One of the things I realized when learning about them, is that compiling is simply a translation.
If you have ever translated from one language to another, you have just done compiling.
So when should you learn about compilers?
When you want to, or need it to solve a problem.
Compiler theory is useful, but not essential.
Although there are some techniques which come in handy, like lexical analysis and parsing.
Another one is error handling. Compilers need a lot of these. User input can contain anything, even the unexpected. And you need to deal with all of these.
If you're going to be working at a high-enough level where you're worrying over UML and self-describing code, you could easily go your entire career without wanting or needing intimate details of how the compiler works.
But, if you're an in-the-trenches coder and have no aspirations to manage your friends, it's likely that one day, you'll realize you're waging war with your compiler. It could be a random bug that comes along or a hallway conversation about while-verses-for loops. You'll realize the assembly (or IL, likely, in the coming years) is just a bit to the left of what you were needing and another universe will unfold.
So, I suppose my answer is, just be aware of the compiler for now, that it's doing quite a lot, but don't worry over it too much.
The compilers courses usually focus on how the high-level code is analyzed and translated into machine code. That's very interesting, but not crucial. It's more important to understand what is this machine code that is generated by the compiler so that you understand how a computer works and what is the cost of each language construct.
So I'd rather say that you should know an assembly language (I mean a limited subset of assembly language for one architecture) to understand how a computer works and the latter is definitely required for a competent programmer so that he understands what segmenation fault is, when to optimize and when not and other similar low-level things.
If you intend to write extremely time-critical real-time code, you will benefit from understanding how the compiler optimises your code. However, you will actually benefit more from understanding the underlying architecture of your hardware.
From my experience, if you understand how the hardware works, and how the compiler interprets your code, you will be able to write code that does exactly what you intend it to do. I have been caught on several occasions, writing code that got optimised away by the compiler and made the hardware do something that I did not intend.
All in all, understanding the entire software-hardware stack is not essential to write good algorithms and code, but it will most certainly help!
From a practical perspective, general compiler theory is less of concern than a assembler, linker and loader to a specific platform. For example, I just consider the GCC compiler as a translator from my high-level C language to the low-level assembly language on a x86 platform. And more often than not, I manually refine ;) the code generated by the compiler.
From a scientific perspective, I would strongly suggest you learning the compiler theory, it will help you understand the great idea that computer is built upon. And even more, you will have a different eye upon the world.
Just my opinion, but I believe compilers is not given enough attention in CS courses, not in mine, and not in any others afaik. I think any CS major should do 2 things after a sabbatical or finishing their major: Re-learn if necessary finite automata and maybe a formal methods language. Apply it.
Write a simple compiler with this knowledge. Alex Aiken has a very useful online tutorial on writing a compiler for the COOL (Classroom Object Oriented Language) which is a subset of Scala as of 2013 ver. At least at time of writing.

Resources