I have a question about Machine Languages.
If i write a program the prints "Hello World", using machine language, will my syntax be different depending on the machine I am writing my code on.
I'm assuming that by "machine code" you mean human readable assembly language code and not binary CPU instruction code.
I would say that machine languages do not have a syntax, that is what separates low level languages from high level languages. Or, to the extent that machine languages have syntax, the syntax is all the same.
Compiling a high-level language program requires a parser to determine what items are variables, control structures, and function definitions. But, machine languages have no such constructs (except for control structures) so there is no thing that we think of as a syntax.
In a machine language, every line you write instructs that machine (or CPU) to perform a task. Tasks can be storing information, retrieving information, comparing values, or other things.
If you are writing a "Hello, World" program for an x86 machine and a SH4 machine, the code will be different. The code will be more different than if you wrote the "Hello, World" program in C or Python. But the "syntax" of the program - the particular layout or format of your machine code and the ability to read your machine code - will be the same for both x86 and SH4 machine code.
Related
Are all interpreted languages not eventually machine code? I'm curious if the reason is because companies don't think it's worth the effort, if there is an inherit conflict that makes it impossible, or some other reason. Is there a way to capture the script's executed machine code myself?
Edit
I was speaking loosely because I didn't want the title to be too long. I understand there are no "interpreted languages". I'm talking about languages that are generally interpreted ( not c++, not rust, etc.. ).
"Are all interpreted languages not eventually machine code?" - this was a rhetorical question. The answer is a simple "yes". Because that's how computers work.
I am curious why most companies who create a language with an interpreter don't also supplement it with a compiler (that compiles to native machine code). And I'm curious if I can record the executed machine code myself.
Also, Jörg W Mittag's answer is misleading (and arrogant). Out of "all" these compilers I don't see one that compiles to native machine code. I don't think one of them even exists anymore (go to the Rubinius website). I've also worked with some of them and they have limitations (I can't arbitrarily take a non-trivial script that works with the standard ruby interpreter and compile it).
Why don't most interpreted languages like ruby provide an optional compiler?
There is no such thing as an "interpreted language". Interpretation and compilation are traits of the interpreter or compiler (duh!) not the language. A language is just a set of abstract mathematical rules and restrictions. It is neither interpreted nor compiled. It just is.
Those two terms belong to two completely different levels of abstraction. If English were a typed language, the term "interpreted language" would be a Type Error. The term "interpreted language" is not even wrong, it is non-sensical.
Every language can be implemented by a compiler, and every language can be implemented by an interpreter. Most languages have both interpreted and compiled implementations. Many modern high-performance language implementations combine compilers and interpreters.
Are all interpreted languages not eventually machine code?
In some sense, every language is machine code for an abstract machine corresponding to that language, yes. I.e. Ruby is machine language for the "Ruby Abstract Machine" which is the machine whose execution semantics match exactly the execution semantics of Ruby and whose machine language syntax matches exactly the syntax of Ruby.
if there is an inherit conflict that makes it impossible
All currently existing Ruby implementations (with one caveat) have at least one compiler. Most have more than one. At least one has no interpreter at all.
Opal is purely compiled. It never interprets. There is no interpreter in Opal, only a compiler.
YARV compiles Ruby to YARV byte code. This byte code then gets interpreted by the YARV VM. Code that has been executed more than a certain number of times then gets compiled to native machine code for the underlying architecture (i.e. when running the AMD64 version of YARV, it gets compiled to AMD64 machine code, when running the ARM version, it gets compiled to ARM machine code, and so on).
Artichoke is … somewhat complicated, but suffice to say, it does not interpret Ruby.
MRuby compiles Ruby to MRuby byte code. This byte code then gets interpreted by the MRuby VM.
Rubinius compiles Ruby to Rubinius byte code. This byte code then gets interpreted by the Rubinius VM. Code that has been executed more than a certain number of times then gets compiled to native machine code for the underlying architecture (i.e. when running the AMD64 version of YARV, it gets compiled to AMD64 machine code, when running the ARM version, it gets compiled to ARM machine code, and so on). [Note: there are a couple of different versions of Rubinius. The original version had a native code compiler. This was then removed, and is in the process of being rewritten.]
JRuby compiles Ruby to JRuby IR. This IR then gets interpreted by the JRuby IR interpreter. Code that has been executed more than a certain number of times then gets compiled to JRuby compiler IR. This compiler IR then gets further compiled to JVM byte code. What happens to this JVM byte code depends on the JVM. On the HotSpot JVM, the JVM byte code will be interpreted by the HotSpot interpreter, which will profile the code, and then compile the code that is executed often to native machine code.
TruffleRuby parses Ruby to Truffle AST. This AST then gets interpreted by the Truffle AST interpreter framework. The Truffle AST interpreter framework will then specialize the AST nodes while it is interpreting them, including possibly compiling them to native machine code using Graal.
The last major, mainstream Ruby implementation that was purely interpreted and didn't have a compiler, was the original MRI, which was abandoned years ago.
I know about assembly language and machine code.
A computer can usually execute programs written in its native machine language. Each instruction in this language is simple enough to be executed using a relatively small number of
electronic circuits. For simplicity, we will call this language L0. Programmers would have a difficult time writing programs in L0 because it is enormously detailed and consists purely of numbers. If a new language, L1, could be constructed that was easier to use, programs could be written in L1.
But I just want to know that is there a single example what the machine code is?
I mean is there any thing that I can write and just save it and run it (without compiling it with any compiler).
Аssembly instructions have a one-to-one relationship with the underlying machine instructions. This means that essentially you can convert assembly instructions into machine instructions with a look-up table.
look here: x86 instructions
If so, why different programs written in different languages have different execution speeds?
Simple answer: they don't produce the same machine code. They might produce different machine code which still produces the same side effects (same end result), but via different machine instructions.
Imagine you have two interpreters (let's say male and female just to distinguish them) to translate what you say into some other language. Both of them may translate what you say properly into the desired language, but they won't necessarily be equally efficient. One of them might feel the need to explain more of what you meant, one might be very terse and translate what you say in a very short and sweet way.
Performance doesn't just vary between languages. They vary between compilers for the same programming language.
For example, with C, the performance difference between GCC and Tiny-C can be about 2 to 3x, with Tiny-C being roughly 2-3 times slower.
And it's because even within the same programming language (C), GCC and Tiny-C don't produce identical machine instructions. In the case of Tiny-C, it was optimized to compile quickly, not to produce code that runs as quickly. For example, it doesn't make the best use of the fastest form of memory available to the machine (registers) and spills more data into the stack (which uses anything from L1 to DRAM depending on the access patterns). Because it doesn't bother to get so fancy with register allocation, Tiny-C can compile code quite quickly, but the resulting code isn't as efficient.
If you want a more in-depth answer, then you should study compiler design starting with the Dragon Book.
Though programs written in different languages are converted into machine code at the end of the day, different languages have different implementation to say same thing.
You can take analogy from human languages e.g the English statement I am coming home. is translated to Chinese as 我未来的家。, as you can see the Chinese one is more concise though it is not always true; same concept applies to programming languages.
So in the case of programming languages a machine code X can be written in programming language A as 2X-X, programming language B as X/2 + X/2...but executing machine code X and 2X-X will result same result though their performance wont same ( this is hypothetical example but hope it makes sense.)
Basically it is not guaranteed that a program with same output written in different programming languages results in same machine code, but is converted into a machine code that gives same output, that where the difference comes.
But this will give you thorough info
Because 1) the compilers are written by different people so the machine code they generate is not the same, and 2) they make use of preexisting run-time libraries of routines to do math, input-output, memory management, and more, and those libraries are also not the same, for the same reason.
Some compilers do not generate machine code, because then the resulting code would not be portable to different machines, so instead they generate code for a fictitious general computer.
Then on any particular machine that code is either interpreted directly by an interpreter program, or it is translated into that machine's code, or a combination of these (look up just-in-time(JIT) compiler).
Supposedly Forth programs can be "compiled" but I don't see how that is true if they have words that are only evaluated at runtime. For example, there is the word DOES> which stores words for evaluation at runtime. If those words include an EVALUATE or INTERPRET word then there will be a runtime need for the dictionary.
To support such statements it would mean the entire word list (dictionary) would have to be embedded inside the program, essentially what interpreted programs (not compiled programs) do.
This would seem to prevent you from compiling small programs using Forth because the entire dictionary would have to be embedded in the program, even if you used only a fraction of the words in the dictionary.
Is this correct, or is there some way to compile Forth programs without embedding the dictionary? (maybe by not using runtime words at all ??)
Forth programs can be compiled with or without word headers. The headers include the word names (called "name space").
In the scenario you describe, where the program may include run-time evalutation calls such as EVALUATE, the headers will be needed.
The dictionary can be divided into three logically distinct parts: name space, code space, and data space. Code and data are needed for program execution, names are usually not.
A normal Forth program will usually not do any runtime evaluation. So in most cases, the names aren't needed in a compiled program.
The code after DOES> is compiled, so it's not evaluated at run time. It's executed at run time.
Even though names are included, they usually don't add much to program size.
Many Forths do have a way to leave out the names from a program. Some have a switch to remove word headers (the names). Other have cross compilers which keep the names in the host system during compile time, but generate target code without names.
No, the entire dictionary need not be embedded, nor compiled. All that need remain is just the list of words used, and their parent words, (& grandparents, etc.). And the even names of the words aren't necessary, the word locations are enough. Forth code compiled by such methods can be about as compact as it gets, rivaling or even surpassing assembly language in executable size.
Proof by example: Tom Almy's ForthCMP, an '80s-'90s MSDOS compiler that shrunk executable code way down. Its README says:
. Compiles Forth into machine code -- not interpreted.
. ForthCMP is written in Forth so that Forth code can be executed
during compilation, as is customary in Forth applications.
. Very fast -- ForthCMP compiles Forth code into an executable file
in a single pass.
. Generated code is extremely compact. Over 110 Forth "primitives"
are compiled in-line. ForthCMP performs constant expression
folding, strength reduction, register optimization, DO...LOOP
optimization, tail recursion, and various "peephole"
optimizations.
. Built-in assembler.
4C.COM runs under emulators like dosemu or dosbox.
A "Hello World" compiles into a 117 byte .COM file, a wc program compiles to a 3K .COM file (from 5K of source code). No dictionary or external libraries, (aside from standard MSDOS calls, i.e. the OS it runs on).
Forth can be a bear to get your head around from the outside because there is NO standard implementation of the language. Much of what people see are from the early days of Forth when the author (Charles Moore) was still massaging his own thoughts. Or worse, homemade systems that people call Forth because it has a stack but are really not Forth.
So is Forth Interpreted or Compiled?
Short answer: both
Early years:
Forth had a text interpreter facing the programmer. So Interpreted: Check
But... The ':' character enabled the compiler which "compiled" the addresses of the words in the language so it was "compiled" but not as native machine code. It was lists of addresses where the code was in memory. The clever part was that those addresses could be run with a list "interpreter" that was only 2 or 3 instructions on most machines and a few more on an old 8 bit CPU. That meant it was still pretty fast and quite space efficient.
These systems are more of an image system so yes the system goes along with your program but some of those system kernels were 8K bytes for the entire run-time including the compiler and interpreter. Not heavy lifting.
This is what most people think of as Forth. See JonesForth for a literate example. (This was called "threaded code" at the time, not to be confused with multi-threading)
1990ish
Forth gurus and Chuck Moore began to realize that a Forth language primitive could be as little as one machine instruction on modern machines so why not just compile the instruction rather than the address. This became very useful with 32bit machines since the address was sometimes bigger than the instruction. They could then replace the little 3 instruction interpreter with the native CALL/Return instructions of the processor. This was called sub-routine threading. The front end interpreter did not disappear. It simply kicked off native code sub-routines
Today
Commercial Forth systems generate native code, inline many/most primitives and do many of the other optimization tricks you see in modern compilers.
They still have an interpreter facing the programmer. :-)
You can also buy (or build) Forth cross-compilers that create standalone executables for different CPUs that include multi-tasking, TCP/IP stacks and guess what, that text interpreter can be compiled into the executable as an option for remote debugging and configuration if you want it.
So is Forth Interpreted or Compiled? Still both.
You are right that a program that executes INTERPRET (EVALUATE, LOAD, INCLUDE etc.) is obliged to have a dictionary. That is hardly a disadvantage because even a 64 bit executable is merely a 50 K for Linux or MS-Windows. Modern single board computer like the MSP430 can have the whole dictionary in flash memory. See ciforth and noforth respectively. Then there is scripting. If you use Forth as a scripting language, it is similar to perl or python The script is small, and doesn't contain the whole language. It requires though that the language is installed on your computer.
In case of really small computers you can resort to cross compiling or using an umbellical Forth where the dictionary is positioned on a host computer and communicates and programs via a serial line. These are special techniques that are normally not needed. You can't use INTERPRETing code in those cases on the sbc, because obviously there is no dictionary there.
Note: mentioning the DOES> instruction doesn't serve to make the question clearer. I recommend that you edit this out.
In the chosen answer for this question about Blue Ruby, Chuck says:
All of the current Ruby
implementations are compiled to
bytecode. Contrary to SAP's claims, as
of Ruby 1.9, MRI itself includes a
bytecode compiler, though the ability
to save the compiled bytecode to disk
disappeared somewhere in the process
of merging the YARV virtual machine.
JRuby is compiled into Java .class
files. I don't have a lot of details
on MagLev, but it seems safe to say it
will take that road as well.
I'm confused about this compilation/interpretation issue with respect to Ruby.
I learned that Ruby is an interpreted language and that's why when I save changes to my Ruby files I don't need to re-build the project.
But if all of the Ruby implementations now are compiled, is it still fair to say that Ruby is an interpreted language? Or am I misunderstanding something?
Nearly every language is "compiled" nowadays, if you count bytecode as being compiled. Even Emacs Lisp is compiled. Ruby was a special case because until recently, it wasn't compiled into bytecode.
I think you're right to question the utility of characterizing languages as "compiled" vs. "interpreted." One useful distinction, though, is whether the language creates machine code (e.g. x86 assembler) directly from user code. C, C++, many Lisps, and Java with JIT enabled do, but Ruby, Python, and Perl do not.
People who don't know better will call any language that has a separate manual compilation step "compiled" and ones that don't "interpreted."
Yes, Ruby's still an interpreted language, or more precisely, Matz's Ruby Interpreter (MRI), which is what people usually talk about when they talk about Ruby, is still an interpreter. The compilation step is simply there to reduce the code to something that's faster to execute than interpreting and reinterpreting the same code time after time.
A subtle question indeed...
It used to be that "interpreted" languages were parsed and transformed into an intermediate form which was faster to execute, but the "machine" executing them was a pretty language specific program. "Compiled" languages were translated instead into the machine code instructions supported by the computer on which it was run. An early distinction was very basic--static vs. dynamic scope. In a statically typed language, a variable reference could pretty much be resolved to a memory address in a few machine instructions--you knew exactly where in the calling frame the variable referred. In dynamically typed languages you had to search (up an A-list or up a calling frame) for the reference. With the advent of object oriented programming, the non-immediate nature of a reference expanded to many more concepts--classes(types), methods(functions),even syntactical interpretation (embedded DSLs like regex).
The distinction, in fact, going back to maybe the late 70's was not so much between compiled and interpreted languages, but whether they were run in a compiled or interpreted environment.
For example, Pascal (the first high-level language I studied) ran at UC Berkeley first on Bill Joy's pxp interpreter, and later on the compiler he wrote pcc. Same language, available in both compiled and interpreted environments.
Some languages are more dynamic than others, the meaning of something--a type, a method, a variable--is dependent on the run-time environment. This means that compiled or not there is substantial run-time mechanism associated with executing a program. Forth, Smalltalk, NeWs, Lisp, all were examples of this. Initially, these languages required so much mechanism to execute (versus a C or a Fortran) that they were a natural for interpretation.
Even before Java, there were attempts to speed up execution of complex, dynamic languages with tricks, techniques which became threaded compilation, just-in-time compilation, and so on.
I think it was Java, though, which was the first wide-spread language that really muddied the compiler/interpreter gap, ironically not so that it would run faster (though, that too) but so that it would run everywhere. By defining their own machine language and "machine" the java bytecode and VM, Java attempted to become a language compiled into something close to any basic machine, but not actually any real machine.
Modern languages marry all these innovations. Some have the dynamic, open-ended, you-don't-know-what-you-get-until-runtime nature of traditional "interpreted languages (ruby, lisp, smalltalk, python, perl(!)), some try to have the rigor of specification allowing deep type-based static error detection of traditional compiled languages (java, scala). All compile to actual machine-independent representations (JVM) to get write once-run anywhere semantics.
So, compiled vs. interpreted? Best of both, I'd say. All the code's around in source (with documentation), change anything and the effect is immediate, simple operations run almost as fast as the hardware can do them, complex ones are supported and fast enough, hardware and memory models are consistent across platforms.
The bigger polemic in languages today is probably whether they are statically or dynamically typed, which is to say not how fast will they run, but will the errors be found by the compiler beforehand (at the cost of the programmer having to specify pretty complex typing information) or will the errors come up in testing and production.
You can run Ruby programs interactively using irb, the Interactive Ruby Shell. While it may generate intermediate bytecode, it's certainly not a "compiler" in the traditional sense.
A compiled language is generally compiled into machine code, as opposed to just byte code. Some byte code generators can actually further compile the byte code into machine code though.
Byte code itself is just an intermediate step between the literal code written by the user and the virtual machine, it still needs to be interpreted by the virtual machine though (as it's done with Java in a JVM and PHP with an opcode cache).
This is possibly a little off topic but...
Iron Ruby is a .net based implementation of ruby and therefore is usually compiled to byte code and then JIT compiled to machine language at runtime (i.e. not interpreted). Also (at least with other .net languages, so I assume with ruby) ngen can be used to generate a compiled native binary ahead of time, so that's effectively a machine code compiled version of ruby code.
As for the information I got from RubyConf 2011 in Shanghai, Matz is developing a 'MRuby'(stands for Matz's Ruby) to targeting running on embedded devices. And Matz said the the MRuby, will provide the ability to compile the ruby code into machine code to boost the speed and decrease the usage of the (limited) resources on the embedded devices. So, there're various kind of Ruby implementation and definitely not all of them are just interpreted during the runtime.