How is ruby code executed - ruby

I have recently started to learn Ruby. I know that Ruby is a interpreted language(even though "every" language is since it is interpreted by the CPU as machine code). But how does the ruby interpreter convert the code written in Ruby to machine code? I have read that the interpreter do not read the source code, but byte code, however I do never have to compile as I do in Java. So, is this yet another thing that Ruby does for you? And if it does, does it generate the byte code at runtime? Because you never get a .class file as you do in Java. And on top of it all I read about Just-In-Time compilators that obviously does something to the byte code so it runs faster.
If the above is the case does the interpreter first scan through all of the source code, convert it into byte code and then compiles it another time with JIT at runtime?
And last I AM NOT looking for an answer with the performance aspect of this, I just want to know how it is processed, which stages it goes through and in what time it does so.
Thanks for your time.
I am using this interpeter http://www.ruby-lang.org/en/

But how does the ruby interpreter convert the code written in Ruby to machine code?
It doesn't, at least not all the implementations.
Afaik only Rubinius is trying to do what you describe, that's compiling to machine code.
I have read that the interpreter do not read the source code, but byte code, however I do never have to compile as I do in Java. So, is this yet another thing that Ruby does for you?
Yes
And if it does, does it generate the byte code at runtime?
Yeap, pretty much. And keeps it in memory. The tradeof is the next time it has to read->translate->execute all over again.
If the above is the case does the interpreter first scan through all of the source code, convert it into byte code and then compiles it another time with JIT at runtime?
Not all the source code, just what it needs. Then yes, create a bytecode representation keeps it in memory, and not necessarily that is compiled to machine code.

The standard implementation of Ruby1.8 uses an interpreter called MRI (Matz's Ruby Interpreter). This is a program that is compiled to machine code that:
Reads the text files into a data structure.
Follows the instructions in the data structure to decide what to do

Related

running external program in TCL

After developing an elaborate TCL code to do smoothing based on Gabriel Taubin's smoothing without shape shrinkage, the code runs extremely slow. This is likely due to the size of unstructured grid I am smoothing. I have to use TCL because the grid generator I am using is Pointwise and Pointwise's "macro language" is TCL based. I'm still a bit new to this, but is there a way to run an external code from TCL where TCL sends the data to the software, the software runs the smoothing operation, and output is sent back to TCL to update the internal data inside the Pointwise grid generation tool? I will be writing the smoothing tool in another language which is significantly faster.
There are a number of options to deal with code that "runs extremely show". I would start with determining how fast it must run. Are we talking milliseconds, seconds, minutes, hours or days. Next it is necessary to determine which part is slow. The time command is useful here.
But assuming you have decided that more performance is necessary and you have some metrics for your current program so you will know if you are improving, here are some things to try:
Try to improve the existing code. If you are using the expr command, make sure your expressions are given to the command as a single argument enclosed in braces. Beginners sometimes forget this and the improvement can be substantial.
Use the critcl package to code parts of the program in "C". Critcl allows you to put "C" code directly into your Tcl program and have that code pulled out, compiled and loaded into your program.
Write a traditional "C" based Tcl extension. Tcl is very extensible and has a clean API for building extensions. There is sample code for extensions and source to many extensions is readily available.
Write a program to do the time consuming part of the job and execute it as a separate process and obtain the output back into your Tcl script. This is where the exec command comes in useful. Presumably you will have to write data out to some where the program can get it and read the output of the program back into your Tcl script. If you want to get fancy you can do two-way communications across a localhost TCP port. The set up in Tcl is quite simple. The "C" code in a program to do it is a bit more tedious, but many examples exist out on the Internet.
Which option to choose depends very much on how much improvement is required and the amount of code that must be improved. You haven't given us much idea what those things are in your case, so all I can offer is rather vague general solutions.
For a loadable module, you can write a Tcl extension. An example is here:
File Last Modified Time with Milliseconds Precision
Alternatively, just write your program to take input from a file. Have Tcl write the input data to the file, run the program, then collect the output from the external program.

How does an interpreter translate a for loop?

I understand that interpreter translates your source code into machine code line by line, and stops when it encounters an error.
I am wondering, what does an interpreter do when you give it for loops.
E.g. I have the following (MATLAB) code:
for i = 1:10000
pi*pi
end
Does it really run through and translate the for loop line by line 10000 times?
With compilers is the machine code shorter, consisting only of a set of statements that include a go to control statement that is in effect for 10000 iterations.
I'm sorry if this doesn't make sense, I don't have very good knowledge of the underlying bolts and nuts of programming but I want to quickly understand.
I understand that interpreter translates your source code into machine code line by line, and stops when it encounters an error.
This is wrong. There are many different types of interpreters, but very few execute code line-by-line and those that do (mostly shells) don't generate machine code at all.
In general there are four more-or-less common ways that code can be interpreted:
Statement-by-statement execution
This is presumably what you meant by line-by-line, except that usually semicolons can be used as an alternative for line breaks. As I said, this is pretty much only done by shells.
How this works is that a single statement is parsed at a time. That is the parser reads tokens until the statement is finished. For simple statements that's until a statement terminator is reached, e.g. the end of line or a semicolon. For other statements (such as if-statements, for-loops or while-loops) it's until the corresponding terminator (endif, fi, etc.) is found. Either way the parser returns some kind of representation of the statement (some type of AST usually), which is then executed. No machine code is generated at any point.
This approach has the unusual property that syntax error at the end of the file won't prevent the beginning of the file from being executed. However everything is still parsed at most once and the bodies of if-statements etc. will still be parsed even if the condition is false (so a syntax error inside an if false will still abort the script).
AST-walking interpretation
Here the whole file is parsed at once and an AST is generated from it. The interpreter then simply walks the AST to execute the program. In principle this is the same as above, except that the entire program is parsed first.
Bytecode interpretation
Again the entire file is parsed at once, but instead of an AST-type structure the parser generates some bytecode format. This bytecode is then executed instruction-by-instruction.
JIT-compilation
This is the only variation that actually generates machine code. There are two variations of this:
Generate machine code for all functions before they're called. This might mean translating the whole file as soon as it's loaded or translating each function right before it's called. After the code has been generated, execute it.
Start by interpreting the bytecode and then JIT-compile specific functions or code paths individually once they have been executed a number of times. This allows us to make certain optimizations based on usage data that has been collected during interpretation. It also means we don't pay the overhead of compilation on functions that aren't called a lot. Some implementations (specifically some JavaScript engines) also recompile already-JITed code to optimize based on newly gathered usage data.
So in summary: The overlap between implementations that execute the code line-by-line (or rather statement-by-statement) and implementations that generate machine code should be pretty close to zero. And those implementations that do go statement-by-statement still only parse the code once.

Is compiling code really faster than interpreting code?

Firstly, I want you to check the best answer over here.
Compiled vs. Interpreted Languages
As you can see, it says, compiled languages are faster. However, what I know for granted is that compilers take the whole source code, compiles it to machine code, then executes it. Interpreter takes one statement at a time, translates it to machine code or virtual machine code, then executes it immediately. So we get the output on the fly, during the run-time.
Then aren't interpreted languages faster than compiled languages?
You are trying to compare "Code Compiling" vs "Code Interpreting"
"Code Compiling" doesn't execute the code it only creates a binary or platform independent code which can be run over and over again with no need of re-compilation or minimal compilation which has much less overhead than interpreting like in Java
"Code Interpreting" - compiles the code line by line in memory and executes it on the fly
So compiled languages are faster in execution as at the time of execution no compilation is required but in interpreted languages each execution step is preceded by a compilation step every time, making it slow.

How can I have different states of expansion in LuaTex/LuaLaTex for debugging for instance?

I am preparing LaTex/Tex fragments with lua programs getting information from SQL request to a database (LuaSQL).
I wish I could see intermediate states of expansion for debugging purpose but also to control what has been brought from SQL requests and lua processings.
My dream would be for instance to see the code of my Latex page as if I had typed it myself manually with all the information given by the SQL requests and the lua processing.
I would then have a first processing with my lua programs and SQL request to build a valid and readable luaLatex code I could amend if necessary ; then I would compile again that file to obtain the wanted pdf document.
Today, I use a lua development environment, ZeroBrane Studio, to execute and test the lua chunk before I integrate it in my luaLatex code. For instance:
my lua chunk :
for k,v in pairs(data.param) do
print('\\def\\'..k..'{'..data.param[k]..'}')
end
lua print out:
\gdef\pB{0.7}
\gdef\pAinterB{0.5}
\gdef\pA{0.4}
\gdef\pAuB{0.6}
luaLaTex code :
nothing visible ! except that now I can use \pA for instance in the code
my dream would be to have, in the luaLatex code :
\gdef\pB{0.7}
\gdef\pAinterB{0.5}
\gdef\pA{0.4}
\gdef\pAuB{0.6}
May be a solution would stand in the use of the expl3 extension ? But since I am not familiar with it nor with the precise Tex expansion process, I prefer to ask you experts before I invest heavily in the understanding of this module.
Addition :
Pushing forward the reflection, a consequence would be that from a Latex code I get a Latex code instead of a pdf file for instance. This implies that we use only the first steps of the four TeX processors as described by Veijkhout in "TeX by Topic" : the input processor, the expansion processor (but with a controlled depth of expansion), not the execution processor nor the visual processor. Moreover, there would be need to show the intermediate state, that means a new processor able to show tokens back into readable strings and correct Tex/Latex code that can be processed later.
Unless somebody has already done or seen something like that, I feel that my wish may be unfeasible in the short and middle terms. What is your feeling, should I abandon any hope ?

Debugging a program without source code (Unix / GDB)

This is homework. Tips only, no exact answers please.
I have a compiled program (no source code) that takes in command line arguments. There is a correct sequence of a given number of command line arguments that will make the program print out "Success." Given the wrong arguments it will print out "Failure."
One thing that is confusing me is that the instructions mention two system tools (doesn't name them) which will help in figuring out the correct arguments. The only tool I'm familiar with (unless I'm overlooking something) is GDB so I believe I am missing a critical component of this challenge.
The challenge is to figure out the correct arguments. So far I've run the program in GDB and set a breakpoint at main but I really don't know where to go from there. Any pro tips?
Are you sure you have to debug it? It would be easier to disassemble it. When you disassemble it look for cmp
There exists not only tools to decompile X86 binaries to Assembler code listings, but also some which attempt to show a more high level or readable listing. Try googling and see what you find. I'd be specific, but then, that would be counterproductive if your job is to learn some reverse engineering skills.
It is possible that the code is something like this: If Arg(1)='FOO' then print "Success". So you might not need to disassemble at all. Instead you only might need to find a tool which dumps out all strings in the executable that look like sequences of ASCII characters. If the sequence you are supposed to input is not in the set of characters easily input from the keyboard, there exist many utilities that will do this. If the program has been very carefully constructed, the author won't have left "FOO" if that was the "password" in plain sight, but will have tried to obscure it somewhat.
Personally I would start with an ltrace of the program with any arbitrary set of arguments. I'd then use the strings command and guess from that what some of the hidden argument literals might be. (Let's assume, for the moment, that the professor hasn't encrypted or obfuscated the strings and that they appear in the binary as literals). Then try again with one or two (or the requisite number, if number).
If you're lucky the program was compiled and provided to you without running strip. In that case you might have the symbol table to help. Then you could try single stepping through the program (read the gdb manuals). It might be tedious but there are ways to set a breakpoint and tell the debugger to run through some function call (such as any from the standard libraries) and stop upon return. Doing this repeatedly (identify where it's calling into standard or external libraries, set a breakpoint for the next instruction after the return, let gdb run the process through the call, and then inspect what the code is doing besides that.
Coupled with the ltrace it should be fairly easy to see the sequencing of the strcmp() (or similar) calls. As you see the string against which your input is being compared you can break out of the whole process and re-invoke the gdb and the program with that one argument, trace through 'til the next one and so on. Or you might learn some more advanced gdb tricks and actually modify your argument vector and restart main() from scratch.
It actually sounds like fun and I might have my wife whip up a simple binary for me to try this on. It might also create a little program to generate binaries of this sort. I'm thinking of a little #INCLUDE in the sources which provides the "passphrase" of arguments, and a make file that selects three to five words from /usr/dict/words, generates that #INCLUDE file from a template, then compiles the binary using that sequence.

Resources