I'm going through Pat Shaughnessy's Ruby Under a Microscope, and trying to supplement myself with up to date knowledge of how Ruby executes your program. Full disclosure, I have not completed the book yet, but am trying to understand as I go along. At a high level, this is what I understand.
Ruby Code -> Lexical Analysis -> Tokens
Tokens -> Parser -> AST Nodes
AST Node -> Compiler -> YARV Instructions (bytecode)
YARV Instructions -> YARV Interpreter -> ???
??? -> ??? -> Machine Language
My question is what is the output (???) of the YARV Interpreter? Where do these instructions live? As well, what then are the steps necessary to then get that into Machine Language?
If someone could help point me in the right direction or if I have missed anything I would appreciate it!
I've tried reading numerous articles online, but they don't seem to really expand on what happens after YARV Instructions are compiled. I understand that the grammatical instructions of producing YARV instructions live in the main Ruby repo, but then what comes nex?
My question is what is the output (???) of the YARV Interpreter?
There is no output. The interpreter doesn't generate output. It interprets (another word is "executes") the code.
More precisely: the output of the interpreter is the output of the program that is being run by the interpreter. So, if you write a program that is supposed to print "Hello, World" to the console, then the output of the interpreter running that program will be to print "Hello, World" to the console.
Where do these instructions live?
In RAM.
As well, what then are the steps necessary to then get that into Machine Language?
There are none. An interpreter interprets. It doesn't generate code. Something that translates code from one language to another is called a "compiler".
Related
Firstly, I want you to check the best answer over here.
Compiled vs. Interpreted Languages
As you can see, it says, compiled languages are faster. However, what I know for granted is that compilers take the whole source code, compiles it to machine code, then executes it. Interpreter takes one statement at a time, translates it to machine code or virtual machine code, then executes it immediately. So we get the output on the fly, during the run-time.
Then aren't interpreted languages faster than compiled languages?
You are trying to compare "Code Compiling" vs "Code Interpreting"
"Code Compiling" doesn't execute the code it only creates a binary or platform independent code which can be run over and over again with no need of re-compilation or minimal compilation which has much less overhead than interpreting like in Java
"Code Interpreting" - compiles the code line by line in memory and executes it on the fly
So compiled languages are faster in execution as at the time of execution no compilation is required but in interpreted languages each execution step is preceded by a compilation step every time, making it slow.
I'm trying to dump a backtrace of a passenger process in gdb. I know I should just execute
attach <PID>
call rb_backtrace()
after starting gdb, but I can't figure out where the output is going, I've looked at rails production logs (set to info), nginx logs in /var/logs/nginx but I can't find the output. Any ideas?
I don't know the answer on the ruby end -- I'd guess it is going to the ruby process' stdout or stderr -- but gdb recently got a new feature that is designed to help with this scenario.
The new feature is called "frame filters" and it lets you change how stack traces are presented by writing simple Python scripts that examine the state of the inferior process. For example, you could write such a script that understands the Ruby interpreter, and then have gdb's "bt" automatically interleave interpreted (Ruby) frames with C frames.
For more information, start here and read the next few nodes: http://sourceware.org/gdb/current/onlinedocs/gdb/Frame-Filter-API.html#Frame-Filter-API
I'd like to see this feature be adopted by the various interpreter projects. There's been pretty good adoption of pretty printing, and I think this is a logical next step.
This is homework. Tips only, no exact answers please.
I have a compiled program (no source code) that takes in command line arguments. There is a correct sequence of a given number of command line arguments that will make the program print out "Success." Given the wrong arguments it will print out "Failure."
One thing that is confusing me is that the instructions mention two system tools (doesn't name them) which will help in figuring out the correct arguments. The only tool I'm familiar with (unless I'm overlooking something) is GDB so I believe I am missing a critical component of this challenge.
The challenge is to figure out the correct arguments. So far I've run the program in GDB and set a breakpoint at main but I really don't know where to go from there. Any pro tips?
Are you sure you have to debug it? It would be easier to disassemble it. When you disassemble it look for cmp
There exists not only tools to decompile X86 binaries to Assembler code listings, but also some which attempt to show a more high level or readable listing. Try googling and see what you find. I'd be specific, but then, that would be counterproductive if your job is to learn some reverse engineering skills.
It is possible that the code is something like this: If Arg(1)='FOO' then print "Success". So you might not need to disassemble at all. Instead you only might need to find a tool which dumps out all strings in the executable that look like sequences of ASCII characters. If the sequence you are supposed to input is not in the set of characters easily input from the keyboard, there exist many utilities that will do this. If the program has been very carefully constructed, the author won't have left "FOO" if that was the "password" in plain sight, but will have tried to obscure it somewhat.
Personally I would start with an ltrace of the program with any arbitrary set of arguments. I'd then use the strings command and guess from that what some of the hidden argument literals might be. (Let's assume, for the moment, that the professor hasn't encrypted or obfuscated the strings and that they appear in the binary as literals). Then try again with one or two (or the requisite number, if number).
If you're lucky the program was compiled and provided to you without running strip. In that case you might have the symbol table to help. Then you could try single stepping through the program (read the gdb manuals). It might be tedious but there are ways to set a breakpoint and tell the debugger to run through some function call (such as any from the standard libraries) and stop upon return. Doing this repeatedly (identify where it's calling into standard or external libraries, set a breakpoint for the next instruction after the return, let gdb run the process through the call, and then inspect what the code is doing besides that.
Coupled with the ltrace it should be fairly easy to see the sequencing of the strcmp() (or similar) calls. As you see the string against which your input is being compared you can break out of the whole process and re-invoke the gdb and the program with that one argument, trace through 'til the next one and so on. Or you might learn some more advanced gdb tricks and actually modify your argument vector and restart main() from scratch.
It actually sounds like fun and I might have my wife whip up a simple binary for me to try this on. It might also create a little program to generate binaries of this sort. I'm thinking of a little #INCLUDE in the sources which provides the "passphrase" of arguments, and a make file that selects three to five words from /usr/dict/words, generates that #INCLUDE file from a template, then compiles the binary using that sequence.
UPDATE: Problem located in my related question - Nokogiri performance problem
I am having a serious problem with my program. After program reaches it's last statement, Aptana studio shows the program is still running even after the last line was evaluated. Ruby process (after the last line of the script) is still running with 100% CPU usage, it ends after several seconds (15-30 maybe). I am trying to at atleast see where the problem is but after a long time I am still at the beginning. So the question is, what could cause this problem and how can I at least see where the problem is, what are my options? Some additional information:
Aptana debbug mode: After the last line, this will show in the Debug window:
<terminated, exit value: 0>path/to/ruby
But Ruby process is still running and using 100% CPU
I was trying to use gdb to profile Ruby process itself, but ended up with nothing using method described here: Profilig using gdb. I am using debian squeeze 64-bit and i tried both versions of script (8,12 > 16,24). When I tried to get some stack info I just get this:
Program received signal SIGSEGV, Segmentation fault.
0x00007f20539a80b8 in ?? () from /lib/libc.so.6
/home/giron/programovani/gdb_init.sh:1: Error in sourced command file:
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(backtrace) will be abandoned.
When the function is done executing, GDB will silently stop.
After I quit gdb, following output shows up in Aptana console (But this is maybe absolutely useless, probably gdb did this, I don't know):
/home/giron/Aptana Studio 3 Workspace/RedisXmlConcept/bin/main.rb: [BUG] Segmentation fault
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]
-- control frame ----------
c:0001 p:0000 s:0002 b:0002 l:000f68 d:000f68 TOP
---------------------------
-- C level backtrace information -------------------------------------------
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(rb_vm_bugreport+0x5f)[0x7f205488216f]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(+0x63274) [0x7f205476a274]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(rb_bug+0xb3) [0x7f205476a413]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/lib/libruby.so.1.9(+0x10c215) [0x7f2054813215]
/lib/libpthread.so.0(+0xeff0) [0x7f20544f9ff0]
/lib/libc.so.6(+0xe40b8) [0x7f20539a80b8]
/lib/libgcc_s.so.1(_Unwind_Backtrace+0x49) [0x7f2050d5b599]
/lib/libc.so.6(backtrace+0x4e) [0x7f20539a81ae]
/home/giron/.rvm/rubies/ruby-1.9.2-p290/bin/ruby(_start+0) [0x400890]
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html
Just to be sure that I have described problem well, last line of code (before this, Nokogiri parsing and work with Redis database is done):
puts "End"
End is printed out and after this Ruby process will consume 100% CPU for several seconds
This question is related to my previous one here: Nokogiri performance problem where are some more code snippets but since I am focusing on the different approach here (profiling Ruby), I have created new question.
Thank you in advance for any tips, I am pretty much clueless right now.
I was trying to use gdb to profile Ruby process itself
Don't do that. Calling backtrace may not be safe in the context you are executing in, and (apparently) causes your program to SIGSEGV.
Instead, just attach gdb to the Ruby process, and execute thread apply all where command. Update your question with the output, and you may get a better answer.
I have recently started to learn Ruby. I know that Ruby is a interpreted language(even though "every" language is since it is interpreted by the CPU as machine code). But how does the ruby interpreter convert the code written in Ruby to machine code? I have read that the interpreter do not read the source code, but byte code, however I do never have to compile as I do in Java. So, is this yet another thing that Ruby does for you? And if it does, does it generate the byte code at runtime? Because you never get a .class file as you do in Java. And on top of it all I read about Just-In-Time compilators that obviously does something to the byte code so it runs faster.
If the above is the case does the interpreter first scan through all of the source code, convert it into byte code and then compiles it another time with JIT at runtime?
And last I AM NOT looking for an answer with the performance aspect of this, I just want to know how it is processed, which stages it goes through and in what time it does so.
Thanks for your time.
I am using this interpeter http://www.ruby-lang.org/en/
But how does the ruby interpreter convert the code written in Ruby to machine code?
It doesn't, at least not all the implementations.
Afaik only Rubinius is trying to do what you describe, that's compiling to machine code.
I have read that the interpreter do not read the source code, but byte code, however I do never have to compile as I do in Java. So, is this yet another thing that Ruby does for you?
Yes
And if it does, does it generate the byte code at runtime?
Yeap, pretty much. And keeps it in memory. The tradeof is the next time it has to read->translate->execute all over again.
If the above is the case does the interpreter first scan through all of the source code, convert it into byte code and then compiles it another time with JIT at runtime?
Not all the source code, just what it needs. Then yes, create a bytecode representation keeps it in memory, and not necessarily that is compiled to machine code.
The standard implementation of Ruby1.8 uses an interpreter called MRI (Matz's Ruby Interpreter). This is a program that is compiled to machine code that:
Reads the text files into a data structure.
Follows the instructions in the data structure to decide what to do