Trace Ruby calls to "optimized" method calls with TracePoint - ruby

I am trying to trace all method calls in a Ruby program using TracePoint. It works well until I hit a method call to an "optimized" method call.
Ruby has operators that are "optimized" by replacing YARV instructions with specialized instructions to speed up method calls like greater than, less than. One of these optimized
You can see this using ruby directly by running
code = <<END
1 / 1
END
puts RubyVM::InstructionSequence.compile(code).disasm
# == disasm: <RubyVM::InstructionSequence:<compiled>#<compiled>>==========
# 0000 trace 1 ( 1)
# 0002 putobject_OP_INT2FIX_O_1_C_
# 0003 putobject_OP_INT2FIX_O_1_C_
# 0004 opt_div <callinfo!mid:/, argc:1, ARGS_SIMPLE>
# 0006 leave
Here you see opt_div is used rather than opt_send_without_block.
It appears you cannot trace these optimized method calls. For example:
trace = TracePoint.trace(:call, :c_call) do |tp|
tp.disable
puts "calling #{tp.defined_class}##{tp.method_id}"
tp.enable
end
trace.enable
1.div(1)
1 / 1
1.div(2)
You can see that 1.div is traced, but not 1/1
calling TracePoint#enable
calling Fixnum#div
calling Fixnum#div
So my question is this: How can I trace all method calls including "optimized" method calls in Ruby (MRI)?

From koichi, you can disable the optimizations using:
RubyVM::InstructionSequence.compile_option = { specialized_instruction: false }
This will work for my cases, but I imagine that it will slow down execution.
One other caveat if you are trying this at home, is that you cannot set that compile_option in the same file since by the time it is executed the file is already compiled. Instead you need need to execute this code before loading or requiring the file you are trying to trace.
You can also use this option in eval-d code:
iseq = RubyVM::InstructionSequence.compile(<<EOS, nil, nil, 1, specialized_instruction: false)
1 / 1
EOS
trace = TracePoint.trace(:call, :c_call) do |tp|
tp.disable
puts "calling #{tp.defined_class}##{tp.method_id}"
tp.enable
end
iseq.eval

Related

Is there an equivalent of shell scripting's xtrace option for Ruby?

When debugging shell scripts, I find it helpful to run with xtrace on:
-x xtrace Print commands and parameter
assignments when they are exe-
cuted, preceded by the value
of PS4.
For instance:
$ set -x
$ s='Say again?'
+ s='Say again?'
# Other commands that might mess with the value of $s
$ echo $s
+ echo Say 'again?'
Say again?
I know that Ruby has interactive debuggers such as pry and byebug, but I'm looking for something that will be easy to turn on for logging automated scripts.
I did find an xtrace gem, but it has something to do with a PHP format.
I also see there is a Tracer class and a TracePoint class which do seem to provide a way to print statements as they are executed. But I haven't found any way to print the value of variables (rather than just the variable name):
$ ruby -r tracer trace.rb
#0:/usr/local/Cellar/ruby/2.4.1_1/lib/ruby/2.4.0/rubygems/core_ext/kernel_require.rb:55:Kernel:<: return gem_original_require(path)
#0:trace.rb:1::-: s='Say again?'
#0:trace.rb:2::-: puts s
Say again?
I'd like to have the penultimate line read:
#0:trace.rb:2::-: puts 'Say again?'
Is this possible? Or is there a better way with Ruby?
I was able to build a module that more or less does what I'm looking for:
require 'pry'
=begin
We are always one line behind because the value of assignment comes
_after_ the trace is called. Without this, every assignment would look
like:
x = 1 #=> {:x=>nil}
It would be nice if the trace happened after assignment, but what can
you do?
=end
module Xtrace
# Only run the trace on the main file, not on require'd files.
# Possible room for expansion later.
#files = {$0 => Pry::Code.from_file($0)}
def Xtrace.print_trace
if #prev_line then
if #files[#path] then
line = #files[#path].around(#prev_line, 0).chomp
# When calling a method, don't make it look like it's being defined.
line.gsub!(/^\s*def\s*\b/, '') if #event == :call
values = []
#bind.local_variables.each do |v|
values << {v => #bind.local_variable_get(v)} if line =~ /\b#{v}\b/
end
STDERR.printf "%5s: %s", #prev_line, line
STDERR.printf " #=> %s", values.join(', ') unless values.empty?
STDERR.printf "\n"
end
end
end
#xtrace = TracePoint.trace(:line, :call) do |tp|
tp.disable
#bind=tp.binding
Xtrace.print_trace
# Other than the binding, everything we need to print comes from the
# previous trace call.
#prev_line = tp.lineno
#event=tp.event
#path=tp.path
tp.enable
end
# Need to print the trace one last time after the last line of code.
at_exit do
# It doesn't matter what we do in this last line. Any statement works.
# Also, it's a bit inconvenient that the disable command is itself traced.
#xtrace.disable
end
end
If you put it in a file named xtrace.rb and put in in your library load path, you can begin tracing by adding require 'xtrace'. It prints the line number of each line and method call executed, the actual code and the values of any local variable in the line. For a simple factorial function, the output might look like:
3: def factorial(n)
8: puts factorial(3)
3: factorial(n) #=> {:n=>3}
4: return 1 if n <= 1 #=> {:n=>3}
5: return n*factorial(n-1) #=> {:n=>2}
3: factorial(n) #=> {:n=>2}
4: return 1 if n <= 1 #=> {:n=>2}
5: return n*factorial(n-1) #=> {:n=>1}
3: factorial(n) #=> {:n=>1}
4: return 1 if n <= 1
6
For the moment, it only looks at local variables. It also only traces the executed file and not any loaded files. There's no way to enable or disable traces just yet. The trace begins when you require the module and ends when the execution does. Trace output goes to STDERR and the format is hardcoded.
If you use this module, watch out that you don't leak sensitive information such as passwords, API keys or PII.

Why does the Ruby debugger return different values than the code at run time?

See this simple Ruby class:
require 'byebug'
class Foo
def run
byebug
puts defined?(bar)
puts bar.inspect
bar = 'local string'
puts defined?(bar)
puts bar.inspect
end
def bar
'string from method'
end
end
Foo.new.run
When running this class the following behavior can be observed in the debugger's console:
$ ruby byebug.rb
[2, 11] in /../test.rb
2:
3: class Foo
4: def run
5: byebug
6:
=> 7: puts defined?(bar)
8: puts bar.inspect
9:
10: bar = 'local string'
11:
At the breakpoint the debugger returns the following values:
(byebug) defined?(bar)
"local-variable"
(byebug) bar.inspect
"nil"
Note that - although the debugger's breakpoint is in line #5 - it already knows that there will be a local variable bar defined in line #10 that will shadow the method bar and the debugger is actually not able anymore to call the bar method. What is doesn't know at this point is that the string 'local string' will be assign to bar. The debugger returns nil for bar.
Let's continue with the original code in the Ruby file and look at its output:
(byebug) continue
method
"string from method"
local-variable
"local string"
At run time in line #7 Ruby still knowns that bar is indeed a method and it is still able to call it in line #8. Then l ine #10 actually defines the local variable that shadows the method with the same name and tTherefore Ruby returns like expected in line #12 and #13.
Questions: Why does the debugger return different values than the original code? It seems like it is able to look into the future. Is this considered a feature or a bug? Is this behavior documented?
Whenever you drop into a debugging session, you're effectively executing an eval against the binding at that spot in the code. Here's a simpler bit of code that recreates the behavior that's driving you nuts:
def make_head_explode
puts "== Proof bar isn't defined"
puts defined?(bar) # => nil
puts "== But WTF?! It shows up in eval"
eval(<<~RUBY)
puts defined?(bar) # => 'local-variable'
puts bar.inspect # => nil
RUBY
bar = 1
puts "\n== Proof bar is now defined"
puts defined?(bar) # => 'local-variable'
puts bar.inspect # => 1
end
When the method make_head_explode is fed to the interpreter, it's compiled to YARV instructions, a local table, which stores information about the method's arguments and all local variables in the method, and a catch table that includes information about rescues within the method if present.
The root cause of this issue is that since you're compiling code dynamically at runtime with eval, Ruby passes the local table, which includes an unset variable enry, to eval as well.
To start, let's use a use a very simple method that demonstrates the behavior we'd expect.
def foo_boom
foo # => NameError
foo = 1 # => 1
foo # => 1
end
We can inspect this by extracting the YARV byte code for the existing method with RubyVM::InstructionSequence.disasm(method). Note I'm going to ignore trace calls to keep the instructions tidy.
Output for RubyVM::InstructionSequence.disasm(method(:foo_boom)) less trace:
== disasm: #<ISeq:foo_boom#(irb)>=======================================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1#-1, kwrest: -1])
[ 2] foo
0004 putself
0005 opt_send_without_block <callinfo!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 pop
0011 putobject_OP_INT2FIX_O_1_C_
0012 setlocal_OP__WC__0 2
0016 getlocal_OP__WC__0 2
0020 leave ( 253)
Now let's walk through the trace.
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1#-1, kwrest: -1])
[ 2] foo
We can see here that YARV has identified we have the local variable foo, and stored it in our local table at index [2]. If we had other local variables and arguments, they'd also appear in this table.
Next we have the instructions generated when we try to call foo before its assigned:
0004 putself
0005 opt_send_without_block <callinfo!mid:foo, argc:0, FCALL|VCALL|ARGS_SIMPLE>, <callcache>
0008 pop
Let's dissect what happens here. Ruby compiles function calls for YARV according to the following pattern:
Push receiver: putself, referring to top-level scope of function
Push arguments: none here
Call the method/function: function call (FCALL) to foo
Next we have the instructions for setting at getting foo once it becomes a global variable:
0008 pop
0011 putobject_OP_INT2FIX_O_1_C_
0012 setlocal_OP__WC__0 2
0016 getlocal_OP__WC__0 2
0020 leave ( 253)
Key takeaway: when YARV has the entire source code at hand, it knows when locals are defined and treats premature calls to local variables as FCALLs just as you'd expect.
Now let's look at a "misbehaving" version that uses eval
def bar_boom
eval 'bar' # => nil, but we'd expect an errror
bar = 1 # => 1
bar
end
Output for RubyVM::InstructionSequence.disasm(method(:bar_boom)) less trace:
== disasm: #<ISeq:bar_boom#(irb)>=======================================
local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: -1#-1, kwrest: -1])
[ 2] bar
0004 putself
0005 putstring "bar"
0007 opt_send_without_block <callinfo!mid:eval, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0010 pop
0013 putobject_OP_INT2FIX_O_1_C_
0014 setlocal_OP__WC__0 2
0018 getlocal_OP__WC__0 2
0022 leave ( 264)
Again we see a local variable, bar, in the locals table at index 2. We also have the following instructions for eval:
0004 putself
0005 putstring "bar"
0007 opt_send_without_block <callinfo!mid:eval, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0010 pop
Let's dissect what happens here:
Push receiver: again putself, referring to top-level scope of function
Push arguments: "bar"
Call the method/function: function call (FCALL) to eval
Afterward, we have the standard assignment to bar that we'd expect.
0013 putobject_OP_INT2FIX_O_1_C_
0014 setlocal_OP__WC__0 2
0018 getlocal_OP__WC__0 2
0022 leave ( 264)
Had we not had eval here, Ruby would have known to treat the call to bar as a function call, which would have blown up as it did in our previous example. However, since eval is dynamically evaluated and the instructions for its code won't be generated until runtime, the evaluation occurs in the context of the already determined instructions and local table, which holds the phantom bar that you see. Unfortunately, at this stage, Ruby is unaware that bar was initialized "below" the eval statement.
For a deeper dive, I'd recommend reading Ruby Under a Microscope and the Ruby Hacking Guide's section on Evaluation.

What happens when you use string interpolation in ruby?

I thought that ruby just call method to_s but I can't explain how this works:
class Fake
def to_s
self
end
end
"#{Fake.new}"
By the logic this should raise stack level too deep because of infinity recursion. But it works fine and seems to call #to_s from an Object.
=> "#<Fake:0x137029f8>"
But why?
ADDED:
class Fake
def to_s
Fake2.new
end
end
class Fake2
def to_s
"Fake2#to_s"
end
end
This code works differently in two cases:
puts "#{Fake.new}" => "#<Fake:0x137d5ac4>"
But:
puts Fake.new.to_s => "Fake2#to_s"
I think it's abnormal. Can somebody suggest when in ruby interpreter it happens internally?
Short version
Ruby does call to_s, but it checks that to_s returns a string. If it doesn't, ruby calls the default implementation of to_s instead. Calling to_s recursively wouldn't be a good idea (no guarantee of termination) - you could crash the VM and ruby code shouldn't be able to crash the whole VM.
You get different output from Fake.new.to_s because irb calls inspect to display the result to you, and inspect calls to_s a second time
Long version
To answer "what happens when ruby does x", a good place to start is to look at what instructions get generated for the VM (this is all MRI specific). For your example:
puts RubyVM::InstructionSequence.compile('"#{Foo.new}"').disasm
outputs
0000 trace 1 ( 1)
0002 getinlinecache 9, <is:0>
0005 getconstant :Foo
0007 setinlinecache <is:0>
0009 opt_send_simple <callinfo!mid:new, argc:0, ARGS_SKIP>
0011 tostring
0012 concatstrings 1
0014 leave
There's some messing around with the cache, and you'll always get trace, leave but in a nutshell this says.
get the constant Foo
call its new method
execute the tostring instruction
execute the concatstrings instruction with the result of the tostring instruction (the last value on the stack (if you do this with multiple #{} sequences you can see it building up all the individual strings and then calling concatstrings once on all consuming all of those strings)
The instructions in this dump are defined in insns.def: this maps these instructions to their implementation. You can see that tostring just calls rb_obj_as_string.
If you search for rb_obj_as_string through the ruby codebase (I find http://rxr.whitequark.org useful for this) you can see it's defined here as
VALUE
rb_obj_as_string(VALUE obj)
{
VALUE str;
if (RB_TYPE_P(obj, T_STRING)) {
return obj;
}
str = rb_funcall(obj, id_to_s, 0);
if (!RB_TYPE_P(str, T_STRING))
return rb_any_to_s(obj);
if (OBJ_TAINTED(obj)) OBJ_TAINT(str);
return str;
}
In brief, if we already have a string then return that. If not, call the object's to_s method. Then, (and this is what is crucial for your question), it checks the type of the result. If it's not a string it returns rb_any_to_s instead, which is the function that implements the default to_s

Is it possible to see the ruby code in a proc?

p = Proc.new{ puts 'ok' }
Is is possible to see the ruby code in the proc?
inspect returns the memory location:
puts p.inspect
#<Proc:0x007f9e42980b88#(irb):2>
Ruby 1.9.3
Take a look at the sourcify gem:
proc { x + y }.to_source
# >> "proc { (x + y) }"
Do you mean the original source code or its bytecode representation ?
For the former you may use standard Proc's method source_location
p.source_location
=> ["test.rb", 21]
and read the appropriate lines of code.
For the latter it may come handy the RubyVM::InstructionSequence and its class method disassemble:
irb> RubyVM::InstructionSequence.disasm p
=> "== disasm: <RubyVM::InstructionSequence:block in irb_binding#(irb)>
=====\n== catch table\n| catch type: redo st: 0000 ed: 0011 sp: 0000
cont: 0000\n| catch type: next st: 0000 ed: 0011 sp: 0000 cont:
0011\n|------------------------------------------------------------------------\n
0000 trace 1
( 1)\n0002 putself \n0003 putstring \"ok\"\n0005
send :puts, 1, nil, 8, <ic:0>\n0011 leave \n"
No, there is no way to do that in Ruby.
Some Ruby implementations may or may not have implementation-specific ways of getting the source code.
You can also try to use Proc#source_location to find the file that the Proc was defined in, and then parse that file to find the source code. But that won't work if the Proc wasn't defined in a file (e.g. if it was defined dynamically with eval) or if the source file no longer exists, e.g. because you are running an AOT-compiled version of your program.
So, the short answer is: no, there is no way.
The long answer is: there are some ways that may or may not sometimes work depending on way too many factors to even begin to make this work reliably.
That's not even taking into account Procs which don't even have a Ruby source code because they were defined in native code.
If proc is defined into a file, U can get the file location of proc then serialize it, then after deserialize use the location to get back to the proc again
proc_location_array = proc.source_location
after deserialize:
file_name = proc_location_array[0]
line_number = proc_location_array[1]
proc_line_code = IO.readlines(file_name)[line_number - 1]
proc_hash_string = proc_line_code[proc_line_code.index("{")..proc_line_code.length]
proc = eval("lambda #{proc_hash_string}")
Although an old question, still, I wanted to share my thoughts.
You can use Pry gem and end up with something like this:
[11] pry> p = Proc.new{ puts 'ok' }
=> #<Proc:0x007febe00e6360#(pry):23>
[12] pry> show-source p
From: (pry)
Number of lines: 1
p = Proc.new{ puts 'ok' }
Also, if you would use it from Rails context, you can put:
::Kernel.binding.pry
in your controllers or models, and
- require 'pry'; binding.pry
in your views, where you want to start debugging.
And in the tests, I use a combination, first require 'pry' at the top, and then ::Kernel.binding.pry where needed.
References:
http://pryrepl.org
https://github.com/pry/pry
https://github.com/pry/pry/wiki/Source-browsing

Ruby Block statements and Implicit Returns

I always thought that rubyists choose to make returns in ruby implicit because of a style preference (less words = more concise). However, can someone confirm with me that in the following example you actually have to make the returns implicit or else the intended functionality won't work? (The intended functionality is to be able to split a sentence into words and return either "Begins with a vowel" or "Begins with a consonant" for each word)
# With Implicit Returns
def begins_with_vowel_or_consonant(words)
words_array = words.split(" ").map do |word|
if "aeiou".include?(word[0,1])
"Begins with a vowel" # => This is an implicit return
else
"Begins with a consonant" # => This is another implicit return
end
end
end
# With Explicit Returns
def begins_with_vowel_or_consonant(words)
words_array = words.split(" ").map do |word|
if "aeiou".include?(word[0,1])
return "Begins with a vowel" # => This is an explicit return
else
return "Begins with a consonant" # => This is another explicit return
end
end
end
Now, I know there are definitely many ways to make this code more efficient and better, but the reason I've laid it out like this is to illustrate the need for the implicit returns. Can someone confirm with me that implicit returns are indeed needed and not just a stylistic choice?
EDIT:
Here's an example to illustrate what I'm trying to show:
# Implicit Return
begins_with_vowel_or_consonant("hello world") # => ["Begins with a consonant", "Begins with a consonant"]
# Explicit Return
begins_with_vowel_or_consonant("hello world") # => "Begins with a consonant"
The implicit return value of a method is the last expression evaluated in the method.
In your case, neither of the two lines you annotated are the last expression. The last expression that gets evaluated is the assignment to words_array (which BTW is completely useless since because it is the last expression there is no way to use that variable afterwards).
Now, what is the value of an assignment expression? It is the value being assigned, in this particular case, the return value of the map method, which is an Array. So, that is what the method returns.
In the second example, at the very first iteration of the map, you will hit one of the two returns and thus immediately return from the method. In the first example, however, you will always iterate through the entire words array.
The problem is not that implicit and explicit returns are different, the problem is that the two lines you claim are implicit returns aren't.
This reason this is happening is because the return statement is inside of a block. If the return statement was inside of just a function, execution flow would be as you expect. Blocks and returns in ruby (and break statements for that matter) are a weird beast. Let's take a simplier example to capture what you're asking:
def no_return()
(1..10).each do |i|
puts i
end
puts 'end'
end
def yes_return()
(1..10).each do |i|
puts i
return
end
puts 'end'
end
If you run both of these, you'll see that the first will print out the numbers 1-10 and the word 'end' as you would expect, but the second function only prints out 1. This is because ruby internally implements returns (and break statements) as exceptions. When these exceptions are thrown, a lookup table called a "catch table" is used to try and find where the flow of execution should continue. If none is found, ruby internally will search down through the stacks of rb_control_frame structures looking for the pointer that points to the code that we are executing. So in your case, the exception is thrown and the closest pointer (in terms of stack frames) is at the end of the method, essentially causing a termination of the entire method. That's why you won't even see the 'end' being printed.
no_return instructions:
0000 trace 1 ( 3)
0002 putnil
0003 getdynamic i, 0
0006 send :puts, 1, nil, 8, <ic:0>
0012 leave
yes_return instructions:
0000 trace 1 ( 3)
0002 putnil
0003 getdynamic i, 0
0006 send :puts, 1, nil, 8, <ic:0>
0012 pop
0013 trace 1 ( 4)
0015 putnil
0016 throw 1
0018 leave
The important bit is in the yes_return instructions, there's that 'throw 1' that is actually the return statement implemented as a thrown (and caught) exception. Breaks work in the same way except they 'throw 2'.
To see the actual instructions with all of the frames and catch table yourself, check out this app I put together not too long ago:
http://rubybytes.herokuapp.com/
For more information, check out Pat Shaughnessy's blog, in particular this entry for more information:
http://patshaughnessy.net/2012/6/29/how-ruby-executes-your-code
And another shameless plug, if you're into java, check out my java version of rubybytes at
javabytes.herokuapp.com to see disassembled java bytecode (sorry for it being linkless, you'll have to copy and paste the url into your browser).
return will return from the method that yielded to the block, so you either need to be implicit or use next

Resources