Caller/Backtrace beyond a thread - ruby

As far as I know, it is possible to get only the portion of the caller/backtrace information that is within the current thread; anything prior to that (in the thread that created the current thread) is cut off. The following exemplifies this; the fact that a called b, which called c, which created the thread that called d, is cut off:
def a; b end
def b; c end
def c; Thread.new{d}.join end
def d; e end
def e; puts caller end
a
# => this_file:4:in `d'
# this_file:3:in `block in c'
What is the reason for this feature?
Is there a way to get the caller/backtrace information beyond the current thread?

I think I came up with my answer.
Things that can be done to a thread from outside of a thread is not only creating it. Other than creating, you can make wake up, etc. So it is not clear what operation should be attributed as part of the caller. For example, suppose there is a thread:
1: t = Thread.new{
2: Thread.stop
3: puts caller
4: }
5: t.wakeup
The thread t is created at line 1, but it goes into sleep by itself in line 2, then it wakes up by line 5. So, when we locate ourselves at line 3 caller, and consider the caller part outside of the thread, it is not clear whether Thread.new in line 1 should be part of it, or t.wakeup in line 5 should be part of it. Therefore, there is no clear notion callers beyond the current thread.
However, if we define a clear notion, then it is possible for caller beyond a thread to make sense. For example, always adding the callers up to the creation of the thread may make sense. Otherwise, adding the callers leading to the the most recent wakeup or creation may make sense. It is up to the definition.

The answer to both your questions is really the same. Consider a slightly more involved main thread. Instead of simply waiting for the spawned thread to end in c the main thread goes on calling other functions, perhaps even returning from c and going about it's business while the spawned thread goes on about it's business.
This means that the stack in the main thread has changed since the thread starting in d was spawned. In other words, by the time you call puts caller the stack in the main thread is no longer in the state it was when the secondary thread was created. There is no way to safely walk back up the stack beyond this point.
So in short:
The stack of the spawning thread will not remain in the state it was when the thread was spawned so walking back beyond the start of a threads own stack is not safe.
No, since the entire idea behind threads is that they are (pseudo) parallel, their stacks are completely unrelated.
Update:
As suggested in the comments, the stack of the current thread can be copied to the new thread at creation time. This would preserve the information that lead up to the thread being created, but the solution is not without its own set of problems.
Thread creation will be slower. That could be ok, if there was anything to gain from it, but in this case, is it?
What would it mean to return from the thread entry function?
It could return to the function that created the thread and keep running as if it was just a function call - only that it now runs in the second thread, not the original one. Would we want that?
There could be some magic that ensures that the thread terminates even if it's not at the top of the call stack. This would make the information in the call stack above the thread entry function incorrect anyways.
On systems with limits on the stacksize for each thread you could run into problems where the thread ran out of stack even if it's not using very much on it's own.
There probably other scenarios and peculiarities that could be thought out too, but the way threads are created with their own empty stack to start with makes the model both simple and predictable without leaving any useful information out of the callstack.

Related

Understanding Celluloid Pool

I guess my understanding toward Celluloid Pool is sort of broken. I will try to explain below but before that a quick note.
Note: Our system is running against a very fast client passing messages over ZeroMQ.
With the following Vanilla Celluloid app
class VanillaClient
include Celluloid::ZMQ
def read
loop { async.evaluate_response(socket.read_multipart)
end
def evaluate_response(data)
## the reason for using defer can be found over here.
Celluloid.defer do
ExternalService.execute(data)
end
end
end
Our system result in failure after some time, reason 'Can't spawn more thread' (or something like it)
So we intended to use Celluloid Pool(to avoid the above-mentioned problem ) so that we can limit the number of threads that spawned
My Understanding toward Celluloid Pool is
Celluloid Pool maintains a pool of actors for you so that you can distribute your task in parallel.
Hence, I decide to test it, but according to my test cases, it seems to behave serially(i.e thing never get distribute or happen in parallel.)
Example to replicate this.
sender-1.rb
## Send message `1` to the the_client.rb
sender-2.rb
## Send message `2` to the the_client.rb
the_client.rb
## take message from sender-1 and sender-2 and return it back to receiver.rb
## heads on, the `sleep` is introduced to test/replicate the IO block that happens in the actual code.
receiver.rb
## print the message obtained from the_client.rb
If, the sender-2.rb is run before sender-1.rb it appears that the pool gets blocked for 20 sec (sleep time in the_client.rb,can be seen over here) before consuming the data sent by sender-1.rb
It behaves the same in ruby-2.2.2 and under jRuby-9.0.5.0. What could be the possible causes for Pool to act in such manner?
Your pool call is not asynchronous.
Execution of evaluate on #pool needs to be .async still, as in your original example, not using pools. You still want asynchronous behavior, but you als want to have multiple handler actors.
Next you will likely hit the Pool.async bug.
https://github.com/celluloid/celluloid-pool/issues/6
This means after 5 hits to evaluate your pool will become unresponsive until at least one actor in the pool is finished. Worst case scenario, if you get 6+ requests in rapid succession, the 6th will then take 120 seconds, because it will take 5*20 seconds before it executes, then 20 seconds to execute itself.
Depending on what your actual operation is that's causing you delays -- you might need to adjust your pool size down the line.

Ruby threads and mutex

Why does the following ruby code not work?
2 | require 'thread'
3 |
4 | $mutex = Mutex.new
5 | $mutex.lock
6 |
7 | t = Thread.new {
8 | sleep 10
9 | $mutex.unlock
10 | }
11 |
12 | $mutex.lock
13 | puts "Delayed hello"
When I'm running it, I get an error:
./test.rb:13:in `lock': thread 0x7f4557856378 tried to join itself (ThreadError)
from ./test.rb:13
What is the right way to synchronize two threads without joining them (both threads must continue running after synchronization)?
This is old but I'm contributing since it's a bit scary that none of the other answers (at time of writing) seem to be correct. The original code is clearly attempting to:
Create a mutex in the main thread and lock it.
Start a new thread, which may begin running at any time and after any delay subject to the whims of the Ruby runtime.
Have this thread unlock the mutex only once it's finished doing its work.
Have the main thread then deliberately re-lock the mutex, with the intention that it's spawned a thread which will unlock it. The main thread waits for that.
Then the main thread continues running.
#user2413915: Your solution omits the step of locking again in the main thread, so it won't wait for the spawned thread as intended.
#Paul Rubel: Your code assumes that the spawned thread gets as far as its lock of the mutex before the main thread does. This is a race condition. If the main thread continues to execute and locks first, the spawned thread will be blocked until after the main thread has printed "Delayed hello", which is the exact opposite of the desired outcome. You probably ran it by pasting into the IRB prompt; if you try with your example modified so that the end and Mutex lock are on the same line, it'll fail, printing the message too early (i.e. "end; $mutex.lock"). Either way, it's relying on behaviour of the Ruby runtime that's working by chance.
The original code should actually work fine in principle, albeit arguably lacking in elegance - in practice the Ruby 1.9+ runtime won't allow it as it "sees" two consecutive locks in the main thread without an unlock and doesn't "realise" that there's a spawned thread which is going to do the unlocking. Ruby (in this case technically erroneously) raises a ThreadError deadlock exception.
Instead, make cunning use of the ruby Queue. When you try to pull something off a Queue, the call will block until an item is available. So:
require 'thread'
require 'queue'
queue = Queue.new
t = Thread.new {
sleep 10
queue.push( nil ) # Push any object you like - here, it's a NilClass instance
}
queue.pop() # Blocks until thread 't' pushes onto the queue
puts "Delayed hello"
If the spawned thread runs first and pushes onto the queue, then the main thread will just pop the item and keep going. If the main thread tries to pop before the spawned thread pushes, it'll wait for the spawned thread.
[Edit: Note that the object pushed onto the queue could be the results of the spawned thread's processing task, so the main thread gets to wait until processing is complete and get the processing result in one go].
I've tested this on Ruby 1.8.7-p375 and Ruby 2.1.2 via rbenv with success, so it's reasonable to assume that the standard library Queue class is functional across all common major Ruby versions.
You do not need to call the mutex on line 12 again.
require 'thread'
$mutex = Mutex.new
$mutex.lock
t = Thread.new {
sleep 10
$mutex.unlock
}
puts "Delayed hello"
This will work.

Problem wuth Ruby threads

I write a simple bot using "SimpleMUCClient". But got error: app.rb:73:in stop': deadlock detected (fatal)
from app.rb:73:in'. How to fix it?
Most likely the code you're running is executed in another thread. That particular thread is then joined (meaning Ruby waits for it to finish upon exiting the script) using Thread.join(). Calling Thread.stop() while also calling .join() is most likely the cause of the deadlock. Having said that you should following the guides of StackOverflow regarding how to ask questions properly, since you haven't done so I've down voted your question.
Joining a thread while still calling Thread.stop can be done as following:
th = Thread.new do
Thread.stop
end
if th.status === 'sleep'
th.run
else
th.join
end
It's not the cleanest way but it works. Also, if you want to actually terminate a thread you'll have to call Thread.exit instead.

what happens at the lower levels after a fork system call?

I know what the fork() does at the higher level. What I'd like to know is this -
As soon as there is a fork call, a trap instruction follows and control jumps to execute the fork "handler" . Now,How does this handler , which creates the child process, by duplicating the parent process by creating another address space and process control block , return 2 values, one to each process ?
At what point of execution does the fork return 2 values ?
To put it in short, can anbody please explain the step-by-step events that take place at the lower level after a fork call ?
It's not so hard right - the kernel half of the fork() syscall can tell the difference between the two processes via the Process Control Block as you mentioned, but you don't even need to do that. So the pseudocode looks like:
int fork()
{
int orig_pid = getpid();
int new_pid = kernel_do_fork(); // Now there's two processes
// Remember, orig_pid is the same in both procs
if (orig_pid == getpid()) {
return new_pid;
}
// Must be the child
return 0;
}
Edit:
The naive version does just as you describe - it creates a new process context, copies all of the associated thread contexts, copies all of the pages and file mappings, and the new process is put into the "ready to run" list.
I think the part you're getting confused on is, that when these processes resume (i.e. when the parent returns from kernel_do_fork, and the child is scheduled for the first time), it starts in the middle of the function (i.e. executing that first 'if'). It's an exact copy - both processes will execute the 2nd half of the function.
The value returned to each process is different. The parent/original thread get's the PID of the child process and the child process get's 0.
The Linux kernel achieves this on x86 by changing the value in the eax register as it copies the current thread in the parent process.

Windows: Under what circumstances might SetEvent() not return immediately?

I have a thread that, when its function exits its loop (the exit is triggered by an event), it does some cleanup and then sets a different event to let a master thread know that it is done.
However, under some circumstances, SetEvent() seems not to return after it sets the thread's 'I'm done' event.
This thread is part of a DLL and the problem seems to occur after the DLL has been loaded/attached, the thread started, the thread ended and the DLL detached/unloaded a number of times without the application shutting down in between. The number of times this sequence has to be repeated before this problem happens is variable.
In case you are skeptical that I know what I'm talking about, I have determined what's happening by bracketing the SetEvent() call with calls to OutputDebugString(). The output before SetEvent() appears. Then, the waiting thread produces output that indicates that the Event has been set.
However, the second call to OutputDebugString() in the exiting thread (the one AFTER SetEvent() ) never occurs, or at least its string never shows up. If this happens, the application crashes a few moments later.
(Note that the calls to OutputDebugString() were added after the problem started occurring, so it's unlikely to be hanging there, rather than in SetEvent().)
I'm not entirely sure what causes the crash, but it occurs in the same thread in which SetEvent() didn't return immediately (I've been tracking/outputting the thread IDs). I suppose it's possible that SetEvent() is finally returning, by which point the context to which it is returning is gone/invalid, but what could cause such a delay?
It turns out that I've been blinded by looking at this code for so long, and it didn't even occur to me to check the return code. I'm done looking at it for today, so I'll know what it's returning (if it's returning) on Monday and I'll edit this question with that info then.
Update: I changed the (master) code to wait for the thread to exit rather than for it to set the event, and removed the SetEvent() call from the slave thread. This changed the nature of the bug: now, instead of failing to return from SetEvent(), it doesn't exit the thread at all and the whole thing hangs.
This indicates that the problem is not with SetEvent(), but something deeper. No idea what, yet, but it's good not to be chasing down that blind alley.
Update (Feb 13/09):
It turned out that the problem was deeper than I thought when I asked this question. jdigital (and probably others) has pretty much nailed the underlying problem: we were trying to unload a thread as part of the process of detaching a DLL.
This, as I didn't realize at the time, but have since found out through research here and elsewhere (Raymond Chen's blog, for example), is a Very Bad Thing.
The problem was, because of the way it was coded and the way it was behaving, it not obvious that that was the underlying problem - it was camouflaged as all sorts of other Bad Behaviours that I had to wade through.
Some of the suggestions here helped me do that, so I'm grateful to everyone who contributed. Thank you!
Who is unloading the DLL and at what time is the unload done? I am wondering if there is a timing issue here where the DLL is unloaded before the thread has run to completion.
Are you dereferncing a HANDLE * to pass to SetEvent? It's more likely that the event handle reference is invalid and the crash is an access violation (i.e., accessing garbage).
You might want to use WinDbg to catch the crash and examine the stack.
Why do you need to set an event in the slave thread to trigger to the master thread that the thread is done? just exit the thread, the calling master thread should wait for the worker thread to exit, example pseudo code -
Master
{
TerminateEvent = CreateEvent ( ... ) ;
ThreadHandle = BeginThread ( Slave, (LPVOID) TerminateEvent ) ;
...
Do some work
...
SetEvent ( TerminateEvent ) ;
WaitForSingleObject ( ThreadHandle, SOME_TIME_OUT ) ;
CloseHandle ( TerminateEvent ) ;
CloseHandle ( ThreadHandle ) ;
}
Slave ( LPVOID ThreadParam )
{
TerminateEvent = (HANDLE) ThreadParam ;
while ( WaitForSingleObject ( TerminateEvent, SOME__SHORT_TIME_OUT ) == WAIT_TIMEOUT )
{
...
Do some work
...
}
}
There are lots of error conditions and states to check for but this is the essence of how I normally do it.
If you can get hold of it, get this book, it changed my life with respect to Windows development when I first read it many, many years ago.
Advanced Windows: The Developer's Guide to the Win32 Api for Windows Nt 3.5 and Windows 95 (Paperback), by Jeffrey Richter (Author)

Resources