ruby multithreading - stop and resume specific thread - ruby

I want to be able to stop and run specific thread in ruby in the following context:
thread_hash = Hash.new()
loop do
Thread.start(call.function) do |execute|
operation = execute.extract(some_value_from_incoming_message)
if thread_hash.has_key? operation
thread_hash[operation].run
elsif !thread_hash.has_key?
thread_hash[operation] = Thread.current
do_something_else_1
Thread.stop
do_something_else_2
Thread.stop
do_something_else_3
thread_hash.delete(operation)
else
exit
end
end
end
In human language script above acts as a server which receives a message, extracts some parameter from the incoming message. If that parameter is already in the thread_hash, suspended thread should be resumed.
If the parameter is not present in the thread_hash, parameter along with thread id is stored in the thread_hash, some function is executed and current thread is suspended until resumed in the new loop and again until do_something_else_3 function is executed and operation serviced in the current thread is removed from hash.
Can thread be resumed in Ruby based on thread id or should new thread be given name during start like
thr = Thread.start
and can be resumed only by this name like:
thr.run
Is the solution described above realistic? Could it cause some sort of leak/deadlock due to old thread resumption in the new thread or redundant threads are automatically taken care of by Ruby?

It sounds to me like you're trying to do everything in every thread: read input, run existing threads, store new threads, delete old threads. Why not break up the problem?
hash = {}
loop do
operation = get_value_from message
if hash[operation] and hash[operation].alive?
hash[operation].wakeup
else
hash[operation] = Thread.new do
do_something1
Thread.stop
do_something2
Thread.stop
do_something3
end
end
end
Instead of wrapping the whole contents of the loop in a thread, only thread the message processing code. That lets it run in the background while the loop goes back to waiting for a message. This solves any sort of race/deadlock problem since all of the thread management occurs in the main thread.

Related

C# - Worker thread with slots, items being dynamically added

I have window service that polls a web service for new items every 30 seconds. If it finds any new items, it checks to see if they need to be "processed" and then puts them in a list to process. I spawn off different threads to process 5 at a time, and when one finishes, another one will fill the empty slot. Once everything has finished, the program sleeps for 30 seconds and then polls again.
My issue is, while the items are being processed(which could take up to 15 minutes), new items are being created which also may need to be processed. My problem is the main thread gets held up waiting for every last thread to finish before it sleeps and starts the process all over.
What I'm looking to do is have the main thread continue to poll the web service every 30 seconds, however instead of getting held up, add any new items it finds to a list, which would be processed in a separate worker thread. In that worker thread, it would still have say only 5 slots available, but they would essentially always all be filled, assuming the main thread continues to find new items to process.
I hope that makes sense. Thanks!
EDIT: updated code sample
I put together this as a worker thread that operates on a ConcurrentQueue. Any way to improve this?
private void ThreadWorker() {
DateTime dtStart = DateTime.Now;
int iNumOfConcurrentSlots = 6
Thread[] threads = new Thread[iNumOfConcurrentSlots];
while (true) {
for (int i = 0; i < m_iNumOfConcurrentSlots; i++) {
if (m_tAssetQueue.TryDequeue(out Asset aa)) {
threads[i] = new Thread(() => ProcessAsset(aa));
threads[i].Start();
Thread.Sleep(500);
}
}
}
}
EDIT: Ahh yeah that won't work above. I need a way of being able to not hard code the number of ConcurrentSlots, but have each thread basically waiting and looking for something in the Queue and if it finds it, process it. But then I also need a way of signalling that the ProcessAsset() function has completed to release the thread and allow another thread to be created....
One simple way to do it is to have 5 threads reading from a concurrent queue. The main thread queues items and the worker threads do blocking reads from the queue.
Note: The workers are in an infinite loop. They call TryDequeue, process the item if they got one or sleep one second if they fail to get something. They can also check for an exit flag.
To have your service property behaved, you might have an independent polling thread that queues the items. The main thread is kept to respond to start, stop, pause requests.
Pseudo code for worker thread:
While true
If TryDequeue then
process data
If exit flag is true, break
While pause flag, sleep
Sleep
Pseudo code for polling thread:
While true
Poll web service
Queue items in concurrent queue
If exit flag true, break
While pause flag, sleep
Sleep
Pseudo code for main thread:
Start polling thread
Start n worker threads with above code
Handle stop:
set exit flag to true
Handle pause
set pause flag to true

How to pass a block to a yielding thread in Ruby

I am trying to wrap my head around Threads and Yielding in Ruby, and I have a question about how to pass a block to a yielding thread.
Specifically, I have a thread that is sleeping, and waiting to be told to do something, and I would like that Thread to execute a different block if told to (ie, it is sleeping, and if a user presses a button, do something besides sleep).
Say I have code like this:
window = Thread.new do
#thread1 = Thread.new do
# Do some cool stuff
# Decide it is time to sleep
until #told_to_wakeup
if block_given?
yield
end
sleep(1)
end
# At some point after #thread1 starts sleeping,
# a user might do something, so I want to execute
# some code in ##thread1 (unfortunately spawning a new thread
# won't work correctly in my case)
end
Is it possible to do that?
I tried using ##thread1.send(), but send was looking for a method name.
Thanks for taking the time to look at this!
Here's a simple worker thread:
queue = Queue.new
worker = Thread.new do
# Fetch an item from the work queue, or wait until one is available
while (work = queue.pop)
# ... Do something with work
end
end
queue.push(thing: 'to do')
The pop method will block until something is pushed into the queue.
When you're done you can push in a deliberately empty job:
queue.push(nil)
That will make the worker thread exit.
You can always expand on that functionality to do more things, or to handle more conditions.

Multiprocessing gets stuck on join in Windows

I have a script that collects data from a database, filters and puts into list for further processing. I've split entries in the database between several processes to make the filtering faster. Here's the snippet:
def get_entry(pN,q,entries_indicies):
##collecting and filtering data
q.put((address,page_text,))
print("Process %d finished!" % pN)
def main():
#getting entries
data = []
procs = []
for i in range(MAX_PROCESSES):
q = Queue()
p = Process(target=get_entry,args=(i,q,entries_indicies[i::MAX_PROCESSES],))
procs += [(p,q,)]
p.start()
for i in procs:
i[0].join()
while not i[1].empty():
#process returns a tuple (address,full data,)
data += [i[1].get()]
print("Finished processing database!")
#More tasks
#................
I've run it on Linux (Ubuntu 14.04) and it went totally fine. The problems start when I run it on Windows 7. The script gets stuck on i[0].join() for 11th process out of 16 (which looks totally random to me). No error messages, nothing, just freezes there. At the same time, the print("Process %d finished!" % pN) is displayed for all processes, which means they all come to an end, so there should be no problems with the code of get_entry
I tried to comment the q.put line in the process function, and it all went through fine (well, of course, data ended up empty).
Does it mean that Queue here is to blame? Why does it make join() stuck? Is it because of internal Lock within Queue? And if so, and if Queue renders my script unusable on Windows, is there some other way to pass data collected by processes to data list in the main process?
Came up with an answer to my last question.
I use Manager instead
def get_entry(pN,q,entries_indicies):
#processing
# assignment to manager list in another process doesn't work, but appending does.
q += result
def main():
#blahbalh
#getting entries
data = []
procs = []
for i in range(MAX_PROCESSES):
manager = Manager()
q = manager.list()
p = Process(target=get_entry,args=(i,q,entries_indicies[i::MAX_PROCESSES],))
procs += [(p,q,)]
p.start()
# input("Press enter when all processes finish")
for i in procs:
i[0].join()
data += i[1]
print("data", data)#debug
print("Finished processing database!")
#more stuff
The nature of freezing in Windows on join() due to presence of Queue still remains a mystery. So the question is still open.
As the docs says,
Warning As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
So, since the multiprocessing.Queue is a kind of Pipe, when you call .join(), there are some items in the queue, and you should consume then or simply .get() them to make the empty. Then call .close() and .join_thread() for each queue.
You can also refer to this answer.

Background thread in Rails can't see instance variables

I need to gather up some data from a rails application, aggregate it, and send it off to a remote server periodically. I instantiate my aggregation class in a global variable (I know, I know) in application.rb.
Inside my aggregation class, I fire up a thread that sleeps for 10 seconds, then looks at the queue, processes the data, and sends it. The queue is a hash stored in an instance variable of the class.
From the rails controller, I call a method in the aggregator class to queue the data in the hash. Of course this is on a different thread than the background task that reads the queue. The problem is that the background task never sees any data in the hash. In my log, I print out the object_id of the hash both when I write to it (from the controllers thread), and when I read from it (from the background thread). The hash#object_id matches from both threads, but the background thread never sees the data.
Whats killing me is that this works fine outside of rails. I've set up tests with many threads that really pound on it, and it works fine (there is some thread protection that I am not showing for clarity). Anyone know how the object_id's can match, but the contents are not consistent?
class Aggregator
def initialize
#q = {}
#timer = nil
end
def start
#timer = Thread.new do
loop do
sleep(10)
flush_q
end
end
end
def flush_q
logger.debug "flush: q.object_id = #{#q.object_id}" # matches what I get below
logger.debug "flush: q.length = #{#q.length}" # always zero!
#q.each_pair do |k,v|
# pack it up and send it
end
#q.clear
end
def add(item)
logger.debug "add: q.object_id = #{#q.object_id}" # matches what I get above
#q[item.name] ||= item
logger.debug "add: q.length = #{#q.length}" # increases with each add
# not actually that simple, but not relevant
end
end
I'm going to go out on a limb and assume that your code is deployed using a forking app server (eg unicorn or passenger).
This means that your app is loaded once and then new instances are forked from that master instances. Forking is cheap so this means that new instances of the app can be started up/shutdown really quickly.
I believe that your aggregator instance is getting created/started in this master process. When this forks the process's entire memory space is copied (so there an instance of aggregator in the new process, with the same object id and so on).
However when forking only the current thread is copied , so the aggregator flushing is only happening in the master process, but all the appending is happening in the child processes. You could confirm this by adding Proccess.pid to what you log - you should see that your logging is coming from 2 different process.
One way of fixing this would be to start/restart your thread after the child process has forked. How you do this depends on how the app is being served. With unicorn you can do this in your unicorn config via the after_fork method. With passenger you do
PhusionPassenger.on_event(:starting_worker_process) do |forked|
if forked
...
end
end

Ruby threads and mutex

Why does the following ruby code not work?
2 | require 'thread'
3 |
4 | $mutex = Mutex.new
5 | $mutex.lock
6 |
7 | t = Thread.new {
8 | sleep 10
9 | $mutex.unlock
10 | }
11 |
12 | $mutex.lock
13 | puts "Delayed hello"
When I'm running it, I get an error:
./test.rb:13:in `lock': thread 0x7f4557856378 tried to join itself (ThreadError)
from ./test.rb:13
What is the right way to synchronize two threads without joining them (both threads must continue running after synchronization)?
This is old but I'm contributing since it's a bit scary that none of the other answers (at time of writing) seem to be correct. The original code is clearly attempting to:
Create a mutex in the main thread and lock it.
Start a new thread, which may begin running at any time and after any delay subject to the whims of the Ruby runtime.
Have this thread unlock the mutex only once it's finished doing its work.
Have the main thread then deliberately re-lock the mutex, with the intention that it's spawned a thread which will unlock it. The main thread waits for that.
Then the main thread continues running.
#user2413915: Your solution omits the step of locking again in the main thread, so it won't wait for the spawned thread as intended.
#Paul Rubel: Your code assumes that the spawned thread gets as far as its lock of the mutex before the main thread does. This is a race condition. If the main thread continues to execute and locks first, the spawned thread will be blocked until after the main thread has printed "Delayed hello", which is the exact opposite of the desired outcome. You probably ran it by pasting into the IRB prompt; if you try with your example modified so that the end and Mutex lock are on the same line, it'll fail, printing the message too early (i.e. "end; $mutex.lock"). Either way, it's relying on behaviour of the Ruby runtime that's working by chance.
The original code should actually work fine in principle, albeit arguably lacking in elegance - in practice the Ruby 1.9+ runtime won't allow it as it "sees" two consecutive locks in the main thread without an unlock and doesn't "realise" that there's a spawned thread which is going to do the unlocking. Ruby (in this case technically erroneously) raises a ThreadError deadlock exception.
Instead, make cunning use of the ruby Queue. When you try to pull something off a Queue, the call will block until an item is available. So:
require 'thread'
require 'queue'
queue = Queue.new
t = Thread.new {
sleep 10
queue.push( nil ) # Push any object you like - here, it's a NilClass instance
}
queue.pop() # Blocks until thread 't' pushes onto the queue
puts "Delayed hello"
If the spawned thread runs first and pushes onto the queue, then the main thread will just pop the item and keep going. If the main thread tries to pop before the spawned thread pushes, it'll wait for the spawned thread.
[Edit: Note that the object pushed onto the queue could be the results of the spawned thread's processing task, so the main thread gets to wait until processing is complete and get the processing result in one go].
I've tested this on Ruby 1.8.7-p375 and Ruby 2.1.2 via rbenv with success, so it's reasonable to assume that the standard library Queue class is functional across all common major Ruby versions.
You do not need to call the mutex on line 12 again.
require 'thread'
$mutex = Mutex.new
$mutex.lock
t = Thread.new {
sleep 10
$mutex.unlock
}
puts "Delayed hello"
This will work.

Resources