I'm new to Ruby and quick googling of the question did not give a result. For this case it is relatively easy to code a test, however it might be worth to ask it here to get an authoritative answer.
Consider scenario: in a ruby application invoked from command line the main thread creates and starts worker threads. Worker threads perform long computations. A method of the main thread does not wait for anything and simply completes after spawning workers.
Will the process exit and worker threads be terminated after the main thread exits?
Is there a documentation describing this behavior?
As long as these threads are daemon threads, then they will exit along with your program. There is documentation regarding the exiting of threads (although its short) here. However, if your looking for processes to stay around after being spawned by another process (and the corresponding process ending), then you should look at a multi-processing gem or library suited for the task.
Related
Let's say i want to create a thread, I want the necessary spaces allocated for the thread, however, i'd like to defer launching that thread.
I'm working on a threadpool, so i'd like to have some threads ready(but not running) before I start the threadpool.
Is there a way to do so in C++11?
You could have all the threads wait on a semaphore as soon as they start up. And then you can just signal them when it's time for them to actually start running.
This sounds similar to the "Thread Pool / Task" behavior present in a number of languages (and probably several C++ libraries like boost). A Thread Pool has one or more threads, and can queue Tasks. When it doesn't have tasks, a Thread Pool just waits for input. They can also, as implied, queue up tasks if the threads are busy.
We have been using Resque in most of our projects, and we have been happy with it.
In a recent project, we were having a situation, where we are making a connection to a live streaming API from the twitter. Since, we have to maintain the connection, we were dumping each line from the streaming API to a resque queue, lest the connection is not lost. And we were, processing the queue afterwards.
We had a situation where the insertion rate into the queue was of the order 30-40/second and the rate at which the queue is popped was only 3-5/second. And because of this, the queue was always increasing. When we checked for reasons for this, we found that resque had a parent process, and for each job of the queue, it forks a child process, and the child process will be processing the job. Our rails environment was quite heavy and the child process forking was taking time.
So, we implemented another rake task of this sort, for the time being:
rake :process_queue => :environment do
while true
begin
interaction = Resque.pop("process_twitter_resque")
if interaction
ProcessTwitterResque.perform(interaction)
end
rescue => e
puts e.message
puts e.backtrace.join("\n")
end
end
end
and started the task like this:
nohup bundle exec rake process_queue --trace >> log/workers/process_queue/worker.log 2>&1 &
This does not handle failed jobs and all.
But, my question is why does Resque implement a child forked process to process the jobs from the queue. The jobs definitly does not need to be processed paralelly (since it is a queue and we expect it to process one after the other, sequentially and I beleive Resque also fork only 1 child process at a time).
I am sure Resque has done it with some purpose in mind. What is the exact purpose behind this parent/child process architecture?
The Ruby process that sits and listens for jobs in Redis is not the process that ultimately runs the job code written in the perform method. It is the “master” process, and its only responsibility is to listen for jobs. When it receives a job, it forks yet another process to run the code. This other “child” process is managed entirely by its master. The user is not responsible for starting or interacting with it using rake tasks. When the child process finishes running the job code, it exits and returns control to its master. The master now continues listening to Redis for its next job.
The advantage of this master-child process organization – and the advantage of Resque processes over threads – is the isolation of job code. Resque assumes that your code is flawed, and that it contains memory leaks or other errors that will cause abnormal behavior. Any memory claimed by the child process will be released when it exits. This eliminates the possibility of unmanaged memory growth over time. It also provides the master process with the ability to recover from any error in the child, no matter how severe. For example, if the child process needs to be terminated using kill -9, it will not affect the master’s ability to continue processing jobs from the Redis queue.
In earlier versions of Ruby, Resque’s main criticism was its potential to consume a lot of memory. Creating new processes means creating a separate memory space for each one. Some of this overhead was mitigated with the release of Ruby 2.0 thanks to copy-on-write. However, Resque will always require more memory than a solution that uses threads because the master process is not forked. It’s created manually using a rake task, and therefore must load whatever it needs into memory from the start. Of course, manually managing each worker process in a production application with a potentially large number of jobs quickly becomes untenable. Thankfully, we have pool managers for that.
Resque uses #fork for 2 reasons (among others): ability to prevent zombie workers (just kill them) and ability to use multiple cores (since it's another process).
Maybe this will help you with your fast-executing jobs: http://thewebfellas.com/blog/2012/12/28/resque-worker-performance
From my experience, when main thread is ready to exit, it should wait until other threads normally exit.
But from this link http://msdn.microsoft.com/en-us/library/ms686722(v=VS.85).aspx, it looks when process is terminated, all related resources are freed, so if certain worker thread is doing heavy work, waiting may be a litter longer. Can I ignore the waiting?
Also in the above link, I find
Do not terminate a process unless its
threads are in known states. If a
thread is waiting on a kernel object,
it will not be terminated until the
wait has completed. This can cause the
application to hang.
This is too short to understand why killing a thread in unknown states when process exits is wrong.
can someone give me more detail about the problem?
Thanks
So, when a thread is waiting on an object in the kernel, it will not exit until the waiting is over.
So, let's say you have an application with 3 threads, in the following states:
Main thread, currently idle
UI handling thread, currently idle
A thread waiting on a kernel object
If you kill the process, thread 2 will die, making the UI input handlers die, and therefore giving the appearance that the application is unresponsive (hung). Until thread #3 finishes waiting on the kernel, the main thread won't exit, and so the process remains running, and the process resources don't get released.
So, I think it's basically saying that it's better to ask a process to exit normally, instead of sending it a kill signal, because you can get yourself into a situation like the one described if any of the process threads are waiting on kernel objects.
Yesterday I read somewhere that NSTask isn't thread safe and that bothers me a lot, because I'm running a NSTask within a NSThread and is so far not experiencing any threading issues with it.
My code is organized like this
A: main thread -> B: worker thread -> C: worker task
C: The worker task is a commandline program.
B: The worker thread can start/stop the worker task and send it commands.
A: The main thread can send commands to the worker thread.
If NSTask is supposed to be used only within the main thread, then I'm considering moving the NSTask start/stop code to the main thread, just to prevent possible threading issues.
Can NSTask be used outside the main thread?
And if not then what may be the threading issues with NSTask?
I read somewhere that NSTask isn't thread safe…
That's not what that page says. It says that you'll get the process-terminated notification on the same thread you launched it from, which suggests that NSTask is aware of threads and tries to do the right thing.
The problem one of the editors of that page encountered was that they started their process from a thread, then let the thread die. That caused a crash because the framework was no longer able to deliver the process-terminated notification to the correct thread.
The Thread Safety Summary (bookmark that) says something similar, listing NSTask in a list of classes about which it says:
In most cases, you can use these classes from any thread as long as you use them from only one thread at a time. Check the class documentation for additional details.
The NSTask documentation doesn't say anything additional about threads, so it sounds like NSTask is one of the “most cases”: You can use a task from the thread you created it on. Don't use the same task on another thread, and (as noted above) make sure the thread lasts at least as long as the task process.
I will note, however, that in most cases, there is no need to run a task on a separate thread. Separate processes tend to run on other processors just as other threads in your process do, and the run loop does a good job of multiplexing many small events and keeping the UI responsive. You can use NSFileHandle's readInBackgroundAndNotify method if you need to read output from the task. You may be able to cut out your worker threads entirely.
The alternative is, as Eimantas suggested, to use NSOperation: Have an operation that simply starts a particular task and waits for that task to exit (perhaps synchronously reading output from it). The operation is complete when the task has exited.
Yes, it can, but I suggest you using NSOperation. It's KVO-agnostic (unlike threaded NSTask). Also you may want to look into receptionist design pattern regarding KVO and threaded environment (in case you need KVO).
I've read the documentation for ReadDirectoryChangesW() and also seen the CDirectoryChangeWatcher project, but neither say why one would want to call it asynchronously. I understand that the current thread will not block, but, at least for the CDirectoryChangeWatcher code that uses a completion port, when it calls GetQueuedCompletionStatus(), that thread blocks anyway (if there are no changes).
So if I call ReadDirectoryChangesW() synchronously in a separate thread in the first place that I don't care if it blocks, why would I ever want to call ReadDirectoryChangesW() asynchronously?
When you call it asynchronously, you have more control over which thread does the waiting. It also allows you to have a single thread wait for multiple things, such as a directory change, an event, and a message. Finally, even if you're doing the waiting in the same thread that set up the watch in the first place, it gives you control over how long you're willing to wait. GetQueuedCompletionStatus has a timeout parameter that ReadDirectoryChangesW doesn't offer by itself.
You would call ReadDirectoryChangesW such that it returns its results asynchronously if you ever needed the calling thread to not block. A tautology, but the truth.
Candidates for such threads: the UI thread & any thread that is solely responsible for servicing a number of resources (Sockets, any sort of IPC, independent files, etc.).
Not being familiar with the project, I'd guess the CDirectoryChangeWatcher doesn't care if its worker thread blocks. Generally, that's the nature of worker threads.
I tried using ReadDirectoryChanges in a worker thread synchronously, and guess what, it blocked so that the thread wouldn't exit by itself at the program exit.
So if you don't want to use evil things like TerminateThread, you should use asynchronous calls.