Can I use OpenMP to do simple threaded tasks? - openmp

I am looking for a cross- platform C thread library to do things like:
launch a long-running thread, and in the thread, wait for an event to happen; when this event happens, do something, and then wait for the next event.
So, I don't need to do parallel computations, just simple task in another thread.
Is it appropriate to use OpenMP for this?

Yes, OpenMP is up to the task and is probably your most portable solution in C, albeit not the most elegant one in your particular use case.
OpenMP is based on the fork-join model. You will thus need to fork somewhere in your code (using the #pragma omp parallel statement which launches threads) and then use an if to perform a particular task on the given thread. What you want to achieve could be done by something along the lines of:
#include <omp.h>
...
#pragma omp parallel num_threads(2)
{
if (omp_get_thread_num() == 0) {
[do something]
} else {
[do something else]
}
}
With the description of your problem, you would for example perform this in your main() and replace [do something] by calls to various functions. One of these functions would be the loop you described in your question and the other one would be the rest of your program.
Depending on the nature of the event you talk about, you may have to define a shared variable in the #pragma omp ... statement to perform a basic inter-thread communication.
The way you described your problem seems a bit twisted though. Wouldn't it be simpler to develop two separate programs? (Depending on the nature of the event.)

Related

How to Synchronize with Julia CUDArt?

I'm just starting to use Julia's CUDArt package to manage GPU computing. I am wondering how to ensure that if I go to pull data from the gpu (e.g. using to_host()) that I don't do so before all of the necessary computations have been performed on it.
Through some experimentation, it seems that to_host(CudaArray) will lag while the particular CudaArray is being updated. So, perhaps just using this is enough to ensure safety? But it seems a bit chancy.
Right now, I am using the launch() function to run my kernels, as depicted in the package documentation.
The CUDArt documentation gives an example using Julia's #sync macro, which seems like it could be lovely. But for the purposes of #sync I am done with my "work" and ready to move on as soon as the kernel gets launched with launch(), not once it finishes. As far as I understand the operation of launch() - there isn't a way to change this feature (e.g. to make it wait to receive the output of the function it "launches").
How can I accomplish such synchronization?
Ok, so, there isn't a ton of documentation on the CUDArt package, but I looked at the source code and I think it looks straightforward on how to do this. In particular, it appears that there is a device_synchronize() function that will block until all of the work on the currently active device has finished. Thus, the following in particular seems to work:
using CUDArt
md = CuModule("/path/to/module.ptx",false)
MyFunc = CuFunction(md,"MyFunc")
GridDim = 2*2496
BlockDim = 64
launch(MyFunc, GridDim, BlockDim, (arg1, arg2, ...));
device_synchronize()
res = to_host(arg2)
I'd love to hear from anyone with more expertise though if there is anything more to be aware of here.
I think the more canonical way is to make a stream for each device:
streams = [(device(dev); Stream()) for dev in devlist]
and then inside the #async block, after you tell it to do the computations, you use the wait(stream) function to tell it to wait for that stream to finish its computations. See the Streams example in the README.

Ruby Semaphores?

I'm working on an implementation of the "Fair Barbershop" problem in Ruby. This is for a class assignment, but I'm not looking for any handouts. I've been searching like crazy, but I cannot seem to find a Ruby implementation of Semaphores that mirror those found in C.
I know there is Mutex, and that's great. Single implementation, does exactly what that kind of semaphore should do.
Then there's Condition Variables. I thought that this was going to work out great, but looking at these, they require a Mutex for every wait call, which looks to me like I can't put numerical values to the semaphore (as in, I have seven barbershops, 3 barbers, etc.).
I think I need a Counting Semaphore, but I think it's a little bizarre that Ruby doesn't (from what I can find) contain such a class in its core. Can anyone help point me in the right direction?
If you are using JRuby, you can import semaphores from Java as shown in this article.
require 'java'
java_import 'java.util.concurrent.Semaphore'
SEM = Semaphore.new(limit_of_simultaneous_threads)
SEM.acquire #To decrement the number available
SEM.release #To increment the number available
There's http://sysvipc.rubyforge.org/SysVIPC.html which gives you SysV semaphores. Ruby is perfect for eliminating the API blemishes of SysV semaphores and SysV semaphores are the best around -- they are interprocess semaphores, you can use SEM_UNDO so that even SIGKILLs won't mess up your global state (POSIX interprocess semaphores don't have this), and you with SysV semaphores you can perform atomic operations on several semaphores at once as long as they're in the same semaphore set.
As for inter-thread semaphores, those should be perfectly emulatable with Condition Variables and Mutexes. (See Bernanrdo Martinez's link for how it can be done).
I also found this code:
https://gist.github.com/pettyjamesm/3746457
probably someone might like this other option.
since concurrent-ruby is stable (beyond 1.0) and is being widely used thus the best (and portable across Ruby impls) solution is to use its Concurrent::Semaphore class
Thanks to #x3ro for his link. That pointed me in the right direction. However, with the implementation that Fukumoto gave (at least for rb1.9.2) Thread.critical isn't available. Furthermore, my attempts to replace the Thread.critical calls with Thread.exclusive{} simply resulted in deadlocks. It turns out that there is a proposed Semaphore patch for Ruby (which I've linked below) that has solved the problem by replacing Thread.exclusive{} with a Mutex::synchronize{}, among a few other tweaks. Thanks to #x3ro for pushing me in the right direction.
http://redmine.ruby-lang.org/attachments/1109/final-semaphore.patch
Since the other links here aren't working for me, I decided to quickly hack something together. I have not tested this, so input and corrections are welcome. It's based simply on the idea that a Mutex is a binary Semaphore, thus a Semaphore is a set of Mutexes.
https://gist.github.com/3439373
I think it might be useful to mention the Thread::Queue in this context for others arriving at this question.
The Queue is a thread-safe tool (implemented with some behind-the-scenes synchronization primitives) that can be used like a traditional multi-processing semaphore with just a hint of imagination. And it comes preloaded by default, at least in ruby v3:
#!/usr/bin/ruby
# hold_your_horses.rb
q = Queue.new
wait_thread = Thread.new{
puts "Wait for it ..."
q.pop
puts "... BOOM!"
}
sleep 1
puts "... click, click ..."
q.push nil
wait_thread.join
And can be demonstrated simply enough:
user#host:~/scripts$ ruby hold_your_horses.rb
Wait for it ...
... click, click ...
... BOOM!
The docs for ruby v3.1 say a Queue can be initialized with an enumerable object to set up initial contents but that wasn't available in my v3.0. But if you want a semaphore with, say, 7 permits, it's easy to stuff the box with something like:
q = Queue.new
7.times{ q.push nil }
I used the Queue to implement baton-passing between some worker-threads:
class WaitForBaton
def initialize
#q = Queue.new
end
def pass_baton
#q.push nil
sleep 0.0
end
def wait_for_baton
#q.pop
end
end
So that thread task_master could perform steps one and three with thread little_helper stepping in at the appropriate time to handle step two:
baton = WaitForBaton.new
task_master = Thread.new{
step_one(ARGV[0])
baton.pass_baton
baton.wait_for_baton
step_three(logfile)
}
little_helper = Thread.new{
baton.wait_for_baton
step_two(ARGV[1])
baton.pass_baton
}
task_master.join
little_helper.join
Note that the sleep 0.0 in the .pass_baton method of my WaitForBaton class is necessary to prevent task_master from passing the baton to itself: unless thread scheduling happens to jump away from task_master right after baton.pass_baton, the very next thing that happens is task_master's baton.wait_for_baton - which takes the baton right back again. sleep 0.0 explicitly cedes execution to any other threads that might be waiting to run (and, in this case, blocking on the underlying Queue).
Ceding execution is not the default behavior because this is a somewhat unusual usage of semaphore technology - imagine that task_master could be generating many tasks for little_helpers to do and task_master can efficiently get right back to generating tasks right after passing a task off through a Thread::Queue's .push([object]) method.

Coding Style: lock/unlock internal or external?

Another possibly inane style question:
How should concurrency be locked? Should the executor or caller be responsible for locking the thread?
e.g. in no particular language...
Caller::callAnotherThread() {
_executor.method();
}
Executor::method() {
_lock();
doSomething();
_unlock();
}
OR
Caller::callAnotherThread() {
_executor.lock()
_executor.method();
_executor.unlock()
}
Executor::method() {
doSomething();
}
I know little about threading and locking, so I want to make sure the code is robust. The second method allows thread unsafe calls... you could technically call _executor.method() without performing any kind of lock.
Help?
Thanks,
The callee, not the caller should do the locking. The callee is the only one who knows what needs to be synchronized and the only one who can ensure that it is. If you leave locking up to the callers, you do three bad things:
You increase the burden on users of your function/class, increasing design viscosity.
You make it possible for callers to update shared state without taking the lock.
You introduce the possibility of deadlocks if different functions take multiple locks in different order.
If you use locks internally, you have to note it on manual documentation. Or your code will bottleneck of parallel execution, and users will be hard to know the truth.
We are learning that external locking offers advantages if you need to do several interrelated granular operations at once, or work with a reference to an internal structure - you can hold a lock as long as you need your set of work to be safe from other threads.
An example: A container that manages a list of items might want to provide an api to get a mutable reference to one item. Without external locking, as soon as the function call finishes, another thread could potentially lock and mutate data. A plausible solution is to return a copy of the one item, but this is inefficient.
That being said, for some cases, internal locking can have a cleaner api, provided you can be sure that you won't want to preserve a lock longer than one function call.

How to prevent cocoa application from freezing?

-(void)test
{
int i;
for (i=0;i < 1000000;i++)
{
//do lengthly operation
}
}
How to prevent its GUI from freezing ?
Bottom line; don't block the main thread and, thus, don't block the main event loop.
Now, you could spawn a thread. But that isn't actually the correct way write concurrent programs on Mac OS X.
Instead, use NSOperation and NSOperationQueue. It is specifically designed to support your concurrent programming needs, it scales well, and NSOperationQueue is tightly integrated into the system such that it will control concurrency based on system resources available (# of cores, CPU load from other applications, etc) more efficiently than any direct use of threads.
See also the Threaded Programming Guide.
I would do the lengthy operation in a separate thread, using NSThread

How can I implement a blocking process in a single slot without freezing the GUI?

Let's say I have an event and the corresponding function is called. This function interacts with the outside world and so can sometimes have long delays. If the function waits or hangs then my UI will freeze and this is not desirable. On the other hand, having to break up my function into many parts and re-emitting signals is long and can break up the code alot which would make hard to debug and less readable and slows down the development process. Is there a special feature in event driven programming which would enable me to just write the process in one function call and be able to let the mainThread do its job when its waiting? For example, the compiler could reckognize a keyword then implement a return then re-emit signals connected to new slots automatically? Why do I think this would be a great idea ;) Im working with Qt
Your two options are threading, or breaking your function up somehow.
With threading, it sounds like your ideal solution would be Qt::Concurrent. If all of your processing is already in one function, and the function is pretty self-contained (doesn't reference member variables of the class), this would be easy to do. If not, things might get a little more complicated.
For breaking your function up, you can either do it as you suggested and break it into different functions, with the different parts being called one after another, or you can do it in a more figurative way, but scattering calls to allow other processing inside your function. I believe calling processEvents() would do what you want, but I haven't come across its use in a long time. Of course, you can run into other problems with that unless you understand that it might cause other parts of your class to run once more (in response to other events), so you have to treat it almost as multi-threaded in protecting variables that have an indeterminate state while you are computing.
"Is there a special feature in event driven programming which would enable me to just write the process in one function call and be able to let the mainThread do its job when its waiting?"
That would be a non-blocking process.
But your original query was, "How can I implement a blocking process in a single slot without freezing the GUI?"
Perhaps what you're looking for a way to stop other processing when some - any - process decides it's time to block? There are typically ways to do this, yes, by calling a method on one of the parental objects, which, of course, will depend on the specific objects you are using (eg a frame).
Look to the parent objects and see what methods they have that you'd like to use. You may need to overlay one of them to get your exactly desired results.
If you want to handle a GUI event by beginning a long-running task, and don't want the GUI to wait for the task to finish, you need to do it concurrently, by creating either a thread or a new process to perform the task.
You may be able to avoid creating a thread or process if the task is I/O-bound and occasional callbacks to handle I/O would suffice. I'm not familiar with Qt's main loop, but I know that GTK's supports adding event sources that can integrate into a select() or poll()-style loop, running handlers after either a timeout or when a file descriptor becomes ready. If that's the sort of task you have, you could make your event handler add such an event source to the application's main loop.

Resources