timing consecutive events in cuda - events

If you have multiple consecutive CUDA events (in a single stream) that you'd like to time (e.g. cudaMemcpy followed by a kernel launch followed by another cudaMemcpy), is it safe/proper/accurate to synchronize only on the last event? For example:
cudaEventRecord(event1_start);
// do something
cudaEventRecord(event1_stop);
cudaEventRecord(event2_start);
// do something else
cudaEventRecord(event2_stop);
cudaEventSynchronize(event2_stop);
cudaEventElapsedTime(&time1, event1_start, event1_stop);
cudaEventElapsedTime(&time2, event2_start, event2_stop);
My understanding is that these events and actual cuda calls get placed into a FIFO queue. The CPU then needs to only wait until the last event is recorded before it records timings for all. Is this correct?
Thanks!

If they are all executed in the same stream or the default stream they will be executed sequentially so I'd say yes, if you synchronize only the last one the others should be finished. I don't guarantee it because I never tested it. I suggest you test it with a simple case where you synchronize both events or only the last one and then compare the times.

Related

Waiting for a periodic event with wait_event_interruptible

I am writing a kernel module that performs timing functions using an external clock. Basically, the module counts pulses from the clock, rolling over the count every so often. User processes can use an ioctl to ask to be woken up at a specific count; they then perform some task and invoke the same ioctl to wait until the next time the same count comes up. In this way they can execute periodically using this external timing.
I have created an array of wait_queue_head_ts, one for each available schedule slot (i.e. each "count", as described above). When a user process invokes the ioctl, I simply call sleep_on() with the ioctl argument specifying the schedule slot and thus the wait queue. When the kernel module receives a clock pulse and increments the count, it wakes up the wait queue corresponding to that count.
I know that it is considered bad practice to use sleep_on(), because there is potential for state to change between a test to see if a process should sleep, and the corresponding call to sleep_on(). But in this case I do not perform such a test before sleeping because the waking event is periodic. It doesn't matter if I "just miss" a waking event because another will come shortly (in fact, if the ioctl is invoked very close to the specified schedule slot, then something went wrong and I would prefer to wait until the next slot anyway).
I have looked at using wait_event_interruptible(), which is considered safer, but I do not know what to put for the condition argument that wait_event_interruptible requires. wait_event_interruptible will check this condition before sleeping, but I want it to always sleep when the ioctl is invoked. I could use a flag that I clear before sleeping and set before waking up, but I'm worried this might not work in the case that there are multiple processes in the wait queue - one process might finish and clear the flag before the next is woken up.
Am I right to be worried about this? Or are all processes in a wait_queue guaranteed to be woken up before any of them run (and could therefore clear the flag)? Is there a better way to go about implementing a system such as this one? Is it actually okay to just use sleep_on()? (If so, is there a version of sleep_on() that is interruptible?)
Interruptible version of sleep_on is interruptible_sleep_on. Note, that sleep-functions have been removed since kernel 3.15.
As for wait_event_interruptible, requirement I want it to always sleep when the ioctl is invoked. is uncommon for it. You may use a flag, but this flag should be per-process (or per-schedule slot). Or you may modify count for wait to be at least current_count + 1.
In such uncommon scenario, instead of macro wait_event_interruptible you may use blocks it consist of, and arrange them in the way you need. Generally, any waiting can be achived in that way.

slow down for in loop

I have a function that loops through a list of items by sending them to a server and grabbing the response. The problem I'm having is the loop is going faster than the server can handle. I need to figure out a way to slow the loop down without freezing the application. Is there a way to delay the loop from moving to the next item for a brief moment? In other languages, I'd use something like sleep(interval).
Don't slow the process down. Add the network calls to an operation queue with a limited number of concurrent operations. You may need to rewrite your network code as an NSOperation subclass but that's fairly straightforward. You can see some examples in this tutorial.
There is a built-in limit to the number of simultaneous network connections that can be made anyway, but it sounds like your server's limit is lower than that, or that you're saturating the network connections and your later calls are timing out before they've been able to start.
Instead of a sleep interval it sounds like you want a completion block that calls the same code again until the list is empty. So once it finishes the request, it goes onto the next one.
Also I don't think you should be trying to sleep since it will hold the main thread which results in a poor user experience.

Continuously running code in Win32 app

I have a working GUI and now need to add some code that will need to run continuously and update the GUI with data. Where should this code go? I know that it should not go into the message loop because it might block incoming messages to the window, but I'm confused on where in my window process this code could run.
You have a choice: you can use a thread and post messages back to the main thread to update the GUI (or update the GUI directly, but don't try this if you used MFC), or you can use a timer that will post you messages periodically, you then simply implement a handler for the timer and do whatever you need to there.
The thread is best for a complicated, slow process that might block. If the process of getting data is quick (and/or can be set to timeout on error) then a timer is simpler.
Have you looked into threading at all?
Typically, you would create one thread that performs the background task (in this case, reading the voltage data) and storing it into a shared buffer. The GUI thread simply reads that buffer every so often (on redraw, every 30 seconds, when the user clicks refresh, etc) and displays the data.
Your background thread runs on its own schedule, getting CPU time from the OS, and is not bound to the UI or message pump. It can use some type of timer to monitor the data source and read things in as necessary.
Now, since the threads run separately and may run at the same time, you need to make them aware of one another. This can be done with locks (look into mutexes). For example:
The monitor reads the current voltage and stores it in the buffer.
The background/monitor thread locks the buffer holding the latest sample.
The monitor copies the internal buffer to the shared one.
The monitor unlocks the buffer.
Simultaneously, but separately, the UI thread:
Gets a redraw call.
Waits for the buffer to be unlocked, then reads the value.
Draws the UI with the buffer value.
Setting up a new thread and using it, in most Windows GUI-producing languages, is pretty simple. C/++ and C# both have very simple APIs for creating a new thread and having it work on some task, you usually just need to provide a function for the thread to process. See the MSDN docs on CreateThread for a C example.
The concept of threading and locking is for the most part language-agnostic, and similar in most C-inspired languages. You'll need to have your main (in this case, probably UI) thread control the lifetime of the worker: start the worker after the UI is created, and kill it before the UI is shut down.
This approach has a little bit of overhead up front, especially if your data fetch is very simple. If your data source changes (a network request, some blocking data source, reading over actual wires from a physical sensor, etc) then you only need to change the monitor thread and the UI doesn't need to know.

How is wait_for_completion different from wakeup_interruptible

How is wait_for_completion different from wakeup_interruptible?
Actually the question is how completion chains is different from wait queues ?
It looks the same concept to me
completion structure internally uses the wait queues and locks.
completion structure was introduced to address a very common occurring scenario, where multiple threads are waiting on some event. Once that event happens, you want only one of the waiting thread to start running.
The key here is that kernel developers don't have to implement and maintain the waiting queue , which makes life of a kernel developer easy.
Adding on Harman answer, I would also say that those two functions are called in different context: wakeup_interruptible() will wake up all threads waiting on a wait_queue, whereas wait_for_completion() will wait until a specific task completes. Those are two different things to me.

Clarification on Threads and Run Loops In Cocoa

I'm trying to learn about threading and I'm thoroughly confused. I'm sure all the answers are there in the apple docs but I just found it really hard to breakdown and digest. Maybe somebody could clear a thing or 2 up for me.
1)performSelectorOnMainThread
Does the above simply register an event in the main run loop or is it somehow a new thread even though the method says "mainThread"? If the purpose of threads is to relieve processing on the main thread how does this help?
2) RunLoops
Is it true that if I want to create a completely seperate thread I use
"detachNewThreadSelector"? Does calling start on this initiate a default run loop for the thread that has been created? If so where do run loops come into it?
3) And Finally , I've seen examples using NSOperationQueue. Is it true to say that If you use performSelectorOnMainThread the threads are in a queue anyway so NSOperation is not needed?
4) Should I forget about all of this and just use the Grand Central Dispatch instead?
Run Loops
You can think of a Run Loop to be an event processing for-loop associated to a thread. This is provided by the system for every thread, but it's only run automatically for the main thread.
Note that running run loops and executing a thread are two distinct concepts. You can execute a thread without running a run loop, when you're just performing long calculations and you don't have to respond to various events.
If you want to respond to various events from a secondary thread, you retrieve the run loop associated to the thread by
[NSRunLoop currentRunLoop]
and run it. The events run loops can handle is called input sources. You can add input sources to a run-loop.
PerformSelector
performSelectorOnMainThread: adds the target and the selector to a special input source called performSelector input source. The run loop of the main thread dequeues that input source and handles the method call one by one, as part of its event processing loop.
NSOperation/NSOperationQueue
I think of NSOperation as a way to explicitly declare various tasks inside an app which takes some time but can be run mostly independently. It's easier to use than to detach the new thread yourself and maintain various things yourself, too. The main NSOperationQueue automatically maintains a set of background threads which it reuses, and run NSOperations in parallel.
So yes, if you just need to queue up operations in the main thread, you can do away with NSOperationQueue and just use performSelectorOnMainThread:, but that's not the main point of NSOperation.
GCD
GCD is a new infrastructure introduced in Snow Leopard. NSOperationQueue is now implemented on top of it.
It works at the level of functions / blocks. Feeding blocks to dispatch_async is extremely handy, but for a larger chunk of operations I prefer to use NSOperation, especially when that chunk is used from various places in an app.
Summary
You need to read Official Apple Doc! There are many informative blog posts on this point, too.
1)performSelectorOnMainThread
Does the above simply register an event in the main run loop …
You're asking about implementation details. Don't worry about how it works.
What it does is perform that selector on the main thread.
… or is it somehow a new thread even though the method says "mainThread"?
No.
If the purpose of threads is to relieve processing on the main thread how does this help?
It helps you when you need to do something on the main thread. A common example is updating your UI, which you should always do on the main thread.
There are other methods for doing things on new secondary threads, although NSOperationQueue and GCD are generally easier ways to do it.
2) RunLoops
Is it true that if I want to create a completely seperate thread I use "detachNewThreadSelector"?
That has nothing to do with run loops.
Yes, that is one way to start a new thread.
Does calling start on this initiate a default run loop for the thread that has been created?
No.
I don't know what you're “calling start on” here, anyway. detachNewThreadSelector: doesn't return anything, and it starts the thread immediately. I think you mixed this up with NSOperations (which you also don't start yourself—that's the queue's job).
If so where do run loops come into it?
Run loops just exist, one per thread. On the implementation side, they're probably lazily created upon demand.
3) And Finally , I've seen examples using NSOperationQueue. Is it true to say that If you use performSelectorOnMainThread the threads are in a queue anyway so NSOperation is not needed?
These two things are unrelated.
performSelectorOnMainThread: does exactly that: Performs the selector on the main thread.
NSOperations run on secondary threads, one per operation.
An operation queue determines the order in which the operations (and their threads) are started.
Threads themselves are not queued (except maybe by the scheduler, but that's part of the kernel, not your application). The operations are queued, and they are started in that order. Once started, their threads run in parallel.
4) Should I forget about all of this and just use the Grand Central Dispatch instead?
GCD is more or less the same set of concepts as operation queues. You won't understand one as long as you don't understand the other.
So what are all these things good for?
Run loops
Within a thread, a way to schedule things to happen. Some may be scheduled at a specific date (timers), others simply “whenever you get around to it” (sources). Most of these are zero-cost when idle, only consuming any CPU time when the thing happens (timer fires or source is signaled), which makes run loops a very efficient way to have several things going on at once without any threads.
You generally don't handle a run loop yourself when you create a scheduled timer; the timer adds itself to the run loop for you.
Threads
Threads enable multiple things to happen at the exact same time on different processors. Thing 1 can happen on thread A (on processor 1) while thing 2 happens on thread B (on processor 0).
This can be a problem. Multithreaded programming is a dance, and when two threads try to step in the same place, pain ensues. This is called contention, and most discussion of threaded programming is on the topic of how to avoid it.
NSOperationQueue and GCD
You have a thing you need done. That's an operation. You can't have it done on the main thread, or you'd simply send a message like normal; you need to run it in the background, on a secondary thread.
To achieve this, express it as either an NSOperation object (you create a subclass of NSOperation and instantiate it) or a block (or both), then add it to either an NSOperationQueue (NSOperations, including NSBlockOperation) or a dispatch queue (bare block).
GCD can be used to make things happen on the main thread, as well; you can create serial queues and add blocks to them. A serial queue, as its name suggests, will run exactly one block at a time, rather than running a bunch of them in parallel.
So what should I do?
I would not recommend creating threads directly. Use NSOperationQueue or GCD instead; they force you into better thinking habits that will reduce the risk of your threaded code inducing headaches.
For things that run periodically, not fitting into the “thing I need done” model of NSOperations and GCD blocks, consider just using the run loop on the main thread. Chances are, you don't need to put it on a thread after all. A rendering loop in a 3D game, for example, can be a simple timer.

Resources