How does one efficiently implement file region locking using grand central dispatch? - parallel-processing

For my project, I am reading from and writing to a file from multiple threads, so I need to implement file locking. I have tried fcntl(), however it seems that function only works with locking between processes, not between threads. As such, I am looking for another solution. The solution which I came up with (which is probably not the best) is to have a byte in each record in my file to indicate whether the record is locked, and I could then use a busy loop to read and check the byte.
So, I have two questions. First, what is the most efficient way to implement file region locking? Second, if I go with the busy loop approach, how can I optimize that with grand central dispatch? I was thinking that I could make all of the busy loops occur in blocks sent to dispatch_sync(). But I don't know whether or not that would even work efficiently.
Thanks.

How about Dispatch Semaphore? You can use Dispatch Semaphore to access exclusively resources. For instance, create Dispatch Semaphores for your file regions.
for (int i = 0; i < regions; ++i)
sema_[i] = dispatch_semaphore_create(1);
And then, access resources with wait and signal.
dispatch_semaphore_wait(sema_[region], DISPATCH_TIME_FOREVER);
/* access the region */
dispatch_semaphore_signal(sema_[region]);

Related

Why does Go uses channels to send and receive data between goroutines instead of using normal variables?

I could not find anything about this question except this explanation by Wikipedia https://en.wikipedia.org/wiki/Channel_(programming). But I'm not satisfied with the explanation.
What problem do channels solve?
Why don't we just use normal variables to send and receive data instead?
If by "normal variables" you mean, for example, a slice that multiple goroutines write to and read from, then this is a guaranteed way to get data races (you don't want to get data races). You can avoid concurrent access by using some kind of synchronization (such as Mutex or RWLock).
At this point, you
reinvented channels (which are basically that, a slice under a mutex)
spent more time than you needed to and still your solution is inferior (there's no syntax support, you can't use your slices in select, etc.)
Channels solve the problem of concurrent read and write. Basically, prevent the situation when one goroutine reads a variable and another one writes the same variable.
Also channels may have buffer, so you can write several values before locking.
Of course, you don't have to use channels. There are other ways to send data between goroutines. For example, you can use atomic operations when assigning or reading a value from a shared variable, or use mutex whenever you access it.

What is the use-case for TryEnterCriticalSection?

I've been using Windows CRITICAL_SECTION since the 1990s and I've been aware of the TryEnterCriticalSection function since it first appeared. I understand that it's supposed to help me avoid a context switch and all that.
But it just occurred to me that I have never used it. Not once.
Nor have I ever felt I needed to use it. In fact, I can't think of a situation in which I would.
Generally when I need to get an exclusive lock on something, I need that lock and I need it now. I can't put it off until later. I certainly can't just say, "oh well, I won't update that data after all". So I need EnterCriticalSection, not TryEnterCriticalSection
So what exactly is the use case for TryEnterCriticalSection?
I've Googled this, of course. I've found plenty of quick descriptions on how to use it but almost no real-world examples of why. I did find this example from Intel that, frankly doesn't help much:
CRITICAL_SECTION cs;
void threadfoo()
{
while(TryEnterCriticalSection(&cs) == FALSE)
{
// some useful work
}
// Critical Section of Code
LeaveCriticalSection (&cs);
}
// other work
}
What exactly is a scenario in which I can do "some useful work" while I'm waiting for my lock? I'd love to avoid thread-contention but in my code, by the time I need the critical section, I've already been forced to do all that "useful work" in order to get the values that I'm updating in shared data (for which I need the critical section in the first place).
Does anyone have a real-world example?
As an example you might have multiple threads that each produce a high volume of messages (events of some sort) that all need to go on a shared queue.
Since there's going to be frequent contention on the lock on the shared queue, each thread can have a local queue and then, whenever the TryEnterCriticalSection call succeeds for the current thread, it copies everything it has in its local queue to the shared one and releases the CS again.
In C++11 therestd::lock which employs deadlock-avoidance algorithm.
In C++17 this has been elaborated to std::scoped_lock class.
This algorithm tries to lock on mutexes in one order, and then in another, until succeeds. It takes try_lock to implement this approach.
Having try_lock method in C++ is called Lockable named requirement, whereas mutexes with only lock and unlock are BasicLockable.
So if you build C++ mutex on top of CTRITICAL_SECTION, and you want to implement Lockable, or you'll want to implement lock avoidance directly on CRITICAL_SECTION, you'll need TryEnterCriticalSection
Additionally you can implement timed mutex on TryEnterCriticalSection. You can do few iterations of TryEnterCriticalSection, then call Sleep with increasing delay time, until TryEnterCriticalSection succeeds or deadline has expired. It is not a very good idea though. Really timed mutexes based on user-space WIndows synchronization objects are implemented on SleepConditionVariableSRW, SleepConditionVariableCS or WaitOnAddress.
Because windows CS are recursive TryEnterCriticalSection allows a thread to check whether it already owns a CS without risk of stalling.
Another case would be if you have a thread that occasionally needs to perform some locked work but usually does something else, you could use TryEnterCriticalSection and only perform the locked work if you actually got the lock.

Is it safe to read concurrently from a pointer?

I'm working on an image uploader and want to concurrently resize the image to different sizes. Once I've read the file as a []byte I'm passing a reference of that buffer to my resize functions that are being run concurrently.
Is this safe? I'm thinking by passing a reference of a large file to be read by resize functions will save me memory, and the concurrency will save me time.
Thank you!
Read-only data is usually fine for concurrent access, but you have to be very careful when passing references (pointers, slices, maps and so on) around. Today maybe no one is modifying them while you're also reading, but tomorrow someone may be.
If this is a throwaway script, you'll be fine. But if it's part of a larger program, I'd recommend future-proofing your code by judiciously protecting concurrent access. In your case something like a reader-writer lock could be a good match - all the readers will be able to acquire the lock concurrently, so the performance impact is negligible. And then if you do decide in the future this data could be modified, you already have the proper groundwork laid down w.r.t. safety.
Don't forget to run your code with the race detector enabled.

converting a mutex solution into a channel solution

I am writing a golang program that has to perform multiple download requests at a time utilizing goroutines running in parallel (using GOMAXPROCS). In addition, there is a form of state kept, which is which components have been downloaded and which components are left to be downloaded. The mutex solution would be to lock this structure keeping track of which components have been successfully downloaded. I have read that when attempting to keep state, mutexes are the best option.
However, I am wondering what would be a solution utilizing channels (passing ownership instead of providing exclusive access to state) instead of mutexes, or are mutexes the best option?
P.S.
So far I have thought of passing the global structure keeping state between go routines which are all utilizing one channel (a read-write channel). A go routine attempts to read the structure from the channel and then write it back when it's done.The problem I found with this, is that when the last running go routine [assume all others have finished and stopped running] gives up its posession of the structure by writing to the channel, it will result in deadlock since there are no receivers. In addition, this is still attempting to use channels as mutexes [attempting to provide exclusive access].

Coding Style: lock/unlock internal or external?

Another possibly inane style question:
How should concurrency be locked? Should the executor or caller be responsible for locking the thread?
e.g. in no particular language...
Caller::callAnotherThread() {
_executor.method();
}
Executor::method() {
_lock();
doSomething();
_unlock();
}
OR
Caller::callAnotherThread() {
_executor.lock()
_executor.method();
_executor.unlock()
}
Executor::method() {
doSomething();
}
I know little about threading and locking, so I want to make sure the code is robust. The second method allows thread unsafe calls... you could technically call _executor.method() without performing any kind of lock.
Help?
Thanks,
The callee, not the caller should do the locking. The callee is the only one who knows what needs to be synchronized and the only one who can ensure that it is. If you leave locking up to the callers, you do three bad things:
You increase the burden on users of your function/class, increasing design viscosity.
You make it possible for callers to update shared state without taking the lock.
You introduce the possibility of deadlocks if different functions take multiple locks in different order.
If you use locks internally, you have to note it on manual documentation. Or your code will bottleneck of parallel execution, and users will be hard to know the truth.
We are learning that external locking offers advantages if you need to do several interrelated granular operations at once, or work with a reference to an internal structure - you can hold a lock as long as you need your set of work to be safe from other threads.
An example: A container that manages a list of items might want to provide an api to get a mutable reference to one item. Without external locking, as soon as the function call finishes, another thread could potentially lock and mutate data. A plausible solution is to return a copy of the one item, but this is inefficient.
That being said, for some cases, internal locking can have a cleaner api, provided you can be sure that you won't want to preserve a lock longer than one function call.

Resources