How can I determine which Allegro library functions are threadsafe? - thread-safety

I'm trying to learn how to use the Allegro 5 game programming library. I'm wondering how I can find out which library functions are threadsafe. I understand how to use mutexes to ensure safety in my own code, but the amount that I may need to use them when calling Allegro's own functions is unclear to me.
The Allegro FAQ says it is threadsafe, and links to this thread. However, that thread isn't very helpful because the "really good article" linked in the first comment is a dead link, and the conclusion of the commentors seems to be "Allegro is mostly threadsafe", with no indication about which parts may not be.

ALLEGRO relies internally on OpenGL (by default) for it's graphics routines, so they are not guaranteed to be thread safe. You can assume the same to be true for audio. All other functions though are indeed thread-safe:
Synchronization routines (mutex,cond...)
Timers
Filesystem and IO
What I do in my programs, is make all graphics calls from a single thread, and all audio calls from a single thread (not necessarily the same). All other threads use ALLEGRO sync routines to synchronize with graphics and audio.
NOTE:
Just to clarify, I meant that you shouldn't DRAW from two threads simultaneously. It's alright to create, copy etc. DIFFERENT bitmaps from different threads simultaneously, so long as you don't draw them on the screen.
NOTE 2:
I feel like this is obvious, but you shouldn't write to any object in any programming language from two different threads simultaneously.

Related

Golang main difference from CSP-Language by Hoare

Look at this statement taken from The examples from Tony Hoare's seminal 1978 paper:
Go's design was strongly influenced by Hoare's paper. Although Go differs significantly from the example language used in the paper, the examples still translate rather easily. The biggest difference apart from syntax is that Go models the conduits of concurrent communication explicitly as channels, while the processes of Hoare's language send messages directly to each other, similar to Erlang. Hoare hints at this possibility in section 7.3, but with the limitation that "each port is connected to exactly one other port in another process", in which case it would be a mostly syntactic difference.
I'm confused.
Processes in Hoare's language communicate directly to each other. Go routines communicate also directly to each other but using channels.
So what impact has the limitation in golang. What is the real difference?
The answer requires a fuller understanding of Hoare's work on CSP. The progression of his work can be summarised in three stages:
based on Dijkstra's semaphore's, Hoare developed monitors. These are as used in Java, except Java's implementation contains a mistake (see Welch's article Wot No Chickens). It's unfortunate that Java ignored Hoare's later work.
CSP grew out of this. Initially, CSP required direct exchange from process A to process B. This rendezvous approach is used by Ada and Erlang.
CSP was completed by 1985, when his Book was first published. This final version of CSP includes channels as used in Go. Along with Hoare's team at Oxford, David May concurrently developed Occam, a language deliberately intended to blend CSP into a practical programming language. CSP and Occam influenced each other (for example in The Laws of Occam Programming). For years, Occam was only available on the Transputer processor, which had its architecture tailored to suit CSP. More recently, Occam has developed to target other processors and has also absorbed Pi calculus, along with other general synchronisation primitives.
So, to answer the original question, it is probably helpful to compare Go with both CSP and Occam.
Channels: CSP, Go and Occam all have the same semantics for channels. In addition, Go makes it easy to add buffering into channels (Occam does not).
Choices: CSP defines both the internal and external choice. However, both Go and Occam have a single kind of selection: select in Go and ALT in Occam. The fact that there are two kinds of CSP choice proved to be less important in practical languages.
Occam's ALT allows condition guards, but Go's select does not (there is a workaround: channel aliases can be set to nil to imitate the same behaviour).
Mobility: Go allows channel ends to be sent (along with other data) via channels. This creates a dynamically-changing topology and goes beyond what is possible in CSP, but Milner's Pi calculus was developed (out of his CCS) to describe such networks.
Processes: A goroutine is a forked process; it terminates when it wants to and it doesn't have a parent. This is less like CSP / Occam, in which processes are compositional.
An example will help here: firstly Occam (n.b. indentation matters)
SEQ
PAR
processA()
processB()
processC()
and secondly Go
go processA()
go processB()
processC()
In the Occam case, processC doesn't start until both processA and processB have terminated. In Go, processA and processB fork very quickly, then processC runs straightaway.
Shared data: CSP is not really concerned with data directly. But it is interesting to note there is an important difference between Go and Occam concerning shared data. When multiple goroutines share a common set of data variables, race conditions are possible; Go's excellent race detector helps to eliminate problems. But Occam takes a different stance: shared mutable data is prevented at compilation time.
Aliases: related to the above, Go allows many pointers to refer to each data item. Such aliases are disallowed in Occam, so reducing the effort needed to detect race conditions.
The latter two points are less about Hoare's CSP and more about May's Occam. But they are relevant because they directly concern safe concurrent coding.
That's exactly the point: in the example language used in Hoare's initial paper (and also in Erlang), process A talks directly to process B, while in Go, goroutine A talks to channel C and goroutine B listens to channel C. I.e. in Go the channels are explicit while in Hoare's language and Erlang, they are implicit.
See this article for more info.
Recently, I've been working quite intensively with Go's channels, and have been working with concurrency and parallelism for many years, although I could never profess to know everything about this.
I think what you're asking is what's the subtle difference between sending a message to a channel and sending directly to each other? If I understand you, the quick answer is simple.
Sending to a Channel give the opportunity for parallelism / concurrency on both sides of the channel. Beautiful, and scalable.
We live in a concurrent world. Sending a long continuous stream of messages from A to B (asynchronously) means that B will need to process the messages at pretty much the same pace as A sends them, unless more than one instance of B has the opportunity to process a message taken from the channel, hence sharing the workload.
The good thing about channels is that that you can have a number of producer/receiver go-routines which are able to push messages to the queue, or consume from the queue and process it accordingly.
If you think linearly, like a single-core CPU, concurrency is basically like having a million jobs to do. Knowing a single-core CPU can only do one thing at a time, and yet also see that it gives the illusion that lots of things are happening at the same time. When executing some code, the time the OS needs to wait a while for something to come back from the network, disk, keyboard, mouse, etc, or even some process which sleeps for a while, give the OS the opportunity to do something else in the meantime. This all happens extremely quickly, creating the illusion of parallelism.
Parallelism on the other hand is different in that the job can be run on a completely different CPU independent of what's going with other CPUs, and therefore doesn't run under the same constraints as the other CPU (although most OS's do a pretty good job at ensuring workloads are evenly distributed to run across all of it's CPUs - with perhaps the exception of CPU-hungry, uncooperative non-os-yielding-code, but even then the OS tames them.
The point is, having multi-core CPUs means more parallelism and more concurrency can occur.
Imagine a single queue at a bank which fans-out to a number of tellers who can help you. If no customers are being served by any teller, one teller elects to handle the next customer and becomes busy, until they all become busy. Whenever a customer walks away from a teller, that teller is able to handle the next customer in the queue.

Why std::thread doesn't have try_join_for() and interrupt() methods

Can anybody explain me why std::thread is different from boost::thread by the following:
It doesn't have try_join_for / try_join_until methods
It has no interrupt method
There are some explanations from https://isocpp.org/wiki/faq/cpp11-library-concurrency:
There is no way to request a thread to terminate (i.e. request that it exit as a soon as possible and as gracefully as possible) or to force a thread to terminate (i.e. kill it). We are left with the options of
designing our own cooperative “interruption mechanism” (with a piece of shared data that a caller thread can set for a called thread to check, and quickly and gracefully exit when it is set),
“going native” by using thread::native_handle() to gain access to the operating system’s notion of a thread,
kill the process (std::quick_exit()),
kill the program (std::terminate()).
This was all the committee could agree upon. In particular, representatives from POSIX were vehemently against any form of “thread cancellation” however much C++’s model of resources rely on destructors. There is no perfect solution for every system and every possible application.
But maybe there are more complete explanations? Such methods (try_join and interrupt) sometimes are very useful.
Well, you answered your question by yourself. The important difference between boost and the c++11 standard is, that c++11 is a standard. So basically everyone had to agree with the methods and functions related to threads. But as you told us already, "Such methods (try_join and interrupt) sometimes are very useful"
So would it have been reasonable to impose that as standard to every compiler? Maybe, but if you really need it you could as well simply use the boost equivalent until it does maybe some day get into the standard.
#Howard-Hinnant suggested: There were voices on the committee adamantly claiming that cooperative thread cancellation could not be portably implemented in C++. At the time boost::thread did not have interrupt. So Anthony Williams implemented interrupt for boost largely as was currently proposed as a proof of concept that it could be portably implemented. This proof of concept was largely ignored by the committee, mainly because so much time had already been spent on the subject that we were at risk of sinking the entire standard because of this one issue.

Pixel drawing disadvantages?

Is there any disadvantage to drawing all one's graphics pixel-by-pixel, as compared to using pre-defined rectangle- or circle-drawing functions? Self-defining such functions is fine. Mainly, I'm just worried about execution speed.
Also, what about sprite sheets? How do they compare to all of the above?
There can be a number of issues:
The OS or library-supplied routines may use hardware acceleration which will be faster than what you're likely to write on your own.
The OS or library-supplied routines have likely been tested better than code you'll write from scratch. (It's already being used by millions of people, so if there was a bug, it probably would have been hit by now!)
You'll have to update your routines when hardware changes, whereas an OS or library can abstract that away. (They have to change theirs as well, but they keep the interface the same so your program keeps running.)
You're unlikely to write routines that are general and faster than those supplied by popular OSes and libraries since they're written by people with a lot of experience, highly optimized, and often written in conjunction with the people who designed the hardware.

What is the reason why high level abstractions that use lock free programming deep down aren't popular?

From what I gathered on the lock free programming, it is incredibly hard to do right... and I agree.
Just thinking about some problems makes my head hurt. But what I wonder is, why isn't
there a widespread use of high-level wrappers around (e.g. lock free queue and similar stuff)?
For example boost has no lock free library, although one was suggested as far as I know.
I mean I guess that there is a lot of applications where you cant avoid the fact that the critical
section is the big part of the load. So what are the reasons? Is it...
Patents - I heard that some stuff related to lock-free programming is patented.
Performance.
Google, and Microsoft have internal libraries like that but none of them are public...
Something else?
So my question is: Why are high level abstractions that use lock free programming deep down not very
popular, while at the same time "regular" multi-threaded programming is "in"?
EDIT: boost got a lockfree lib :)
There are few people who are familiar enough with the field to implement easy-to-use lock-free libraries. Of those few, even fewer publish work for free and of those almost none do the vital additional work to make the library useable - e.g. publish full API docs, etc. They tend to just release a zip file with code in, which is almost useless. Then of course you also need to find a library which is written in the language you want to use, compiles on the platform you're using and finally, word of the library has to get out, so people know it exists.
Patents are an issue, in that they limit what can be offered. There is, for example, to my knowledge no unpatented singly-linked list. All the skip list stuff is heavily patented, too.
A hero in this field is Cliff Click, who came up with a lock-free hash, which he has more-or-less placed in the public domain.
You can find my lock-free library here;
http://www.liblfds.org
Another is Samy Bahra's Concurrency Kit;
http://www.concurrencykit.org
FYI Microsoft's .Net framework gained some lock free classes in .Net 4.0. Namely container classes in the System.Collections.Concurrent namespace, which are:
ConcurrentDictionary
ConcurrentQueue
ConcurrentStack
I've looked into their implementation and they are relatively fiddly/complex under the hood therefore they do represent a significant amount of effort in designing and testing (threading issues are of course notoriously difficult to test to a high standard).
You can take a look at libcds C++ library. It is collection of lock-free containers (stacks, queues, sets and maps) and safe memory reclamation algorithms.
IMHO regarding C++ (I'm not advanced in other languages). New C++ standard has just been released and the compiler developers need a time to implement its requirements. Today, all compilers do not support C++11 memory model entirely since it requires significant changes in compiler’s optimization rules. Recently, Microsoft announces support of the atomic operations that is the base of lock-free programming in VC++ 11 Developer Preview. It is good news for us. As I know, GCC is going to support it in 4.8 (or above).
Second problem is patents. Many interesting lock-free container algorithms are patented that is a barrier to include them to vendor’s libraries.
Third, the main part of lock-free containers is garbage collecting (safe memory reclamation). C++ is free from any GC (fortunately). There are a few GC algos (Hazard Pointer, Pass-the-Buck, epoch-based and so on) but most of them are patented too.
Fourth, not enough instruments to prove the correctness of memory fences applied in your lock-free implementation. Now I known only one – relacy(http://www.1024cores.net/home/relacy-race-detector).
I think after 2-3 years we’ll see many production-ready multiplatform C++ libraries of lock-free containers and algorithms. These libraries are being developed by vendors and enthusiasts.
However, in my opinion, our future is the hardware transaction memory (HTM). Today AMD, Sun (sorry, Oracle), Intel (?) are investigating HTM with very interesting results. Let’s wait.
There is at least one "lock free” framework that is somewhat popular: Erlang.
One major problem is that unless one uses an excessive number of memory barriers, it's hard to be certain that one has enough; if one does use an excessive number of memory barriers, performance is likely to be inferior to what one would have gotten using locks.
The biggest problem with locks is not performance, but robustness. If a thread gets waylaid while it holds a lock, the system dies. By contrast, if a thread which is accessing a lock-free data structure gets waylaid, it won't affect other threads' use thereof. In some situations, a lock-free data structure may be preferable to one using locks, even if performance is inferior, because one must protect the system from being brought down by a malfunctioning thread (for example, even if one was prepared to kill off a thread which hit a StackOverflowException without taking down the process, how would one protect against a thread putting a lot of stuff on its stack before calling a method to access a lock-protected data structure that the method, such that the lock-guarded method hit a stack overflow?) If one uses lock-free data structures, such risks aren't a problem.

Distributed array in MPI for parallel numerics

in many distributed computing applications, you maintain a distributed array of objects. Each process manages a set of objects that it may read and write exclusively and furthermore a set of objects that may only read (the content of which is authored by and frequently recerived from other processes).
This is very basic and is likely to have been done a zillion times until times until now - for example, with MPI. Hence I suppose there is something like an open source extension for MPI, which provides the basic capabilities of a distributed array for computing.
Ideally, it would be written in C(++) and mimic the official MPI standard interface style. Does anybody know anything like that? Thank you.
From what I gather from your question, you're looking for a mechanism for allowing a global view (read-only) of the problem space, but each process has ownership (read-write) of a segment of the data.
MPI is simply an API specification for inter-process communication for parallel applications and any implementation of it will work at a level lower than what you are looking for.
It is quite common in HPC applications to perform data decomposition in a way that you mentioned, with MPI used to synchronise shared data to other processes. However each application have different sharing patterns and requirements (some may wish to only exchange halo regions with neighbouring nodes, and perhaps using non-blocking calls to overlap communication other computation) so as to improve performance by making use of knowledge of the problem domain.
The thing is, using MPI to sync data across processes is simple but implementing a layer above it to handle general purpose distribute array synchronisation that is easy to use yet flexible enough to handle different use cases can be rather trickly.
Apologies for taking so long to get to the point, but to answer your question, AFAIK there isn't be an extension to MPI or a library that can efficiently handle all use cases while still being easier to use than simply using MPI. However, it is possible to to work above the level of MPI which maintaining distributed data. For example:
Use the PGAS model to work with your data. You can then use libraries such as Global Arrays (interfaces for C, C++, Fortran, Python) or languages that support PGAS such as UPC or Co-Array Fortran (soon to be included into the Fortran standards). There are also languages designed specifically for this form of parallelism, i,e. Fortress, Chapel, X10
Roll your own. For example, I've worked on a library that uses MPI to do all the dirty work but hides the complexity by providing creating custom data types for the application domain, and exposing APIs such as:
X_Create(MODE, t_X) : instantiate the array, called by all processes with the MODE indicating if the current process will require READ-WRITE or READ-ONLY access
X_Sync_start(t_X) : non-blocking call to initiate synchronisation in the background.
X_Sync_complete(t_X) : data is required. Block if synchronisation has not completed.
... and other calls to delete data as well as perform domain specific tasks that may require MPI calls.
To be honest, in most cases it is often simpler to stick with basic MPI or OpenMP, or if one exists, using a parallel solver written for the application domain. This of course depends on your requirements.
For dense arrays, see Global Arrays and Elemental (Google will find them for you).
For sparse arrays, see PETSc.
I know this is a really short answer, but there is too much documentation of these elsewhere to bother repeating it.

Resources