Is this a proper thread-safe Random wrapper? - random

I am fairly inexperienced with threading and concurrency; to remedy that, I am currently working for fun on implementing a random-search algorithm in F#. I wrote a wrapper around the System.Random class, following ideas from existing C# examples - but as I am not sure how I would even begin to unit test this for faulty behavior, I'd like to hear what more experienced minds have to say, and if there are obvious flaws or improvements with my code, either due to F# syntax or threading misunderstanding:
open System
open System.Threading
type Probability() =
static let seedGenerator = new Random()
let localGenerator =
new ThreadLocal<Random>(
fun _ ->
lock seedGenerator (
fun _ ->
let seed = seedGenerator.Next()
new Random(seed)))
member this.Draw() =
localGenerator.Value.NextDouble()
My understanding of what this does: ThreadLocal ensures that for an instance, each thread receives its own instance of a Random, with its own random seed provided by a common, static Random. That way, even if multiple instances of the class are created close in time, they will receive their own seed, avoiding the problem of "duplicate" random sequences. The lock enforces that no two threads will get the same seed.
Does this look correct? Are there obvious issues?

I think your approach is pretty reasonable - using ThreadLocal gives you safe access to the Random and using a master random number generator to provide seeds means that you'll get random values even if you access it from multiple threads at similar time. It may not be random in the cryptographical sense, but should be fine for most other applications.
As for testing, this is quite tricky. If Random breaks, it will return 0 all the time, but that's just empirical experience and it is hard to say for how long you need to keep accessing it unsafely. The best thing I can suggest is to implement some simple randomness tests (some simple ones are on WikiPedia) and access your type from multiple threads in a loop - though this is still quite bad test as it may not fail every time.
Aside, you don't need to use type to encapsulate this behaviour. It can be written as a function too:
open System
open System.Threading
module Probability =
let Draw =
// Create master seed generator and thread local value
let seedGenerator = new Random()
let localGenerator = new ThreadLocal<Random>(fun _ ->
lock seedGenerator (fun _ ->
let seed = seedGenerator.Next()
new Random(seed)))
// Return function that uses thread local random generator
fun () ->
localGenerator.Value.NextDouble()

This feels wrong-headed. Why not just use a singleton (only ever create one Random instance, and lock it)?
If real randomness is a concern, then maybe see RNGCryptoServiceProvider, which is threadsafe.

Unless there is a performance bottleneck, I think something like
let rand = new Random()
let rnext() =
lock rand (
fun () ->
rand.next())
will be easier to understand, but I think your method should be fine.

If you really want to go with the OO approach, then your code may be fine (I won't say 'it is' fine as I am not too smart to understand OO :) ). But in case you want to go the functional way it would be as simple as something like:
type Probability = { Draw : unit -> int }
let probabilityGenerator (n:int) =
let rnd = new Random()
Seq.init n (fun _ -> new Random(rnd.Next()))
|> Seq.map (fun r -> { Draw = fun () -> r.Next() })
|> Seq.toList
Here you can use the function probabilityGenerator to generate as much as "Porbability" type object and then distribute them to various threads which can work on them in parallel.
The important thing here is that we are not introducing lock etc in the core type i.e probability and it becomes the responsibility of the consumer how they want to distribute it across threads.

Related

DeviceSumModuleF32 is broken

let sumModule = (new DeviceSumModuleF32(GPUModuleTarget.Worker(worker))).Create(2e2 |> int)
let t = worker.Malloc([|1.0f;1.0f;1.0f;1.0f;|])
let q = sumModule.Reduce(t.Ptr,4)
Without fail, the above code crashes with around 66% probability per run of the last line. I've tried varying the parameters, but it makes no difference. I think the DeviceSumModuleF32 might be broken.
let sumModule = (new DeviceReduceModule<float32>(GPUModuleTarget.Worker(worker),<# (+) #>)).Create(2e9 |> int)
let t = worker.Malloc([|1.0f;1.0f;1.0f;1.0f;|])
let q = sumModule.Reduce(t.Ptr,4)
The above works using DeviceReduceModule perfectly fine though.
See this post.
Edit: I should have written that instead of crashing, it goes into an infinite loop. Sorry about that.
I think this might be a bug in disposing the GPU module. Here is a workaround, by switch the CUDA context mode to "Threaded", and try to use "use" keyword to maintain the life time of GPU module (GPU module is a result of compilation, so it should be kept alive as long as possible to avoid re-compiling during runtime).
// workaround to use threaded cuda context mode
Alea.CUDA.Settings.Instance.Worker.DefaultContextType <- "threaded"
// compile GPU code and keep the module live for a long time
use reduceModule = new DeviceReduceModule<float32>(GPUModuleTarget.Worker(worker),<# (+) #>)
// now get a reducer from reduce module.
// this reduce object includes some temp memories for algorithm
use reducer = reduceModule.Create(maxReduceNumber)
reducer.Reduce(....)

How should I initialize an `Arc<[u8; 65536]>` efficiently?

I'm writing an application creating Arc objects of large arrays:
use std::sync::Arc
let buffer: Arc<[u8; 65536]> = Arc::new([0u8; 65536]);
After profiling this code, I've found that a memmove is occurring, making this slow. With other calls to Arc::new, the compiler seems smart enough to initialize the stored data without the memmove.
Believe it or not, the above code is faster than:
use std::sync::Arc;
use std::mem;
let buffer: Arc<[u8; 65536]> = Arc::new(unsafe {mem::uninitialized})
Which is a bit of a surprise.
Insights welcome, I expect this is a compiler issue.
Yeah, right now, you have to lean on optimizations, and apparently, it isn't doing it in this case. I'm not sure why.
We are also still working on placement new functionality, which will be able to let you explicitly tell the compiler you want to initialize this on the heap directly. See https://github.com/rust-lang/rfcs/pull/809 (and https://github.com/rust-lang/rfcs/pull/1228 which proposes changes that are inconsequential for this question). Once this is implemented, this should work:
let buffer: Arc<_> = box [0u8; 65536];

How to check which index in a loop is executing without slow down process?

What is the best way to check which index is executing in a loop without too much slow down the process?
For example I want to find all long fancy numbers and have a loop like
for( long i = 1; i > 0; i++){
//block
}
and I want to learn which i is executing in real time.
Several ways I know to do in the block are printing i every time, or checking if(i % 10000), or adding a listener.
Which one of these ways is the fastest. Or what do you do in similar cases? Is there any way to access the value of the i manually?
Most of my recent experience is with Java, so I'd write something like this
import java.util.concurrent.atomic.AtomicLong;
public class Example {
public static void main(String[] args) {
AtomicLong atomicLong = new AtomicLong(1); // initialize to 1
LoopMonitor lm = new LoopMonitor(atomicLong);
Thread t = new Thread(lm);
t.start(); // start LoopMonitor
while(atomicLong.get() > 0) {
long l = atomicLong.getAndIncrement(); // equivalent to long l = atomicLong++ if atomicLong were a primitive
//block
}
}
private static class LoopMonitor implements Runnable {
private final AtomicLong atomicLong;
public LoopMonitor(AtomicLong atomicLong) {
this.atomicLong = atomicLong;
}
public void run() {
while(true) {
try {
System.out.println(atomicLong.longValue()); // Print l
Thread.sleep(1000); // Sleep for one second
} catch (InterruptedException ex) {}
}
}
}
}
Most AtomicLong implementations can be set in one clock cycle even on 32-bit platforms, which is why I used it here instead of a primitive long (you don't want to inadvertently print a half-set long); look into your compiler / platform details to see if you need something like this, but if you're on a 64-bit platform then you can probably use a primitive long regardless of which language you're using. The modified for loop doesn't take much of an efficiency hit - you've replaced a primitive long with a reference to a long, so all you've added is a pointer dereference.
It won't be easy, but probably the only way to probe the value without affecting the process is to access the loop variable in shared memory with another thread. Threading libraries vary from one system to another, so I can't help much there (on Linux I'd probably use pthreads). The "monitor" thread might do something like probe the value once a minute, sleep()ing in between, and so allowing the first thread to run uninterrupted.
To have a null cost reporting (on multi-cpu computers) : set your index as a "global" property (class-wide for instance), and have a separate thread to read and report the index value.
This report could be timer-based (5 times per seconds or so).
Rq : Maybe you'll need also a boolean stating 'are we in the loop ?'.
Volatile and Caches
If you're going to be doing this in, say, C / C++ and use a separate monitor thread as previously suggested then you'll have to make the global/static loop variable volatile. You don't want the compiler decide deciding to use a register for the loop variable. Some toolchains make that assumption anyway, but there's no harm being explicit about it.
And then there's the small issue of caches. A separate monitor thread nowadays will end up on a separate core, and that'll mean that the two separate cache subsystems will have to agree on what the value is. That will unavoidably have a small impact on the runtime of the loop.
Real real time constraint?
So that begs the question of just how real time is your loop anyway? I doubt that your timing constraint is such that you're depending on it running within a specific number of CPU clock cycles. Two reasons, a) no modern OS will ever come close to guaranteeing that, you'd have to be running on the bare metal, b) most CPUs these days vary their own clock rate behind your back, so you can't count on a specific number of clock cycles corresponding to a specific real time interval.
Feature rich solution
So assuming that your real time requirement is not that constrained, you may wish to do a more capable monitor thread. Have a shared structure protected by a semaphore which your loop occasionally updates, and your monitor thread periodically inspects and reports progress. For best performance the monitor thread would take the semaphore, copy the structure, release the semaphore and then inspect/print the structure, minimising the semaphore locked time.
The only advantage of this approach over that suggested in previous answers is that you could report more than just the loop variable's value. There may be more information from your loop block that you'd like to report too.
Mutex semaphores in, say, C on Linux are pretty fast these days. Unless your loop block is very lightweight the runtime overhead of a single mutex is not likely to be significant, especially if you're updating the shared structure every 1000 loop iterations. A decent OS will put your threads on separate cores, but for the sake of good form you'd make the monitor thread's priority higher than the thread running the loop. This would ensure that the monitoring does actually happen if the two threads do end up on the same core.

Is it possible to create a TreeModel for data in a State monad in GTK and Haskell?

I would imagine that the overall answers to this will push me over to Functional Reactive Programming, but... bear with me for a bit.
I also don't have example code for this question. I have wandered near the topic with some of my code, but I have stayed firmly in the IO monad with it.
Imagine that I have an application in which I am modelling somewhat complex state and put it into an overall Application State monad. I am doing it this way because I am wanting a certain level of detachment between my core application and the particular user interface.
data S = S DataStore EventStream Sockets
type AppState m = StateT S m
(assume that DataStore, EventStream, and Sockets are all data types that do basically what they sound like :))
Now, say I want to create a table in GTK (TreeView, but no child nodes) that views only the EventStream. I have already learned to do that by saying listStoreNew event_stream >>= treeViewNewWithModel (see http://markus.alyra.org/?p=1023 where I talked pretty extensively about the mechanics of setting this up).
But, now I have a mutable copy of data that is in my AppState monad. When the application goes off and does something that appends new data to the EventStream, that will not show up in the view. The only way I can think of it make it show up in the view is to send it over with a message like listStoreInsert my_new_event in addition to the changes made to the monad. That's doable, but is starting to feel clumsy.
Worse, though, this mythical tree view is an administrative view! It's editable! The admin says "oh, that event has some invalid data, I want to change it!". Now, I have no problems changing the data that is in the ListStore I created above. I can create callbacks that make the update with no problem. But I cannot think at all of how to get the update into the Global AppState Monad.
And those last few words show the core of the problem. If I have a global AppState Monad, then anything that updates that monad has to be in one line of execution with everything that wants to view the monad. The TreeView breaks that. When a cell gets edited in the TreeView monad, the edit handler runs entirely in the IO monad and is expected to return nothing. The end data type is IO (). Even if I had some nifty way to unwrap data from my AppState, then do the edit handler, and then re-wrap the data in my AppState, no other branch of the application could see it.
Even if I can figure out how to create my own completely custom ModelView instance that provides a read-only view into my AppState, I cannot think of how to make state updates available to the rest of the application.
So...
Is it even possible to model GTK/Haskell application in this way? Or, have I gone down the road to madness?
You have no way of sharing the state reliably using a normal state monad. What if (contrived example) your user edits the model via the GUI and you get a new entry from somewhere else at the same time? You cannot possibly serialize the changes to your state monad in that situation using some pure monad stack.
What you could do is to use some kind of synchronization system using mutable references (With MVars for example); you store the actual application state in an MVar, and whenever something happens that might read or change the state, you access that MVar. Here's some pseudo-code that shows what I mean:
-- This is the MVar that stores your application state
appStateMVar :: MVar S
appStateMVar = unsafePerformIO $ newMVar initialAppState
{-# NOINLINE appStateMVar #-}
-- It could also be passed as a parameter to the functions below, so that when
-- you define the callbacks, you create a closure over the MVar that you use.
-- (i.e.:
-- > appStateMVar <- newMVar initialAppState
-- > createListViewWithCallback $ whenUserAddedSomethingViaTheGUI appStateMVar
-- )
-- That way, you don't have to have the MVar in global scope and can avoid the
-- use of `unsafePerformIO` to initialize it, etc.
main :: IO ()
main = do
createListViewWithCallback whenUserAddedSomethingViaTheGUI
createSocketsAndListenUsingCallback whenChangesArriveOverTheNetwork
runSomeKindOfMainLoop
-- This would be called on any thread by the GUI when the user added something in
-- the view (For example)
whenUserAddedSomethingViaTheGUI :: AddedThing -> IO ()
whenUserAddedSomethingViaTheGUI theThingThatWasAdded =
takeMVar appStateMVar >>=
execStateT (addToTheState theThingThatWasAdded) >>=
putMVar appStateMVar
-- This would be called by the network when something changed there
whenChangesArriveOverTheNetwork :: ArrivedChanges -> IO ()
whenChangesArriveOverTheNetwork theChangesThatArrived =
takeMVar appStateMVar >>=
execStateT (handleChanges theChangesThatArrived) >>=
putMVar appStateMVar
Then, you can write addToTheState and handleChanges using a pure AppState monad, just like you did before.
Of course, if you decide to use FRP, you can avoid this very imperative-style state wiring by letting your application state be a pure signal that changes over time. I understand that reactive-banana has done some work that makes it possible to integrate bi-directional GUI editors/views with FRP event networks.

D Dynamic Arrays - RAII

I admit I have no deep understanding of D at this point, my knowledge relies purely on what documentation I have read and the few examples I have tried.
In C++ you could rely on the RAII idiom to call the destructor of objects on exiting their local scope.
Can you in D?
I understand D is a garbage collected language, and that it also supports RAII.
Why does the following code not cleanup the memory as it leaves a scope then?
import std.stdio;
void main() {
{
const int len = 1000 * 1000 * 256; // ~1GiB
int[] arr;
arr.length = len;
arr[] = 99;
}
while (true) {}
}
The infinite loop is there so as to keep the program open to make residual memory allocations easy visible.
A comparison of a equivalent same program in C++ is shown below.
It can be seen that C++ immediately cleaned up the memory after allocation (the refresh rate makes it appear as if less memory was allocated), whereas D kept it even though it had left scope.
Therefore, when does the GC cleanup?
scope declarations are going in D2, so I'm not terribly certain on the semantics, but what I'd imagine is happening is that scope T[] a; only allocates the array struct on the stack (which needless to say, already happens, regardless of scope). As they are going, don't use scope (using scope(exit) and friends is different -- keep using them).
Dynamic arrays always use the GC to allocate their memory -- there's no getting around that. If you want something more deterministic, using std.container.Array would be the simplest manner, as I think you could pretty much drop it in where your scope vector3b array is:
Array!vector3b array
Just don't bother setting the length to zero -- the memory will be free'd once it goes out of scope (Array uses malloc/free from libc under the hood).
No, you cannot assume that the garbage collector will collect your object at any point in time.
There is, however, a delete keyword (as well as a scope keyword) that can delete an object deterministically.
scope is used like:
{
scope auto obj = new int[5];
//....
} //obj cleaned up here
and delete is used like in C++ (there's no [] notation for delete).
There are some gotcha's, though:
It doesn't always work properly (I hear it doesn't work well with arrays)
The developers of D (e.g. Andrei) are intending to remove them in later versions, because it can obviously mess up things if used incorrectly. (I personally hate this, given that it's so easy to screw things up anyway, but they're sticking with removing it, and I don't think people can convince them otherwise although I'd love it if that was the case.)
In its place, there is already a clear method that you can use, like arr.clear(); however, I'm not quite sure what it exactly does yet myself, but you could look at the source code in object.d in the D runtime if you're interested.
As to your amazement: I'm glad you're amazed, but it shouldn't be really surprising considering that they're both native code. :-)

Resources