How can I wait for multiple child processes using the OCaml Unix module? - multiprocessing

I'm trying to parallelize work. I'm trying to have subprocesses each take a piece of the work, then wait for the work to complete from the parent process.
The following solution seems to work for small examples:
let par_iter (items: 'i list) ~(f: 'i -> unit): unit =
let orig_pid = Unix.getpid () in
let rec loop = function
| [] -> ()
| h :: t ->
match Unix.fork () with
| 0 -> f h
| _ -> loop t
in
loop items;
if Unix.getpid () = orig_pid then
List.iter items ~f:(fun _ -> ignore ## Unix.wait ())
else
exit 0
Usage:
let () =
par_iter [1; 2; 3; 4] ~f:do_something_with_an_int;
The contract of Unix.wait when there are multiple subprocesses is not very clear to me. Here I'm relying on the behavior where waiting n times will wait for all and only n subprocesses to finish.
Does this solution rely on undefined behavior? Is there an alternative way to correctly implement par_iter such that it spawns work on child processes and waits for the work to finish?
tested on OCaml 4.14.0

If the original parent process already had some subprocesses, this can fail. Thus, IMHO it's not usable as a general library. You should use Unix.waitpid to wait specifically for the processes you created.

Related

Programming pattern or library (i.e. idiomatic way) to handle CLI arguments semantic errors?

I have a Haskell application which uses optparse-applicative library for CLI arguments parsing. My data type for CLI arguments contains FilePaths (both files and directories), Doubles and etc. optparse-applicative can handle parse errors but I want to ensure that some files and some directories exist (or don't exist), numbers are >= 0 and etc.
What can be done is an implementation of a bunch of helper functions like these ones:
exitIfM :: IO Bool -> Text -> IO ()
exitIfM predicateM errorMessage = whenM predicateM $ putTextLn errorMessage >> exitFailure
exitIfNotM :: IO Bool -> Text -> IO ()
exitIfNotM predicateM errorMessage = unlessM predicateM $ putTextLn errorMessage >> exitFailure
And then I use it like this:
body :: Options -> IO ()
body (Options path1 path2 path3 count) = do
exitIfNotM (doesFileExist path1) ("File " <> (toText ledgerPath) <> " does not exist")
exitIfNotM (doesDirectoryExist path2) ("Directory " <> (toText skKeysPath) <> " does not exist")
exitIfM (doesFileExist path3) ("File " <> (toText nodeExe) <> " already exist")
exitIf (count <= 0) ("--counter should be positive")
This looks too ad-hoc and ugly to me. Also, I need similar functionality for almost every application I write. Are there some idiomatic ways to deal with this sort of programming pattern when I want to do a bunch of checks before actually doing something with data type? The less boilerplate involved the better it is :)
Instead of validating the options record after it has been constructed, perhaps we could use applicative functor composition to combine argument parsing and validation:
import Control.Monad
import Data.Functor.Compose
import Control.Lens ((<&>)) -- flipped fmap
import Control.Applicative.Lift (runErrors,failure) -- form transformers
import qualified Options.Applicative as O
import System.Directory -- from directory
data Options = Options { path :: FilePath, count :: Int } deriving Show
main :: IO ()
main = do
let pathOption = Compose (Compose (O.argument O.str (O.metavar "FILE") <&> \file ->
do exists <- doesPathExist file
pure $ if exists
then pure file
else failure ["Could not find file."]))
countOption = Compose (Compose (O.argument O.auto (O.metavar "INT") <&> \i ->
do pure $ if i < 10
then pure i
else failure ["Incorrect number."]))
Compose (Compose parsy) = Options <$> pathOption <*> countOption
io <- O.execParser $ O.info parsy mempty
errs <- io
case runErrors errs of
Left msgs -> print msgs
Right r -> print r
The composed parser has type Compose (Compose Parser IO) (Errors [String]) Options. The IO layer is for performing file existence checks, while Errors is a validation-like Applicative from transformers that accumulates error messages. Running the parser produces an IO action that, when run, produces an Errors [String] Options value.
The code is a bit verbose but those argument parsers could be packed in a library and reused.
Some examples form the repl:
Λ :main "/tmp" 2
Options {path = "/tmp", count = 2}
Λ :main "/tmpx" 2
["Could not find file."]
Λ :main "/tmpx" 22
["Could not find file.","Incorrect number."]

Ocaml stdin interface [duplicate]

This question already has an answer here:
How to make an interactive program?
(1 answer)
Closed 5 years ago.
I need to process user input from stdin with ocaml. User will enter commands until he types quit and then the program finishes. How to do this? I know how to program in imperative, but I want to learn functional. The user is supposed to manipulate data from a stack based on his commands. Also I want to do like a parser to process user commands.
Really appreciate your help!
Here is a sketch of something you could write using OCaml stack library. It's far from perfect and it could be improved in many ways, but the general structure is here.
The most important part as far as your question is concerned is the loop function. It reads a line from the standard input, and uses a pattern-matching to either end the program, or evaluate the given and command, and recursively call itself to wait for another command.
The eval function uses a pattern-matching on the given arguments to do the right thing. You can find documentation for the Stack module here.
let stack = Stack.create ()
let eval args =
match args with
| ["push"; v] -> Stack.push v stack
| ["pop"] -> begin try print_endline (Stack.pop stack) with
| Stack.Empty -> print_endline "Stack is empty"
end
| ["show"] -> Stack.iter print_endline stack
| _ -> print_endline "Unrecognized command"
let rec loop () =
match read_line () with
| "quit" -> print_endline "Bye"
| _ as command -> eval (String.split_on_char ' ' command); loop ()
let () =
loop ()
Note: I usually don't really like the idea of giving a full solution to a question that doesn't show a lot of research, but hey, you have to start somewhere when you're new to functional programming.
Note 2: This code will only work for string stacks. If you intend to store a different type, say ints, or if you want it to be polymorphic, you'll need to tweak that code a little bit.
EDIT: According to a remark made in the comments, below is an improved version of the above code, which doesn't use a global variable stack.
let eval s args =
match args with
| ["push"; v] -> Stack.push v s
| ["pop"] -> begin try print_endline (Stack.pop s) with
| Stack.Empty -> print_endline "Stack is empty"
end
| ["show"] -> Stack.iter print_endline s
| _ -> print_endline "Unrecognized command"
let rec loop s =
match read_line () with
| "quit" -> print_endline "Bye"
| _ as command -> eval s (String.split_on_char ' ' command); loop s
let () =
loop (Stack.create ())

F# Akka.NET agents performance optimization when synchronizing

I am trying to solve the following problem. I have some agents running in real-time, with a large heartbeat of couple of milliseconds, and the order of operations they process for this reason are mostly deterministic (as the message processing is not the bottleneck).
Now, I am running a large amount of simulations of the system where I no longer have a heartbeat (otherwise it will take a couple of centuries) - but I need to make sure the order of operations are preserved. For this, I adopted the following solution: the simulator makes sure each agent has processed his message queue, by posting a dummy synchronization message and blocking while waiting for the answer. This does work for my application, but the time it takes is not intuitive - as a single threaded implementation would be an order of magnitude faster (I guess - x 100 ish - although I have not tested ).
I have isolated a small test that shows the issue, even trying to use another library, akka.net
type Greet =
| Greet of string
| Hello of AsyncReplyChannel<bool>
| Hello2
[<EntryPoint>]
let main argv =
let system = System.create "MySystem" <| Configuration.load()
let greeter = spawn system "greeter" <| fun mailbox ->
let rec loop() = actor {
let! msg = mailbox.Receive()
let sender = mailbox.Sender()
match msg with
| Greet who -> () // printf "Hello, %s!\n" who
| Hello2 -> sender.Tell(true)
| _ -> ()
return! loop()
}
loop()
let greeterF =
MailboxProcessor.Start
(fun inbox ->
async {
while true do
let! msg = inbox.Receive()
match msg with
| Greet who -> () // printf "Hello, %s!\n" who
| Hello reply -> reply.Reply true
| _ -> ()
}
)
let n = 1000000
let t1 = System.Diagnostics.Stopwatch()
t1.Start()
for i = 1 to n do
let rep = greeterF.PostAndReply(fun reply -> (Hello reply)) |> ignore
()
printfn "elapsed Mailbox:%A" t1.ElapsedMilliseconds
t1.Restart()
for i = 1 to n do
let res = greeter.Ask (Hello2)
let rr = res.Result
()
printfn "elapsed Akka:%A" t1.ElapsedMilliseconds
System.Console.ReadLine () |> ignore
0
Basically, both take about 10 seconds for a mere 1 millions of synchronizations - and not computation what-so-ever involved, and this is... unfortunate.
I am wondering if anybody has come across the same problem and if there is anyway to switch off the overhead forcing everything to run in a single threaded mode... something like better than de-activating all the cpus but 1 in the bios - or writing a clone of the whole system without the agents.
Any help well appreciated.
The reason for the Akka.NET version beeing slow here is how you communicate with the actor:
main process Task FutureActorRef !!ThreadPool!! greeter
Ask ---------------------->
Tell----------->
MailboxRun ----->
(greeter mailbox is empty) |
<--------------------------Tell
<--Complete task
<----------.Result
For each iteration, a TPL task will be created
Then a single message is sent to the greeter.
The main process then blocks while waiting for the response to come back.
The greeter replies back which in turn completes the task inside the FutureActorRef
Rinse and Repeat..
This design will cause Akka.NET to start and stop the greeters "mailbox run" for each message as the mailbox queue becomes empty for each iteration.
This results in threadpool schedulation for each single message that is passed.
It's a bit like entering your car, putting the pedal to the metal, then abruptly stop and step out of the car, and then repeating the procedure again.
That is just not a very effective way to travel fast.
#Aaronontheweb's suggestion will only have effect if you work out the above kinks in your code.
The mailbox needs to be able to constantly pick messages of the internal queue to work with messages in batches to achieve full throughput.
Instead, separate the producer from the consumer.
Create an actor that listens for the responses from your greeter.
And once that actor have processes your 1000000 messages, let that actor send a WorkCompleted message back to the consumer.
[Edit]
I gave it a shot myself, I don't know F# so it might not be completely idiomatic :)
open Akka
open Akka.Actor
open Akka.FSharp
type Greet =
| Greet of string
| Hello of AsyncReplyChannel<bool>
| Hello2
type Consume =
| Response
| SetSender
[<EntryPoint>]
let main argv =
let system = System.create "MySystem" <| Configuration.load()
let greeter = spawn system "greeter" <| fun mailbox ->
let rec loop() = actor {
let! msg = mailbox.Receive()
let sender = mailbox.Sender()
match msg with
| Greet who -> () // printf "Hello, %s!\n" who
| Hello2 -> sender.Tell(Response)
| _ -> ()
return! loop()
}
loop()
let consumer = spawn system "consumer" <| fun mailbox ->
let rec loop(count,sender : IActorRef) = actor {
if count = 1000000 then sender.Tell(true)
let! msg = mailbox.Receive()
match msg with
| Response -> return! loop(count+1,sender)
| SetSender -> return! loop(count,mailbox.Sender())
}
loop(0,null)
let n = 1000000
let t1 = System.Diagnostics.Stopwatch()
t1.Start()
for i = 1 to n do
greeter.Tell(Hello2,consumer)
let workdone = consumer.Ask SetSender
workdone.Wait()
printfn "elapsed Akka:%A" t1.ElapsedMilliseconds
System.Console.ReadLine () |> ignore
0
I updated your code to use a separate consumer for the actor responses and then reply back once all replies had been processed.
By doing so, your processing time is now down to 650ms on my machine.
If you want better throughput, you need to involve more actors to parallelize more.
I'm not sure if this helps in your specific scenario
Here is a slightly modified MailboxProcessor version:
module MBPAsync =
type Greet =
| Greet of string
| Hello of AsyncReplyChannel<bool>
let run n =
let timer = Stopwatch.StartNew ()
use greeter =
MailboxProcessor.Start <| fun inbox -> async {
while true do
let! msg = inbox.Receive()
match msg with
| Greet who -> () // printf "Hello, %s!\n" who
| Hello reply -> reply.Reply true
}
Async.RunSynchronously <| async {
for i = 1 to n do
do! Async.Ignore (greeter.PostAndAsyncReply Hello)
}
let elapsed = timer.Elapsed
printfn "%A" elapsed
The difference here is that this version uses PostAndAsyncReply and keeps the computation in an async workflow. On my quick test this seemed to be much faster that using PostAndReply, but YMMV.
The timings I get from the above MBP version look roughly like this:
> MBPAsync.run 1000000 ;;
00:00:02.6883486
val it : unit = ()
A comment earlier mentioned my Hopac library. Here is an optimized version using Hopac:
module Hop =
type Greet =
| Greet of string
| Hello of IVar<bool>
let run n =
let timer = Stopwatch.StartNew ()
let greeterCh = ch ()
do greeterCh >>= function
| Greet who -> Job.unit ()
| Hello reply -> reply <-= true
|> Job.forever
|> server
Job.forUpToIgnore 1 n <| fun _ ->
let reply = ivar ()
greeterCh <-- Hello reply >>.
reply
|> run
let elapsed = timer.Elapsed
printfn "%A" elapsed
The timings I get from the above Hopac version look roughly like this:
> Hop.run 1000000 ;;
00:00:00.1088768
val it : unit = ()
I'm not an F# developer, but I'm a core dev on Akka.NET. A couple of ideas for your scenario:
If you're only using a single actor for this work, you can try using a PinnedDispatcher - that way the actor runs on its own dedicated thread all the time. That will save you on unnecessary context switching overhead.
You can also set the throughput of the mailbox to be much higher for this PinnedDispatcher than the normal settings. i.e. set a throughput value of 10000 (or something) instead of the normal 25. Assuming that the contents of your mailbox grow in large bursts this should save you on mailbox synchronization overhead.
Here's what your dispatcher configuration might look like:
my-pinned-dispatcher {
type = PinnedDispatcher
throughput = 1000 #your mileage may vary
}
And then configure an actor to use it
C# Fluent Interface
var myActor = myActorSystem.ActorOf(Props.Create<FooActor>()
.WithDispatcher("my-pinned-dispatcher");
Config
akka.actor.deployment{
/greeter{
dispatcher = my-pinned-dispatcher
}
}
These are both options you can configure via HOCON in App.config or Web.config or you can use the fluent interface on the Props class to do this. Also worth noting: there's a bug with pinned dispatchers at the moment, but that should be fixed in our next maintenance release (v1.0.1,) which should be out next week.
Your mileage may vary, but this is what I would try - basically it's just designed to help reduce contention and overhead around a single actor.

Error in Prosses <proc_num> with exit value:

I'm trying to parallelize a function but I seem to be having problems (in Erlang).
I have code similar to this:
-module(examples).
-import(lists, [sublist/2, sublist/3]).
motherFunc(Ls) -> childFunc(Ls, false).
childFunc(Ls, false) ->
Pid1 = spawn(example, mergesortP, [lists:sublist(Ls, length(Ls) div 2, self()]),
Pid2 = spawn(example, mergesortP, [lists:sublist(Ls, length(Ls) div 2, self()]),
FirstHalf = [],
SecondHalf = [],
receive
{Pid1, Msg} -> FirstHalf = Msg;
{Pid2, Msg} -> SecondHalf = Msg
end,
SecondHalf ++ FirstHalf;
childFunc([], Pid) -> Pid ! {self(), []};
childFunc([L], Pid) -> Pid ! {self(), [L]};
childFunc(Ls, Pid) ->
Pid1 = spawn(examples, childFunc, [lists:sublist(Ls, length(Ls) div 2, self()]),
Pid2 = spawn(examples, childFunc, [lists:sublist(Ls, length(Ls) div 2 + 1, length(Ls) div 2 + 1), self()]),
FirstHalf = [],
SecondHalf = [],
receive
{Pid1, Msg} -> FirstHalf = Msg;
{Pid2, Msg} -> SecondHalf = Msg
end,
Pid ! {self(), SecondHalf ++ FirstHalf}.
When I run this I'm getting this error message for both threads, and then it does nothing else: Error in process <0.31.0> with exit value: {undef,[{examples,childFunc,[[1,2,3,4],<0.2.0>],[]}]}
The function spawn/3 with the module and the function name as the first and second argument requires that the function is exported.
In fact, any explicit qualified call:
module:function(...)
or implicit qualified call:
apply(module, function, ...)
requires that the function be exported. Whether it's supposedly from the same module or not is irrelevant (1).
In your code, you should either use spawn/1 or export the function:
-export([childFunc/2, mergesortP/2]).
% ...
spawn(?MODULE, mergesortP, [lists:sublist(Ls, length(Ls) div 2), self()])
or
Self = self(),
spawn(fun() -> mergesortP(lists:sublist(Ls, length(Ls) div 2), Self) end.
Note the difference when passing self(). With spawn/3 invocation, self() is evaluated by the spawning process and therefore is its pid. If you call self() within the anonymous function passed to spawn/1, it will evaluate to the pid of the spawned process. So you need to call it before and pass it in a variable.
Additionally, your code is unlikely to work. Indeed, the following section does not do what you mean:
FirstHalf = [],
SecondHalf = [],
receive
{Pid1, Msg} -> FirstHalf = Msg;
{Pid2, Msg} -> SecondHalf = Msg
end,
This will first assign [] to FirstHalf and SecondHalf variables, and then try to match this with the first message that arrived, as variables in Erlang cannot be re-assigned. The second message is ignored as received is only used once. The first result probably isn't an empty list and therefore it will fail with a badmatch error. Instead, it seems you mean to perform parallel execution of the merge an collect the results afterwards. You could write:
FirstHalf = receive {Pid1, Msg1} -> Msg1 end,
SecondHalf = receive {Pid2, Msg2} -> Msg2 end,
This will collect the first half and then the second half.
(1) Actually, a qualified call can in fact be executed by another module, or more precisely a newer version of the module. Qualifying calls to (a loop) function within the same module is a common method to continue execution in a newer version during code upgrades.

Converting OCaml to F#: F# equivelent of Pervasives at_exit

I am converting the OCaml Format module to F# and tracked a problem back to a use of the OCaml Pervasives at_exit.
val at_exit : (unit -> unit) -> unit
Register the given function to be called at program termination time. The functions registered with at_exit will be called when the program executes exit, or terminates, either normally or because of an uncaught exception. The functions are called in "last in, first out" order: the function most recently added with at_exit is called first.
In the process of conversion I commented out the line as the compiler did not flag it as being needed and I was not expecting an event in the code.
I checked the FSharp.PowerPack.Compatibility.PervasivesModule for at_exit using VS Object Browser and found none.
I did find how to run code "at_exit"? and How do I write an exit handler for an F# application?
The OCaml line is
at_exit print_flush
with print_flush signature: val print_flush : (unit -> unit)
Also in looking at the use of it during a debug session of the OCaml code, it looks like at_exit is called both at the end of initialization and at the end of each use of a call to the module.
Any suggestions, hints on how to do this. This will be my first event in F#.
EDIT
Here is some of what I have learned about the Format module that should shed some light on the problem.
The Format module is a library of functions for basic pretty printer commands of simple OCaml values such as int, bool, string. The format module has commands like print_string, but also some commands to say put the next line in a bounded box, think new set of left and right margins. So one could write:
print_string "Hello"
or
open_box 0; print_string "<<";
open_box 0; print_string "p \/ q ==> r"; close_box();
print_string ">>"; close_box()
The commands such as open_box and print_string are handled by a loop that interprets the commands and then decides wither to print on the current line or advance to the next line. The commands are held in a queue and there is a state record to hold mutable values such as left and right margin.
The queue and state needs to be primed, which from debugging the test cases against working OCaml code appears to be done at the end of initialization of the module but before the first call is made to any function in the Format module. The queue and state is cleaned up and primed again for the next set of commands by the use of mechanisms for at_exit that recognize that the last matching frame for the initial call to the format modules has been removed thus triggering the call to at_exit which pushes out any remaining command in the queue and re-initializes the queue and state.
So the sequencing of the calls to print_flush is critical and appears to be at more than what the OCaml documentation states.
This should do it:
module Pervasives =
open System
open System.Threading
//
let mutable private exitFunctions : (unit -> unit) list = List.empty
//
let mutable private exitFunctionsExecutedFlag = 0
//
let private tryExecuteExitFunctions _ =
if Interlocked.CompareExchange (&exitFunctionsExecutedFlag, 1, 0) = 0 then
// Run the exit functions in last-in-first-out order.
exitFunctions
|> List.iter (fun f -> f ())
// Register handlers for events which fire when the process exits cleanly
// or due to an exception being thrown.
do
AppDomain.CurrentDomain.ProcessExit.Add tryExecuteExitFunctions
AppDomain.CurrentDomain.UnhandledException.Add tryExecuteExitFunctions
//
let at_exit f =
// TODO : This function should be re-written using atomic operations
// for thread-safety!
exitFunctions <- f :: exitFunctions
And some code to test it:
open System
// Register a couple of handlers to test our code.
Pervasives.at_exit <| fun () ->
Console.WriteLine "The first registered function has fired!"
Pervasives.at_exit <| fun () ->
Console.WriteLine "The second registered function has fired!"
TimeSpan.FromSeconds 1.0
|> System.Threading.Thread.Sleep
Console.WriteLine "Exiting the second registered function!"
Pervasives.at_exit <| fun () ->
Console.WriteLine "The third registered function has fired!"
// Do some stuff in our program
printfn "blah"
printfn "foo"
printfn "bar"
(* The functions we registered with at_exit should be fired here. *)
// Uncomment this to see that our handlers work even when the
// program crashes due to an unhandled exception.
//failwith "Uh oh!"

Resources