appending results to array after parallel processing - parallel-processing

Some eloquency questions:
A. How to add a list that was formed from parallel processing directly to the a Concurrent results array in an eloquent way.
let results = System.Collections.Concurrent.ConcurrentBag<string>()
let tasks = System.Collections.Generic.List<string>()
tasks.add("a")
tasks.add("b")
let answers = tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
|> Array.toList
Array.append results answers
Attempt Is there a way to append via pipe operator?
let answers = tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
|> Array.append results
B. Is there a way to add items via List constructor?
let tasks = System.Collections.Generic.List<string>()
tasks.add("a")
tasks.add("b")
C. Is there a way to construct a queue from array using Queue constructor?
let items: string[] = [|"a", "b", "c"|]
let jobs = System.Collections.Generic.Queue<string>()
items |> Array.map jobs.Enqueue |> ignore

A. you can't use Array.append on results, because results is a ConcurrentBag, but Array.append expects its argument to be an Array. To add stuff to ConcurrentBag, use its Add method. Add items one by one:
tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
|> Array.iter results.Add
Adding items one by one is a little inefficient. If your ConcurrentBag is really created right in the same function, as your example shows, you may consider using its constructor that takes an IEnumerable<T>:
let answers = tasks
|> Seq.map asyncRequest
|> Async.Parallel
|> Async.RunSynchronously
let results = System.Collections.Concurrent.ConcurrentBag<string>( answers )
B. yes, there is a way to add stuff to a System.Collections.Generic.List<T>. This class provides a handy Add method for this purpose:
tasks.Add "a"
tasks.Add "b"
Enclosing the argument in parentheses (as in your attempt) is not necessary, but allowed:
tasks.Add("a")
tasks.Add("b")
C. yes, there is a way to construct a queue from an array. The Queue class has a constructor that takes an IEnumerable<T>, and arrays implement IEnumerable<T>, so you can call that constructor on an array:
let jobs = System.Collections.Generic.Queue<string>( items )
Please note that you hardly needed my help to get any of the above information. Everything is freely available on MSDN (see links above) or from autocompletion/intellisense in your favorite code editor.

Related

F# efficiency implications of passing large data structures between functions

How does F# pass data from a caller function to a called function? Does it make a copy of the data before handing it over or does it just pass a pointer? I would think the latter but want to make sure.
On a related note, are there any performance implications of the following 2 F# code styles.
let someFunction e =
1//pretend this is a complicated function
let someOtherFunction e =
2//pretend this is a complicated function
let foo f largeList=
List.map (fun elem -> f elem)
let bar largeList =
largeList
|> foo someFunction
|> foo someOtherFunction
let bar2 largeList =
let foo2 f f2 =
largeList
|> List.map (fun elem -> f elem)
|> List.map (fun elem -> f2 elem)
foo2 someFunction someOtherFunction
Would you expect bar to have a different performance to bar2? If not, are there any situations I should be aware of that would make a difference?
The short answer:
No. The entire list is not copied, just the reference to it is.
The long answer:
In F# (just like in C#) both value and reference types can be passed either by value or by reference.
Both value types and reference types are, by default, passed by value.
In the case of value types (structs) this means that you'll be
passing around a copy of the entire data structure.
In the case of reference types (classes, discriminated unions, records, etc.) this means that the reference is passed by value. This does not mean that the entire data structure is copied, it just means that an int/int64 which references the data structure is copied.
If you're working with mutable data structures, e.g. ResizeArray<'T> (.NET List<'T>) which are classes, passing references by value could have implications. Perhaps the function you've passed it to adds elements to the list, for example? Such an update would apply to the data structure referenced from both locations. Since your question uses the immutable F# List though, you don't have to worry about this!
You can also pass value/reference types by reference, for more detail about that see: https://msdn.microsoft.com/en-us/library/dd233213.aspx#Anchor_4
F# list is implemented as a singly linked list, that means that access the head and prepend operations are O(1). These data structures are also very memory efficient because when you prepend an element to the list you only need to store the new value and a reference to the rest of the list.
So you can see how it works, such a data structure can be implemented like this:
type ExampleList<'T> =
|Empty
|Cons of 'T * List<'T>
Additional Information:
List.map is eagerly evaluated meaning that every time you call it, a new list will be created. If you use Seq.map (F# List implements the IEnumerable<'T> interface), which is lazily evaluated, you can evaluate both map operations in only enumeration of the list.
largeList
|> Seq.map (fun elem -> f elem)
|> Seq.map (fun elem -> f2 elem)
|> List.ofSeq
This is likely to be a lot more efficient for large lists because it involves allocating only one new list of results, rather than two.

Task.async in Elixir Stream

I want to do a parallel map over a big list. The code looks somewhat like this:
big_list
|> Stream.map(&Task.async(Module, :do_something, [&1]))
|> Stream.map(&Task.await(&1))
|> Enum.filter filter_fun
But I was checking Stream implementation and as far as I understand Stream.map combines the functions and applies combined function to elements in the stream, which would mean that the sequence is like this:
Take first element
Create async task
Wait for it to finish
Take second elelemnt...
In that case, it doesn't do it in parallel. Am I right or am I missing something?
If I am right, what about this code?
Stream.map Task.async ...
|> Enum.map Task.await ...
Is that going to run in parallel?
The second one also doesn't do what you want. You can see it clearly with this code:
defmodule Test do
def test do
[1,2,3]
|> Stream.map(&Task.async(Test, :job, [&1]))
|> Enum.map(&Task.await(&1))
end
def job(number) do
:timer.sleep 1000
IO.inspect(number)
end
end
Test.test
You'll see a number, then a 1 second wait, another number, and so on. The key here is that you want to create the tasks as soon as possible, so you shouldn't use the
lazy Stream.map at all. Instead use the eager Enum.map at that point:
|> Enum.map(&Task.async(Test, :job, [&1]))
|> Enum.map(&Task.await(&1))
On the other hand you can use Stream.map when awaiting, as long as you do some eager operation later, like your filter. That way the awaits will be interspersed with any processing you might be doing on the results.
Elixir 1.4 provides the new Task.async_stream/5 function that will return a stream that runs a given function concurrently on each item in an enumerable.
There are also options to specify the maximum number of workers and a timeout, using the :max_concurrency and :timeout options parameters.
Please note that you don't have to await this Task, because the function returns a stream, so you can either use Enum.to_list/1 or use Stream.run/1.
This will make your example run concurrently:
big_list
|> Task.async_stream(Module, :do_something, [])
|> Enum.filter(filter_fun)
You can try Parallel Stream.
stream = 1..10 |> ParallelStream.map(fn i -> i * 2 end)
stream |> Enum.into([])
[2,4,6,8,10,12,14,16,18,20]
UPD
Or better use Flow

Why am I getting an error using List.map in F#?

Why does the F# compiler complain "RequireQualifiedAccess ..." for the Open statement and give an error for the use of List.map in:
open Microsoft.FSharp.Collections.Map
type Gen =
static member Calc (data : int[]) = data.List.map (fun x -> x + 1)
First of all, your open statement has nothing to to with List.map, it would open the Map module, which you cannot open but have to access explicitly with Map., hence the error. The Map module contains functions similar to the ones in the List module, but works with maps (similar to dictionaries in C#).
The function List.map ist just called that: List.map. It is standalone and not a part of your data object, which, by the way, you have defined to be an array with (data : int[]).
So I think the code you meant to write is:
type Gen =
static member Calc (data : List<int>) = data |> List.map (fun x -> x + 1)
And also note that the compiler is smart enough to deduce that data is a list of ints, so you can remove the type annotation if you like.

composing many quotations into linq queries

I'm working on a project in which I'm trying to use F# and Linq for UDF's and stored procs in an SQL server.
Part of that has been to statically define all the valid queries, the sorting criteria, and a means of scoring the results of the queries.
I've so far been fairly successful, but I'm running into serious difficulty composing sortBy expressions.
Here's the basic concept
let sorter =
let exprMap:Map<string,Quotations.Expr<seq<Product> -> seq<Product>>> =
Map.ofList
["ProductName",<# Seq.sortBy (fun prod -> prod.Name) #> ]
// .. more entries ..
let sortBuilder sortkeys =
Array.foldBack
(fun criteria acc -> <# %(exprMap.[criteria]) >> (%acc) #>)
sortkeys
<# Seq.map id #>
This ends up being used later in the query executor like so
let execQuery = fun (predicates,sorts,scorer) ->
<# seq { for prod in (%dc).Products do
if (%predicates) prod then yield prod }
|> (%sorts)
|> (%scorer) #>
Using these basic outlines, everything works as long as I don't use (%sorts). Each time I pass that in, I get not recognized in F# to Linq translator. I've tried a number of different attempts at using combinators, but I have the sense I'm missing something. If I stub out the sorter function with the following
<# Seq.sortBy (fun prod -> prod.Name) |> Seq.sortBy (fun prod -> prod.Style) #>
It works as expected. However using a combinator like this:
let (|>*) = fun f g -> <# fun c -> ((%f) c) |> (%g) #>
does not..
Any ideas?
Unfortunately, I don't have any good answer to this question.
I'm afraid that the F# LINQ translator is currently very sensitive to the structure of the query. Using composition, you should be able to get the same quotation you get if you write it by hand, so you may need to generate exactly the same thing that worked if written by hand.
For example with your sorter, you may need something like (I didn't try it, but I think this should produce exactly the same quotation as the usual code that works):
let (|>*) f g = fun c -> <# (%c) |> (%f) |> (%g) #>
<# seq { for prod in (%dc).Products do
if (%predicates) prod then yield prod } #> |>
( <# Seq.sortBy (fun prod -> prod.Name) #> |>*
<# Seq.sortBy (fun prod -> prod.Style) #> )
The problem is that if you include lambda functions in the quotation, the F# translator needs to deal with them - probably by partially evaluating them (because otherwise the LINQ to SQL translator would fail). There are quite a few tricky cases in this...
However, the F# team has been doing some improvements in this area recently. I think the best thing to do would be to find a simple repro case and send it to fsbugs at microsoft dot com. PowerPack releases are not that "sensitive" so you may be able to get the source code with the recent changes if you ask and offer help with testing (but no promises).

Wait for any event of multiple events simultaneously in F#

In F# I know how to wait asynchronously for one event using Async.AwaitEvent:
let test = async {
let! move = Async.AwaitEvent(form.MouseMove)
...handle move... }
Suppose I want to wait for either the MouseMove or the KeyDown event. I'd like to have something like this:
let! moveOrKeyDown = Async.AwaitEvent(form.MouseMove, form.KeyDown)
This function doesn't exist but is there another way to do this?
let ignoreEvent e = Event.map ignore e
let merged = Event.merge (ignoreEvent f.KeyDown) (ignoreEvent f.MouseMove)
Async.AwaitEvent merged
EDIT: another version that preserves original types
let merged = Event.merge (f.KeyDown |> Event.map Choice1Of2) (f.MouseMove |> Event.map Choice2Of2)
Async.AwaitEvent merged
EDIT 2: according to comments of Tomas Petricek
let e1 = f.KeyDown |> Observable.map Choice1Of2
let e2 = f.MouseMove |> Observable.map Choice2Of2
let! evt = Observable.merge e1 e2 |> Async.AwaitObservable
AwaitObservable primitive can be taken from here ('Reactive demos in Silverlight' by Tomas Petricek).
I used an implementation of a method that you use in your sample in the talk about reactive programming that I had in London (there is a download link at the bottom of the page). If you're interested in this topic, you may find the talk useful as well :-).
The version I'm using takes IObservable instead of IEvent (so the name of the method is AwaitObservable). There are some serious memory leaks when using Event.merge (and other combinators from the Event module) together with AwaitEvent, so you should use Observable.merge etc. and AwaitObservable instead.
The problem is described in more detail here (see Section 3 for a clear example). Briefly - when you use Event.merge, it attaches a handler to the source event (e.g. MouseDown), but it does not remove the handler after you finish waiting using AwaitEvent, so the event is never removed - if you keep waiting in a loop coded using asynchronous workflow, you keep adding new handlers (that do not do anything when run).
A simple correct solution (based on what desco posted) would look like this:
let rec loop () = async {
let e1 = f.KeyDown |> Observable.map Choice1Of2
let e2 = f.MouseMove |> Observable.map Choice2Of2
let! evt = Observable.merge e1 e2 |> Async.AwaitObservable
// ...
return! loop() } // Continue looping
BTW: You may also want to look at this article (based on chapter 16 from my book).
In the interest of understanding what's going on I looked up the source code to Event.map, Event.merge and Choice.
type Choice<'T1,'T2> =
| Choice1Of2 of 'T1
| Choice2Of2 of 'T2
[<CompiledName("Map")>]
let map f (w: IEvent<'Delegate,'T>) =
let ev = new Event<_>()
w.Add(fun x -> ev.Trigger(f x));
ev.Publish
[<CompiledName("Merge")>]
let merge (w1: IEvent<'Del1,'T>) (w2: IEvent<'Del2,'T>) =
let ev = new Event<_>()
w1.Add(fun x -> ev.Trigger(x));
w2.Add(fun x -> ev.Trigger(x));
ev.Publish
This means our solution is creating 3 new events.
async {
let merged = Event.merge
(f.KeyDown |> Event.map Choice1Of2)
(f.MouseMove |> Event.map Choice2Of2)
let! move = Async.AwaitEvent merged
}
We could reduce this to one event by making a tightly coupled version of this library code.
type EventChoice<'T1, 'T2> =
| EventChoice1Of2 of 'T1
| EventChoice2Of2 of 'T2
with
static member CreateChoice (w1: IEvent<_,'T1>) (w2: IEvent<_,'T2>) =
let ev = new Event<_>()
w1.Add(fun x -> ev.Trigger(EventChoice1Of2 x))
w2.Add(fun x -> ev.Trigger(EventChoice2Of2 x))
ev.Publish
And here is our new code.
async {
let merged = EventChoice.CreateChoice form.MouseMove form.KeyDown
let! move = Async.AwaitEvent merged
}
You can use a combination of Event.map and Event.merge:
let eventOccurs e = e |> Event.map ignore
let mouseOrKey = Event.merge (eventOccurs frm.MouseMove) (eventOccurs frm.KeyDown)
Then you can use Async.AwaitEvent with this new event. If MouseMove and KeyDown had the same type, you could skip the Event.map step and just directly merge them.
EDIT
But at Tomas points out, you should use the Observable combinators in preference to the Event ones.

Resources