F# how to do List.map in parallel - parallel-processing

What is the most lightweight, terse way to run the following code in parallel within the standard F# libs? Or failing that any widely used additional libs?
let newlist = oldlist |> List.map myComplexFunction
The best I could find was
let newlist = oldlist |> List.map (fun x -> async { return myComplexFunction x }
|> Async.Parallel
|> Async.RunSynchronously
|> Array.toList
I don't like this because it's 4 lines long and constructs an array that I then have to make back into a list. If I were working with Arrays it would be simple, Array.parallel, but I want to keep that lovely Immutable list functional purity. I just can't believe there is no list alternative, but so far have been unable to find one.
Any good suggestions?

Use the PSeq module:
open Microsoft.FSharp.Collections
let newlist =
oldlist
|> PSeq.map myComplexFunction
|> PSeq.toList

It's now worth taking a look at Array.Parallel in F# Core.
let newlist =
oldlist
|> Array.ofList
|> Array.Parallel.map (fun x -> myComplexFunction x)
|> Array.toList

Related

Elegant Array.multipick(?) implementation

I'd like to implement something akin to imaginary Array.multipick:
Array.multipick : choosers:('a -> bool) [] -> array:'a [] -> 'a []
Internally, we test each array's element with all choosers, the first chooser to return true is removed from choosers array, and we add that chooser's argument to the result. After that, we continue interation while choosers array has elements left.
The last part is important, because without early exit requirement this could be solved with just Array.fold.
This could be easily implemented with something like:
let rec impl currentIndex currentChoosers results
But it's too procedural for my taste. Maybe there's more elegant solution?
It's quite difficult to write elegant code using arrays of changing size. Here is some code that works on lists instead and does not mutate any values.
let rec pick accum elem tried = function
| [] -> (accum, List.rev tried)
| chooser :: rest ->
if chooser elem then (elem :: accum, List.rev_append tried rest)
else pick accum elem (chooser :: tried) rest
let rec multipick_l accum choosers list =
match choosers, list with
| [], _
| _, [] -> List.rev accum
| _, elem :: elems ->
let (accum', choosers') = pick accum elem [] choosers in
multipick_l accum' choosers' elems
let multipick choosers array =
Array.of_list
(multipick_l [] (Array.to_list choosers) (Array.to_list array))
If you think that Array.fold_left is usable except for the early exit requirement, you can use an exception to exit early.
A fold with an early exit is a good idea, however a production-worthy one specifically targeting arrays would need to be written in a fairly imperative manner. For simplicity, I'll grab the more general sequence one from this answer.
let multipick (choosers: ('a -> bool) array) (arr: 'a array) : 'a array =
let indexed =
choosers
|> Seq.indexed
|> Map.ofSeq
((indexed, []), arr)
||> foldWhile (fun (cs, res) e ->
if Map.isEmpty cs then
None
else
match cs |> Seq.tryFind (fun kvp -> kvp.Value e) with
| Some kvp -> Some (Map.remove kvp.Key cs, e :: res)
| None -> Some (cs, res))
|> snd
|> List.rev
|> Array.ofList
I'm using a Map keyed by array index to keep track of remaining functions - this allows for easy removal of elements, but still retains their order (since map key-value pairs are ordered by keys when iterating).
F# Set wouldn't work with functions due to comparison constraint. System.Collections.Generic.HashSet would work, but it's mutable, and I'm not sure if it would retain ordering.

Equivalent of Ruby's Enumerable#each_slice method in FSharp

I'm currently writing a bit of F#. I've created a method that is the equivalent of Ruby's Enumerable#each_slice method and was wondering if somebody has a better (i.e. more elegant, more concise, more readable) solution.
Here it is:
let rec slicesBySize size list =
match list with
| [] -> [] // case needed for type inference
| list when list.Length < size -> [list]
| _ ->
let first = list |> Seq.take size |> List.ofSeq
let rest = list |> Seq.skip size |> List.ofSeq
[first] # slicesBySize size rest
Thanks for any and all feedback/help.
You're looking for List.chunkBySize, which was added in F# 4.0. There are also Seq and Array variants.

Sort list High-to-Low in F#

List.Sort
sorts a list from low to high - How does one sort from high to low? Is there some kind of library function for this?
For a list of numbers:
list
|> List.sortBy (fun x -> -x)
The function (fun x -> -x) negates the number, therefore reversing the order.
For comparables in general, use List.sortWith with compare. Observe the ordering of a b in compare:
> List.sortWith (fun a b -> compare a b) ["a";"s";"d";"f"];;
val it : string list = ["a"; "d"; "f"; "s"]
> List.sortWith (fun a b -> compare b a) ["a";"s";"d";"f"];;
val it : string list = ["s"; "f"; "d"; "a"]
If you looked at the linked thread F# Seq.sortBy in descending order, there is a chance of overflow when you use List.sortBy (fun x -> -x). To be correct, it should be:
List.sortBy (fun x -> -x-1)
In F# 4.0 (that comes with Visual Studio 2015 Preview), there are sortDescending/sortByDescending functions for this exact purpose.
You can use
list
|> List.sortDescending
or
list
|> List.sortByDescending id
See the comprehensive list of new core library functions at https://github.com/fsharp/FSharpLangDesign/blob/master/FSharp-4.0/ListSeqArrayAdditions.md.
You can use List.sortBy to sort by a custom function, and use the unary minus operator ~- as such function in a compact notation:
let list = [1..10]
list |> List.sortBy (~-)

Find duplicates in an unsorted sequence efficiently

I need a very efficient way to find duplicates in an unsorted sequence. This is what I came up with, but it has a few shortcomings, namely it
unnecessarily counts occurrences beyond 2
consumes the entire sequence before yielding duplicates
creates several intermediate sequences
module Seq =
let duplicates items =
items
|> Seq.countBy id
|> Seq.filter (snd >> ((<) 1))
|> Seq.map fst
Regardless of the shortcomings, I don't see a reason to replace this with twice the code. Is it possible to improve this with comparably concise code?
A more elegant functional solution:
let duplicates xs =
Seq.scan (fun xs x -> Set.add x xs) Set.empty xs
|> Seq.zip xs
|> Seq.choose (fun (x, xs) -> if Set.contains x xs then Some x else None)
Uses scan to accumulate sets of all elements seen so far. Then uses zip to combine each element with the set of elements before it. Finally, uses choose to filter out the elements that are in the set of previously-seen elements, i.e. the duplicates.
EDIT
Actually my original answer was completely wrong. Firstly, you don't want duplicates in your outputs. Secondly, you want performance.
Here is a purely functional solution that implements the algorithm you're after:
let duplicates xs =
(Map.empty, xs)
||> Seq.scan (fun xs x ->
match Map.tryFind x xs with
| None -> Map.add x false xs
| Some false -> Map.add x true xs
| Some true -> xs)
|> Seq.zip xs
|> Seq.choose (fun (x, xs) ->
match Map.tryFind x xs with
| Some false -> Some x
| None | Some true -> None)
This uses a map to track whether each element has been seen before once or many times and then emits the element if it is seen having only been seen once before, i.e. the first time it is duplicated.
Here is a faster imperative version:
let duplicates (xs: _ seq) =
seq { let d = System.Collections.Generic.Dictionary(HashIdentity.Structural)
let e = xs.GetEnumerator()
while e.MoveNext() do
let x = e.Current
let mutable seen = false
if d.TryGetValue(x, &seen) then
if not seen then
d.[x] <- true
yield x
else
d.[x] <- false }
This is around 2× faster than any of your other answers (at the time of writing).
Using a for x in xs do loop to enumerate the elements in a sequence is substantially slower than using GetEnumerator directly but generating your own Enumerator is not significantly faster than using a computation expression with yield.
Note that the TryGetValue member of Dictionary allows me to avoid allocation in the inner loop by mutating a stack allocated value whereas the TryGetValue extension member offered by F# (and used by kvb in his/her answer) allocates its return tuple.
Here's an imperative solution (which is admittedly slightly longer):
let duplicates items =
seq {
let d = System.Collections.Generic.Dictionary()
for i in items do
match d.TryGetValue(i) with
| false,_ -> d.[i] <- false // first observance
| true,false -> d.[i] <- true; yield i // second observance
| true,true -> () // already seen at least twice
}
This is the best "functional" solution I could come up with that doesn't consume the entire sequence up front.
let duplicates =
Seq.scan (fun (out, yielded:Set<_>, seen:Set<_>) item ->
if yielded.Contains item then
(None, yielded, seen)
else
if seen.Contains item then
(Some(item), yielded.Add item, seen.Remove item)
else
(None, yielded, seen.Add item)
) (None, Set.empty, Set.empty)
>> Seq.Choose (fun (x,_,_) -> x)
Assuming your sequence is finite, this solution requires one run on the sequence:
open System.Collections.Generic
let duplicates items =
let dict = Dictionary()
items |> Seq.fold (fun acc item ->
match dict.TryGetValue item with
| true, 2 -> acc
| true, 1 -> dict.[item] <- 2; item::acc
| _ -> dict.[item] <- 1; acc) []
|> List.rev
You can provide length of the sequence as the capacity of Dictionary, but it requires to enumerate the whole sequence once more.
EDIT:
To resolve 2nd problem, one could generate duplicates on demand:
open System.Collections.Generic
let duplicates items =
seq {
let dict = Dictionary()
for item in items do
match dict.TryGetValue item with
| true, 2 -> ()
| true, 1 -> dict.[item] <- 2; yield item
| _ -> dict.[item] <- 1
}
Functional solution:
let duplicates items =
let test (unique, result) v =
if not(unique |> Set.contains v) then (unique |> Set.add v ,result)
elif not(result |> Set.contains v) then (unique,result |> Set.add v)
else (unique, result)
items |> Seq.fold test (Set.empty, Set.empty) |> snd |> Set.toSeq

How to Create the Power Set (Combinations) of the Infinite Set in F# using Sequences?

Here is my failed attempt at the problem any help would be appreciated.
I tried to come up with the best algo for the power set that worked on eager lists. This part seems to be working fine. The part I'm having trouble with is translating it to work with Sequences so it can run it on streaming\infinite lists. I really don't like the yield syntax maybe because I don't understand it well but I would rather have it without using the yield syntax as well.
//All Combinations of items in a list
//i.e. the Powerset given each item is unique
//Note: lists are eager so can't be used for infinite
let listCombinations xs =
List.fold (fun acc x ->
List.collect (fun ys -> ys::[x::ys]) acc) [[]] xs
//This works fine (Still interested if it could be faster)
listCombinations [1;2;3;4;5] |> Seq.iter (fun x -> printfn "%A" x)
//All Combinations of items in a sequence
//i.e. the Powerset given each item is unique
//Note: Not working
let seqCombinations xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
seq { yield ys
yield seq { yield x
yield! ys} }) acc) Seq.empty xs
//All Combinations of items in a sequence
//i.e. the Powerset given each item is unique
//Note: Not working (even wrong type signature)
let seqCombinations2 xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
Seq.append ys (Seq.append x ys)) acc) Seq.empty xs
//Sequences to test on
let infiniteSequence = Seq.initInfinite (fun i -> i + 1)
let finiteSequence = Seq.take 5 infiniteSequence
//This should work easy since its in a finite sequence
//But it does not, so their must be a bug in 'seqCombinations' above
for xs in seqCombinations finiteSequence do
for y in xs do
printfn "%A" y
//This one is much more difficult to get to work
//since its the powerset on the infinate sequence
//None the less If someone could help me find a way to make this work
//This is my ultimate goal
let firstFew = Seq.take 20 (seqCombinations infiniteSequence)
for xs in firstFew do
for y in xs do
printfn "%A" y
Your seqCombinations is almost correct, but you didn't translate it from lists to sequences properly. The equivalent of [[]] is not Seq.empty, but Seq.singleton Seq.empty:
let seqCombinations xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
seq { yield ys
yield seq { yield x
yield! ys} }) acc) (Seq.singleton Seq.empty) xs
The code above works for finite sequences. But for infinite ones, it doesn't work, because it first tries to reach the end, which it obviously never does for infinite sequences.
If you want a function that will work with infinite sequences I managed to figure out two ways, but neither of them is particularly nice. One of them uses mutable state:
let seqCombinations xs =
let combs = ref [[]]
seq {
yield! !combs
for x in xs do
let added = List.map (fun ys -> x::ys) !combs
yield! added
combs := !combs # added
}
The other is too much about dealing with details of seq<T>:
open System.Collections.Generic
let seqCombinations (xs : seq<_>) =
let rec combs acc (e : IEnumerator<_>) =
seq {
if (e.MoveNext()) then
let added = List.map (fun ys -> (e.Current)::ys) acc
yield! added
yield! combs (acc # added) e }
use enumerator = xs.GetEnumerator()
seq {
yield []
yield! combs [[]] enumerator
}
I think this would be much easier if you could treat infinite sequences as head and tail, like finite lists in F# or any sequence in Haskell. But it's certainly possible there is a nice way to express this in F#, and I just didn't find it.
I've asked a similar question recently at Generate powerset lazily and got some nice answers.
For powerset of finite sets, the answer by #Daniel in the above link is an efficient solution and probably suits your purpose. You can come up with a test case to compare between his approach and yours.
Regarding powerset of infinite sets, here is a bit of maths. According to Cantor's theorem, the power set of a countably infinite set is uncountably infinite. It means there is no way to enumerate powerset of all integers (which is countably infinite) even in a lazy way. The intuition is the same for real numbers; since real number is uncountably infinite, we can't actually model them using infinite sequences.
Therefore, there is no algorithm to enumerate powerset of a countably infinite set. Or that kind of algorithm just doesn't make sense.
This is sort of a joke, but will actually generate the correct result for an infinite sequence (it's just that it can't be proven--empirically, not mathematically).
let powerset s =
seq {
yield Seq.empty
for x in s -> seq [x]
}

Resources