Linq like function GroupMultipleValBy

Linq like function GroupMultipleValBy - linq

In the process of learning the awesomness that is F# + Linq I came to a problem I can not solve(nicely) using functional, OOP, nor Linq like syntax, would anyone be willing to help?
Lets say my input is the following sequence:
let db = seq [ ("bob", 90, ['x';'y'])
("bob", 70, ['z'])
("frank", 20, ['b'])
("charlie", 10, ['c']) ]
Rows could read for example "Student bob has enrolled in x,y in semester 90"
What I need is this instead:
[ ("bob", [90; 70], ['x'; 'y'; 'z'])
("frank", [20], ['b'])
("charlie", [10], ['c']) ]
This would read instead "Bob has finished semesters 90,70 and taken x,y,z".
Linq/Relational approach usualy gives the most readable solutions to such problems. But the best I can come up with is:
type Student = string
type Semester = int
type Class = char
let restructure (inp:seq<Student * Semester * Class list>) = query {
for (student, semester, classes) in inp do
groupValBy (semester,classes) student into data
yield (data.Key, Seq.map fst data, Seq.collect snd data)
}
Which is neither readable, nor fast, nor pretty, nor idiomatic, and due to intricacies of F# requires that I write the input type signature...
Is there a better way some GroupMultipleValBy function?
Thank you very much!

If you can stick to "classic" F# code you can rewrite it in a more readable way (especially by using locals to make the code even more readable)
Well it seems Tomas beat me to this, we've gone roughly the same road with some "quirks" in the middle
let restructure inp =
// could have been defined at a more global scope as helpers
let fst3 (x, _, _) = x
let flip f y x = f x y
let folder cont (student, semester, classes) (_, semesters, allClasses) =
cont (student, semester :: semesters, classes # allClasses)
let initialState = "", [], []
inp
|> Seq.groupBy fst3
|> Seq.map (snd >> flip (Seq.fold folder id) initialState)
The hard part to understand is the folder one, to keep semesters in the wanted order we either have to add an extra step reversing that part or as done here using a continuation
That (and the use of flip to keep it point free) added with the use of # makes me think Tomas code is "better" (but his answer makes semesters and classes seq instead of list)
Addendum
Here's Tomas code written in a way I find more readable (but that's a matter of taste) and maybe more agnostic about what's being manipulated although it's longer
[that doesn't take anything to it's answer which is great]
let restructure inp =
// could have been defined at a more global scope as helpers
let fst3 (x, _, _) = x
let snd3 (_, y, _) = y
let trd3 (_, _, z) = z
let mapping (key, values) =
key,
// replace with commented part to have lists instead of seqs
values |> Seq.map snd3, //[ for value in values -> snd3 value ],
values |> Seq.collect trd3 //[ for value in values do yield! trd3 value ]
inp
|> Seq.groupBy fst3
|> Seq.map mapping

If you do not insist on using the query syntax (which is needed if you are working with databases, but is just one of the options when working with in-memory data), then I would probably use simple Seq.groupBy function:
db
|> Seq.groupBy (fun (name, _, _) -> name)
|> Seq.map (fun (name, group) ->
name,
group |> Seq.map (fun (_, sem, _) -> sem),
group |> Seq.collect (fun (_, _, courses) -> courses) )
Here, we are saying that we want to group records by student name and return a triple with:
The student name, which was used as the grouping key
Get semester of all the records
Collect all the courses they attended
This is not shorter than your version, but I think a combination of groupBy and map is a fairly common pattern that is quite easy to understand. That said, I'm quite curious to see other answers! I can imagine there would be a nicer way of doing this...

Related

Elegant Array.multipick(?) implementation

I'd like to implement something akin to imaginary Array.multipick:
Array.multipick : choosers:('a -> bool) [] -> array:'a [] -> 'a []
Internally, we test each array's element with all choosers, the first chooser to return true is removed from choosers array, and we add that chooser's argument to the result. After that, we continue interation while choosers array has elements left.
The last part is important, because without early exit requirement this could be solved with just Array.fold.
This could be easily implemented with something like:
let rec impl currentIndex currentChoosers results
But it's too procedural for my taste. Maybe there's more elegant solution?

It's quite difficult to write elegant code using arrays of changing size. Here is some code that works on lists instead and does not mutate any values.
let rec pick accum elem tried = function
| [] -> (accum, List.rev tried)
| chooser :: rest ->
if chooser elem then (elem :: accum, List.rev_append tried rest)
else pick accum elem (chooser :: tried) rest
let rec multipick_l accum choosers list =
match choosers, list with
| [], _
| _, [] -> List.rev accum
| _, elem :: elems ->
let (accum', choosers') = pick accum elem [] choosers in
multipick_l accum' choosers' elems
let multipick choosers array =
Array.of_list
(multipick_l [] (Array.to_list choosers) (Array.to_list array))
If you think that Array.fold_left is usable except for the early exit requirement, you can use an exception to exit early.

A fold with an early exit is a good idea, however a production-worthy one specifically targeting arrays would need to be written in a fairly imperative manner. For simplicity, I'll grab the more general sequence one from this answer.
let multipick (choosers: ('a -> bool) array) (arr: 'a array) : 'a array =
let indexed =
choosers
|> Seq.indexed
|> Map.ofSeq
((indexed, []), arr)
||> foldWhile (fun (cs, res) e ->
if Map.isEmpty cs then
None
else
match cs |> Seq.tryFind (fun kvp -> kvp.Value e) with
| Some kvp -> Some (Map.remove kvp.Key cs, e :: res)
| None -> Some (cs, res))
|> snd
|> List.rev
|> Array.ofList
I'm using a Map keyed by array index to keep track of remaining functions - this allows for easy removal of elements, but still retains their order (since map key-value pairs are ordered by keys when iterating).
F# Set wouldn't work with functions due to comparison constraint. System.Collections.Generic.HashSet would work, but it's mutable, and I'm not sure if it would retain ordering.

F# List optimisation

From an unordered list of int, I want to have the smallest difference between two elements. I have a code that is working but way to slow. Can anyone sugest some change to improve the performance? Please explain why you did the change and what will be the performance gain.
let allInt = [ 5; 8; 9 ]
let sortedList = allInt |> List.sort;
let differenceList = [ for a in 0 .. N-2 do yield sortedList.Item a - sortedList.Item a + 1 ]
printfn "%i" (List.min differenceList) // print 1 (because 9-8 smallest difference)
I think I'm doing to much list creation or iteration but I don't know how to write it differently in F#...yet.
Edit: I'm testing this code on list with 100 000 items or more.
Edit 2: I believe that if I can calculte the difference and have the min in one go it should improve the perf a lot, but I don't know how to do that, anay idea?
Thanks in advance

The List.Item performs in O(n) time and is probably the main performance bottle neck in your code. The evaluation of differenceList iterates the elements of sortedList by index, which means the performance is around O((N-2)(2(N-2))), which simplifies to O(N^2), where N is the number of elements in sortedList. For long lists, this will eventually perform badly.
What I would do is to eliminate calls to Item and instead use the List.pairwise operation
let data =
[ let rnd = System.Random()
for i in 1..100000 do yield rnd.Next() ]
#time
let result =
data
|> List.sort
|> List.pairwise // convert list from [a;b;c;...] to [(a,b); (b,c); ...]
|> List.map (fun (a,b) -> a - b |> abs) // Calculates the absolute difference
|> List.min
#time
The #time directives lets me measure execution time in F# Interactive and the output I get when running this code is:
--> Timing now on
Real: 00:00:00.029, CPU: 00:00:00.031, GC gen0: 1, gen1: 1, gen2: 0
val result : int = 0
--> Timing now off

F#'s built-in list type is implemented as a linked list, which means accessing elements by index has to enumerate the list all the way to the index each time. In your case you have two index accesses repeated N-2 times, getting slower and slower with each iteration, as the index grows and each access needs to go through longer part of the list.
First way out of this would be using an array instead of a list, which is a trivial change, but grants you faster index access.
(*
[| and |] let you define an array literal,
alternatively use List.toArray allInt
*)
let allInt = [| 5; 8; 9 |]
let sortedArray = allInt |> Array.sort;
let differenceList = [ for a in 0 .. N-2 do yield sortedArray.[a] - sortedArray.[a + 1] ]
Another approach might be pairing up the neighbours in the list, subtracting them and then finding a min.
let differenceList =
sortedList
|> List.pairwise
|> List.map (fun (x,y) -> x - y)
List.pairwise takes a list of elements and returns a list of the neighbouring pairs. E.g. in your example List.pairwise [ 5; 8; 9 ] = [ (5, 8); (8, 9) ], so that you can easily work with the pairs in the next step, the subtraction mapping.
This way is better, but these functions from List module take a list as input and produce a new list as the output, having to pass through the list 3 times (1 for pairwise, 1 for map, 1 for min at the end). To solve this, you can use functions from the Seq module, which work with .NETs IEnumerable<'a> interface allowing lazy evaluation resulting usually in fewer passes.
Fortunately in this case Seq defines alternatives for all the functions we use here, so the next step is trivial:
let differenceSeq =
sortedList
|> Seq.pairwise
|> Seq.map (fun (x,y) -> x - y)
let minDiff = Seq.min differenceSeq
This should need only one enumeration of the list (excluding the sorting phase of course).
But I cannot guarantee you which approach will be fastest. My bet would be on simply using an array instead of the list, but to find out, you will have to try it out and measure for yourself, on your data and your hardware. BehchmarkDotNet library can help you with that.

The rest of your question is adequately covered by the other answers, so I won't duplicate them. But nobody has yet addressed the question you asked in your Edit 2. To answer that question, if you're doing a calculation and then want the minimum result of that calculation, you want List.minBy. One clue that you want List.minBy is when you find yourself doing a map followed by a min operation (as both the other answers are doing): that's a classic sign that you want minBy, which does that in one operation instead of two.
There's one gotcha to watch out for when using List.minBy: It returns the original value, not the result of the calculation. I.e., if you do ints |> List.pairwise |> List.minBy (fun (a,b) -> abs (a - b)), then what List.minBy is going to return is a pair of items, not the difference. It's written that way because if it gives you the original value but you really wanted the result, you can always recalculate the result; but if it gave you the result and you really wanted the original value, you might not be able to get it. (Was that difference of 1 the difference between 8 and 9, or between 4 and 5?)
So in your case, you could do:
let allInt = [5; 8; 9]
let minPair =
allInt
|> List.pairwise
|> List.minBy (fun (x,y) -> abs (x - y))
let a, b = minPair
let minDifference = abs (a - b)
printfn "The difference between %d and %d was %d" a b minDifference
The List.minBy operation also exists on sequences, so if your list is large enough that you want to avoid creating an intermediate list of pairs, then use Seq.pairwise and Seq.minBy instead:
let allInt = [5; 8; 9]
let minPair =
allInt
|> Seq.pairwise
|> Seq.minBy (fun (x,y) -> abs (x - y))
let a, b = minPair
let minDifference = abs (a - b)
printfn "The difference between %d and %d was %d" a b minDifference
EDIT: Yes, I see that you've got a list of 100,000 items. So you definitely want the Seq version of this. The F# seq type is just IEnumerable, so if you're used to C#, think of the Seq functions as LINQ expressions and you'll have the right idea.
P.S. One thing to note here: see how I'm doing let a, b = minPair? That's called destructuring assignment, and it's really useful. I could also have done this:
let a, b =
allInt
|> Seq.pairwise
|> Seq.minBy (fun (x,y) -> abs (x - y))
and it would have given me the same result. Seq.minBy returns a tuple of two integers, and the let a, b = (tuple of two integers) expression takes that tuple, matches it against the pattern a, b, and thus assigns a to have the value of that tuple's first item, and b to have the value of that tuple's second item. Notice how I used the phrase "matches it against the pattern": this is the exact same thing as when you use a match expression. Explaining match expressions would make this answer too long, so I'll just point you to an excellent reference on them if you haven't already read it:
https://fsharpforfunandprofit.com/posts/match-expression/

Here is my solution:
let minPair xs =
let foo (x, y) = abs (x - y)
xs
|> List.allPairs xs
|> List.filter (fun (x, y) -> x <> y)
|> List.minBy foo
|> foo

How can I select a random value from a list using F#

I'm new to F# and I'm trying to figure out how to return a random string value from a list/array of strings.
I have a list like this:
["win8FF40", "win10Chrome45", "win7IE11"]
How can I randomly select and return one item from the list above?
Here is my first try:
let combos = ["win8FF40";"win10Chrome45";"win7IE11"]
let getrandomitem () =
let rnd = System.Random()
fun (combos : string[]) -> combos.[rnd.Next(combos.Length)]

Both the answers given here by latkin and mydogisbox are good, but I still want to add a third approach that I sometimes use. This approach isn't faster, but it's more flexible and more composable, and fast enough for small sequences. Depending on your needs, you can use one of higher performance options given here, or you can use the following.
Single-argument function using Random
Instead of directly enabling you to select a single element, I often define a shuffleR function like this:
open System
let shuffleR (r : Random) xs = xs |> Seq.sortBy (fun _ -> r.Next())
This function has the type System.Random -> seq<'a> -> seq<'a>, so it works with any sort of sequence: lists, arrays, collections, and lazily evaluated sequences (although not with infinite sequences).
If you want a single random element from a list, you can still do that:
> [1..100] |> shuffleR (Random ()) |> Seq.head;;
val it : int = 85
but you can also take, say, three randomly picked elements:
> [1..100] |> shuffleR (Random ()) |> Seq.take 3;;
val it : seq<int> = seq [95; 92; 12]
No-argument function
Sometimes, I don't care about having to pass in that Random value, so I instead define this alternative version:
let shuffleG xs = xs |> Seq.sortBy (fun _ -> Guid.NewGuid())
It works in the same way:
> [1..100] |> shuffleG |> Seq.head;;
val it : int = 11
> [1..100] |> shuffleG |> Seq.take 3;;
val it : seq<int> = seq [69; 61; 42]
Although the purpose of Guid.NewGuid() isn't to provide random numbers, it's often random enough for my purposes - random, in the sense of being unpredictable.
Generalised function
Neither shuffleR nor shuffleG are truly random. Due to the ways Random and Guid.NewGuid() work, both functions may result in slightly skewed distributions. If this is a concern, you can define an even more general-purpose shuffle function:
let shuffle next xs = xs |> Seq.sortBy (fun _ -> next())
This function has the type (unit -> 'a) -> seq<'b> -> seq<'b> when 'a : comparison. It can still be used with Random:
> let r = Random();;
val r : Random
> [1..100] |> shuffle (fun _ -> r.Next()) |> Seq.take 3;;
val it : seq<int> = seq [68; 99; 54]
> [1..100] |> shuffle (fun _ -> r.Next()) |> Seq.take 3;;
val it : seq<int> = seq [99; 63; 11]
but you can also use it with some of the cryptographically secure random number generators provided by the Base Class Library:
open System.Security.Cryptography
open System.Collections.Generic
let rng = new RNGCryptoServiceProvider ()
let bytes = Array.zeroCreate<byte> 100
rng.GetBytes bytes
let q = bytes |> Queue
FSI:
> [1..100] |> shuffle (fun _ -> q.Dequeue()) |> Seq.take 3;;
val it : seq<int> = seq [74; 82; 61]
Unfortunately, as you can see from this code, it's quite cumbersome and brittle. You have to know the length of the sequence up front; RNGCryptoServiceProvider implements IDisposable, so you should make sure to dispose of rng after use; and items will be removed from q after use, which means it's not reusable.
Cryptographically random sort or selection
Instead, if you really need a cryptographically correct sort or selection, it'd be easier to do it like this:
let shuffleCrypto xs =
let a = xs |> Seq.toArray
use rng = new RNGCryptoServiceProvider ()
let bytes = Array.zeroCreate a.Length
rng.GetBytes bytes
Array.zip bytes a |> Array.sortBy fst |> Array.map snd
Usage:
> [1..100] |> shuffleCrypto |> Array.head;;
val it : int = 37
> [1..100] |> shuffleCrypto |> Array.take 3;;
val it : int [] = [|35; 67; 36|]
This isn't something I've ever had to do, though, but I thought I'd include it here for the sake of completeness. While I haven't measured it, it's most likely not the fastest implementation, but it should be cryptographically random.

Your problem is that you are mixing Arrays and F# Lists (*type*[] is a type notation for Array). You could modify it like this to use lists:
let getrandomitem () =
let rnd = System.Random()
fun (combos : string list) -> List.nth combos (rnd.Next(combos.Length))
That being said, indexing into a List is usually a bad idea since it has O(n) performance since an F# list is basically a linked-list. You would be better off making combos into an array if possible like this:
let combos = [|"win8FF40";"win10Chrome45";"win7IE11"|]

I wrote a blog post on exactly this topic a while ago: http://latkin.org/blog/2013/11/16/selecting-a-random-element-from-a-linked-list-3-approaches-in-f/
3 approaches are given there, with discussion of performance and tradeoffs of each.
To summarize:
// pro: simple, fast in practice
// con: 2-pass (once to get length, once to select nth element)
let method1 lst (rng : Random) =
List.nth lst (rng.Next(List.length lst))
// pro: ~1 pass, list length is not bound by int32
// con: more complex, slower in practice
let method2 lst (rng : Random) =
let rec step remaining picks top =
match (remaining, picks) with
| ([], []) -> failwith "Don't pass empty list"
// if only 1 element is picked, this is the result
| ([], [p]) -> p
// if multiple elements are picked, select randomly from them
| ([], ps) -> step ps [] -1
| (h :: t, ps) ->
match rng.Next() with
// if RNG makes new top number, picks list is reset
| n when n > top -> step t [h] n
// if RNG ties top number, add current element to picks list
| n when n = top -> step t (h::ps) top
// otherwise ignore and move to next element
| _ -> step t ps top
step lst [] -1
// pro: exactly 1 pass
// con: more complex, slowest in practice due to tuple allocations
let method3 lst (rng : Random) =
snd <| List.fold (fun (i, pick) elem ->
if rng.Next(i) = 0 then (i + 1, elem)
else (i + 1, pick)
) (0, List.head lst) lst
Edit: I should clarify that above shows a few ways to get a random element from a list, assuming you must use a list. If it fits with the rest of your program's design, it is definitely more efficient to take a random element from an array.

Merging lists of 'spatial' tuples

I have three lists of tuples, each tuple contains (startpos, endpos, value).
What I want to do is merge these into one list of tuples (startpos, endpos, values[]), but following a rule which I find it easier to draw than to write:
//third [---------] [------------]
//second [-------------] [---------------------------]
//first [-----------------------------] [--------------]
//(pos) 0123456789|123456789|123456789|123456789|123456789|123456789|123456789
//result [--1-][--2-][---3---][---1----] [---2--][---3--]
(The numbers in result represent the expected length of the values[] list for each resulting element)
Basically, I only keep a 'higher' element where it overlaps a 'lower' element, and I split up into 'homogenous' elements.
The positions can be considered as being of type int. As you can see from the result, the 'split' segments do not start and end at the same position, but at pos-1 or pos+1. The order of the values is not important, as long as it is defined.
Sample data (based on example above):
let third = [(12,22,3.1);(43,56,3.2)]
let second = [(6,20,2.1);(35,63,2.2)]
let first = [(0,30,1.1);(35,50,1.2)]
let after = [
(0,5,[1.1]);
(6,11,[1.1;2.1]);
(12,20,[1.1;2.1;3.1]);
(21,30,[1.1]);
(35,42,[1.2;2.2]);
(43,50,[1.2;2.2;3.2])
]
Right now I'm finding it difficult to think about this in a functional way, anything that comes to mind is imperative. Maybe that's inevitable in this case, but if anyone has any ideas...
UPDATE Actually, if we generalised the input case to already be of type (int*int*List<float>), we could just treat the case of two input lists, then fold that.
PS: This is not homework, or code golf, I've just sterilised the data somewhat.

Your after data is wrong; at least my program thinks it is, and I believe it. :)
let third = [(12,22,3.1);(43,56,3.2)]
let second = [(6,20,2.1);(35,63,2.2)]
let first = [(0,30,1.1);(35,50,1.2)]
let all = List.concat [first; second; third]
let min = all |> Seq.map (fun (x,y,z)->x) |> Seq.min
let max = all |> Seq.map (fun (x,y,z)->y) |> Seq.max
let setsEachValueIsIn =
[min..max]
|> List.map (fun i ->
i, all
|> List.filter (fun (x,y,z) -> x<=i && i<=y)
|> List.map (fun (x,y,z) -> z))
printfn "%A" setsEachValueIsIn
let x1,l1 = Seq.nth 0 setsEachValueIsIn
let result =
setsEachValueIsIn
|> List.fold (fun (((l,h,s)::t) as prev) (nx,ns) ->
if s=ns then (l,nx,s)::t else (nx,nx,ns)::prev
) [x1,x1,l1]
|> List.rev
let after = [
(0,5,[1.1]);
(6,11,[1.1;2.1]);
(12,20,[1.1;2.1;3.1]);
(21,30,[1.1]);
(35,42,[1.2;2.2]);
(43,50,[1.2;2.2;3.2])
]
printfn ""
printfn "%A" result
printfn ""
printfn "%A" after
assert(result = after)
Strategy: first I map every number in the whole range to the 'sets it is in'. Then I fold, seeding with the first result as (min,min,setsMinIsIn) and every step of the way, if the set does not change, I just widen the range, else if the set does change, I make a new element.
Key for var names in the fold: low, high, set, nx-next x, ns-next set

Complete rewrite (see edits), shorter, more elegant, maybe less readable. Still pinching Brian's logic.
UPDATE: now works, at least for the test above
let third = [(12,22,3.1);(43,56,3.2)]
let second = [(6,20,2.1);(35,63,2.2)]
let first = [(0,30,1.1);(35,50,1.2)]
//===helper functions===
// foldable combined min and max finder
let minmax (mn,mx) (x,y,_) = (min mn x, max mx y)
// test if x - y range overlaps position i
let overlaps i (x,y,_) = x<=i && i<=y
// get third element from triple
let getz (_,_,z) = z
//specialise function, given two tuples, will combine lists (L & U)
// but only if both lists have contents AND their indexes (il & iu)
// are not more than 1 apart, i is included simply so that we can pass
// merge directly to the List.map2 below
let merge (i,il,L) (_,iu,U) =
if L = [] || U = [] || iu - il > 1 then
(i, il, L)
else
(i, iu, L # U)
let input = [first;second;third] // input data - 'bottom' first
//find max and min positions
let (x0,yn) = input |> Seq.concat |> Seq.fold minmax (0,0)
//transform each data list to a list of (i,[z])
let valsByPos = input |> List.map (fun level -> //for each data 'level'
[x0..yn] |> List.map (fun i -> //for each position in range
//collect values of all elements in level that
// overlap this position
(i, level |> List.filter (overlaps i) |> List.map getz)))
// 'merge up' each level, keeping only upper values if lower values exist
// after we will have just one list of (i, [z])
let mergedValsByPos = valsByPos //offside here for SO formatting
//add an index into each tuple
|> List.mapi (fun i l -> l |> List.map (fun (j,z) -> (j,i,z)))
//use index to determine if we should 'merge up' for each subsublst
|> List.reduce (List.map2 merge)
//rip the index back out
|> List.map (fun (i,_,z) -> (i,z))
//get first value as seed for fold
let x1,l1 = Seq.nth 0 mergedValsByPos
//transform list (i,[z]) into list of (x,y,[z])
//key: (l)ow, (h)igh, (s)et, (nx)-next x, (ns)-next set
let result =
mergedValsByPos
//first remove any positions where there are no values
|> List.filter (fun el -> snd(el) <> [])
//double capture on state so we can take all or part of it
|> List.fold (fun (((l,h,s)::t) as prev) (nx,ns) ->
//if [z] value hasn't changed, we just enlarge range
// of current state (from (l,h) to (l,nx))
// otherwise we add a new element (nx,nx,ns) to state
if s=ns then (l,nx,s)::t else (nx,nx,ns)::prev
) [x1,x1,l1] //initial state from seed values
|> List.rev //folded value is backwards (because of::), so reverse

Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable<T>.OrderBy extension method?

I've recently written a piece of code to read some data from a file, store it in a tuple and sort all the collected data by the first element of the tuple. After some tests I've noticed that using Seq.sortBy (and Array.sortBy) is extremely slower than using IEnumerable.OrderBy.
Below are two snippets of code which should show the behaviour I'm talking about:
(filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries)
|> Array.map(double)
|> Array.sort in arr.[0], arr.[1])
).OrderBy(new Func(fun (a,b) -> a))
and
filename
|> File.ReadAllLines
|> Array.Parallel.map(fun ln -> let arr = ln.Split([|' '|], StringSplitOptions.RemoveEmptyEntries) |> Array.map(double) |> Array.sort in arr.[0], arr.[1])
|> Seq.sortBy(fun (a,_) -> a)
On a file containing 100000 lines made of two doubles, on my computer the latter version takes over twice as long as the first one (no improvements are obtained if using Array.sortBy).
Ideas?

the f# implementation uses a structural comparison of the resulting key.
let sortBy keyf seq =
let comparer = ComparisonIdentity.Structural
mkDelayedSeq (fun () ->
(seq
|> to_list
|> List.sortWith (fun x y -> comparer.Compare(keyf x,keyf y))
|> to_array) :> seq<_>)
(also sort)
let sort seq =
mkDelayedSeq (fun () ->
(seq
|> to_list
|> List.sortWith Operators.compare
|> to_array) :> seq<_>)
both Operators.compare and the ComparisonIdentity.Structural.Compare become (eventually)
let inline GenericComparisonFast<'T> (x:'T) (y:'T) : int =
GenericComparisonIntrinsic x y
// lots of other types elided
when 'T : float = if (# "clt" x y : bool #)
then (-1)
else (# "cgt" x y : int #)
but the route to this for the Operator is entirely inline, thus the JIT compiler will end up inserting a direct double comparison instruction with no additional method invocation overhead except for the (required in both cases anyway) delegate invocation.
The sortBy uses a comparer so will go through an additional virtual method call but is basically about the same.
In comparison the OrderBy function also must go through virtual method calls for the equality (Using EqualityComparer<T>.Default) but the significant difference is that it sorts in place and uses the buffer created for this as the result. In comparison if you take a look at the sortBy you will see that it sorts the list (not in place, it uses the StableSortImplementation which appears to be merge sort) and then creates a copy of it as a new array. This additional copy (given the size of your input data) is likely the principle cause of the slow down though the differing sort implementations may also have an effect.
That said this is all guessing. If this area is a concern for you in performance terms then you should simply profile to find out what is taking the time.
If you wish to see what effect the sorting/copying change would have try this alternate:
// these are taken from the f# source so as to be consistent
// beware doing this, the compiler may know about such methods
open System.Collections.Generic
let mkSeq f =
{ new IEnumerable<'b> with
member x.GetEnumerator() = f()
interface System.Collections.IEnumerable with
member x.GetEnumerator() = (f() :> System.Collections.IEnumerator) }
let mkDelayedSeq (f: unit -> IEnumerable<'T>) =
mkSeq (fun () -> f().GetEnumerator())
// the function
let sortByFaster keyf seq =
let comparer = ComparisonIdentity.Structural
mkDelayedSeq (fun () ->
let buffer = Seq.to_array seq
Array.sortInPlaceBy (fun x y -> comparer.Compare(keyf x,keyf y)) buffer
buffer :> seq<_>)
I get some reasonable percentage speedups within the repl with very large (> million) input sequences but nothing like an order of magnitude. Your mileage, as always, may vary.

A difference of x2 is not much when sorts are O(n.log(n)).
Small differences in data structures (e.g. optimising for input being ICollection<T>) could make this scale of difference.
And F# is currently Beta (not so much focus on optimisation vs. getting the language and libraries right), plus the generality of F# functions (supporting partial application etc.) could lead to a slight slow down in calling speed: more than enough to account for the different.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Linq like function GroupMultipleValBy - linq

Related

Elegant Array.multipick(?) implementation

F# List optimisation

How can I select a random value from a list using F#

Merging lists of 'spatial' tuples

Why is F#'s Seq.sortBy much slower than LINQ's IEnumerable<T>.OrderBy extension method?

Categories

Resources