Seq.take won't return elements - visual-studio

When I run the following code:
getTheData() |> Seq.take 3
it does not return the elements, instead it outputs this:
val it : seq<Collections.Generic.KeyValuePair<ID,Data>>
I am using Visual Studio 2017 and F# Interactive
What is wrong, should it not output the first 3 items?
getTheData function =
let getTheData() =
(#"C:\Users\data.xlsx")
|> (ParseExcel >> datap)
|> Seq.distinct
|> Seq.map(fun b -> b.ID, b)
|> Map.ofSeq

Seq.take is not considered a terminal operation on a sequence in F#. As mentioned in the comments, sequences are lazily evaluated, and only operations that are considered "terminal" will cause a sequence to be iterated. Terminal operations include Seq.iter (if you want to perform an action on each element) and Seq.toList (if you want a materialized list of each element), as well as others like Seq.exactlyOne.
In F# interactive, you can probably just evaluate it to see the first few values. In the following example mirroring yours, evaluating it at the end will display the 3 values taken:
open System
let getTheData() =
seq {
for n in {0..1000} -> Guid.NewGuid(), n
} |> Map.ofSeq
getTheData()
|> Seq.take 3;;
it;;
val it : seq<Collections.Generic.KeyValuePair<Guid,int>> =
seq
[[001830fe-9ce3-4649-8609-571e4aedb4c7, 791]
{Key = 001830fe-9ce3-4649-8609-571e4aedb4c7;
Value = 791;};
[001bf0a9-5981-4bc0-bcaf-046af7f4866a, 383]
{Key = 001bf0a9-5981-4bc0-bcaf-046af7f4866a;
Value = 383;};
[004b44a7-85d2-4ce5-91bf-49bcc44f03ba, 91]
{Key = 004b44a7-85d2-4ce5-91bf-49bcc44f03ba;
Value = 91;}]

Related

Elegant Array.multipick(?) implementation

I'd like to implement something akin to imaginary Array.multipick:
Array.multipick : choosers:('a -> bool) [] -> array:'a [] -> 'a []
Internally, we test each array's element with all choosers, the first chooser to return true is removed from choosers array, and we add that chooser's argument to the result. After that, we continue interation while choosers array has elements left.
The last part is important, because without early exit requirement this could be solved with just Array.fold.
This could be easily implemented with something like:
let rec impl currentIndex currentChoosers results
But it's too procedural for my taste. Maybe there's more elegant solution?
It's quite difficult to write elegant code using arrays of changing size. Here is some code that works on lists instead and does not mutate any values.
let rec pick accum elem tried = function
| [] -> (accum, List.rev tried)
| chooser :: rest ->
if chooser elem then (elem :: accum, List.rev_append tried rest)
else pick accum elem (chooser :: tried) rest
let rec multipick_l accum choosers list =
match choosers, list with
| [], _
| _, [] -> List.rev accum
| _, elem :: elems ->
let (accum', choosers') = pick accum elem [] choosers in
multipick_l accum' choosers' elems
let multipick choosers array =
Array.of_list
(multipick_l [] (Array.to_list choosers) (Array.to_list array))
If you think that Array.fold_left is usable except for the early exit requirement, you can use an exception to exit early.
A fold with an early exit is a good idea, however a production-worthy one specifically targeting arrays would need to be written in a fairly imperative manner. For simplicity, I'll grab the more general sequence one from this answer.
let multipick (choosers: ('a -> bool) array) (arr: 'a array) : 'a array =
let indexed =
choosers
|> Seq.indexed
|> Map.ofSeq
((indexed, []), arr)
||> foldWhile (fun (cs, res) e ->
if Map.isEmpty cs then
None
else
match cs |> Seq.tryFind (fun kvp -> kvp.Value e) with
| Some kvp -> Some (Map.remove kvp.Key cs, e :: res)
| None -> Some (cs, res))
|> snd
|> List.rev
|> Array.ofList
I'm using a Map keyed by array index to keep track of remaining functions - this allows for easy removal of elements, but still retains their order (since map key-value pairs are ordered by keys when iterating).
F# Set wouldn't work with functions due to comparison constraint. System.Collections.Generic.HashSet would work, but it's mutable, and I'm not sure if it would retain ordering.

How can I select a random value from a list using F#

I'm new to F# and I'm trying to figure out how to return a random string value from a list/array of strings.
I have a list like this:
["win8FF40", "win10Chrome45", "win7IE11"]
How can I randomly select and return one item from the list above?
Here is my first try:
let combos = ["win8FF40";"win10Chrome45";"win7IE11"]
let getrandomitem () =
let rnd = System.Random()
fun (combos : string[]) -> combos.[rnd.Next(combos.Length)]
Both the answers given here by latkin and mydogisbox are good, but I still want to add a third approach that I sometimes use. This approach isn't faster, but it's more flexible and more composable, and fast enough for small sequences. Depending on your needs, you can use one of higher performance options given here, or you can use the following.
Single-argument function using Random
Instead of directly enabling you to select a single element, I often define a shuffleR function like this:
open System
let shuffleR (r : Random) xs = xs |> Seq.sortBy (fun _ -> r.Next())
This function has the type System.Random -> seq<'a> -> seq<'a>, so it works with any sort of sequence: lists, arrays, collections, and lazily evaluated sequences (although not with infinite sequences).
If you want a single random element from a list, you can still do that:
> [1..100] |> shuffleR (Random ()) |> Seq.head;;
val it : int = 85
but you can also take, say, three randomly picked elements:
> [1..100] |> shuffleR (Random ()) |> Seq.take 3;;
val it : seq<int> = seq [95; 92; 12]
No-argument function
Sometimes, I don't care about having to pass in that Random value, so I instead define this alternative version:
let shuffleG xs = xs |> Seq.sortBy (fun _ -> Guid.NewGuid())
It works in the same way:
> [1..100] |> shuffleG |> Seq.head;;
val it : int = 11
> [1..100] |> shuffleG |> Seq.take 3;;
val it : seq<int> = seq [69; 61; 42]
Although the purpose of Guid.NewGuid() isn't to provide random numbers, it's often random enough for my purposes - random, in the sense of being unpredictable.
Generalised function
Neither shuffleR nor shuffleG are truly random. Due to the ways Random and Guid.NewGuid() work, both functions may result in slightly skewed distributions. If this is a concern, you can define an even more general-purpose shuffle function:
let shuffle next xs = xs |> Seq.sortBy (fun _ -> next())
This function has the type (unit -> 'a) -> seq<'b> -> seq<'b> when 'a : comparison. It can still be used with Random:
> let r = Random();;
val r : Random
> [1..100] |> shuffle (fun _ -> r.Next()) |> Seq.take 3;;
val it : seq<int> = seq [68; 99; 54]
> [1..100] |> shuffle (fun _ -> r.Next()) |> Seq.take 3;;
val it : seq<int> = seq [99; 63; 11]
but you can also use it with some of the cryptographically secure random number generators provided by the Base Class Library:
open System.Security.Cryptography
open System.Collections.Generic
let rng = new RNGCryptoServiceProvider ()
let bytes = Array.zeroCreate<byte> 100
rng.GetBytes bytes
let q = bytes |> Queue
FSI:
> [1..100] |> shuffle (fun _ -> q.Dequeue()) |> Seq.take 3;;
val it : seq<int> = seq [74; 82; 61]
Unfortunately, as you can see from this code, it's quite cumbersome and brittle. You have to know the length of the sequence up front; RNGCryptoServiceProvider implements IDisposable, so you should make sure to dispose of rng after use; and items will be removed from q after use, which means it's not reusable.
Cryptographically random sort or selection
Instead, if you really need a cryptographically correct sort or selection, it'd be easier to do it like this:
let shuffleCrypto xs =
let a = xs |> Seq.toArray
use rng = new RNGCryptoServiceProvider ()
let bytes = Array.zeroCreate a.Length
rng.GetBytes bytes
Array.zip bytes a |> Array.sortBy fst |> Array.map snd
Usage:
> [1..100] |> shuffleCrypto |> Array.head;;
val it : int = 37
> [1..100] |> shuffleCrypto |> Array.take 3;;
val it : int [] = [|35; 67; 36|]
This isn't something I've ever had to do, though, but I thought I'd include it here for the sake of completeness. While I haven't measured it, it's most likely not the fastest implementation, but it should be cryptographically random.
Your problem is that you are mixing Arrays and F# Lists (*type*[] is a type notation for Array). You could modify it like this to use lists:
let getrandomitem () =
let rnd = System.Random()
fun (combos : string list) -> List.nth combos (rnd.Next(combos.Length))
That being said, indexing into a List is usually a bad idea since it has O(n) performance since an F# list is basically a linked-list. You would be better off making combos into an array if possible like this:
let combos = [|"win8FF40";"win10Chrome45";"win7IE11"|]
I wrote a blog post on exactly this topic a while ago: http://latkin.org/blog/2013/11/16/selecting-a-random-element-from-a-linked-list-3-approaches-in-f/
3 approaches are given there, with discussion of performance and tradeoffs of each.
To summarize:
// pro: simple, fast in practice
// con: 2-pass (once to get length, once to select nth element)
let method1 lst (rng : Random) =
List.nth lst (rng.Next(List.length lst))
// pro: ~1 pass, list length is not bound by int32
// con: more complex, slower in practice
let method2 lst (rng : Random) =
let rec step remaining picks top =
match (remaining, picks) with
| ([], []) -> failwith "Don't pass empty list"
// if only 1 element is picked, this is the result
| ([], [p]) -> p
// if multiple elements are picked, select randomly from them
| ([], ps) -> step ps [] -1
| (h :: t, ps) ->
match rng.Next() with
// if RNG makes new top number, picks list is reset
| n when n > top -> step t [h] n
// if RNG ties top number, add current element to picks list
| n when n = top -> step t (h::ps) top
// otherwise ignore and move to next element
| _ -> step t ps top
step lst [] -1
// pro: exactly 1 pass
// con: more complex, slowest in practice due to tuple allocations
let method3 lst (rng : Random) =
snd <| List.fold (fun (i, pick) elem ->
if rng.Next(i) = 0 then (i + 1, elem)
else (i + 1, pick)
) (0, List.head lst) lst
Edit: I should clarify that above shows a few ways to get a random element from a list, assuming you must use a list. If it fits with the rest of your program's design, it is definitely more efficient to take a random element from an array.

Sort list High-to-Low in F#

List.Sort
sorts a list from low to high - How does one sort from high to low? Is there some kind of library function for this?
For a list of numbers:
list
|> List.sortBy (fun x -> -x)
The function (fun x -> -x) negates the number, therefore reversing the order.
For comparables in general, use List.sortWith with compare. Observe the ordering of a b in compare:
> List.sortWith (fun a b -> compare a b) ["a";"s";"d";"f"];;
val it : string list = ["a"; "d"; "f"; "s"]
> List.sortWith (fun a b -> compare b a) ["a";"s";"d";"f"];;
val it : string list = ["s"; "f"; "d"; "a"]
If you looked at the linked thread F# Seq.sortBy in descending order, there is a chance of overflow when you use List.sortBy (fun x -> -x). To be correct, it should be:
List.sortBy (fun x -> -x-1)
In F# 4.0 (that comes with Visual Studio 2015 Preview), there are sortDescending/sortByDescending functions for this exact purpose.
You can use
list
|> List.sortDescending
or
list
|> List.sortByDescending id
See the comprehensive list of new core library functions at https://github.com/fsharp/FSharpLangDesign/blob/master/FSharp-4.0/ListSeqArrayAdditions.md.
You can use List.sortBy to sort by a custom function, and use the unary minus operator ~- as such function in a compact notation:
let list = [1..10]
list |> List.sortBy (~-)

Find duplicates in an unsorted sequence efficiently

I need a very efficient way to find duplicates in an unsorted sequence. This is what I came up with, but it has a few shortcomings, namely it
unnecessarily counts occurrences beyond 2
consumes the entire sequence before yielding duplicates
creates several intermediate sequences
module Seq =
let duplicates items =
items
|> Seq.countBy id
|> Seq.filter (snd >> ((<) 1))
|> Seq.map fst
Regardless of the shortcomings, I don't see a reason to replace this with twice the code. Is it possible to improve this with comparably concise code?
A more elegant functional solution:
let duplicates xs =
Seq.scan (fun xs x -> Set.add x xs) Set.empty xs
|> Seq.zip xs
|> Seq.choose (fun (x, xs) -> if Set.contains x xs then Some x else None)
Uses scan to accumulate sets of all elements seen so far. Then uses zip to combine each element with the set of elements before it. Finally, uses choose to filter out the elements that are in the set of previously-seen elements, i.e. the duplicates.
EDIT
Actually my original answer was completely wrong. Firstly, you don't want duplicates in your outputs. Secondly, you want performance.
Here is a purely functional solution that implements the algorithm you're after:
let duplicates xs =
(Map.empty, xs)
||> Seq.scan (fun xs x ->
match Map.tryFind x xs with
| None -> Map.add x false xs
| Some false -> Map.add x true xs
| Some true -> xs)
|> Seq.zip xs
|> Seq.choose (fun (x, xs) ->
match Map.tryFind x xs with
| Some false -> Some x
| None | Some true -> None)
This uses a map to track whether each element has been seen before once or many times and then emits the element if it is seen having only been seen once before, i.e. the first time it is duplicated.
Here is a faster imperative version:
let duplicates (xs: _ seq) =
seq { let d = System.Collections.Generic.Dictionary(HashIdentity.Structural)
let e = xs.GetEnumerator()
while e.MoveNext() do
let x = e.Current
let mutable seen = false
if d.TryGetValue(x, &seen) then
if not seen then
d.[x] <- true
yield x
else
d.[x] <- false }
This is around 2× faster than any of your other answers (at the time of writing).
Using a for x in xs do loop to enumerate the elements in a sequence is substantially slower than using GetEnumerator directly but generating your own Enumerator is not significantly faster than using a computation expression with yield.
Note that the TryGetValue member of Dictionary allows me to avoid allocation in the inner loop by mutating a stack allocated value whereas the TryGetValue extension member offered by F# (and used by kvb in his/her answer) allocates its return tuple.
Here's an imperative solution (which is admittedly slightly longer):
let duplicates items =
seq {
let d = System.Collections.Generic.Dictionary()
for i in items do
match d.TryGetValue(i) with
| false,_ -> d.[i] <- false // first observance
| true,false -> d.[i] <- true; yield i // second observance
| true,true -> () // already seen at least twice
}
This is the best "functional" solution I could come up with that doesn't consume the entire sequence up front.
let duplicates =
Seq.scan (fun (out, yielded:Set<_>, seen:Set<_>) item ->
if yielded.Contains item then
(None, yielded, seen)
else
if seen.Contains item then
(Some(item), yielded.Add item, seen.Remove item)
else
(None, yielded, seen.Add item)
) (None, Set.empty, Set.empty)
>> Seq.Choose (fun (x,_,_) -> x)
Assuming your sequence is finite, this solution requires one run on the sequence:
open System.Collections.Generic
let duplicates items =
let dict = Dictionary()
items |> Seq.fold (fun acc item ->
match dict.TryGetValue item with
| true, 2 -> acc
| true, 1 -> dict.[item] <- 2; item::acc
| _ -> dict.[item] <- 1; acc) []
|> List.rev
You can provide length of the sequence as the capacity of Dictionary, but it requires to enumerate the whole sequence once more.
EDIT:
To resolve 2nd problem, one could generate duplicates on demand:
open System.Collections.Generic
let duplicates items =
seq {
let dict = Dictionary()
for item in items do
match dict.TryGetValue item with
| true, 2 -> ()
| true, 1 -> dict.[item] <- 2; yield item
| _ -> dict.[item] <- 1
}
Functional solution:
let duplicates items =
let test (unique, result) v =
if not(unique |> Set.contains v) then (unique |> Set.add v ,result)
elif not(result |> Set.contains v) then (unique,result |> Set.add v)
else (unique, result)
items |> Seq.fold test (Set.empty, Set.empty) |> snd |> Set.toSeq

Merging lists of 'spatial' tuples

I have three lists of tuples, each tuple contains (startpos, endpos, value).
What I want to do is merge these into one list of tuples (startpos, endpos, values[]), but following a rule which I find it easier to draw than to write:
//third [---------] [------------]
//second [-------------] [---------------------------]
//first [-----------------------------] [--------------]
//(pos) 0123456789|123456789|123456789|123456789|123456789|123456789|123456789
//result [--1-][--2-][---3---][---1----] [---2--][---3--]
(The numbers in result represent the expected length of the values[] list for each resulting element)
Basically, I only keep a 'higher' element where it overlaps a 'lower' element, and I split up into 'homogenous' elements.
The positions can be considered as being of type int. As you can see from the result, the 'split' segments do not start and end at the same position, but at pos-1 or pos+1. The order of the values is not important, as long as it is defined.
Sample data (based on example above):
let third = [(12,22,3.1);(43,56,3.2)]
let second = [(6,20,2.1);(35,63,2.2)]
let first = [(0,30,1.1);(35,50,1.2)]
let after = [
(0,5,[1.1]);
(6,11,[1.1;2.1]);
(12,20,[1.1;2.1;3.1]);
(21,30,[1.1]);
(35,42,[1.2;2.2]);
(43,50,[1.2;2.2;3.2])
]
Right now I'm finding it difficult to think about this in a functional way, anything that comes to mind is imperative. Maybe that's inevitable in this case, but if anyone has any ideas...
UPDATE Actually, if we generalised the input case to already be of type (int*int*List<float>), we could just treat the case of two input lists, then fold that.
PS: This is not homework, or code golf, I've just sterilised the data somewhat.
Your after data is wrong; at least my program thinks it is, and I believe it. :)
let third = [(12,22,3.1);(43,56,3.2)]
let second = [(6,20,2.1);(35,63,2.2)]
let first = [(0,30,1.1);(35,50,1.2)]
let all = List.concat [first; second; third]
let min = all |> Seq.map (fun (x,y,z)->x) |> Seq.min
let max = all |> Seq.map (fun (x,y,z)->y) |> Seq.max
let setsEachValueIsIn =
[min..max]
|> List.map (fun i ->
i, all
|> List.filter (fun (x,y,z) -> x<=i && i<=y)
|> List.map (fun (x,y,z) -> z))
printfn "%A" setsEachValueIsIn
let x1,l1 = Seq.nth 0 setsEachValueIsIn
let result =
setsEachValueIsIn
|> List.fold (fun (((l,h,s)::t) as prev) (nx,ns) ->
if s=ns then (l,nx,s)::t else (nx,nx,ns)::prev
) [x1,x1,l1]
|> List.rev
let after = [
(0,5,[1.1]);
(6,11,[1.1;2.1]);
(12,20,[1.1;2.1;3.1]);
(21,30,[1.1]);
(35,42,[1.2;2.2]);
(43,50,[1.2;2.2;3.2])
]
printfn ""
printfn "%A" result
printfn ""
printfn "%A" after
assert(result = after)
Strategy: first I map every number in the whole range to the 'sets it is in'. Then I fold, seeding with the first result as (min,min,setsMinIsIn) and every step of the way, if the set does not change, I just widen the range, else if the set does change, I make a new element.
Key for var names in the fold: low, high, set, nx-next x, ns-next set
Complete rewrite (see edits), shorter, more elegant, maybe less readable. Still pinching Brian's logic.
UPDATE: now works, at least for the test above
let third = [(12,22,3.1);(43,56,3.2)]
let second = [(6,20,2.1);(35,63,2.2)]
let first = [(0,30,1.1);(35,50,1.2)]
//===helper functions===
// foldable combined min and max finder
let minmax (mn,mx) (x,y,_) = (min mn x, max mx y)
// test if x - y range overlaps position i
let overlaps i (x,y,_) = x<=i && i<=y
// get third element from triple
let getz (_,_,z) = z
//specialise function, given two tuples, will combine lists (L & U)
// but only if both lists have contents AND their indexes (il & iu)
// are not more than 1 apart, i is included simply so that we can pass
// merge directly to the List.map2 below
let merge (i,il,L) (_,iu,U) =
if L = [] || U = [] || iu - il > 1 then
(i, il, L)
else
(i, iu, L # U)
let input = [first;second;third] // input data - 'bottom' first
//find max and min positions
let (x0,yn) = input |> Seq.concat |> Seq.fold minmax (0,0)
//transform each data list to a list of (i,[z])
let valsByPos = input |> List.map (fun level -> //for each data 'level'
[x0..yn] |> List.map (fun i -> //for each position in range
//collect values of all elements in level that
// overlap this position
(i, level |> List.filter (overlaps i) |> List.map getz)))
// 'merge up' each level, keeping only upper values if lower values exist
// after we will have just one list of (i, [z])
let mergedValsByPos = valsByPos //offside here for SO formatting
//add an index into each tuple
|> List.mapi (fun i l -> l |> List.map (fun (j,z) -> (j,i,z)))
//use index to determine if we should 'merge up' for each subsublst
|> List.reduce (List.map2 merge)
//rip the index back out
|> List.map (fun (i,_,z) -> (i,z))
//get first value as seed for fold
let x1,l1 = Seq.nth 0 mergedValsByPos
//transform list (i,[z]) into list of (x,y,[z])
//key: (l)ow, (h)igh, (s)et, (nx)-next x, (ns)-next set
let result =
mergedValsByPos
//first remove any positions where there are no values
|> List.filter (fun el -> snd(el) <> [])
//double capture on state so we can take all or part of it
|> List.fold (fun (((l,h,s)::t) as prev) (nx,ns) ->
//if [z] value hasn't changed, we just enlarge range
// of current state (from (l,h) to (l,nx))
// otherwise we add a new element (nx,nx,ns) to state
if s=ns then (l,nx,s)::t else (nx,nx,ns)::prev
) [x1,x1,l1] //initial state from seed values
|> List.rev //folded value is backwards (because of::), so reverse

Resources