F# using sequence cache correctly - caching

I'm trying to use Seq.cache with a function that I made that returns a sequence of primes up to a number N excluding the number 1. I'm having trouble figuring out how to keep the cached sequence in scope but still use it in my definition.
let rec primesNot1 n =
{2 .. n}
|> Seq.filter (fun i ->
(primesNot1 (i / 2) |> Seq.for_all (fun o -> i % o <> 0)))
|> Seq.append {2 .. 2}
|> Seq.cache
Any ideas of how I could use Seq.cache to make this faster? Currently it keeps dropping from scope and is only slowing down performance.

Seq.cache caches an IEnumerable<T> instance so that each item in the sequence is only calculated once. In your case, though, you're caching the sequence returned by a function, and each time you call the function you get a new cached sequence, which doesn't do you any good. I don't think caching is really the right approach to your problem as you've outlined it; instead you should probably look into memoization.
If instead of defining a function giving the primes less than n you want to define an infinite enumerable sequence of primes, then caching makes more sense. That would look more like this:
let rec upFrom i =
seq {
yield i
yield! upFrom (i+1)
}
let rec primes =
seq {
yield 2
yield!
upFrom 3 |>
Seq.filter (fun p -> primes |> Seq.takeWhile (fun j -> j*j <= p) |> Seq.forall (fun j -> p % j <> 0))
}
|> Seq.cache
I haven't compared the performance of this method compared to yours.

I figured out how to solve my problem with a fold but not my idea of using seq.cache.
let primesNot1 n =
{2 .. n}
|> Seq.fold (fun primes i ->
if primes |> Seq.for_all (fun o -> i % o <> 0) then
List.append primes [i]
else
primes) [2]

Have you taken a look at LazyList? Seems like it's designed to solve the same problem. It's in PowerPack.

Related

F# performance difference between tail recursion and Seq library

I have this code in F# which finds the smallest positive number that is evenly divisible by all of the numbers from 1 to 20. It takes 10 seconds to complete.
let isDivisableByAll num (divisors: int[]) = Array.forall (fun div -> num % div = 0) divisors
let minNumDividedBy (divisors: int[]) =
let rec minNumDividedByAll stopAt acc =
if acc >= stopAt then 0
else if isDivisableByAll acc divisors then acc
else minNumDividedByAll stopAt (acc + 1)
minNumDividedByAll 400000000 1
minNumDividedBy [|1..20|]
So, I thought I could make it more elegant, because I prefer less code and wrote the following.
let answer = { 1..400000000 }
|> Seq.tryFind (fun el -> isDivisableByAll el [|1..20|])
It took 10 minutes! I couldn't explain the huge difference, since sequences are lazy. In an effort to investigate, I wrote an imperative loop.
let mutable i = 1
while i < 232792561 do
if isDivisableByAll i [|1..20|] then
printfn "%d" i
i <- i + 1
It took 8 minutes. Therefore, it's not the sequence's fault either, right? So, why is the initial function so fast? It can't be avoiding building up the stack, due to tail recursion, can it? Because I wouldn't expect a considerable stack if any, being built in the slow examples either.
It doesn't make much sense to me, can someone tell me?
Thank you.
If I understand correctly, you are trying to find how many numbers between 1 and 400000000 (inclusive) are divisible by all the numbers from 1 to 20. I made my own crude version of it:
let factors = Array.rev [| 2 .. 20 |]
let divisible f n =
Array.forall (fun x -> n % x = 0) f
let solution () =
{1 .. 400000000}
|> Seq.filter (divisible factors)
|> Seq.length
This solution takes over 90 seconds to run where I tested it. But I came to realize that it is a variation of Euler problem number 5, where we learn that 2520 is the first number divisible by all the numbers from 1 to 10. Using this fact, we can create a sequence of multiples of 2520, and test only the numbers from 11 to 19, as the multiples are guaranteed to be divisible by all the numbers from 1 to 10, and 20 as well:
let factors = Array.rev [| 11 .. 19 |]
let divisible f n =
Array.forall (fun x -> n % x = 0) f
let solution () =
Seq.initInfinite (fun i -> (i + 1) * 2520)
|> Seq.takeWhile (fun i -> i <= 400000000)
|> Seq.filter (divisible factors)
|> Seq.length
This solution takes 0.191 seconds.
If you don't know about Euler problem number 5, you can even algorithmically compute sequences with elements that are multiples of a given starting value. We feed the algorithm a sequence of numbers divisible by all numbers from 2 to n - 1, and it computes the first number divisible by all numbers from 2 to n. This is iterated through until we have a sequence of multiples of the first number divisible by all the factors we want:
let narrowDown m n s =
(s, {m .. n})
||> Seq.fold (fun a i ->
let j = Seq.find (fun x -> x % i = 0) a
Seq.initInfinite (fun i -> (i + 1) * j))
let solution () =
Seq.initInfinite (fun i -> i + 1)
|> narrowDown 2 20
|> Seq.takeWhile (fun i -> i <= 400000000)
|> Seq.length
This solution runs in 0.018 seconds.
As Fyodor Soikin commented, making a new array [|1..20|] for each iteration in the seq solution is the main culprit. If I define the array once and pass it in, I can run it in 10 seconds, compared to 27 seconds for the recursive solution. The remaining disparity must be down to the extra machinery needed around for a lazy sequence, compared to recursion that is tail-call optimised into a for loop.
Making the isDivisableByAll an inline function makes a significant difference for the recursive solution (down to 6 seconds). It doesn't seem to affect the seq solution.

How can I select a random value from a list using F#

I'm new to F# and I'm trying to figure out how to return a random string value from a list/array of strings.
I have a list like this:
["win8FF40", "win10Chrome45", "win7IE11"]
How can I randomly select and return one item from the list above?
Here is my first try:
let combos = ["win8FF40";"win10Chrome45";"win7IE11"]
let getrandomitem () =
let rnd = System.Random()
fun (combos : string[]) -> combos.[rnd.Next(combos.Length)]
Both the answers given here by latkin and mydogisbox are good, but I still want to add a third approach that I sometimes use. This approach isn't faster, but it's more flexible and more composable, and fast enough for small sequences. Depending on your needs, you can use one of higher performance options given here, or you can use the following.
Single-argument function using Random
Instead of directly enabling you to select a single element, I often define a shuffleR function like this:
open System
let shuffleR (r : Random) xs = xs |> Seq.sortBy (fun _ -> r.Next())
This function has the type System.Random -> seq<'a> -> seq<'a>, so it works with any sort of sequence: lists, arrays, collections, and lazily evaluated sequences (although not with infinite sequences).
If you want a single random element from a list, you can still do that:
> [1..100] |> shuffleR (Random ()) |> Seq.head;;
val it : int = 85
but you can also take, say, three randomly picked elements:
> [1..100] |> shuffleR (Random ()) |> Seq.take 3;;
val it : seq<int> = seq [95; 92; 12]
No-argument function
Sometimes, I don't care about having to pass in that Random value, so I instead define this alternative version:
let shuffleG xs = xs |> Seq.sortBy (fun _ -> Guid.NewGuid())
It works in the same way:
> [1..100] |> shuffleG |> Seq.head;;
val it : int = 11
> [1..100] |> shuffleG |> Seq.take 3;;
val it : seq<int> = seq [69; 61; 42]
Although the purpose of Guid.NewGuid() isn't to provide random numbers, it's often random enough for my purposes - random, in the sense of being unpredictable.
Generalised function
Neither shuffleR nor shuffleG are truly random. Due to the ways Random and Guid.NewGuid() work, both functions may result in slightly skewed distributions. If this is a concern, you can define an even more general-purpose shuffle function:
let shuffle next xs = xs |> Seq.sortBy (fun _ -> next())
This function has the type (unit -> 'a) -> seq<'b> -> seq<'b> when 'a : comparison. It can still be used with Random:
> let r = Random();;
val r : Random
> [1..100] |> shuffle (fun _ -> r.Next()) |> Seq.take 3;;
val it : seq<int> = seq [68; 99; 54]
> [1..100] |> shuffle (fun _ -> r.Next()) |> Seq.take 3;;
val it : seq<int> = seq [99; 63; 11]
but you can also use it with some of the cryptographically secure random number generators provided by the Base Class Library:
open System.Security.Cryptography
open System.Collections.Generic
let rng = new RNGCryptoServiceProvider ()
let bytes = Array.zeroCreate<byte> 100
rng.GetBytes bytes
let q = bytes |> Queue
FSI:
> [1..100] |> shuffle (fun _ -> q.Dequeue()) |> Seq.take 3;;
val it : seq<int> = seq [74; 82; 61]
Unfortunately, as you can see from this code, it's quite cumbersome and brittle. You have to know the length of the sequence up front; RNGCryptoServiceProvider implements IDisposable, so you should make sure to dispose of rng after use; and items will be removed from q after use, which means it's not reusable.
Cryptographically random sort or selection
Instead, if you really need a cryptographically correct sort or selection, it'd be easier to do it like this:
let shuffleCrypto xs =
let a = xs |> Seq.toArray
use rng = new RNGCryptoServiceProvider ()
let bytes = Array.zeroCreate a.Length
rng.GetBytes bytes
Array.zip bytes a |> Array.sortBy fst |> Array.map snd
Usage:
> [1..100] |> shuffleCrypto |> Array.head;;
val it : int = 37
> [1..100] |> shuffleCrypto |> Array.take 3;;
val it : int [] = [|35; 67; 36|]
This isn't something I've ever had to do, though, but I thought I'd include it here for the sake of completeness. While I haven't measured it, it's most likely not the fastest implementation, but it should be cryptographically random.
Your problem is that you are mixing Arrays and F# Lists (*type*[] is a type notation for Array). You could modify it like this to use lists:
let getrandomitem () =
let rnd = System.Random()
fun (combos : string list) -> List.nth combos (rnd.Next(combos.Length))
That being said, indexing into a List is usually a bad idea since it has O(n) performance since an F# list is basically a linked-list. You would be better off making combos into an array if possible like this:
let combos = [|"win8FF40";"win10Chrome45";"win7IE11"|]
I wrote a blog post on exactly this topic a while ago: http://latkin.org/blog/2013/11/16/selecting-a-random-element-from-a-linked-list-3-approaches-in-f/
3 approaches are given there, with discussion of performance and tradeoffs of each.
To summarize:
// pro: simple, fast in practice
// con: 2-pass (once to get length, once to select nth element)
let method1 lst (rng : Random) =
List.nth lst (rng.Next(List.length lst))
// pro: ~1 pass, list length is not bound by int32
// con: more complex, slower in practice
let method2 lst (rng : Random) =
let rec step remaining picks top =
match (remaining, picks) with
| ([], []) -> failwith "Don't pass empty list"
// if only 1 element is picked, this is the result
| ([], [p]) -> p
// if multiple elements are picked, select randomly from them
| ([], ps) -> step ps [] -1
| (h :: t, ps) ->
match rng.Next() with
// if RNG makes new top number, picks list is reset
| n when n > top -> step t [h] n
// if RNG ties top number, add current element to picks list
| n when n = top -> step t (h::ps) top
// otherwise ignore and move to next element
| _ -> step t ps top
step lst [] -1
// pro: exactly 1 pass
// con: more complex, slowest in practice due to tuple allocations
let method3 lst (rng : Random) =
snd <| List.fold (fun (i, pick) elem ->
if rng.Next(i) = 0 then (i + 1, elem)
else (i + 1, pick)
) (0, List.head lst) lst
Edit: I should clarify that above shows a few ways to get a random element from a list, assuming you must use a list. If it fits with the rest of your program's design, it is definitely more efficient to take a random element from an array.

dynamic programming and continuation passing style

for simple problems like fibonacci, writing CPS is relatively straightforward
let fibonacciCPS n =
let rec fibonacci_cont a cont =
if a <= 2 then cont 1
else
fibonacci_cont (a - 2) (fun x ->
fibonacci_cont (a - 1) (fun y ->
cont(x + y)))
fibonacci_cont n (fun x -> x)
However, in the case of the rod-cutting exemple from here (or the book intro to algo), the number of closure is not always equal to 2, and can't be hard coded.
I imagine one has to change the intermediate variables to sequences.
(I like to think of the continuation as a contract saying "when you have the value, pass it on to me, then i'll pass it on to my boss after treatment" or something along those line, which defers the actual execution)
For the rod cutting, we have
//rod cutting
let p = [|1;5;8;9;10;17;17;20;24;30|]
let rec r n = seq { yield p.[n-1]; for i in 1..(n-1) -> (p.[i-1] + r (n-i)) } |> Seq.max
[1 .. 10] |> List.map (fun i -> i, r i)
In this case, I will need to attached the newly created continuation
let cont' = fun (results: _ array) -> cont(seq { yield p.[n-1]; for i in 1..(n-1) -> (p.[i-1] + ks.[n-i]) } |> Seq.max)
to the "cartesian product" continuation made by the returning subproblems.
Has anyone seen a CPS version of rod-cutting / has any tips on this ?
I assume you want to explicitly CPS everything, which means some nice stuff like the list comprehension will be lost (maybe using async blocks can help, I don't know F# very well) -- so starting from a simple recursive function:
let rec cutrod (prices: int[]) = function
| 0 -> 0
| n -> [1 .. min n (prices.Length - 1)] |>
List.map (fun i -> prices.[i] + cutrod prices (n - i)) |>
List.max
It's clear that we need CPS versions of the list functions used (map, max and perhaps a list-building function if you want to CPS the [1..(blah)] expression too). map is quite interesting since it's a higher-order function, so its first parameter needs to be modified to take a CPS-ed function instead. Here's an implementation of a CPS List.map:
let rec map_k f list k =
match list with
| [] -> k []
| x :: xs -> f x (fun y -> map_k f xs (fun ys -> k (y :: ys)))
Note that map_k invokes its argument f like any other CPS function, and puts the recursion in map_k into the continuation. With map_k, max_k, gen_k (which builds a list from 1 to some value), the cut-rod function can be CPS-ed:
let rec cutrod_k (prices: int[]) n k =
match n with
| 0 -> k 0
| n -> gen_k (min n (prices.Length - 1)) (fun indices ->
map_k (fun i k -> cutrod_k prices (n - i) (fun ret -> k (prices.[i] + ret)))
indices
(fun totals -> max_k totals k))

Find duplicates in an unsorted sequence efficiently

I need a very efficient way to find duplicates in an unsorted sequence. This is what I came up with, but it has a few shortcomings, namely it
unnecessarily counts occurrences beyond 2
consumes the entire sequence before yielding duplicates
creates several intermediate sequences
module Seq =
let duplicates items =
items
|> Seq.countBy id
|> Seq.filter (snd >> ((<) 1))
|> Seq.map fst
Regardless of the shortcomings, I don't see a reason to replace this with twice the code. Is it possible to improve this with comparably concise code?
A more elegant functional solution:
let duplicates xs =
Seq.scan (fun xs x -> Set.add x xs) Set.empty xs
|> Seq.zip xs
|> Seq.choose (fun (x, xs) -> if Set.contains x xs then Some x else None)
Uses scan to accumulate sets of all elements seen so far. Then uses zip to combine each element with the set of elements before it. Finally, uses choose to filter out the elements that are in the set of previously-seen elements, i.e. the duplicates.
EDIT
Actually my original answer was completely wrong. Firstly, you don't want duplicates in your outputs. Secondly, you want performance.
Here is a purely functional solution that implements the algorithm you're after:
let duplicates xs =
(Map.empty, xs)
||> Seq.scan (fun xs x ->
match Map.tryFind x xs with
| None -> Map.add x false xs
| Some false -> Map.add x true xs
| Some true -> xs)
|> Seq.zip xs
|> Seq.choose (fun (x, xs) ->
match Map.tryFind x xs with
| Some false -> Some x
| None | Some true -> None)
This uses a map to track whether each element has been seen before once or many times and then emits the element if it is seen having only been seen once before, i.e. the first time it is duplicated.
Here is a faster imperative version:
let duplicates (xs: _ seq) =
seq { let d = System.Collections.Generic.Dictionary(HashIdentity.Structural)
let e = xs.GetEnumerator()
while e.MoveNext() do
let x = e.Current
let mutable seen = false
if d.TryGetValue(x, &seen) then
if not seen then
d.[x] <- true
yield x
else
d.[x] <- false }
This is around 2× faster than any of your other answers (at the time of writing).
Using a for x in xs do loop to enumerate the elements in a sequence is substantially slower than using GetEnumerator directly but generating your own Enumerator is not significantly faster than using a computation expression with yield.
Note that the TryGetValue member of Dictionary allows me to avoid allocation in the inner loop by mutating a stack allocated value whereas the TryGetValue extension member offered by F# (and used by kvb in his/her answer) allocates its return tuple.
Here's an imperative solution (which is admittedly slightly longer):
let duplicates items =
seq {
let d = System.Collections.Generic.Dictionary()
for i in items do
match d.TryGetValue(i) with
| false,_ -> d.[i] <- false // first observance
| true,false -> d.[i] <- true; yield i // second observance
| true,true -> () // already seen at least twice
}
This is the best "functional" solution I could come up with that doesn't consume the entire sequence up front.
let duplicates =
Seq.scan (fun (out, yielded:Set<_>, seen:Set<_>) item ->
if yielded.Contains item then
(None, yielded, seen)
else
if seen.Contains item then
(Some(item), yielded.Add item, seen.Remove item)
else
(None, yielded, seen.Add item)
) (None, Set.empty, Set.empty)
>> Seq.Choose (fun (x,_,_) -> x)
Assuming your sequence is finite, this solution requires one run on the sequence:
open System.Collections.Generic
let duplicates items =
let dict = Dictionary()
items |> Seq.fold (fun acc item ->
match dict.TryGetValue item with
| true, 2 -> acc
| true, 1 -> dict.[item] <- 2; item::acc
| _ -> dict.[item] <- 1; acc) []
|> List.rev
You can provide length of the sequence as the capacity of Dictionary, but it requires to enumerate the whole sequence once more.
EDIT:
To resolve 2nd problem, one could generate duplicates on demand:
open System.Collections.Generic
let duplicates items =
seq {
let dict = Dictionary()
for item in items do
match dict.TryGetValue item with
| true, 2 -> ()
| true, 1 -> dict.[item] <- 2; yield item
| _ -> dict.[item] <- 1
}
Functional solution:
let duplicates items =
let test (unique, result) v =
if not(unique |> Set.contains v) then (unique |> Set.add v ,result)
elif not(result |> Set.contains v) then (unique,result |> Set.add v)
else (unique, result)
items |> Seq.fold test (Set.empty, Set.empty) |> snd |> Set.toSeq

How to Create the Power Set (Combinations) of the Infinite Set in F# using Sequences?

Here is my failed attempt at the problem any help would be appreciated.
I tried to come up with the best algo for the power set that worked on eager lists. This part seems to be working fine. The part I'm having trouble with is translating it to work with Sequences so it can run it on streaming\infinite lists. I really don't like the yield syntax maybe because I don't understand it well but I would rather have it without using the yield syntax as well.
//All Combinations of items in a list
//i.e. the Powerset given each item is unique
//Note: lists are eager so can't be used for infinite
let listCombinations xs =
List.fold (fun acc x ->
List.collect (fun ys -> ys::[x::ys]) acc) [[]] xs
//This works fine (Still interested if it could be faster)
listCombinations [1;2;3;4;5] |> Seq.iter (fun x -> printfn "%A" x)
//All Combinations of items in a sequence
//i.e. the Powerset given each item is unique
//Note: Not working
let seqCombinations xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
seq { yield ys
yield seq { yield x
yield! ys} }) acc) Seq.empty xs
//All Combinations of items in a sequence
//i.e. the Powerset given each item is unique
//Note: Not working (even wrong type signature)
let seqCombinations2 xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
Seq.append ys (Seq.append x ys)) acc) Seq.empty xs
//Sequences to test on
let infiniteSequence = Seq.initInfinite (fun i -> i + 1)
let finiteSequence = Seq.take 5 infiniteSequence
//This should work easy since its in a finite sequence
//But it does not, so their must be a bug in 'seqCombinations' above
for xs in seqCombinations finiteSequence do
for y in xs do
printfn "%A" y
//This one is much more difficult to get to work
//since its the powerset on the infinate sequence
//None the less If someone could help me find a way to make this work
//This is my ultimate goal
let firstFew = Seq.take 20 (seqCombinations infiniteSequence)
for xs in firstFew do
for y in xs do
printfn "%A" y
Your seqCombinations is almost correct, but you didn't translate it from lists to sequences properly. The equivalent of [[]] is not Seq.empty, but Seq.singleton Seq.empty:
let seqCombinations xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
seq { yield ys
yield seq { yield x
yield! ys} }) acc) (Seq.singleton Seq.empty) xs
The code above works for finite sequences. But for infinite ones, it doesn't work, because it first tries to reach the end, which it obviously never does for infinite sequences.
If you want a function that will work with infinite sequences I managed to figure out two ways, but neither of them is particularly nice. One of them uses mutable state:
let seqCombinations xs =
let combs = ref [[]]
seq {
yield! !combs
for x in xs do
let added = List.map (fun ys -> x::ys) !combs
yield! added
combs := !combs # added
}
The other is too much about dealing with details of seq<T>:
open System.Collections.Generic
let seqCombinations (xs : seq<_>) =
let rec combs acc (e : IEnumerator<_>) =
seq {
if (e.MoveNext()) then
let added = List.map (fun ys -> (e.Current)::ys) acc
yield! added
yield! combs (acc # added) e }
use enumerator = xs.GetEnumerator()
seq {
yield []
yield! combs [[]] enumerator
}
I think this would be much easier if you could treat infinite sequences as head and tail, like finite lists in F# or any sequence in Haskell. But it's certainly possible there is a nice way to express this in F#, and I just didn't find it.
I've asked a similar question recently at Generate powerset lazily and got some nice answers.
For powerset of finite sets, the answer by #Daniel in the above link is an efficient solution and probably suits your purpose. You can come up with a test case to compare between his approach and yours.
Regarding powerset of infinite sets, here is a bit of maths. According to Cantor's theorem, the power set of a countably infinite set is uncountably infinite. It means there is no way to enumerate powerset of all integers (which is countably infinite) even in a lazy way. The intuition is the same for real numbers; since real number is uncountably infinite, we can't actually model them using infinite sequences.
Therefore, there is no algorithm to enumerate powerset of a countably infinite set. Or that kind of algorithm just doesn't make sense.
This is sort of a joke, but will actually generate the correct result for an infinite sequence (it's just that it can't be proven--empirically, not mathematically).
let powerset s =
seq {
yield Seq.empty
for x in s -> seq [x]
}

Resources