Why do prints to console get mixed up when doing parallel computations? - parallel-processing

While running some code that performs parallel computations the output becomes garbled: different messages get mixed up. Here is a sample:
Iteration 1
Iteration
Iteration 23 of 19 - Calculating P&L for test window ending at 10/28/1968 12:00:00 AM
of
Iteration 4
Iteration of
Iteration 5
Iteration
Iteration 19 - Calculating P&L for test window ending at of 19 - Calculating P&L for test window ending at 5/29/1974 12:00:00 AM
6 of 878/18/1971 12:00:00 AM19 - Calculating P&L for test window ending at 3/4/1977 12:00:00 AM
of 19 of
of 19 - Calculating P&L for test window ending at 6/25/1985 12:00:00 AM
When running the same program sequentially the console output comes out nice, with no garbling.
Printing to the console is done by this function:
let windowTrainTest (comm: Communication) critFoo count (model: IModel) (assets: Assets) (paramList: Parameters list) =
// Deleted some code here
if comm = Verbose then
let msg1 = sprintf "\nwindowTrainTestPandL: First date: %A, Last date: %A\nBest Criterion: %.2f\n" fDate lDate bestCriterion
let msg2 = sprintf "Best Parameters: %A\n" bestParameters
printfn "%s" <| msg1 + msg2
(pandl, wgts), bestParameters, ( ["Criterion", bestCriterion] |> Map.ofList,
["FirstDate", fDate; "LastDate", lDate] |> Map.ofList )
Parallelization is done by this portion of the program:
let pSeqMapi f (xs: seq<'T>) = xs |> PSeq.mapi f
let trainTest n i (trainSize, fullSize) =
let takenAssets = assets |> Assets.take (min fullSize len)
lastDate takenAssets
|> printfn "\nIteration %d of %d - Calculating P&L for test window ending at %A\n" (i + 1) n
paramList
|> windowTrainTest comm' critFoo trainSize model takenAssets
let mapTrainTest (initSizes: (int * int) list) =
let f = trainTest initSizes.Length
match calcType with
| PSeq -> initSizes |> pSeqMapi f |> List.ofSeq
| _ -> initSizes |> Seq.mapi f |> List.ofSeq
Is there a way to avoid this kind of behavior, for example by flushing the message to the console?

Parallel computations run on different threads, and if one thread is interrupted in the middle of a printfn and a second thread runs a printfn before the first thread gets run again, then their outputs will be interleaved.
The simplest way to deal with this is to create a new function that will use the lock keyword around printfn invocations:
let lockObj = new obj()
let lockedPrintfn msg = lock lockObj (fun _ -> printfn msg)
Then replace all your printfn calls with lockedPrintfn and you should get the serialized output you're expecting. Your performance will suffer just a little since your threads will occasionally be spending some time waiting for the printfn lock, but as long as your computations take significantly longer than the time spent printing output, you shouldn't actually notice the slightly-slower performance.

I think I found the solution, and it does not require locks. I replaced the lines
lastDate takenAssets
|> printfn "\nIteration %d of %d - Calculating P&L for test window ending at %A\n" (i + 1) n
with
let msg = sprintf "\nIteration %d of %d - Calculating P&L for test window ending at %A\n" (i + 1) n (lastDate takenAssets)
printfn "%s" msg
I leave to those more knowledgeable to offer an explanation.

Related

F# syntax with sets and updating

In F# im trying to remove an occurence in the set if a condition is met, however it's not really working they way i'd like it to.
The trick to removing elements from a set is the function Set.filter, which takes a function as an argument - filter will feed in every value of the set to your function, and add it to a new set if the function returns true. An example implementation might be:
let filter f (original : Set<'T>) =
set [ for value in original do if f value then yield value ]
which has type filter : ('T -> bool) -> Set<'T> -> Set<'T>. An example of using it would be
filter (fun x -> x % 2 = 0) (set [ 1; 2; 3; 4; 5 ])
This filters the set for even numbers, so the return value would be set [ 2; 4 ].
I'm not entirely sure what problem you're having exactly, but here is a solution to the game Mastermind using Knuth's algorithm, albeit with a random starting guess, rather than his choice of "1122".
I thought it was quite a nice exercise, though writing the checkGuess function was the hardest part of it for me!
You can run a test by opening this in F# interactive by running the function playMastermind (), which will show you its guesses.
/// The colours that pegs are allowed to be.
type Peg = Blue | Red | Green | Yellow | Purple | Brown
/// A shared instance of the System.Random () class for all the random number
/// generators.
let private rnd = new System.Random ()
/// Make a random set of four peg colours.
let randomGuess () =
let randomPeg () =
match rnd.Next(1, 6) with
| 1 -> Blue
| 2 -> Red
| 3 -> Green
| 4 -> Yellow
| 5 -> Purple
| 6 -> Brown
| _ -> failwith "Random number generation failed."
[ randomPeg (); randomPeg (); randomPeg (); randomPeg () ]
/// Iterate over the colours to make all of the possible combinations.
let allPossibles =
let colours = [ Blue; Red; Green; Yellow; Purple; Brown]
set [ for a in colours do for b in colours do for c in colours do for d in colours -> [ a; b; c; d ] ]
/// Get the number of white and black pegs when comparing solution to guess.
let checkGuess solution guess =
/// Create a map of (colour -> count).
let toMap = List.countBy id >> Map.ofList
/// Compute how many pegs' colours are shared in the guesses.
let mapIntersect map1 map2 =
let overlap peg count =
match Map.tryFind peg map2 with
| None -> 0
| Some num -> min num count
Map.fold (fun acc peg count -> acc + overlap peg count) 0 map1
/// Simply compare to see if each peg is in the correct place.
let blacks = List.map2 (fun x y -> if x = y then 1 else 0) solution guess |> List.sum
// The number of pegs of the right colour but the wrong location is the
// same as the total number of pegs of the right colour subtract the ones
// that are also in the right place.
let whites = mapIntersect (toMap solution) (toMap guess) - blacks
whites, blacks
/// Get a random element of a set.
let randomSetElement set =
let arr = Set.toArray set
arr.[rnd.Next (Array.length arr)]
let playMastermind () =
// This creates a closure so we can check our guess against the solution,
// without storing the actual value of the solution.
let checkAnswer = checkGuess (randomGuess ())
let rec loop turnCount remaining =
if Set.count remaining = 1 then
let answer = Set.maxElement remaining
printfn "The answer is %A, which I calculated in %d turns." answer (turnCount - 1)
else
let guess = randomSetElement remaining
let (whites, blacks) = checkAnswer guess
printfn "On turn %d I guessed %A, which gave %d white pins and %d black pins." turnCount guess whites blacks
/// Remove all possibilities from the solution that wouldn't give the
/// same numbers of white and black pins and continue.
loop (turnCount + 1) (Set.filter (fun possible -> (whites, blacks) = checkGuess possible guess) remaining)
// Play the game!
loop 1 allPossibles
I'd make this a comment, but it's too long, so it needs to be an answer instead, even though it's not a complete answer to your problem.
One problem with your code, as it is now, is this section:
for candidate in candidateSet do
let scString = candidate.ToString()
let mutable secretList = []
for i = 0 to 3 do
let digit = (int scString.[i])-(int '0')
secretList <- secretList # [digit]
let tempCode = List.map (fun x -> numberToCodeColorPlus (x)) secretList
//Validate works and returns a peg set (b,w)..e.g. (0,0)
let secretCodePegs = validate guess tempCode
if secretCodePegs <> guessPegs then
candidateSet <- Set.remove candidate candidateSet
F#, like Python, uses indentation to denote blocks. So that let secretCodePegs = validate guess tempCode line is outside the for loop, not inside the for loop the way you clearly intended it to be. And the if secretCodePegs <> guessPegs then line that follows it, as far as F# is concerned, is part of a new block, and not part of the for loop any longer (because the let secretCodePegs = ... line ended the for loop). All you need to do is indent the let secretCodePegs = ... line by one level, and your code will work. In other words, that section should have looked like this:
for candidate in candidateSet do
let scString = candidate.ToString()
let mutable secretList = []
for i = 0 to 3 do
let digit = (int scString.[i])-(int '0')
secretList <- secretList # [digit]
let tempCode = List.map (fun x -> numberToCodeColorPlus (x)) secretList
//Validate works and returns a peg set (b,w)..e.g. (0,0)
let secretCodePegs = validate guess tempCode
if secretCodePegs <> guessPegs then
candidateSet <- Set.remove candidate candidateSet

Performance of iterating over Array vs List

Inspired by this question, I wanted to see if there were any performance differences between iterating over an Array vs a List.
Since we would iterating over the entire collection, my initial thought was that there shouldn't really be a performance difference between the two. Furthermore, I thought that using a tail recursive function to do a count should be as fast as just using a mutable variable. However, when I wrote a simple script to test the difference, I found the following (run in Release Mode with VS2015):
add_k_list, elapsed 15804 ms, result 0L
add_k_list_mutable, elapsed 12800 ms, result 0L
add_k_array, elapsed 15719 ms, result 0L
I'm wonder why the list addition implementation that uses a mutable variable is decently faster than both tail recursive version and the one using a mutable variable and an array.
Here's my code:
open System.Diagnostics
let d = 100000
let n = 100000
let stopWatch =
let sw = Stopwatch ()
sw.Start ()
sw
let testList = [1..d]
let testArray = [|1..d|]
let timeIt (name : string) (a : int -> int list -> 'T) : unit =
let t = stopWatch.ElapsedMilliseconds
let v = a 0 (testList)
for i = 1 to (n) do
a i testList |> ignore
let d = stopWatch.ElapsedMilliseconds - t
printfn "%s, elapsed %d ms, result %A" name d v
let timeItArr (name : string) (a : int -> int [] -> 'T) : unit =
let t = stopWatch.ElapsedMilliseconds
let v = a 0 (testArray)
for i = 1 to (n) do
a i testArray |> ignore
let d = stopWatch.ElapsedMilliseconds - t
printfn "%s, elapsed %d ms, result %A" name d v
let add_k_list x (k_range: int list) =
let rec add k_range x acc =
match k_range with
| [] -> acc
| k::ks -> let y = x ^^^ k
if (y < k || y > d) then
add ks x (acc + 1L)
else
add ks x acc
add k_range x 0L
let add_k_list_mutable x (k_range: int list) =
let mutable count = 0L
for k in k_range do
let y = x ^^^ k
if (y < k || y > d) then
count <- count + 1L
count
let add_k_array x (k_range: int []) =
let mutable count = 0L
for k in k_range do
let y = x ^^^ k
if (y < k || y > d) then
count <- count + 1L
count
[<EntryPoint>]
let main argv =
let x = 5
timeItArr "add_k_array" add_k_array
timeIt "add_k_list" add_k_list
timeIt "add_k_list_mutable" add_k_list_mutable
printfn "%A" argv
0 // return an integer exit code
EDIT: The above test was run on 32bit Release mode in VS2015. At the suggestion of s952163, I ran it at 64 bit and found the results differ quite a bit:
add_k_list, elapsed 17918 ms, result 0L
add_k_list_mutable, elapsed 17898 ms, result 0L
add_k_array, elapsed 8261 ms, result 0L
I'm especially surprised that the difference between using tail recursion with an accumulator vs a mutable variable seems to have disappeared.
When running a slightly modified program (posted below) these are the numbers I received:
x64 Release .NET 4.6.1
TestRun: Total: 1000000000, Outer: 100, Inner: 10000000
add_k_array, elapsed 1296 ms, accumulated result 495000099L
add_k_list, elapsed 2675 ms, accumulated result 495000099L
add_k_list_mutable, elapsed 2678 ms, accumulated result 495000099L
TestRun: Total: 1000000000, Outer: 1000, Inner: 1000000
add_k_array, elapsed 869 ms, accumulated result 499624318L
add_k_list, elapsed 2486 ms, accumulated result 499624318L
add_k_list_mutable, elapsed 2483 ms, accumulated result 499624318L
TestRun: Total: 1000000000, Outer: 10000, Inner: 100000
add_k_array, elapsed 750 ms, accumulated result 507000943L
add_k_list, elapsed 1602 ms, accumulated result 507000943L
add_k_list_mutable, elapsed 1603 ms, accumulated result 507000943L
x86 Release .NET 4.6.1
TestRun: Total: 1000000000, Outer: 100, Inner: 10000000
add_k_array, elapsed 1601 ms, accumulated result 495000099L
add_k_list, elapsed 2014 ms, accumulated result 495000099L
add_k_list_mutable, elapsed 1835 ms, accumulated result 495000099L
TestRun: Total: 1000000000, Outer: 1000, Inner: 1000000
add_k_array, elapsed 1495 ms, accumulated result 499624318L
add_k_list, elapsed 1714 ms, accumulated result 499624318L
add_k_list_mutable, elapsed 1595 ms, accumulated result 499624318L
TestRun: Total: 1000000000, Outer: 10000, Inner: 100000
add_k_array, elapsed 1363 ms, accumulated result 507000943L
add_k_list, elapsed 1406 ms, accumulated result 507000943L
add_k_list_mutable, elapsed 1221 ms, accumulated result 507000943L
(As usual it's important to not run with the debugger attached as that changes how the JIT:er works. With debugger attached the JIT:er produces code that is easier for the debugger but also slower.)
The way this works is that the total number of iterations is kept constant but it varies the count of the outer loop and the size of the list/array.
For me the only measurement that is odd is that the array loop is worse in some cases than the list loop.
If the total amount of work is the same why do we see different results when outer/inner is varied?
The answer is most likely related to the CPU cache. When we iterate over an array of size 10,000,000 the actual size of it in memory is 40,000,000 bytes. My machine have "just" 6,000,000 bytes of L3 cache. When the array size is 1,000,000 the size of the array is 4,000,000 bytes which can fit in L3.
The list type in F# is essentially a single-linked list and a rough estimate of the list element is 4(data)+8(64bit pointer)+8(vtable pointer)+4(heap overhead) = 24 bytes. With this estimate the size of a list with 10,000,000 elements is 240,000,000 bytes and with size 1,000,000 elements the size is 24,000,000. Neither fit in the L3 cache on my machine.
When the number of element is 100,000 the size of the array is 400,000 bytes and the list size is 2,400,000. Both fit snugly into the L3 cache.
This reasoning can explain the difference in performance between smaller array/lists compared to bigger ones.
If the elements for the list is not allocated sequentially (ie the heap is fragmented or the GC moved them around) the performance of the list is expected to be much worse if it doesn't fit into the cache because the CPU prefetch strategy no longer works then. The elements in an array is guaranteed to always be sequential thus prefetch will work fine if you iterate sequentially.
Why is tail-recursion slower than the mutable for loop?
This actually isn't true in F# 3 where the for loop is expected to be much slower than the tail-recursion.
For an hint of the answer I used ILSpy to look at the generated IL code.
I found that FSharpList<>::get_TailOrNull() is called twice per loop when using tail-recursion. Once to check if we reached the end and once to get the next element (redundant call).
The for loop version only calls FSharpList<>::get_TailOrNull() once.
This extra call likely explains why tail-recursion is slower but as you noted in x64 bit mode both list versions were about as fast. What's going on?
I checked the JIT:ed assembly code and noted that the x64 JIT:er eliminated the extra call to FSharpList<>::get_TailOrNull(). The x86 JIT:er failed to eliminate the call.
Lastly, why is array version slower than the list version on x86?
In general I expect that arrays to have the least overhead of all collection in .NET. The reason is that it's compact, sequential and there are special instructions in ILAsm to access the elements.
So it's suprising to me that lists performs better in some cases.
Checking the assembly code again what it seems to come from that the array version requires an extra variable to perform its work and the x86 CPU has few registers available leading to an extra read from the stack per iteration. x64 has significantly more registers leading to that the array version only has to read once from memory per iteration where the list version reads twice (head and tail).
Any conclusions?
When it comes to CPU performance x64 is the way to go (this hasn't always been the case)
Expects arrays to perform better than any data structure in .NET for operations where array operations are O(1) (inserts are slow obviously)
The devil is in the details meaning in order to gain true insight we might need to check the assembly code.
Cache locality is very important for large collections. Since arrays are compact and guaranteed to be sequential they are often a good choice.
It's very difficult to predict performance, always measure
Iterate towards zero when possible if you are really hungry for performance. This can save one read from memory.
EDIT: OP wondered why it seemed x86 lists performed better x64 lists
I reran the perf tests with list/array size set to 1,000. This will make sure the entire data structure fit into my L1 Cache (256kB)
x64 Release .NET 4.6.1
TestRun: Total: 1000000000, Outer: 1000000, Inner: 1000
add_k_array, elapsed 1062 ms, accumulated result 999499999L
add_k_list, elapsed 1134 ms, accumulated result 999499999L
add_k_list_mutable, elapsed 1110 ms, accumulated result 999499999L
x86 Release .NET 4.6.1
TestRun: Total: 1000000000, Outer: 1000000, Inner: 1000
add_k_array, elapsed 1617 ms, accumulated result 999499999L
add_k_list, elapsed 1359 ms, accumulated result 999499999L
add_k_list_mutable, elapsed 1100 ms, accumulated result 999499999L
We see that for this size it seems x64 is performing about as well or better than x86. Why do we see the opposite in the other measurements? I speculate that this is because the size of the list elements are larger in the x64 versions meaning we use more bandwidth moving data from L3 to L1.
So if you can try to make sure your data fits into L1 cache.
Final musings
When working with these sort of questions I sometimes wonder if the whole Von Neumann architecture is a big mistake. Instead we should have a data flow architecture as data is slow and instructions are fast.
AFAIK under the hood CPU:s have a data flow architecture. The assembly language though looks like one would expect from a Von Neumann architecture so in some sense it's a high-level abstraction over the data flow architecture. But in order to provide reasonable performant code the CPU die is mostly occupied by cache (~95%). With a pure data flow architecture one would expect a higher percentage of the CPU die would do actual work.
Hope this was interesting, my modified program follows:
open System.Diagnostics
let stopWatch =
let sw = Stopwatch ()
sw.Start ()
sw
let timeIt (name : string) (outer : int) (a : int -> int64) : unit =
let t = stopWatch.ElapsedMilliseconds
let mutable acc = a 0
for i = 2 to outer do
acc <- acc + a i
let d = stopWatch.ElapsedMilliseconds - t
printfn "%s, elapsed %d ms, accumulated result %A" name d acc
let add_k_list x l (k_range: int list) =
let rec add k_range x acc =
match k_range with
| [] -> acc
| k::ks -> let y = x ^^^ k
if (y < k || y > l) then
add ks x (acc + 1L)
else
add ks x acc
add k_range x 0L
let add_k_list_mutable x l (k_range: int list) =
let mutable count = 0L
for k in k_range do
let y = x ^^^ k
if (y < k || y > l) then
count <- count + 1L
count
let add_k_array x l (k_range: int []) =
let mutable count = 0L
for k in k_range do
let y = x ^^^ k
if (y < k || y > l) then
count <- count + 1L
count
[<EntryPoint>]
let main argv =
let total = 1000000000
let outers = [|100; 1000; 10000|]
for outer in outers do
let inner = total / outer
printfn "TestRun: Total: %d, Outer: %d, Inner: %d" total outer inner
ignore <| System.GC.WaitForFullGCComplete ()
let testList = [1..inner]
let testArray = [|1..inner|]
timeIt "add_k_array" outer <| fun x -> add_k_array x inner testArray
timeIt "add_k_list" outer <| fun x -> add_k_list x inner testList
timeIt "add_k_list_mutable" outer <| fun x -> add_k_list_mutable x inner testList
0

finding primes very slow in F#

I have answered Project Euler Question 7 very easily using Sieve of Eratosthenes in C and I had no problem with it.
I am still quite new to F# so I tried implementing the same technique
let prime_at pos =
let rec loop f l =
match f with
| x::xs -> loop xs (l |> List.filter(fun i -> i % x <> 0 || i = x))
| _ -> l
List.nth (loop [2..pos] [2..pos*pos]) (pos-1)
which works well when pos < 1000, but will crash at 10000 with out of memory exception
I then tried changing the algorithm to
let isPrime n = n > 1 && seq { for f in [2..n/2] do yield f } |> Seq.forall(fun i -> n % i <> 0)
seq {for i in 2..(10000 * 10000) do if isPrime i then yield i} |> Seq.nth 10000 |> Dump
which runs successfully but still takes a few minutes.
If I understand correctly the first algorithm is tail optimized so why does it crash? And how can I write an algorithm that runs under 1 minute (I have a fast computer)?
Looking at your first attempt
let prime_at pos =
let rec loop f l =
match f with
| x::xs -> loop xs (l |> List.filter(fun i -> i % x <> 0 || i = x))
| _ -> l
List.nth (loop [2..pos] [2..pos*pos]) (pos-1)
At each loop iteration, you are iterating over and creating a new list. This is very slow as list creation is very slow and you don't see any benefits from the cache. Several obvious optimisations such as the factor list skipping the even numbers, are skipped. When pos=10 000 you are trying to create a list which will occupy 10 000 * 10 000 * 4 = 400MB of just integers and a further 800MB of pointers (F# lists are linked lists). Futhermore, as each list element takes up a very small amount of memory there will probably be significant overhead for things like GC overhead. In the function you then create a new list of smiliar size. As a result, I am not surprised that this causes OutOfMemoryException.
Looking at the second example,
let isPrime n =
n > 1 &&
seq { for f in [2..n/2] do yield f }
|> Seq.forall(fun i -> n % i <> 0)
Here, the problem is pretty similar as you are generating giant lists for each element you are testing.
I have written a quite fast F# sieve here https://stackoverflow.com/a/12014908/124259 which shows how to do this faster.
As already mentioned by John, your implementation is slow because it generates some temporary data structures.
In the first case, you are building a list, which needs to be fully created in memory and that introduces significant overhead.
In the second case, you are building a lazy sequence, which does not consume memory (because it is build while it is being iterated), but it still introduces indirection that slows the algorithm down.
In most cases in F#, people tend to prefer readability and so using sequences is a nice way to write the code, but here you probably care more about performance, so I'd avoid sequences. If you want to keep the same structure of your code, you can rewrite isPrime like this:
let isPrime n =
let rec nonDivisible by =
if by = 1 then true // Return 'true' if we reached the end
elif n%by = 0 then false // Return 'false' if there is a divisor
else nonDivisible (by - 1) // Otherwise continue looping
n > 1 && nonDivisible (n/2)
This just replaces the sequence and forall with a recursive function nonDivisible that returns true when the number n is not divisible by any number between 2 and n/2. The function first checks the two termination cases and otherwise performs a recursive call..
With the original implementation, I'm able to find 1000th prime in 1.5sec and with the new one, it takes 22ms. Finding 10000th prime with the new implementation takes 3.2sec on my machine.

Slow tail recursion in F#

I have an F# function that returns a list of numbers starting from 0 in the pattern of skip n, choose n, skip n, choose n... up to a limit. For example, this function for input 2 will return [2, 3, 6, 7, 10, 11...].
Initially I implemented this as a non-tail-recursive function as below:
let rec indicesForStep start blockSize maxSize =
match start with
| i when i > maxSize -> []
| _ -> [for j in start .. ((min (start + blockSize) maxSize) - 1) -> j] # indicesForStep (start + 2 * blockSize) blockSize maxSize
Thinking that tail recursion is desirable, I reimplemented it using an accumulator list as follows:
let indicesForStepTail start blockSize maxSize =
let rec indicesForStepInternal istart accumList =
match istart with
| i when i > maxSize -> accumList
| _ -> indicesForStepInternal (istart + 2 * blockSize) (accumList # [for j in istart .. ((min (istart + blockSize) maxSize) - 1) -> j])
indicesForStepInternal start []
However, when I run this in fsi under Mono with the parameters 1, 1 and 20,000 (i.e. should return [1, 3, 5, 7...] up to 20,000), the tail-recursive version is significantly slower than the first version (12 seconds compared to sub-second).
Why is the tail-recursive version slower? Is it because of the list concatenation? Is it a compiler optimisation? Have I actually implemented it tail-recursively?
I also feel as if I should be using higher-order functions to do this, but I'm not sure exactly how to go about doing it.
As dave points out, the problem is that you're using the # operator to append lists. This is more significant performance issue than tail-recursion. In fact, tail-recursion doesn't really speed-up the program too much (but it makes it work on large inputs where the stack would overflow).
The reason why you'r second version is slower is that you're appending shorter list (the one generated using [...]) to a longer list (accumList). This is slower than appending longer list to a shorter list (because the operation needs to copy the first list).
You can fix it by collecting the elements in the accumulator in a reversed order and then reversing it before returning the result:
let indicesForStepTail start blockSize maxSize =
let rec indicesForStepInternal istart accumList =
match istart with
| i when i > maxSize -> accumList |> List.rev
| _ ->
let acc =
[for j in ((min (istart + blockSize) maxSize) - 1) .. -1 .. istart -> j]
# accumList
indicesForStepInternal (istart + 2 * blockSize) acc
indicesForStepInternal start []
As you can see, this has the shorter list (generated using [...]) as the first argument to # and on my machine, it has similar performance to the non-tail-recursive version. Note that the [ ... ] comprehension generates elements in the reversed order - so that they can be reversed back at the end.
You can also write the whole thing more nicely using the F# seq { .. } syntax. You can avoid using the # operator completely, because it allows you to yield individual elemetns using yield and perform tail-recursive calls using yield!:
let rec indicesForStepSeq start blockSize maxSize = seq {
match start with
| i when i > maxSize -> ()
| _ ->
for j in start .. ((min (start + blockSize) maxSize) - 1) do
yield j
yield! indicesForStepSeq (start + 2 * blockSize) blockSize maxSize }
This is how I'd write it. When calling it, you just need to add Seq.toList to evaluate the whole lazy sequence. The performance of this version is similar to the first one.
EDIT With the correction from Daniel, the Seq version is actually slightly faster!
In F# the list type is implemented as a singly linked list. Because of this you get different performance for x # y and y # x if x and y are of different length. That's why your seeing a difference in performance. (x # y) has running time of X.length.
// e.g.
let x = [1;2;3;4]
let y = [5]
If you did x # y then x (4 elements) would be copied into a new list and its internal next pointer would be set to the existing y list. If you did y # x then y (1 element) would be copied into a new list and its next pointer would be set to the existing list x.
I wouldn't use a higher order function to do this. I'd use list comprehension instead.
let indicesForStepTail start blockSize maxSize =
[
for block in start .. (blockSize * 2) .. (maxSize - 1) do
for i in block .. (block + blockSize - 1) do
yield i
]
This looks like the list append is the problem. Append is basically an O(N) operation on the size of the first argument. By accumulating on the left, this operation takes O(N^2) time.
The way this is typically done in functional code seems to be to accumulate the list in reverse order (by accumulating on the right), then at the end, return the reverse of the list.
The first version you have avoids the append problem, but as you point out, is not tail recursive.
In F#, probably the easiest way to solve this problem is with sequences. It is not very functional looking, but you can easily create an infinite sequence following your pattern, and use Seq.take to get the items you are interested in.

Filter elements in a list by length - Ocaml

I have the following list:
["A";"AA";"ABC";"BCD";"B";"C"]
I am randomly extracting an element from the list. But the element I extract should be of size 3 only not lesser than 3.
I am trying to do this as follows:
let randomnum = (Random.int(List.length (list)));;
let rec code c =
if (String.length c) = 3 then c
else (code ((List.nth (list) (randomnum)))) ;;
print_string (code ( (List.nth (list) (randomnum)))) ;;
This works fine if randomly a string of length 3 is picked out from the list.
But the program does not terminate if a string of length < 3 is picked up.
I am trying to do a recursive call so that new code keeps getting picked up till we get one of length = 3.
I am unable to figure out why this is does not terminate. Nothing gets output by the print statement.
What you probably want to write is
let rec code list =
let n = Random.int (List.length list) in
let s = List.nth list in
if String.length s < 3 then code list else s
Note that, depending on the size of the list and the number of strings of size greater than 3, you might want to work directly on a list with only strings greater than 3:
let code list =
let list = List.filter (fun s -> String.length s >= 3) list in
match list with
| [] -> raise Not_found
| _ -> List.nth list (Random.int (List.length list))
This second function is better, as it always terminate, especially when there are no strings greater than 3.
You only pick a random number once. Say you pick 5. You just keep recursing with 5 over and over and over. You need to get a new random number.
For your code to terminate, it would be better to first filter the list for suitable elements, then take your random number:
let code list =
let suitables = List.filter (fun x -> String.length x = 3) list in
match List.length suitables with
| 0 -> raise Not_found (* no suitable elements at all! *)
| len -> List.nth suitables (Random.int len)
Otherwise your code would take very long to terminate on a large list of elements with size <> 3; or worse on a list with no element of size 3, it would not terminate at all!

Resources