Why is my Trie lookup slower than that of the standard F# Map's? - performance

So, I just ported the Trie from OCaml. Unfortunately, it runs slower than the standard Map in terms of tryFind. I don't understand this - the trie seems like it should be faster. Is F#'s code libraries built in some special way as to make them faster than the code that the user typically deploys?
Here's the code -
[<RequireQualifiedAccess>]
module Trie
type Node<'k, 'v when 'k : comparison> =
{ TrieMap : Map<'k, Node<'k, 'v>>
TrieKvp : ('k list * 'v) option }
member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty
let inline make map kvp =
{ TrieMap = map
TrieKvp = kvp }
let inline makeEmpty () : Node<'k, 'v> = make Map.empty None
let inline isEmpty (node : Node<'k, 'v>) = node.IsEmpty
let rec tryFind (key : 'k list) node =
if key.IsEmpty then
match node.TrieKvp with
| Some (_, value) -> Some value
| None -> None
else
let keyHead = key.Head
let keyTail = key.Tail
let optSubNode = Map.tryFind keyHead node.TrieMap
match optSubNode with
| Some subNode -> tryFind keyTail subNode
| None -> None
let inline containsKey key node =
(tryFind key node).IsSome
let rec addInternal (key : 'k list) value node =
if key.IsEmpty then make node.TrieMap (Some (key, value))
else
let keyHead = key.Head
let keyTail = key.Tail
let newTrie =
match Map.tryFind keyHead node.TrieMap with
| Some subTrie -> subTrie
| None -> makeEmpty ()
let newTrie2 = addInternal keyTail value newTrie
make (Map.add keyHead newTrie2 node.TrieMap) node.TrieKvp
let inline add key value node =
addInternal key value node
let rec addMany kvps node =
if Seq.isEmpty kvps then node
else
let kvpHead = Seq.head kvps
let kvpTail = Seq.skip 1 kvps
let newTrie = add (fst kvpHead) (snd kvpHead) node
addMany kvpTail newTrie
let inline ofList kvps =
addMany kvps (makeEmpty ())
let inline ofListBy by kvps =
let pairs = List.map by kvps
ofList pairs
let rec foldInternal folder rev node state =
match node.TrieKvp with
| Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value
| None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap
let inline fold folder state node =
foldInternal folder [] node state
let rec map (mapper : 'k list -> 'v -> 'a) (node : Node<'k, 'v>) : Node<'k, 'a> =
match node.TrieKvp with
| Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value))
| None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None
let inline toValueList node =
fold (fun state _ value -> value :: state) [] node
let inline singleton (key, value) =
add key value (makeEmpty ())
Here's a performance test that Jon Harrop provided that I find adequate for measuring improvements -
let xs = Array.init 1000000 (fun i -> [i])
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Trie.makeEmpty()
for i=0 to xs.Length-1 do
t <- Trie.add xs.[i] xs.[i] t
printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Trie.tryFind xs.[i])
printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Map.empty
for i=0 to xs.Length-1 do
t <- Map.add xs.[i] xs.[i] t
printfn "Map took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Map.tryFind xs.[i])
printfn "Map took %fs to search" timer.Elapsed.TotalSeconds
NOTE: if you have a faster lookup data structure in mind, please note that I need a persistent data structure.

Unfortunately, it runs slower than the standard Map in terms of tryFind. I don't understand this - the trie seems like it should be faster.
A quick benchmark here suggests that your trie is already faster than Map for at least simple case:
do
let n = 0
let xs = Array.init 1000000 (fun i -> [i])
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Trie.makeEmpty()
for i=0 to xs.Length-1 do
t <- Trie.add xs.[i] xs.[i] t
printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Trie.tryFind xs.[i])
printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Map.empty
for i=0 to xs.Length-1 do
t <- Map.add xs.[i] xs.[i] t
printfn "Map took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Map.tryFind xs.[i])
printfn "Map took %fs to search" timer.Elapsed.TotalSeconds
I get 4s to build your Trie, 8.7s to build a Map and about 0.7 to search in both cases.
However, there is a lot of room for improvement in your implementation. I recently wrote an article about an optimized generic persistent hash trie implementation in F# that was published here.
Your later comments imply that you only want to use this to map over strings. If so, it would be vastly more efficient to specialize your trie for string keys.
EDIT
KVB suggested that I elaborate on the "room for improvement" so here's some feedback:
Use inline sparingly as an optimization and only on the basis of compelling performance measurements.
Make empty a value rather than a function.
Avoid List.head and List.tail whenever possible. Use pattern matching instead.
Avoid genericity when possible (e.g. hard-code for string keys in this case).

Alright, so after a little more thinking, I hypothesized that the real difference in performance is in the use of lists for keys as opposed to strings. Strings (and array) have much better cache coherency. So, I changed the key from a 'k list to a string and voila! Performance is now actually better than the Map in my application!
Here's the code -
[<RequireQualifiedAccess>]
module StringTrie
type Node<'v> =
{ TrieMap : Map<char, Node<'v>>
TrieKvp : (string * 'v) option }
member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty
let inline make map kvp =
{ TrieMap = map
TrieKvp = kvp }
let inline makeEmpty () : Node<'v> = make Map.empty None
let inline isEmpty (node : Node<'v>) = node.IsEmpty
let rec tryFindInternal (key : string) index node =
if key.Length = index then
match node.TrieKvp with
| Some (_, value) -> Some value
| None -> None
else
let optSubNode = Map.tryFind key.[index] node.TrieMap
match optSubNode with
| Some subNode -> tryFindInternal key (index + 1) subNode
| None -> None
let inline tryFind (key : string) node =
tryFindInternal key 0 node
let inline containsKey key node =
(tryFind key node).IsSome
let rec addInternal (key : string) index value node =
if key.Length = index then make node.TrieMap (Some (key, value))
else
let char = key.[index]
let newTrie =
match Map.tryFind char node.TrieMap with
| Some subTrie -> subTrie
| None -> makeEmpty ()
let newTrie2 = addInternal key (index + 1) value newTrie
make (Map.add char newTrie2 node.TrieMap) node.TrieKvp
let inline add key value node =
addInternal key 0 value node
let rec addMany kvps node =
if Seq.isEmpty kvps then node
else
let kvpHead = Seq.head kvps
let kvpTail = Seq.skip 1 kvps
let newTrie = add (fst kvpHead) (snd kvpHead) node
addMany kvpTail newTrie
let inline ofList kvps =
addMany kvps (makeEmpty ())
let inline ofListBy by kvps =
let pairs = List.map by kvps
ofList pairs
let rec foldInternal folder rev node state =
match node.TrieKvp with
| Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value
| None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap
let inline fold folder state node =
foldInternal folder [] node state
let rec map (mapper : string -> 'v -> 'a) (node : Node<'v>) : Node<'a> =
match node.TrieKvp with
| Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value))
| None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None
let inline toValueList node =
fold (fun state _ value -> value :: state) [] node
let inline singleton (key, value) =
add key value (makeEmpty ())
I also built a version that works for arrays in general and is also fast -
[<RequireQualifiedAccess>]
module ArrayTrie
type Node<'k, 'v when 'k : comparison> =
{ TrieMap : Map<'k, Node<'k, 'v>>
TrieKvp : ('k array * 'v) option }
member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty
let inline make map kvp =
{ TrieMap = map
TrieKvp = kvp }
let inline makeEmpty () : Node<'k, 'v> = make Map.empty None
let inline isEmpty (node : Node<'k, 'v>) = node.IsEmpty
let rec tryFindInternal (key : 'k array) index node =
if key.Length = index then
match node.TrieKvp with
| Some (_, value) -> Some value
| None -> None
else
let optSubNode = Map.tryFind key.[index] node.TrieMap
match optSubNode with
| Some subNode -> tryFindInternal key (index + 1) subNode
| None -> None
let inline tryFind (key : 'k array) node =
tryFindInternal key 0 node
let inline containsKey key node =
(tryFind key node).IsSome
let rec addInternal (key : 'k array) index value node =
if key.Length = index then make node.TrieMap (Some (key, value))
else
let char = key.[index]
let newTrie =
match Map.tryFind char node.TrieMap with
| Some subTrie -> subTrie
| None -> makeEmpty ()
let newTrie2 = addInternal key (index + 1) value newTrie
make (Map.add char newTrie2 node.TrieMap) node.TrieKvp
let inline add key value node =
addInternal key 0 value node
let rec addMany kvps node =
if Seq.isEmpty kvps then node
else
let kvpHead = Seq.head kvps
let kvpTail = Seq.skip 1 kvps
let newTrie = add (fst kvpHead) (snd kvpHead) node
addMany kvpTail newTrie
let inline ofList kvps =
addMany kvps (makeEmpty ())
let inline ofListBy by kvps =
let pairs = List.map by kvps
ofList pairs
let rec foldInternal folder rev node state =
match node.TrieKvp with
| Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value
| None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap
let inline fold folder state node =
foldInternal folder [] node state
let rec map (mapper : 'k array -> 'v -> 'a) (node : Node<'k, 'v>) : Node<'k, 'a> =
match node.TrieKvp with
| Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value))
| None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None
let inline toValueList node =
fold (fun state _ value -> value :: state) [] node
let inline singleton (key, value) =
add key value (makeEmpty ())
The only thing left that seem like it would improve performance is to get an internal pointer to the string and inc that rather than doing indexes over and over. This doesn't seem easy in F#, but seems at least possible for arrays in C#.

Why wouldn't it be? How about OCaml, is it any faster there? Since your Trie is implemented in terms of Map I would expect it to be slower than Map for at least some inputs. It can still perhaps outperform Map in some cases, for example when the size is very large.
Also, if your primary interest is lookup performance, why not freeze your Trie to use Dictionary-based nodes?

Related

How do I make this Guid-to-hash function more concise?

I came up with a Guid-to-hash function in F# as shown below, but I think it is too verbose. How can I make it more concise?
let digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
let nbase = bigint digits.Length
let zero = bigint.Zero
let hash (g: System.Guid) =
(g.ToByteArray(), [| 0x00uy |])
||> Array.append
|> bigint
|> Array.unfold (fun d ->
if d = zero then
None
else
let (n, r) = bigint.DivRem(d, nbase)
Some (r, n))
|> Array.rev
|> Array.skipWhile ((=) zero)
|> Array.map (fun b -> digits.[int b])
|> System.String
There's a little bit of cleanup you can do like so:
let hash (g: System.Guid) =
[| yield! g.ToByteArray(); 0x00uy |]
|> bigint
|> Array.unfold (fun d ->
if d = zero then
None
else
let (n, r) = bigint.DivRem(d, nbase)
Some (r, n))
|> Array.choose (fun b -> if b = zero then None else Some digits[int b])
|> System.String
But in general I'm not sure how to make this more succinct based on how I understand you want your hash function to operate.
That being said, if I wanted to hash a guid I'd just use the built-in hash function.
I wouldn't call this a hash function, as its output is usually fixed length and does not uniquely identify keys (but still has good uniformity over the key space). It's more like a conversion of the data type nigint to and from other data types, System.Guid and string (with an encoding added).
For conciseness, it appears that recursion with a list accumulator is quite suitable, as it does also the reversal to get the most significant character into the first position.
module Bigint =
let toString alphabet input =
let nbase = bigint(String.length alphabet)
let rec aux acc x =
if x = 0I then acc
else aux (alphabet.[int(x % nbase)]::acc) (x / nbase)
System.String(Array.ofList(aux [] input))
let fromString alphabet (input : string) =
let nbase = bigint(String.length alphabet)
let d = alphabet |> Seq.mapi (fun i c -> c, bigint i) |> dict
input.ToCharArray()
|> Array.rev
|> Array.mapi (fun i c -> pown nbase i * d.[c])
|> Array.sum
let fromGuid (g : System.Guid) =
bigint[| yield! g.ToByteArray(); yield 0uy |]
let toGuid (bi : bigint) =
let b = bi.ToByteArray()
if Array.length b > 16 then b.[..15]
else Array.append (Array.zeroCreate (16 - Array.length b)) b
|> fun a -> System.Guid a
The output is variable length, the zero Guid will give you an empty string. Testing:
let alphabet2 = "01"
Bigint.toString alphabet2 (bigint 0x181) // 385
// val it : System.String = "110000001"
|> Bigint.fromString alphabet2
// val it : System.Numerics.BigInteger = 385
let alphabet16 =
System.String(Array.concat[|[|'0'..'9'|];[|'A'..'F'|]|])
Bigint.toString alphabet16 (bigint 0x181) // 385
// val it : System.String = "181"
|> Bigint.fromString alphabet16
// val it : System.Numerics.BigInteger = 385
let alphabet62 =
System.String(Array.concat[[|'0'..'9'|];[|'A'..'Z'|];[|'a'..'z'|]])
let guid0 = System.Guid()
// val guid : System.Guid = 00000000-0000-0000-0000-000000000000
Bigint.fromGuid guid0
|> Bigint.toString alphabet62
// val it : System.String = ""
|> Bigint.fromString alphabet62
// val it : System.Numerics.BigInteger = 0
|> Bigint.toGuid
// val guid : System.Guid = 00000000-0000-0000-0000-000000000000
let guid1 = System.Guid.NewGuid()
// val guid1 : System.Guid = bf9f89db-e307-4dcb-b734-a4bbb61a8365
Bigint.fromGuid guid1
|> Bigint.toString alphabet62
// val it : System.String = "35Y8fherqxCBOydJ1VN9UR"
|> Bigint.fromString alphabet62
// val it : System.Numerics.BigInteger =
// 134932760282992047737587898291171854811
|> Bigint.toGuid
// val it : System.Guid = bf9f89db-e307-4dcb-b734-a4bbb61a8365
If you want the conversion with a fixed length, Base64 encoding might be an alternative:
System.Convert.ToBase64String(guid1.ToByteArray()).[..21]
// val it : string = "24mfvwfjy023NKS7thqDZQ"
|> fun s -> System.Guid(System.Convert.FromBase64String(s + "=="))
// val it : System.Guid = bf9f89db-e307-4dcb-b734-a4bbb61a8365

F# - Algorithm and strings

Let's says I have a string of a length N that contains only 0 or 1. I want to split that string in multiples strings and each string should contains only one digit.
Example:
00011010111
Should be split into:
000
11
0
1
0
111
The only solution I can think of if using a for loop with a string builder (Written in pseudo code below, more c# like sorry):
result = new list<string>
tmpChar = ""
tmpString = ""
for each character c in MyString
if tmpchar != c
if tmpsString != ""
result.add tmpString
tmpString = ""
endIf
tmpchar = c
endIf
tmpString += tmpChar
endFor
Do you have any other solution and maybe a clever solution that use a more functional approach?
I think Seq.scan would be a good fit for this, this is a very procedural problem in nature, preserving the order like that. But here is code that I believe does what you are asking.
"00011010111"
|> Seq.scan (fun (s, i) x ->
match s with
| Some p when p = x -> Some x, i
| _ -> Some x, i + 1 ) (None, 0)
|> Seq.countBy id
|> Seq.choose (function
| (Some t, _), n -> Some(t, n)
| _ -> None )
|> Seq.toList
Perhaps something along the lines of:
let result =
let rec groupWhileSame xs result =
match xs with
| a when a |> Seq.isEmpty -> result
| _ ->
let head = xs |> Seq.head
let str = xs |> Seq.takeWhile ((=) head)
let rest = xs |> Seq.skipWhile ((=) head)
groupWhileSame rest (Seq.append result [str])
groupWhileSame (myStr) []
Seq.fold (fun (acc:(string list)) x ->
match acc with
| y::rst when y.StartsWith(string x) -> (string x) + y::rst
| _ -> (string x)::acc)
[]
"00011010111"
Consider this function (which is generic):
let chunk s =
if Seq.isEmpty s then []
else
let rec chunk items chunks =
if Seq.isEmpty items then chunks
else
let chunks' =
match chunks with
| [] -> [(Seq.head items, 1)]
| x::xs ->
let c,n = x in let c' = Seq.head items in
if c = c' then (c, n + 1) :: xs else (c', 1) :: x :: xs
chunk (Seq.tail items) chunks'
chunk s [] |> List.rev
It returns a list of tuples, where each tuple represents an item and its repetitions.
So
"00011010111" |> Seq.toList |> chunk
actually returns
[('0', 3); ('1', 2); ('0', 1); ('1', 1); ('0', 1); ('1', 3)]
Basically, we're doing run length encoding (which is admittedly a bit wasteful in the case of your example string).
To get the list of strings that you want, we use code like following:
"00011010111"
|> Seq.toList
|> chunk
|> List.map (fun x -> let c,n = x in new string(c, n))
Here's a working version of OP's proposal with light syntax:
let chunk (s: string) =
let result = System.Collections.Generic.List<string>()
let mutable tmpChar = ""
let mutable tmpString = ""
for c in s do
if tmpChar <> string c then
if tmpString <> "" then
result.Add tmpString
tmpString <- ""
tmpChar <- string c
tmpString <- tmpString + tmpChar
result.Add tmpString
result
No attempt was made to follow a functional style.

How to abstract non-linear list iteration schemes into a re-usable algorithm?

On the odd chance, that someone has a brilliant idea...
I am not sure if there is a good way to generalize that.
EDIT: I think it might be nice to explain exactly what the inputs and outputs are. The code below is only how I approached the solution.
Inputs: data, recipe
data: set of string, string list here also called "set of named lists"
recipe: list of commands
Command Print (literal|list reference)
Adds the literal to the output or if it is a list reference, it adds the head of the referenced list to the output.
Command While (list reference)
when referenced list not empty --> next command
when referenced list empty --> skip entries in recipe list past the matching Wend.
Command Wend (list reference)
replace referenced list with tail (reference list)
when referenced list is empty, next command
when referenced list is not empty, next command is the matching while above.
Outputs: string list
The best answer is the implementation of that which is shortest and which allows to re-use that algorithm in new contexts.
This is not just a programming problem for the fun of it, btw. It is basically what happens if you try to implement data driven text templating.
The code below is my attempt to solve this problem.
The first code snippet is a non-generalized solution.
The second code snippet is an attempt to isolate the algorithm.
If you play with the code, simply paste the second snippet below the first snippet and both versions are working.
The whole topic is about understanding better how to separate the iteration algorithm from the rest of the code and then to simply apply it, in contrast of having all the other code within.
Would it not be great, if there was a way to abstract the way the statements are being processed and the looping of the while/wend, such,
that it can be reused in my main code, just as I keep re-using other "iteration schemes", such as List.map?
The commonalities between my main code and this study are:
An evolving "environment" which is threaded through all steps of the computation.
Collections, which need to be iterated in a well-formed nested manner. (Malformed would be: while x while y wend x wend y)
A series of "execution steps" form the body of each of those "while wend" loops.
Done in a "pure" manner. As you will note, nothing is mutable in the study. Want to keep it like that.
Each "While" introduces a new scope (as for binding values), which is discarded again, once the while loop is done.
So, I am looking for something like:
run: CommandClassifier -> commandExecutor -> Command list -> EnvType -> EnvType
where
CommandClassifier could be a function of the form Command -> NORMAL|LOOP_START|LOOP_END
and commandexecutor: Command -> EnvType -> EnvType
Of course, nesting of those while-blocks would not be limited to 2 (just tried to keep the testProgram() small).
SideNote: the "commands list" is an AST from a preceding parser run, but that should not really matter.
type MiniLanguage =
| Print of string
| While of string
| Wend of string
let testProgram =
[ Print("Hello, I am your Mini Language program")
While("names")
Print("<names>")
While("pets")
Print("<pets>")
Wend("pets")
Print("Done with pets.")
Wend("names")
Print("Done with names.")
]
type MiniEnvironment = { Bindings : Map<string,string>; Collections : Map<string, string list> }
let init collections =
{ Bindings = Map.empty; Collections = Map.ofList collections}
let bind n v env =
let newBindings =
env.Bindings
|> Map.remove n
|> Map.add n v
{ env with Bindings = newBindings; }
let unbind n env =
{ env with Bindings = Map.remove n env.Bindings; }
let bindingValue n env =
if env.Bindings.ContainsKey n then
Some(env.Bindings.Item n)
else
None
let consumeFirstFromCollection n env =
if env.Collections.ContainsKey n then
let coll = env.Collections.Item n
match coll with
| [] -> env |> unbind n
| _ ->
let first = coll.Head
let newCollections =
env.Collections
|> Map.remove n
|> Map.add n coll.Tail
{ env with Collections = newCollections }
|> bind n first
else failwith ("Unknown collection: " + n)
// All do functions take env - the execution environment - as last argument.
// All do functions return (a new) env as single return parameter.
let rec doPrint (s : string) env =
if s.StartsWith("<") && s.EndsWith(">") then
match bindingValue (s.Substring (1, s.Length - 2 )) env with
| Some(v) -> v
| _ -> s
else s
|> printfn "%s"
env
let rec skipPastWend name code =
match code with
| (Wend(cl) :: rest) when cl = name -> rest
| [] -> failwith "No Wend found!"
| (_ :: rest) -> skipPastWend name rest
let rec doWhileX name code env =
match code with
| (Print(s) :: rest) -> env |> (doPrint s) |> doWhileX name rest
| (While(cn) :: rest) -> env |> doWhile cn rest |> ignore; env |> doWhileX name (skipPastWend cn rest)
| (Wend(cn) :: rest) when cn = name -> env
| [] -> failwith ("While without Wend for: " + name)
| _ -> failwith ("nested while refering to same collection!")
and doWhile name code env =
let e0 = env |> consumeFirstFromCollection name
match bindingValue name e0 with
| Some(s) ->
e0 |> doWhileX name code |> doWhile name code
| None -> env
let rec run (program : MiniLanguage list) env =
match program with
| (Print(s) :: rest) -> env |> (doPrint s) |> run rest
| (While(cn) :: rest) ->
env
|> doWhile cn rest |> ignore
env |> run (skipPastWend cn program)
| (Wend(cn) :: rest) -> failwith "wend found in run()"
| [] -> env
let test() =
init [ "names", ["name1"; "name2"; "name3"; ]; "pets", ["pet1"; "pet2"] ]
|> run testProgram
|> printfn "%A"
(*
Running test() yields:
Hello, I am your Mini Language program
name1
pet1
pet2
Done with pets.
name2
pet1
pet2
Done with pets.
name3
pet1
pet2
Done with pets.
Done with names.
{Bindings = map [];
Collections =
map [("names", ["name1"; "name2"; "name3"]); ("pets", ["pet1"; "pet2"])];}
*)
Here my first version of isolating the algorithm. The number of callbacks is not entirely pretty. Can anyone come up with something simpler?
// The only function I had to "modify" to work with new "generalized" algorithm.
let consumeFirstFromCollection1 n env =
if env.Collections.ContainsKey n then
let coll = env.Collections.Item n
match coll with
| [] -> (env |> unbind n , false)
| _ ->
let first = coll.Head
let newCollections =
env.Collections
|> Map.remove n
|> Map.add n coll.Tail
({ env with Collections = newCollections }
|> bind n first , true)
else failwith ("Unknown collection: " + n)
type NamedList<'n,'t when 'n : comparison> = 'n * List<'t>
type Action<'a,'c> = 'c -> 'a -> 'a
type LoopPreparer<'a,'c> = 'c -> 'a -> 'a * bool
type CommandType = | RUN | BEGIN | END
type CommandClassifier<'c> = 'c -> CommandType
type Skipper<'c> = 'c -> List<'c> -> List<'c>
type InterpreterContext<'a,'c> =
{ classifier : CommandClassifier<'c>
executor : Action<'a,'c>
skipper : Skipper<'c>
prepareLoop : LoopPreparer<'a,'c>
isMatchingEnd : 'c -> 'c -> bool
}
let interpret (context : InterpreterContext<'a,'c>) (program : 'c list) (env : 'a) : 'a =
let rec loop front (code : 'c list) e =
let e0,hasData = e |> context.prepareLoop front
if hasData
then
e0
|> loop1 front (code)
|> loop front (code)
else e
and loop1 front code e =
match code with
| x :: more when (context.classifier x) = RUN ->
//printfn "RUN %A" x
e |> context.executor x |> loop1 front more
| x :: more when (context.classifier x) = BEGIN ->
//printfn "BEGIN %A" x
e |> loop x more |> ignore
e |> loop1 front (context.skipper x more)
| x :: more when (((context.classifier x) = END) && (context.isMatchingEnd front x)) -> /// && (context.isMatchingEnd front x)
//printfn "END %A" x
e
| [] -> failwith "No END."
| _ -> failwith "TODO: Not sure which case this is. But it is not a legal one!"
let rec interpr code e =
match code with
| [] -> e
| (first :: rest) ->
match context.classifier first with
| RUN -> env |> context.executor first |> interpr rest
| BEGIN ->
e |> loop first rest |> ignore
e |> interpr (context.skipper first rest)
| END -> failwith "END without BEGIN."
interpr program env
let test1() =
let context : InterpreterContext<MiniEnvironment,MiniLanguage> =
{ classifier = fun c-> match c with | MiniLanguage.Print(_) -> RUN | MiniLanguage.While(_) -> BEGIN | MiniLanguage.Wend(_) -> END;
executor = fun c env -> match c with | Print(s) -> doPrint s env | _ -> failwith "Not a known command.";
skipper = fun c cl -> match c with | While(n) -> skipPastWend n cl | _ -> failwith "first arg of skipper SHALL be While!"
prepareLoop = fun c env -> match c with | While(n) -> (consumeFirstFromCollection1 n env) | _ -> failwith "first arg of skipper SHALL be While!"
isMatchingEnd = fun cwhile cx -> match cwhile,cx with | (While(n),Wend(p)) when n = p -> true | _ -> false
}
init [ "names", ["name1"; "name2"; "name3"; ]; "pets", ["pet1"; "pet2"] ]
|> interpret context testProgram
|> printfn "%A"

Solving knapsack prob in F#: performance

I found an article:
Solving the 0-1 knapsack problem using continuation-passing style with memoization in F#
about knapsack problem implemented in F#. As I'm learning this language, I found this really interesting and tried to investigate this a bit. Here's the code I crafted:
open System
open System.IO
open System.Collections.Generic
let parseToTuple (line : string) =
let parsedLine = line.Split(' ') |> Array.filter(not << String.IsNullOrWhiteSpace) |> Array.map Int32.Parse
(parsedLine.[0], parsedLine.[1])
let memoize f =
let cache = Dictionary<_, _>()
fun x ->
if cache.ContainsKey(x)
then cache.[x]
else
let res = f x
cache.[x] <- res
res
type Item =
{
Value : int
Size : int
}
type ContinuationBuilder() =
member b.Bind(x, f) = fun k -> x (fun x -> f x k)
member b.Return x = fun k -> k x
member b.ReturnFrom x = x
let cont = ContinuationBuilder()
let set1 =
[
(4, 11)
(8, 4)
(10, 5)
(15, 8)
(4, 3)
]
let set2 =
[
(50, 341045); (1906, 4912); (41516, 99732); (23527, 56554); (559, 1818); (45136, 108372); (2625, 6750); (492, 1484)
(1086, 3072); (5516, 13532); (4875, 12050); (7570, 18440); (4436, 10972); (620, 1940); (50897, 122094); (2129, 5558)
(4265, 10630); (706, 2112); (2721, 6942); (16494, 39888); (29688, 71276); (3383, 8466); (2181, 5662); (96601, 231302)
(1795, 4690); (7512, 18324); (1242, 3384); (2889, 7278); (2133, 5566); (103, 706); (4446, 10992); (11326, 27552)
(3024, 7548); (217, 934); (13269, 32038); (281, 1062); (77174, 184848); (952, 2604); (15572, 37644); (566, 1832)
(4103, 10306); (313, 1126); (14393, 34886); (1313, 3526); (348, 1196); (419, 1338); (246, 992); (445, 1390)
(23552, 56804); (23552, 56804); (67, 634)
]
[<EntryPoint>]
let main args =
// prepare list of items from a file args.[0]
let header, items = set1
|> function
| h::t -> h, t
| _ -> raise (Exception("Wrong data format"))
let N, K = header
printfn "N = %d, K = %d" N K
let items = List.map (fun x -> {Value = fst x ; Size = snd x}) items |> Array.ofList
let rec combinations =
let innerSolver key =
cont
{
match key with
| (i, k) when i = 0 || k = 0 -> return 0
| (i, k) when items.[i-1].Size > k -> return! combinations (i-1, k)
| (i, k) -> let item = items.[i-1]
let! v1 = combinations (i-1, k)
let! beforeItem = combinations (i-1, k-item.Size)
let v2 = beforeItem + item.Value
return max v1 v2
}
memoize innerSolver
let res = combinations (N, K) id
printfn "%d" res
0
However, the problem with this implementation is that it's veeeery slow (in practice I'm unable to solve problem with 50 items and capacity of ~300000, which gets solved by my naive implementation in C# in less than 1s).
Could you tell me if I made a mistake somewhere? Or maybe the implementation is correct and this is simply the inefficient way of solving this problem.
When you naively apply a generic memoizer like this, and use continuation passing, the values in your memoization cache are continuations, not regular "final" results. Thus, when you get a cache hit, you aren't getting back a finalized result, you are getting back some function which promises to compute a result when you invoke it. This invocation might be expensive, might invoke various other continuations, might ultimately hit the memoization cache again itself, etc.
Effectively memoizing continuation-passing functions such that a) the caching works to full effect and b) the function remains tail-recursive is quite difficult. Read this discussion and come back when you fully understand it all. ;-)
The author of the blog post you linked is using a more sophisticated, less generic memoizer which is specially fitted to the problem. Admittedly, I don't fully grok it yet (code on the blog is incomplete/broken, so hard to try it out), but I think the gist of it is that it "forces" the chain of continuations before caching the final integer result.
To illustrate the point, here's a quick refactor of your code which is fully self-contained and traces out relevant info:
open System
open System.Collections.Generic
let mutable cacheHits = 0
let mutable cacheMisses = 0
let memoize f =
let cache = Dictionary<_, _>()
fun x ->
match cache.TryGetValue(x) with
| (true, v) ->
cacheHits <- cacheHits + 1
printfn "Hit for %A - Result is %A" x v
v
| _ ->
cacheMisses <- cacheMisses + 1
printfn "Miss for %A" x
let res = f x
cache.[x] <- res
res
type Item = { Value : int; Size : int }
type ContinuationBuilder() =
member b.Bind(x, f) = fun k -> x (fun x -> f x k)
member b.Return x = fun k -> k x
member b.ReturnFrom x = x
let cont = ContinuationBuilder()
let genItems n =
[| for i = 1 to n do
let size = i % 5
let value = (size * i)
yield { Value = value; Size = size }
|]
let N, K = (5, 100)
printfn "N = %d, K = %d" N K
let items = genItems N
let rec combinations_cont =
memoize (
fun key ->
cont {
match key with
| (0, _) | (_, 0) -> return 0
| (i, k) when items.[i-1].Size > k -> return! combinations_cont (i - 1, k)
| (i, k) -> let item = items.[i-1]
let! v1 = combinations_cont (i-1, k)
let! beforeItem = combinations_cont (i-1, k - item.Size)
let v2 = beforeItem + item.Value
return max v1 v2
}
)
let res = combinations_cont (N, K) id
printfn "Answer: %d" res
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""
let rec combinations_plain =
memoize (
fun key ->
match key with
| (i, k) when i = 0 || k = 0 -> 0
| (i, k) when items.[i-1].Size > k -> combinations_plain (i-1, k)
| (i, k) -> let item = items.[i-1]
let v1 = combinations_plain (i-1, k)
let beforeItem = combinations_plain (i-1, k-item.Size)
let v2 = beforeItem + item.Value
max v1 v2
)
cacheHits <- 0
cacheMisses <- 0
let res2 = combinations_plain (N, K)
printfn "Answer: %d" res2
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
As you can see, the CPS version is caching continuations (not integers), and there are is a lot of extra activity going on toward the end as the continuations are invoked.
If you boost the problem size to let (N, K) = (20, 100) (and remove the printfn statements in the memoizer), you will see that the CPS version ends up doing over 1 million cache lookups, compared to plain version doing only a few hundred.
From running this code in FSI:
open System
open System.Diagnostics
open System.Collections.Generic
let time f =
System.GC.Collect()
let sw = Stopwatch.StartNew()
let r = f()
sw.Stop()
printfn "Took: %f" sw.Elapsed.TotalMilliseconds
r
let mutable cacheHits = 0
let mutable cacheMisses = 0
let memoize f =
let cache = Dictionary<_, _>()
fun x ->
match cache.TryGetValue(x) with
| (true, v) ->
cacheHits <- cacheHits + 1
//printfn "Hit for %A - Result is %A" x v
v
| _ ->
cacheMisses <- cacheMisses + 1
//printfn "Miss for %A" x
let res = f x
cache.[x] <- res
res
type Item = { Value : int; Size : int }
type ContinuationBuilder() =
member b.Bind(x, f) = fun k -> x (fun x -> f x k)
member b.Return x = fun k -> k x
member b.ReturnFrom x = x
let cont = ContinuationBuilder()
let genItems n =
[| for i = 1 to n do
let size = i % 5
let value = (size * i)
yield { Value = value; Size = size }
|]
let N, K = (80, 400)
printfn "N = %d, K = %d" N K
let items = genItems N
//let rec combinations_cont =
// memoize (
// fun key ->
// cont {
// match key with
// | (0, _) | (_, 0) -> return 0
// | (i, k) when items.[i-1].Size > k -> return! combinations_cont (i - 1, k)
// | (i, k) -> let item = items.[i-1]
// let! v1 = combinations_cont (i-1, k)
// let! beforeItem = combinations_cont (i-1, k - item.Size)
// let v2 = beforeItem + item.Value
// return max v1 v2
// }
// )
//
//
//cacheHits <- 0
//cacheMisses <- 0
//let res = time(fun () -> combinations_cont (N, K) id)
//printfn "Answer: %d" res
//printfn "Memo hits: %d" cacheHits
//printfn "Memo misses: %d" cacheMisses
//printfn ""
let rec combinations_plain =
memoize (
fun key ->
match key with
| (i, k) when i = 0 || k = 0 -> 0
| (i, k) when items.[i-1].Size > k -> combinations_plain (i-1, k)
| (i, k) -> let item = items.[i-1]
let v1 = combinations_plain (i-1, k)
let beforeItem = combinations_plain (i-1, k-item.Size)
let v2 = beforeItem + item.Value
max v1 v2
)
cacheHits <- 0
cacheMisses <- 0
printfn "combinations_plain"
let res2 = time (fun () -> combinations_plain (N, K))
printfn "Answer: %d" res2
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""
let recursivelyMemoize f =
let cache = Dictionary<_, _>()
let rec memoizeAux x =
match cache.TryGetValue(x) with
| (true, v) ->
cacheHits <- cacheHits + 1
//printfn "Hit for %A - Result is %A" x v
v
| _ ->
cacheMisses <- cacheMisses + 1
//printfn "Miss for %A" x
let res = f memoizeAux x
cache.[x] <- res
res
memoizeAux
let combinations_plain2 =
let combinations_plain2Aux combinations_plain2Aux key =
match key with
| (i, k) when i = 0 || k = 0 -> 0
| (i, k) when items.[i-1].Size > k -> combinations_plain2Aux (i-1, k)
| (i, k) -> let item = items.[i-1]
let v1 = combinations_plain2Aux (i-1, k)
let beforeItem = combinations_plain2Aux (i-1, k-item.Size)
let v2 = beforeItem + item.Value
max v1 v2
let memoized = recursivelyMemoize combinations_plain2Aux
fun x -> memoized x
cacheHits <- 0
cacheMisses <- 0
printfn "combinations_plain2"
let res3 = time (fun () -> combinations_plain2 (N, K))
printfn "Answer: %d" res3
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""
let recursivelyMemoizeCont f =
let cache = Dictionary HashIdentity.Structural
let rec memoizeAux x k =
match cache.TryGetValue(x) with
| (true, v) ->
cacheHits <- cacheHits + 1
//printfn "Hit for %A - Result is %A" x v
k v
| _ ->
cacheMisses <- cacheMisses + 1
//printfn "Miss for %A" x
f memoizeAux x (fun y ->
cache.[x] <- y
k y)
memoizeAux
let combinations_cont2 =
let combinations_cont2Aux combinations_cont2Aux key =
cont {
match key with
| (0, _) | (_, 0) -> return 0
| (i, k) when items.[i-1].Size > k -> return! combinations_cont2Aux (i - 1, k)
| (i, k) -> let item = items.[i-1]
let! v1 = combinations_cont2Aux (i-1, k)
let! beforeItem = combinations_cont2Aux (i-1, k - item.Size)
let v2 = beforeItem + item.Value
return max v1 v2
}
let memoized = recursivelyMemoizeCont combinations_cont2Aux
fun x -> memoized x id
cacheHits <- 0
cacheMisses <- 0
printfn "combinations_cont2"
let res4 = time (fun () -> combinations_cont2 (N, K))
printfn "Answer: %d" res4
printfn "Memo hits: %d" cacheHits
printfn "Memo misses: %d" cacheMisses
printfn ""
I get these results:
N = 80, K = 400
combinations_plain
Took: 7.191000
Answer: 6480
Memo hits: 6231
Memo misses: 6552
combinations_plain2
Took: 6.310800
Answer: 6480
Memo hits: 6231
Memo misses: 6552
combinations_cont2
Took: 17.021200
Answer: 6480
Memo hits: 6231
Memo misses: 6552
combinations_plain is from latkin's answer.
combinations_plain2 exposes the recursive memoization step explicitly.
combinations_cont2 adapts the recursive memoization function into one that memoizes the continuation results.
combinations_cont2 works by intercepting the result in the continuation before passing it on to the actual continuation. Subsequent calls on the same key provide a continuation and this continuation is fed the answer we intercepted originally.
This demonstrates that we are able to:
Memoize using continuation passing style.
Achieve similar (ish) performance characteristics to the vanilla memoized version.
I hope this clears things up a little. Sorry, my blog code snippet was incomplete (I think I might have lost it when reformatting recently).

Filter an array or list by consecutive pairs based on a matching rule

This is probably trivial, and I do have a solution but I'm not happy with it. Somehow, (much) simpler forms don't seem to work and it gets messy around the corner cases (either first, or last matching pairs in a row).
To keep it simple, let's define the matching rule as any two or more numbers that have a difference of two. Example:
> filterTwins [1; 2; 4; 6; 8; 10; 15; 17]
val it : int list = [2; 4; 6; 8; 10; 15; 17]
The code I currently use is this, which just feels sloppy and overweight:
let filterTwins list =
let func item acc =
let prevItem, resultList = acc
match prevItem, resultList with
| 0, []
-> item, []
| var, [] when var - 2 = item
-> item, item::var::resultList
| var, hd::tl when var - 2 = item && hd <> var
-> item, item::var::resultList
| var, _ when var - 2 = item
-> item, item::resultList
| _
-> item, resultList
List.foldBack func list (0, [])
|> snd
I intended my own original exercise to experiment with List.foldBack, large lists and parallel programming (which went well) but ended up messing with the "easy" part...
Guide through the answers
Daniel's last, 113 characters*, easy to follow, slow
Kvb's 2nd, 106 characters* (if I include the function), easy, but return value requires work
Stephen's 2nd, 397 characters*, long winded and comparably complex, but fastest
Abel's, 155 characters*, based on Daniel's, allows duplicates (this wasn't a necessity, btw) and is relatively fast.
There were more answers, but the above were the most distinct, I believe. Hope I didn't hurt anybody's feelings by accepting Daniel's answer as solution: each and every one solution deserves to be the selected answer(!).
* counting done with function names as one character
Would this do what you want?
let filterTwins l =
let rec filter l acc flag =
match l with
| [] -> List.rev acc
| a :: b :: rest when b - 2 = a ->
filter (b::rest) (if flag then b::acc else b::a::acc) true
| _ :: t -> filter t acc false
filter l [] false
This is terribly inefficient, but here's another approach using more built-in functions:
let filterTwinsSimple l =
l
|> Seq.pairwise
|> Seq.filter (fun (a, b) -> b - 2 = a)
|> Seq.collect (fun (a, b) -> [a; b])
|> Seq.distinct
|> Seq.toList
Maybe slightly better:
let filterTwinsSimple l =
seq {
for (a, b) in Seq.pairwise l do
if b - 2 = a then
yield a
yield b
}
|> Seq.distinct
|> Seq.toList
How about this?
let filterPairs f =
let rec filter keepHead = function
| x::(y::_ as xs) when f x y -> x::(filter true xs)
| x::xs ->
let rest = filter false xs
if keepHead then x::rest else rest
| _ -> []
filter false
let test = filterPairs (fun x y -> y - x = 2) [1; 2; 4; 6; 8; 10; 15; 17]
Or if all of your list's items are unique, you could do this:
let rec filterPairs f s =
s
|> Seq.windowed 2
|> Seq.filter (fun [|a;b|] -> f a b)
|> Seq.concat
|> Seq.distinct
let test = filterPairs (fun x y -> y - x = 2) [1; 2; 4; 6; 8; 10; 15; 17]
EDIT
Or here's another alternative which I find elegant. First define a function for breaking a list into a list of groups of consecutive items satisfying a predicate:
let rec groupConsec f = function
| [] -> []
| x::(y::_ as xs) when f x y ->
let (gp::gps) = groupConsec f xs
(x::gp)::gps
| x::xs -> [x]::(groupConsec f xs)
Then, build your function by collecting all results back together, discarding any singletons:
let filterPairs f =
groupConsec f
>> List.collect (function | [_] -> [] | l -> l)
let test = filterPairs (fun x y -> y - x = 2) [1; 2; 4; 6; 8; 10; 15; 17]
The following solution is in the spirit of your own, but I use a discriminate union to encapsulate aspects of the algorithm and reign in the madness a bit:
type status =
| Keep of int
| Skip of int
| Tail
let filterTwins xl =
(Tail, [])
|> List.foldBack
(fun cur (prev, acc) ->
match prev with
| Skip(prev) when prev - cur = 2 -> (Keep(cur), cur::prev::acc)
| Keep(prev) when prev - cur = 2 -> (Keep(cur), cur::acc)
| _ -> (Skip(cur), acc))
xl
|> snd
Here's another solution which uses a similar discriminate union strategy as my other answer but it works on sequences lazily so you can watch those twin (primes?) roll in as they come:
type status =
| KeepTwo of int * int
| KeepOne of int
| SkipOne of int
| Head
let filterTwins xl =
let xl' =
Seq.scan
(fun prev cur ->
match prev with
| KeepTwo(_,prev) | KeepOne prev when cur - prev = 2 ->
KeepOne cur
| SkipOne prev when cur - prev = 2 ->
KeepTwo(prev,cur)
| _ ->
SkipOne cur)
Head
xl
seq {
for x in xl' do
match x with
| KeepTwo(a,b) -> yield a; yield b
| KeepOne b -> yield b
| _ -> ()
}
for completeness sake, I'll answer this with what I eventually came up with, based on the friendly suggestions in this thread.
The benefits of this approach are that it doesn't need Seq.distinct, which I believe is an improvement as it allows for duplicates. However, it still needs List.rev which doesn't make it the fastest. Nor is it the most succinct code (see comparison of solution in question itself).
let filterTwins l =
l
|> Seq.pairwise
|> Seq.fold (fun a (x, y) ->
if y - x = 2 then (if List.head a = x then y::a else y::x::a)
else a) [0]
|> List.rev
|> List.tail

Resources