How do I make this Guid-to-hash function more concise? - algorithm

I came up with a Guid-to-hash function in F# as shown below, but I think it is too verbose. How can I make it more concise?
let digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
let nbase = bigint digits.Length
let zero = bigint.Zero
let hash (g: System.Guid) =
(g.ToByteArray(), [| 0x00uy |])
||> Array.append
|> bigint
|> Array.unfold (fun d ->
if d = zero then
None
else
let (n, r) = bigint.DivRem(d, nbase)
Some (r, n))
|> Array.rev
|> Array.skipWhile ((=) zero)
|> Array.map (fun b -> digits.[int b])
|> System.String

There's a little bit of cleanup you can do like so:
let hash (g: System.Guid) =
[| yield! g.ToByteArray(); 0x00uy |]
|> bigint
|> Array.unfold (fun d ->
if d = zero then
None
else
let (n, r) = bigint.DivRem(d, nbase)
Some (r, n))
|> Array.choose (fun b -> if b = zero then None else Some digits[int b])
|> System.String
But in general I'm not sure how to make this more succinct based on how I understand you want your hash function to operate.
That being said, if I wanted to hash a guid I'd just use the built-in hash function.

I wouldn't call this a hash function, as its output is usually fixed length and does not uniquely identify keys (but still has good uniformity over the key space). It's more like a conversion of the data type nigint to and from other data types, System.Guid and string (with an encoding added).
For conciseness, it appears that recursion with a list accumulator is quite suitable, as it does also the reversal to get the most significant character into the first position.
module Bigint =
let toString alphabet input =
let nbase = bigint(String.length alphabet)
let rec aux acc x =
if x = 0I then acc
else aux (alphabet.[int(x % nbase)]::acc) (x / nbase)
System.String(Array.ofList(aux [] input))
let fromString alphabet (input : string) =
let nbase = bigint(String.length alphabet)
let d = alphabet |> Seq.mapi (fun i c -> c, bigint i) |> dict
input.ToCharArray()
|> Array.rev
|> Array.mapi (fun i c -> pown nbase i * d.[c])
|> Array.sum
let fromGuid (g : System.Guid) =
bigint[| yield! g.ToByteArray(); yield 0uy |]
let toGuid (bi : bigint) =
let b = bi.ToByteArray()
if Array.length b > 16 then b.[..15]
else Array.append (Array.zeroCreate (16 - Array.length b)) b
|> fun a -> System.Guid a
The output is variable length, the zero Guid will give you an empty string. Testing:
let alphabet2 = "01"
Bigint.toString alphabet2 (bigint 0x181) // 385
// val it : System.String = "110000001"
|> Bigint.fromString alphabet2
// val it : System.Numerics.BigInteger = 385
let alphabet16 =
System.String(Array.concat[|[|'0'..'9'|];[|'A'..'F'|]|])
Bigint.toString alphabet16 (bigint 0x181) // 385
// val it : System.String = "181"
|> Bigint.fromString alphabet16
// val it : System.Numerics.BigInteger = 385
let alphabet62 =
System.String(Array.concat[[|'0'..'9'|];[|'A'..'Z'|];[|'a'..'z'|]])
let guid0 = System.Guid()
// val guid : System.Guid = 00000000-0000-0000-0000-000000000000
Bigint.fromGuid guid0
|> Bigint.toString alphabet62
// val it : System.String = ""
|> Bigint.fromString alphabet62
// val it : System.Numerics.BigInteger = 0
|> Bigint.toGuid
// val guid : System.Guid = 00000000-0000-0000-0000-000000000000
let guid1 = System.Guid.NewGuid()
// val guid1 : System.Guid = bf9f89db-e307-4dcb-b734-a4bbb61a8365
Bigint.fromGuid guid1
|> Bigint.toString alphabet62
// val it : System.String = "35Y8fherqxCBOydJ1VN9UR"
|> Bigint.fromString alphabet62
// val it : System.Numerics.BigInteger =
// 134932760282992047737587898291171854811
|> Bigint.toGuid
// val it : System.Guid = bf9f89db-e307-4dcb-b734-a4bbb61a8365
If you want the conversion with a fixed length, Base64 encoding might be an alternative:
System.Convert.ToBase64String(guid1.ToByteArray()).[..21]
// val it : string = "24mfvwfjy023NKS7thqDZQ"
|> fun s -> System.Guid(System.Convert.FromBase64String(s + "=="))
// val it : System.Guid = bf9f89db-e307-4dcb-b734-a4bbb61a8365

Related

F# - Algorithm and strings

Let's says I have a string of a length N that contains only 0 or 1. I want to split that string in multiples strings and each string should contains only one digit.
Example:
00011010111
Should be split into:
000
11
0
1
0
111
The only solution I can think of if using a for loop with a string builder (Written in pseudo code below, more c# like sorry):
result = new list<string>
tmpChar = ""
tmpString = ""
for each character c in MyString
if tmpchar != c
if tmpsString != ""
result.add tmpString
tmpString = ""
endIf
tmpchar = c
endIf
tmpString += tmpChar
endFor
Do you have any other solution and maybe a clever solution that use a more functional approach?
I think Seq.scan would be a good fit for this, this is a very procedural problem in nature, preserving the order like that. But here is code that I believe does what you are asking.
"00011010111"
|> Seq.scan (fun (s, i) x ->
match s with
| Some p when p = x -> Some x, i
| _ -> Some x, i + 1 ) (None, 0)
|> Seq.countBy id
|> Seq.choose (function
| (Some t, _), n -> Some(t, n)
| _ -> None )
|> Seq.toList
Perhaps something along the lines of:
let result =
let rec groupWhileSame xs result =
match xs with
| a when a |> Seq.isEmpty -> result
| _ ->
let head = xs |> Seq.head
let str = xs |> Seq.takeWhile ((=) head)
let rest = xs |> Seq.skipWhile ((=) head)
groupWhileSame rest (Seq.append result [str])
groupWhileSame (myStr) []
Seq.fold (fun (acc:(string list)) x ->
match acc with
| y::rst when y.StartsWith(string x) -> (string x) + y::rst
| _ -> (string x)::acc)
[]
"00011010111"
Consider this function (which is generic):
let chunk s =
if Seq.isEmpty s then []
else
let rec chunk items chunks =
if Seq.isEmpty items then chunks
else
let chunks' =
match chunks with
| [] -> [(Seq.head items, 1)]
| x::xs ->
let c,n = x in let c' = Seq.head items in
if c = c' then (c, n + 1) :: xs else (c', 1) :: x :: xs
chunk (Seq.tail items) chunks'
chunk s [] |> List.rev
It returns a list of tuples, where each tuple represents an item and its repetitions.
So
"00011010111" |> Seq.toList |> chunk
actually returns
[('0', 3); ('1', 2); ('0', 1); ('1', 1); ('0', 1); ('1', 3)]
Basically, we're doing run length encoding (which is admittedly a bit wasteful in the case of your example string).
To get the list of strings that you want, we use code like following:
"00011010111"
|> Seq.toList
|> chunk
|> List.map (fun x -> let c,n = x in new string(c, n))
Here's a working version of OP's proposal with light syntax:
let chunk (s: string) =
let result = System.Collections.Generic.List<string>()
let mutable tmpChar = ""
let mutable tmpString = ""
for c in s do
if tmpChar <> string c then
if tmpString <> "" then
result.Add tmpString
tmpString <- ""
tmpChar <- string c
tmpString <- tmpString + tmpChar
result.Add tmpString
result
No attempt was made to follow a functional style.

Boggle - count all possible paths on a N*N grid. Performance

While reading up this question, I wondered why no one would "simply" iterate all the possible paths on the boggle grid and have the word-tries follow and then cancel the path if there is no match in the word-trie. Cannot be that many paths on a tiny 4 by 4 grid, right? How many paths are there? So I set out to code a path-counter function in F#. The results yield what no one stated on that other page: Way more paths on the grid than I would have guessed (more paths than words in the word-set, actually).
While all that is pretty much the back story to my question, the code I ended up with was running rather slow and I found that I could not give good answers to a few aspects of the code. So here, the code first, then below it, you will find points which I think deserve explanations...
let moves n state square =
let allSquares = [0..n*n-1] |> Set.ofList
let right = Set.difference allSquares (Set.ofList [n-1..n..n*n])
let left = Set.difference allSquares (Set.ofList [0..n..n*n-1])
let up = Set.difference allSquares (Set.ofList [0..n-1])
let down = Set.difference allSquares (Set.ofList [n*n-n..n*n-1])
let downRight = Set.intersect right down
let downLeft = Set.intersect left down
let upRight = Set.intersect right up
let upLeft = Set.intersect left up
let appendIfInSet se v res =
if Set.contains square se then res # v else res
[]
|> appendIfInSet right [square + 1]
|> appendIfInSet left [square - 1]
|> appendIfInSet up [square - n]
|> appendIfInSet down [square + n]
|> appendIfInSet downRight [square + n + 1]
|> appendIfInSet downLeft [square + n - 1]
|> appendIfInSet upRight [square - n + 1]
|> appendIfInSet upLeft [square - n - 1]
|> List.choose (fun s -> if ((uint64 1 <<< s) &&& state) = 0UL then Some s else None )
let block state square =
state ||| (uint64 1 <<< square)
let countAllPaths n lmin lmax =
let mov = moves n // line 30
let rec count l state sq c =
let state' = block state sq
let m = mov state' sq
match l with
| x when x <= lmax && x >= lmin ->
List.fold (fun acc s -> count (l+1) state' s acc) (c+1) m
| x when x < lmin ->
List.fold (fun acc s -> count (l+1) state' s acc) (c) m
| _ ->
c
List.fold (fun acc s -> count 0 (block 0UL s) s acc) 0 [0..n*n-1]
[<EntryPoint>]
let main args =
printfn "%d: %A" (Array.length args) args
if 3 = Array.length args then
let n = int args.[0]
let lmin = int args.[1]
let lmax = int args.[2]
printfn "%d %d %d -> %d" n lmin lmax (countAllPaths n lmin lmax)
else
printfn "usage: wordgames.exe n lmin lmax"
0
In line 30, I curried the moves function with the first argument, hoping that maybe code optimization would benefit from it. Maybe optimizing the 9 sets I create in move which are only a function of n. After all, they need not be generated over and over again, right? On the other hand, I would not really bet on it actually happening.
So, question #1 is: How could I enforce this optimization in an as little code bloating way as possible? (I could of course create a type with 9 members and then an array of that type for each possible n and then do a look up table like usage of the pre-computed sets but that would be code bloat in my opinion).
Many sources hint that parallel folds are considered critical. How could I create a parallel version of the counting function (which runs on multiple cores)?
Does anyone have clever ideas how to speed this up? Maybe some pruning or memoization etc?
At first, when I ran the function for n=4 lmin=3 lmax=8 I thought it takes so long because I ran it in fsi. But then I compiled the code with -O and it still took about the same time...
UPDATE
While waiting for input from you guys, I did the code bloated manual optimization version (runs much faster) and then found a way to make it run on multiple cores.
All in all those 2 changes yielded about a speed up by a factor of 30. Here the (bloated) version I came up with (still looking for a way to avoid the bloat):
let squareSet n =
let allSquares = [0..n*n-1] |> Set.ofList
let right = Set.difference allSquares (Set.ofList [n-1..n..n*n])
let left = Set.difference allSquares (Set.ofList [0..n..n*n-1])
let up = Set.difference allSquares (Set.ofList [0..n-1])
let down = Set.difference allSquares (Set.ofList [n*n-n..n*n-1])
let downRight = Set.intersect right down
let downLeft = Set.intersect left down
let upRight = Set.intersect right up
let upLeft = Set.intersect left up
[|right;left;up;down;upRight;upLeft;downRight;downLeft|]
let RIGHT,LEFT,UP,DOWN = 0,1,2,3
let UPRIGHT,UPLEFT,DOWNRIGHT,DOWNLEFT = 4,5,6,7
let squareSets =
[|Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;|]
::
[ for i in 1..8 do
yield squareSet i
]
|> Array.ofList
let moves n state square =
let appendIfInSet se v res =
if Set.contains square se then res # v else res
[]
|> appendIfInSet squareSets.[n].[RIGHT] [square + 1]
|> appendIfInSet squareSets.[n].[LEFT] [square - 1]
|> appendIfInSet squareSets.[n].[UP] [square - n]
|> appendIfInSet squareSets.[n].[DOWN] [square + n]
|> appendIfInSet squareSets.[n].[DOWNRIGHT] [square + n + 1]
|> appendIfInSet squareSets.[n].[DOWNLEFT] [square + n - 1]
|> appendIfInSet squareSets.[n].[UPRIGHT] [square - n + 1]
|> appendIfInSet squareSets.[n].[UPLEFT] [square - n - 1]
|> List.choose (fun s -> if ((uint64 1 <<< s) &&& state) = 0UL then Some s else None )
let block state square =
state ||| (uint64 1 <<< square)
let countAllPaths n lmin lmax =
let mov = moves n
let rec count l state sq c =
let state' = block state sq
let m = mov state' sq
match l with
| x when x <= lmax && x >= lmin ->
List.fold (fun acc s -> count (l+1) state' s acc) (c+1) m
| x when x < lmin ->
List.fold (fun acc s -> count (l+1) state' s acc) (c) m
| _ ->
c
//List.fold (fun acc s -> count 0 (block 0UL s) s acc) 0 [0..n*n-1]
[0..n*n-1]
|> Array.ofList
|> Array.Parallel.map (fun start -> count 0 (block 0UL start) start 0)
|> Array.sum
[<EntryPoint>]
let main args =
printfn "%d: %A" (Array.length args) args
if 3 = Array.length args then
let n = int args.[0]
let lmin = int args.[1]
let lmax = int args.[2]
printfn "%d %d %d -> %d" n lmin lmax (countAllPaths n lmin lmax)
else
printfn "usage: wordgames.exe n lmin lmax"
0
As for the non-optimization of the generation of sets.
The second version posted in the update to the question showed, that this is actually the case (not optimized by compiler) and it yielded a significant speed up. The final version (posted below in this answer) carries that approach even further and speeds up the path counting (and the solving of a boggle puzzle) even further.
Combined with parallel execution on multiple cores, the initially really slow (maybe 30s) version could be sped up to only around 100ms for the n=4 lmin=3 lmax=8 case.
For n=6 class of problems, the parallel and hand tuned implementation solves a puzzle in around 60ms on my machine. It makes sense, that this is faster than the path counting, as the word list probing (used a dictionary with around 80000 words) along with the dynamic programming approach pointed out by #GuyCoder renders the solution of the puzzle a less complex problem than the (brute force) path counting.
Lesson learned
The f# compiler does not seem to be all too mystical and magical if it comes to code optimizations. Hand tuning is worth the effort if performance is really required.
Turning a single threaded recursive search function into a parallel (concurrent) function was not really hard in this case.
The final version of the code
Compiled with:
fsc --optimize+ --tailcalls+ wordgames.fs
(Microsoft (R) F# Compiler version 14.0.23413.0)
let wordListPath = #"E:\temp\12dicts-6.0.2\International\3of6all.txt"
let acceptableWord (s : string) : bool =
let s' = s.Trim()
if s'.Length > 2
then
if System.Char.IsLower(s'.[0]) && System.Char.IsLetter(s'.[0]) then true
else false
else
false
let words =
System.IO.File.ReadAllLines(wordListPath)
|> Array.filter acceptableWord
let squareSet n =
let allSquares = [0..n*n-1] |> Set.ofList
let right = Set.difference allSquares (Set.ofList [n-1..n..n*n])
let left = Set.difference allSquares (Set.ofList [0..n..n*n-1])
let up = Set.difference allSquares (Set.ofList [0..n-1])
let down = Set.difference allSquares (Set.ofList [n*n-n..n*n-1])
let downRight = Set.intersect right down
let downLeft = Set.intersect left down
let upRight = Set.intersect right up
let upLeft = Set.intersect left up
[|right;left;up;down;upRight;upLeft;downRight;downLeft|]
let RIGHT,LEFT,UP,DOWN = 0,1,2,3
let UPRIGHT,UPLEFT,DOWNRIGHT,DOWNLEFT = 4,5,6,7
let squareSets =
[|Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;Set.empty;|]
::
[ for i in 1..8 do
yield squareSet i
]
|> Array.ofList
let movesFromSquare n square =
let appendIfInSet se v res =
if Set.contains square se then v :: res else res
[]
|> appendIfInSet squareSets.[n].[RIGHT] (square + 1)
|> appendIfInSet squareSets.[n].[LEFT] (square - 1)
|> appendIfInSet squareSets.[n].[UP] (square - n)
|> appendIfInSet squareSets.[n].[DOWN] (square + n)
|> appendIfInSet squareSets.[n].[DOWNRIGHT] (square + n + 1)
|> appendIfInSet squareSets.[n].[DOWNLEFT] (square + n - 1)
|> appendIfInSet squareSets.[n].[UPRIGHT] (square - n + 1)
|> appendIfInSet squareSets.[n].[UPLEFT] (square - n - 1)
let lutMovesN n =
Array.init n (fun i -> if i > 0 then Array.init (n*n-1) (fun j -> movesFromSquare i j) else Array.empty)
let lutMoves =
lutMovesN 8
let moves n state square =
let appendIfInSet se v res =
if Set.contains square se then v :: res else res
lutMoves.[n].[square]
|> List.filter (fun s -> ((uint64 1 <<< s) &&& state) = 0UL)
let block state square =
state ||| (uint64 1 <<< square)
let countAllPaths n lmin lmax =
let mov = moves n
let rec count l state sq c =
let state' = block state sq
let m = mov state' sq
match l with
| x when x <= lmax && x >= lmin ->
List.fold (fun acc s -> count (l+1) state' s acc) (c+1) m
| x when x < lmin ->
List.fold (fun acc s -> count (l+1) state' s acc) (c) m
| _ ->
c
//List.fold (fun acc s -> count 0 (block 0UL s) s acc) 0 [0..n*n-1]
[|0..n*n-1|]
|> Array.Parallel.map (fun start -> count 0 (block 0UL start) start 0)
|> Array.sum
//printfn "%d " (words |> Array.distinct |> Array.length)
let usage() =
printfn "usage: wordgames.exe [--gen n count problemPath | --count n lmin lmax | --solve problemPath ]"
let rng = System.Random()
let genProblem n (sb : System.Text.StringBuilder) =
let a = Array.init (n*n) (fun _ -> char (rng.Next(26) + int 'a'))
sb.Append(a) |> ignore
sb.AppendLine()
let genProblems nproblems n (sb : System.Text.StringBuilder) : System.Text.StringBuilder =
for i in 1..nproblems do
genProblem n sb |> ignore
sb
let solve n (board : System.String) =
let ba = board.ToCharArray()
let testWord (w : string) : bool =
let testChar k sq = (ba.[sq] = w.[k])
let rec testSquare state k sq =
match k with
| 0 -> testChar 0 sq
| x ->
if testChar x sq
then
let state' = block state x
moves n state' x
|> List.exists (testSquare state' (x-1))
else
false
[0..n*n-1]
|> List.exists (testSquare 0UL (String.length w - 1))
words
|> Array.splitInto 32
|> Array.Parallel.map (Array.filter testWord)
|> Array.concat
[<EntryPoint>]
let main args =
printfn "%d: %A" (Array.length args) args
let nargs = Array.length args
let sw = System.Diagnostics.Stopwatch()
match nargs with
| x when x >= 2 ->
match args.[0] with
| "--gen" ->
if nargs = 4
then
let n = int args.[1]
let nproblems = int args.[2]
let outpath = args.[3]
let problems = genProblems nproblems n (System.Text.StringBuilder())
System.IO.File.WriteAllText (outpath,problems.ToString())
0
else
usage()
0
| "--count" ->
if nargs = 4
then
let n = int args.[1]
let lmin = int args.[2]
let lmax = int args.[3]
sw.Start()
let count = countAllPaths n lmin lmax
sw.Stop()
printfn "%d %d %d -> %d (took: %d)" n lmin lmax count (sw.ElapsedMilliseconds)
0
else
usage ()
0
| "--solve" ->
if nargs = 2
then
let problems = System.IO.File.ReadAllLines(args.[1])
problems
|> Array.iter
(fun (p : string) ->
let n = int (sqrt (float (String.length p)))
sw.Reset()
sw.Start()
let found = solve n p
sw.Stop()
printfn "%s\n%A\n%dms" p found (sw.ElapsedMilliseconds)
)
0
else
usage ()
0
| _ ->
usage ()
0
| _ ->
usage ()
0

Return string and code optimisation in F#

How to modify below code to Return "string" so that returned output displayed on my MVC page and also would like to accept enteredChar from user.
Is there better way to do create this pyramid?
Current code:
let enteredChar = 'F' // As Interactive window doesn't support to Read Input
let mylist = ['A'..enteredChar]
let mylistlength = mylist |> List.length
let myfunc i x tlist1 =
(for j = 0 to mylistlength-i-2 do printf "%c" ' ')
let a1 = [for p in tlist1 do if p < x then yield p]
for p in a1 do printf "%c" p
printf "%c" x
let a2 = List.rev a1
for p in a2 do printf "%c" p
printfn "%s" " "
mylist |> List.iteri(fun i x -> myfunc i x mylist)
Output:
A
ABA
ABCBA
ABCDCBA
ABCDEDCBA
ABCDEFEDCBA
A few small optimizations could be:
Use StringBuilder instead of printf which is quite slow with long strings.
Use Array instead of List since Array works better with String.
Here is a version producing a pyramid string, which is kept closely to your code:
open System
open System.Text
let generateString c =
let sb = StringBuilder()
let generate i x arr =
String.replicate (Array.length arr-i-1) " " |> sb.Append |> ignore
let a1 = Array.filter (fun p -> p < x) arr
String(a1) |> sb.Append |> ignore
sb.Append x |> ignore
String(Array.rev a1) |> sb.Append |> ignore
sb.AppendLine " " |> ignore
let arr = [|'A'..c|]
arr |> Array.iteri(fun i x -> generate i x arr)
sb.ToString()
generateString 'F' |> printfn "%s"
As an alternative to Daniel's solution, you can achieve what you want with minimal changes to the code logic. Instead of using printf that writes the output to the console, you can use Printf.bprintf which writes the output to a specified StringBuilder. Then you can simply get the resulting string from the StringBuilder.
The modified function will look like this. I added parameter str and replaced printf with Printf.bprintf str (and printfn with bprintf together with additional \n char):
let myfunc i x tlist1 str =
(for j = 0 to mylistlength-i-2 do Printf.bprintf str "%c" ' ')
let a1 = [for p in tlist1 do if p < x then yield p]
for p in a1 do Printf.bprintf str "%c" p
Printf.bprintf str "%c" x
let a2 = List.rev a1
for p in a2 do Printf.bprintf str "%c" p
Printf.bprintf str "%s\n" " "
To call the function, you first create StringBuilder and then pass it to myfunc in every call. At the end, you can get the result using ToString method:
let str = StringBuilder()
mylist |> List.iteri(fun i x -> myfunc i x mylist str)
str.ToString()
I think Daniel's solution looks nicer, but this is the most direct way to tunr your printing code into a string-building code (and it can be done, pretty much, using Search & Replace).
If I understand your question (this likely belongs on Code Review) here's one way to rewrite your function:
let showPyramid (output: TextWriter) lastChar =
let chars = [|'A' .. lastChar|]
let getRowChars n =
let rec loop acc i =
[|
if i < n then let c = chars.[i] in yield c; yield! loop (c::acc) (i+1)
else yield! List.tail acc
|]
loop [] 0
let n = chars.Length
for r = 1 to n do
output.WriteLine("{0}{1}{0}", String(' ', n - r), String(getRowChars r))
Example
showPyramid Console.Out 'F'
or, to output to a string
use output = new StringWriter()
showPyramid output 'F'
let pyramid = output.ToString()
EDIT
After seeing Tomas' answer I realized I skipped over "return a string" in your question. I updated the code and added examples to show how you could do that.
let pyramid (ch:char) =
let ar = [| 'A'..ch |]
let len = ar.Length
Array.mapi
(fun i ch ->
let ar = ar.[0..i]
String.replicate (len - i - 1) " " + new string(ar) + new string((Array.rev ar).[1..]))
ar
|> String.concat "\n"
pyramid 'F' |> printfn "%s"
Here's another approach that seems to be a good demonstration of functional composition. I bet it's the shortest solution among the answers here. :)
let charsToString = Seq.map string >> String.concat String.Empty
let pyramid lastChar =
let src = '-'::['A'..lastChar] |> List.toArray
let len = Array.length src - 1
fun row col -> row-abs(col-len+1)+1 |> max 0 |> Array.get src // (1)
>> Seq.init (len*2-1) >> charsToString // (2)
|> Seq.init len // (3)
pyramid 'X' |> Seq.iter (printfn "%s")
First, we generate an unusual array of initial data. Its element [0] contains a space or whatever separator you want to have; I preferred dash (-) for debugging purposes.
The (1) line makes a function that calculates what character to be placed. The result of row-abs(col-len+1)+1 can be either positive (and there is a char to be placed) or zeronegative, and there should be a space. Note that there is no if statement: it is hidden within the max function;
The (2) line composes a function int -> string for generating an individual row;
The (3) line passes the function above as argument for sequence initializer.
The three lines can be written in a more verbose way:
let genCell row col = row-abs(col-len+1)+1 |> max 0 |> Array.get src
let genRow = genCell >> Seq.init (len*2-1) >> charsToString
Seq.init len genRow
Note genRow needs no formal argument due to functional composition: the argument is being bound into genCell, returning a function of a single argument, exactly what Seq.init needs.

Why is my Trie lookup slower than that of the standard F# Map's?

So, I just ported the Trie from OCaml. Unfortunately, it runs slower than the standard Map in terms of tryFind. I don't understand this - the trie seems like it should be faster. Is F#'s code libraries built in some special way as to make them faster than the code that the user typically deploys?
Here's the code -
[<RequireQualifiedAccess>]
module Trie
type Node<'k, 'v when 'k : comparison> =
{ TrieMap : Map<'k, Node<'k, 'v>>
TrieKvp : ('k list * 'v) option }
member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty
let inline make map kvp =
{ TrieMap = map
TrieKvp = kvp }
let inline makeEmpty () : Node<'k, 'v> = make Map.empty None
let inline isEmpty (node : Node<'k, 'v>) = node.IsEmpty
let rec tryFind (key : 'k list) node =
if key.IsEmpty then
match node.TrieKvp with
| Some (_, value) -> Some value
| None -> None
else
let keyHead = key.Head
let keyTail = key.Tail
let optSubNode = Map.tryFind keyHead node.TrieMap
match optSubNode with
| Some subNode -> tryFind keyTail subNode
| None -> None
let inline containsKey key node =
(tryFind key node).IsSome
let rec addInternal (key : 'k list) value node =
if key.IsEmpty then make node.TrieMap (Some (key, value))
else
let keyHead = key.Head
let keyTail = key.Tail
let newTrie =
match Map.tryFind keyHead node.TrieMap with
| Some subTrie -> subTrie
| None -> makeEmpty ()
let newTrie2 = addInternal keyTail value newTrie
make (Map.add keyHead newTrie2 node.TrieMap) node.TrieKvp
let inline add key value node =
addInternal key value node
let rec addMany kvps node =
if Seq.isEmpty kvps then node
else
let kvpHead = Seq.head kvps
let kvpTail = Seq.skip 1 kvps
let newTrie = add (fst kvpHead) (snd kvpHead) node
addMany kvpTail newTrie
let inline ofList kvps =
addMany kvps (makeEmpty ())
let inline ofListBy by kvps =
let pairs = List.map by kvps
ofList pairs
let rec foldInternal folder rev node state =
match node.TrieKvp with
| Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value
| None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap
let inline fold folder state node =
foldInternal folder [] node state
let rec map (mapper : 'k list -> 'v -> 'a) (node : Node<'k, 'v>) : Node<'k, 'a> =
match node.TrieKvp with
| Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value))
| None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None
let inline toValueList node =
fold (fun state _ value -> value :: state) [] node
let inline singleton (key, value) =
add key value (makeEmpty ())
Here's a performance test that Jon Harrop provided that I find adequate for measuring improvements -
let xs = Array.init 1000000 (fun i -> [i])
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Trie.makeEmpty()
for i=0 to xs.Length-1 do
t <- Trie.add xs.[i] xs.[i] t
printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Trie.tryFind xs.[i])
printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Map.empty
for i=0 to xs.Length-1 do
t <- Map.add xs.[i] xs.[i] t
printfn "Map took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Map.tryFind xs.[i])
printfn "Map took %fs to search" timer.Elapsed.TotalSeconds
NOTE: if you have a faster lookup data structure in mind, please note that I need a persistent data structure.
Unfortunately, it runs slower than the standard Map in terms of tryFind. I don't understand this - the trie seems like it should be faster.
A quick benchmark here suggests that your trie is already faster than Map for at least simple case:
do
let n = 0
let xs = Array.init 1000000 (fun i -> [i])
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Trie.makeEmpty()
for i=0 to xs.Length-1 do
t <- Trie.add xs.[i] xs.[i] t
printfn "Trie took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Trie.tryFind xs.[i])
printfn "Trie took %fs to search" timer.Elapsed.TotalSeconds
let timer = System.Diagnostics.Stopwatch.StartNew()
let mutable t = Map.empty
for i=0 to xs.Length-1 do
t <- Map.add xs.[i] xs.[i] t
printfn "Map took %fs to build" timer.Elapsed.TotalSeconds
timer.Restart()
for _ in 1..100 do
for i=0 to xs.Length-1 do
ignore(Map.tryFind xs.[i])
printfn "Map took %fs to search" timer.Elapsed.TotalSeconds
I get 4s to build your Trie, 8.7s to build a Map and about 0.7 to search in both cases.
However, there is a lot of room for improvement in your implementation. I recently wrote an article about an optimized generic persistent hash trie implementation in F# that was published here.
Your later comments imply that you only want to use this to map over strings. If so, it would be vastly more efficient to specialize your trie for string keys.
EDIT
KVB suggested that I elaborate on the "room for improvement" so here's some feedback:
Use inline sparingly as an optimization and only on the basis of compelling performance measurements.
Make empty a value rather than a function.
Avoid List.head and List.tail whenever possible. Use pattern matching instead.
Avoid genericity when possible (e.g. hard-code for string keys in this case).
Alright, so after a little more thinking, I hypothesized that the real difference in performance is in the use of lists for keys as opposed to strings. Strings (and array) have much better cache coherency. So, I changed the key from a 'k list to a string and voila! Performance is now actually better than the Map in my application!
Here's the code -
[<RequireQualifiedAccess>]
module StringTrie
type Node<'v> =
{ TrieMap : Map<char, Node<'v>>
TrieKvp : (string * 'v) option }
member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty
let inline make map kvp =
{ TrieMap = map
TrieKvp = kvp }
let inline makeEmpty () : Node<'v> = make Map.empty None
let inline isEmpty (node : Node<'v>) = node.IsEmpty
let rec tryFindInternal (key : string) index node =
if key.Length = index then
match node.TrieKvp with
| Some (_, value) -> Some value
| None -> None
else
let optSubNode = Map.tryFind key.[index] node.TrieMap
match optSubNode with
| Some subNode -> tryFindInternal key (index + 1) subNode
| None -> None
let inline tryFind (key : string) node =
tryFindInternal key 0 node
let inline containsKey key node =
(tryFind key node).IsSome
let rec addInternal (key : string) index value node =
if key.Length = index then make node.TrieMap (Some (key, value))
else
let char = key.[index]
let newTrie =
match Map.tryFind char node.TrieMap with
| Some subTrie -> subTrie
| None -> makeEmpty ()
let newTrie2 = addInternal key (index + 1) value newTrie
make (Map.add char newTrie2 node.TrieMap) node.TrieKvp
let inline add key value node =
addInternal key 0 value node
let rec addMany kvps node =
if Seq.isEmpty kvps then node
else
let kvpHead = Seq.head kvps
let kvpTail = Seq.skip 1 kvps
let newTrie = add (fst kvpHead) (snd kvpHead) node
addMany kvpTail newTrie
let inline ofList kvps =
addMany kvps (makeEmpty ())
let inline ofListBy by kvps =
let pairs = List.map by kvps
ofList pairs
let rec foldInternal folder rev node state =
match node.TrieKvp with
| Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value
| None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap
let inline fold folder state node =
foldInternal folder [] node state
let rec map (mapper : string -> 'v -> 'a) (node : Node<'v>) : Node<'a> =
match node.TrieKvp with
| Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value))
| None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None
let inline toValueList node =
fold (fun state _ value -> value :: state) [] node
let inline singleton (key, value) =
add key value (makeEmpty ())
I also built a version that works for arrays in general and is also fast -
[<RequireQualifiedAccess>]
module ArrayTrie
type Node<'k, 'v when 'k : comparison> =
{ TrieMap : Map<'k, Node<'k, 'v>>
TrieKvp : ('k array * 'v) option }
member inline x.IsEmpty = x.TrieKvp.IsNone && x.TrieMap.IsEmpty
let inline make map kvp =
{ TrieMap = map
TrieKvp = kvp }
let inline makeEmpty () : Node<'k, 'v> = make Map.empty None
let inline isEmpty (node : Node<'k, 'v>) = node.IsEmpty
let rec tryFindInternal (key : 'k array) index node =
if key.Length = index then
match node.TrieKvp with
| Some (_, value) -> Some value
| None -> None
else
let optSubNode = Map.tryFind key.[index] node.TrieMap
match optSubNode with
| Some subNode -> tryFindInternal key (index + 1) subNode
| None -> None
let inline tryFind (key : 'k array) node =
tryFindInternal key 0 node
let inline containsKey key node =
(tryFind key node).IsSome
let rec addInternal (key : 'k array) index value node =
if key.Length = index then make node.TrieMap (Some (key, value))
else
let char = key.[index]
let newTrie =
match Map.tryFind char node.TrieMap with
| Some subTrie -> subTrie
| None -> makeEmpty ()
let newTrie2 = addInternal key (index + 1) value newTrie
make (Map.add char newTrie2 node.TrieMap) node.TrieKvp
let inline add key value node =
addInternal key 0 value node
let rec addMany kvps node =
if Seq.isEmpty kvps then node
else
let kvpHead = Seq.head kvps
let kvpTail = Seq.skip 1 kvps
let newTrie = add (fst kvpHead) (snd kvpHead) node
addMany kvpTail newTrie
let inline ofList kvps =
addMany kvps (makeEmpty ())
let inline ofListBy by kvps =
let pairs = List.map by kvps
ofList pairs
let rec foldInternal folder rev node state =
match node.TrieKvp with
| Some (_, value) -> folder (Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap) (List.rev rev) value
| None -> Map.fold (fun state key value -> foldInternal folder (key :: rev) value state) state node.TrieMap
let inline fold folder state node =
foldInternal folder [] node state
let rec map (mapper : 'k array -> 'v -> 'a) (node : Node<'k, 'v>) : Node<'k, 'a> =
match node.TrieKvp with
| Some (key, value) -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) (Some (key, mapper key value))
| None -> make (Map.map (fun _ value -> map mapper value) node.TrieMap) None
let inline toValueList node =
fold (fun state _ value -> value :: state) [] node
let inline singleton (key, value) =
add key value (makeEmpty ())
The only thing left that seem like it would improve performance is to get an internal pointer to the string and inc that rather than doing indexes over and over. This doesn't seem easy in F#, but seems at least possible for arrays in C#.
Why wouldn't it be? How about OCaml, is it any faster there? Since your Trie is implemented in terms of Map I would expect it to be slower than Map for at least some inputs. It can still perhaps outperform Map in some cases, for example when the size is very large.
Also, if your primary interest is lookup performance, why not freeze your Trie to use Dictionary-based nodes?

Filter an array or list by consecutive pairs based on a matching rule

This is probably trivial, and I do have a solution but I'm not happy with it. Somehow, (much) simpler forms don't seem to work and it gets messy around the corner cases (either first, or last matching pairs in a row).
To keep it simple, let's define the matching rule as any two or more numbers that have a difference of two. Example:
> filterTwins [1; 2; 4; 6; 8; 10; 15; 17]
val it : int list = [2; 4; 6; 8; 10; 15; 17]
The code I currently use is this, which just feels sloppy and overweight:
let filterTwins list =
let func item acc =
let prevItem, resultList = acc
match prevItem, resultList with
| 0, []
-> item, []
| var, [] when var - 2 = item
-> item, item::var::resultList
| var, hd::tl when var - 2 = item && hd <> var
-> item, item::var::resultList
| var, _ when var - 2 = item
-> item, item::resultList
| _
-> item, resultList
List.foldBack func list (0, [])
|> snd
I intended my own original exercise to experiment with List.foldBack, large lists and parallel programming (which went well) but ended up messing with the "easy" part...
Guide through the answers
Daniel's last, 113 characters*, easy to follow, slow
Kvb's 2nd, 106 characters* (if I include the function), easy, but return value requires work
Stephen's 2nd, 397 characters*, long winded and comparably complex, but fastest
Abel's, 155 characters*, based on Daniel's, allows duplicates (this wasn't a necessity, btw) and is relatively fast.
There were more answers, but the above were the most distinct, I believe. Hope I didn't hurt anybody's feelings by accepting Daniel's answer as solution: each and every one solution deserves to be the selected answer(!).
* counting done with function names as one character
Would this do what you want?
let filterTwins l =
let rec filter l acc flag =
match l with
| [] -> List.rev acc
| a :: b :: rest when b - 2 = a ->
filter (b::rest) (if flag then b::acc else b::a::acc) true
| _ :: t -> filter t acc false
filter l [] false
This is terribly inefficient, but here's another approach using more built-in functions:
let filterTwinsSimple l =
l
|> Seq.pairwise
|> Seq.filter (fun (a, b) -> b - 2 = a)
|> Seq.collect (fun (a, b) -> [a; b])
|> Seq.distinct
|> Seq.toList
Maybe slightly better:
let filterTwinsSimple l =
seq {
for (a, b) in Seq.pairwise l do
if b - 2 = a then
yield a
yield b
}
|> Seq.distinct
|> Seq.toList
How about this?
let filterPairs f =
let rec filter keepHead = function
| x::(y::_ as xs) when f x y -> x::(filter true xs)
| x::xs ->
let rest = filter false xs
if keepHead then x::rest else rest
| _ -> []
filter false
let test = filterPairs (fun x y -> y - x = 2) [1; 2; 4; 6; 8; 10; 15; 17]
Or if all of your list's items are unique, you could do this:
let rec filterPairs f s =
s
|> Seq.windowed 2
|> Seq.filter (fun [|a;b|] -> f a b)
|> Seq.concat
|> Seq.distinct
let test = filterPairs (fun x y -> y - x = 2) [1; 2; 4; 6; 8; 10; 15; 17]
EDIT
Or here's another alternative which I find elegant. First define a function for breaking a list into a list of groups of consecutive items satisfying a predicate:
let rec groupConsec f = function
| [] -> []
| x::(y::_ as xs) when f x y ->
let (gp::gps) = groupConsec f xs
(x::gp)::gps
| x::xs -> [x]::(groupConsec f xs)
Then, build your function by collecting all results back together, discarding any singletons:
let filterPairs f =
groupConsec f
>> List.collect (function | [_] -> [] | l -> l)
let test = filterPairs (fun x y -> y - x = 2) [1; 2; 4; 6; 8; 10; 15; 17]
The following solution is in the spirit of your own, but I use a discriminate union to encapsulate aspects of the algorithm and reign in the madness a bit:
type status =
| Keep of int
| Skip of int
| Tail
let filterTwins xl =
(Tail, [])
|> List.foldBack
(fun cur (prev, acc) ->
match prev with
| Skip(prev) when prev - cur = 2 -> (Keep(cur), cur::prev::acc)
| Keep(prev) when prev - cur = 2 -> (Keep(cur), cur::acc)
| _ -> (Skip(cur), acc))
xl
|> snd
Here's another solution which uses a similar discriminate union strategy as my other answer but it works on sequences lazily so you can watch those twin (primes?) roll in as they come:
type status =
| KeepTwo of int * int
| KeepOne of int
| SkipOne of int
| Head
let filterTwins xl =
let xl' =
Seq.scan
(fun prev cur ->
match prev with
| KeepTwo(_,prev) | KeepOne prev when cur - prev = 2 ->
KeepOne cur
| SkipOne prev when cur - prev = 2 ->
KeepTwo(prev,cur)
| _ ->
SkipOne cur)
Head
xl
seq {
for x in xl' do
match x with
| KeepTwo(a,b) -> yield a; yield b
| KeepOne b -> yield b
| _ -> ()
}
for completeness sake, I'll answer this with what I eventually came up with, based on the friendly suggestions in this thread.
The benefits of this approach are that it doesn't need Seq.distinct, which I believe is an improvement as it allows for duplicates. However, it still needs List.rev which doesn't make it the fastest. Nor is it the most succinct code (see comparison of solution in question itself).
let filterTwins l =
l
|> Seq.pairwise
|> Seq.fold (fun a (x, y) ->
if y - x = 2 then (if List.head a = x then y::a else y::x::a)
else a) [0]
|> List.rev
|> List.tail

Resources