Linearize an expression syntax tree - data-structures

I defined a recursive type like this
type t = | Param of int | Add of t * t
and it allows me to write symbolic expressions like this
let x = Param(1)
let y = Param(2)
let z = Add(x, y)
# z;;
- : t = Add (Param 1, Param 2)
or something slightly more complex
let rec fib n x y = match n with
| 0 -> (x, y)
| _ -> (fib (n - 1) y (Add(x, y)))
let z5 = fib 5 x y
# z5;;
- : t * t =
(Add (Add (Param 2, Add (Param 1, Param 2)),
Add (Add (Param 1, Param 2), Add (Param 2, Add (Param 1, Param 2)))),
Add (Add (Add (Param 1, Param 2), Add (Param 2, Add (Param 1, Param 2))),
Add (Add (Param 2, Add (Param 1, Param 2)),
Add (Add (Param 1, Param 2), Add (Param 2, Add (Param 1, Param 2))))))
Now I want to convert this recursive thing into a linear form in which each unique Param and Add wold appears only once -- something like this:
(Step 0, (Param 1))
(Step 1, (Param 2))
(Step 2, (Add (Step 0, Step 1))
(Step 3, (Add (Step 1, Step 2))
(Step 4, (Add (Step 2, Step 3))
...
What is this transformation called? And how could it be implemented?

I usually call it flattening or unfolding, some other common names (that are sometimes too specific) are three address code or A-normal form.
It is probably more natural to represent steps with assignments to variables, rather that with the Step notion, e.g.,
type var = int
type const = int
type stmt = Set var * const
type expr = Cst of int | Var of int | Add of expr * expr | ...
So that your flattened fib will look like
Set (0, (Add (Cst 1, Cst 2)))
Set (1, (Add (Cst 2, Var 0)))
Set (2, (Add (Var 0, Var 1)))
...
The flattening itself is quite easy to implement using a recursive function that has type int -> expr -> stmt list -> int * expr * stmt list, e.g., (untested),
let rec flatten v exprs stmts = match expr with
| Cst _ | Var _ as expr -> v,expr,stmts
| Add (x,y) ->
let v,x,stmts = flatten v x stmts in
let v,y,stmts = flatten v x stmts in
v+1,Var (v+1),(Set (v+1,Add(x,y)) :: stmts)
The v parameter is the fresh generator for variable names (which we represent just with integers).
Besides, if you would like to see how flattening works in real-world AST, here is an example.
Implementing hash-consing and flattening
First we need to define a hash-consed representation of our AST, e.g.,
type exp = Cst of int | Ref of int | Add of hexp * hexp
and hexp = {ref : int; exp : exp}
We index each hash-consed expression with a unique ordinal so that physically equal expressions will have the same ref number. We need this in order to be able to store expressions in ordered data structures like maps and sets (OCaml doesn't allow us to order values by their physical address, and it makes sense, since OCaml uses generational GC so the addresses of values change over time).
Now, let's write the hashcons function that will hash-cons an expression, e.g.,
let hashes = Hashtbl.create 100
let hashcons exp =
match Hashtbl.find_opt hashes exp with
| None ->
let ref = Hashtbl.length hashes + 1 in
Hashtbl.add hashes exp ref;
{ref; exp}
| Some ref -> {ref; exp}
Now we can write our fib function that uses the hashconsed representation,
let rec fib n x y = match n with
| 0 -> (x, y)
| _ -> fib (n - 1) y (hashcons (Add (x, y)))
let x = hashcons##Cst 1
let y = hashcons##Cst 2
let _,z = fib 999 x y
No big changes. Now, let's write the flatten function. But before that we need some representation for the program,
module Program = Map.Make(Int)
let set id x = Program.update id (function
| None -> Some x
| x -> x)
let rec get ref prog = match Program.find_opt ref prog with
| None -> None
| Some _ -> Some {ref; exp = Ref ref}
let (++) exp prog = {ref=exp.ref; exp=Ref exp.ref},
set exp.ref exp prog
Our program is a mapping from reference number (which acts both like a variable name and the place where variable is defined - we have a built-in SSA property in our representation), and finally, our flatten function is,
let unify {ref} exp prog =
{ref; exp=Ref ref},
set ref {ref; exp} prog
let rec flatten input prog =
match get input.ref prog with
| Some exp -> exp,prog
| None -> match input.exp with
| Cst _ | Ref _ -> input ++ prog
| Add (x,y) ->
let x,prog = flatten x prog in
let y,prog = flatten y prog in
unify input (Add (x,y)) prog
It is linear, because when we flatten a complex expression we use unify and store the flattened form of that expression under its id. So that the next time we see the same complex expression, we can easily extract its already flattened version from the program, using the integer key for lookup.
And the output is,
# let p = snd##flatten z Program.empty
val p : hexp Program.t = <abstr>
# print_program p;;
#1 := 1
#2 := 2
#3 := #1 + #2
#4 := #2 + #3
#5 := #3 + #4
#6 := #4 + #5
<snip>
#1001 := #999 + #1000
- : unit = ()
where the print_program function is defined as,
let rec pp_exp ppf {exp} = match exp with
| Cst x -> Format.fprintf ppf "%d" x
| Ref x -> Format.fprintf ppf "#%d" x
| Add (x,y) -> Format.fprintf ppf "%a + %a" pp_exp x pp_exp y
let print_program prog =
Program.to_seq prog |>
Seq.iter ## fun (id,exp) ->
Format.printf "#%d := %a#\n" id pp_exp exp
The flattening function works extremely fast, but as always it is not for free. The hash-consed version of fib is now much slower, as we have to hash-cons each expression and during it we have to lookup into a hash table and compare huge values structurally. The fib function is not exploiting the hash-consed representation of the expressions and it is possible to write a more efficient fib function.

Related

Make a century in OCaml

Here is quite a typical make a century problem.
We have a natural number list [1;2;3;4;5;6;7;8;9].
We have a list of possible operators [Some '+'; Some '*';None].
Now we create a list of operators from above possibilities and insert each operator into between each consecutive numbers in the number list and compute the value.
(Note a None b = a * 10 + b)
For example, if the operator list is [Some '+'; Some '*'; None; Some '+'; Some '+'; Some '+'; Some '+'; Some '+'], then the value is 1 + 2 * 34 + 5 + 6 + 7 + 8 + 9 = 104.
Please find all possible operator lists, so the value = 10.
The only way I can think of is brute-force.
I generate all possible operator lists.
Compute all possible values.
Then filter so I get all operator lists which produce 100.
exception Cannot_compute
let rec candidates n ops =
if n = 0 then [[]]
else
List.fold_left (fun acc op -> List.rev_append acc (List.map (fun x -> op::x) (candidates (n-1) ops))) [] ops
let glue l opl =
let rec aggr acc_l acc_opl = function
| hd::[], [] -> (List.rev (hd::acc_l), List.rev acc_opl)
| hd1::hd2::tl, None::optl -> aggr acc_l acc_opl (((hd1*10+hd2)::tl), optl)
| hd::tl, (Some c)::optl -> aggr (hd::acc_l) ((Some c)::acc_opl) (tl, optl)
| _ -> raise Cannot_glue
in
aggr [] [] (l, opl)
let compute l opl =
let new_l, new_opl = glue l opl in
let rec comp = function
| hd::[], [] -> hd
| hd::tl, (Some '+')::optl -> hd + (comp (tl, optl))
| hd1::hd2::tl, (Some '-')::optl -> hd1 + (comp ((-hd2)::tl, optl))
| hd1::hd2::tl, (Some '*')::optl -> comp (((hd1*hd2)::tl), optl)
| hd1::hd2::tl, (Some '/')::optl -> comp (((hd1/hd2)::tl), optl)
| _, _ -> raise Cannot_compute
in
comp (new_l, new_opl)
let make_century l ops =
List.filter (fun x -> fst x = 100) (
List.fold_left (fun acc x -> ((compute l x), x)::acc) [] (candidates ((List.length l)-1) ops))
let rec print_solution l opl =
match l, opl with
| hd::[], [] -> Printf.printf "%d\n" hd
| hd::tl, (Some op)::optl -> Printf.printf "%d %c " hd op; print_solution tl optl
| hd1::hd2::tl, None::optl -> print_solution ((hd1*10+hd2)::tl) optl
| _, _ -> ()
I believe my code is ugly. So I have the following questions
computer l opl is to compute using the number list and operator list. Basically it is a typical math evaluation. Is there any nicer implementation?
I have read Chapter 6 in Pearls of Functional Algorithm Design. It used some techniques to improve the performance. I found it really really obscurity and hard to understand. Anyone who read it can help?
Edit
I refined my code. Basically, I will scan the operator list first to glue all numbers where their operator is None.
Then in compute, if I meet a '-' I will simply negate the 2nd number.
A classic dynamic programming solution (which finds the = 104
solution instantly) that does not risk any problem with operators
associativity or precedence. It only returns a boolean saying whether
it's possible to come with the number; modifying it to return the
sequences of operations to get the solution is an easy but interesting
exercise, I was not motivated to go that far.
let operators = [ (+); ( * ); ]
module ISet = Set.Make(struct type t = int let compare = compare end)
let iter2 res1 res2 f =
res1 |> ISet.iter ## fun n1 ->
res2 |> ISet.iter ## fun n2 ->
f n1 n2
let can_make input target =
let has_zero = Array.fold_left (fun acc n -> acc || (n=0)) false input in
let results = Array.make_matrix (Array.length input) (Array.length input) ISet.empty in
for imax = 0 to Array.length input - 1 do
for imin = imax downto 0 do
let add n =
(* OPTIMIZATION: if the operators are known to be monotonous, we need not store
numbers above the target;
(Handling multiplication by 0 requires to be a bit more
careful, and I'm not in the mood to think hard about this
(I think one need to store the existence of a solution,
even if it is above the target), so I'll just disable the
optimization in that case)
*)
if n <= target && not has_zero then
results.(imin).(imax) <- ISet.add n results.(imin).(imax) in
let concat_numbers =
(* concatenates all number from i to j:
i=0, j=2 -> (input.(0)*10 + input.(1))*10 + input.(2)
*)
let rec concat acc k =
let acc = acc + input.(k) in
if k = imax then acc
else concat (10 * acc) (k + 1)
in concat 0 imin
in add concat_numbers;
for k = imin to imax - 1 do
let res1 = results.(imin).(k) in
let res2 = results.(k+1).(imax) in
operators |> List.iter (fun op ->
iter2 res1 res2 (fun n1 n2 -> add (op n1 n2););
);
done;
done;
done;
let result = results.(0).(Array.length input - 1) in
ISet.mem target result
Here is my solution, which evaluates according to the usual rules of precedence. It finds 303 solutions to find [1;2;3;4;5;6;7;8;9] 100 in under 1/10 second on my MacBook Pro.
Here are two interesting ones:
# 123 - 45 - 67 + 89;;
- : int = 100
# 1 * 2 * 3 - 4 * 5 + 6 * 7 + 8 * 9;;
- : int = 100
This is a brute force solution. The only slightly clever thing is that I treat concatenation of digits as simply another (high precedence) operation.
The eval function is the standard stack-based infix expression evaluation that you will find described many places. Here is an SO article about it: How to evaluate an infix expression in just one scan using stacks? The essence is to postpone evaulating by pushing operators and operands onto stacks. When you find that the next operator has lower precedence you can go back and evaluate what you pushed.
type op = Plus | Minus | Times | Divide | Concat
let prec = function
| Plus | Minus -> 0
| Times | Divide -> 1
| Concat -> 2
let succ = function
| Plus -> Minus
| Minus -> Times
| Times -> Divide
| Divide -> Concat
| Concat -> Plus
let apply op stack =
match op, stack with
| _, [] | _, [_] -> [] (* Invalid input *)
| Plus, a :: b :: tl -> (b + a) :: tl
| Minus, a :: b :: tl -> (b - a) :: tl
| Times, a :: b :: tl -> (b * a) :: tl
| Divide, a :: b :: tl -> (b / a) :: tl
| Concat, a :: b :: tl -> (b * 10 + a) :: tl
let rec eval opstack numstack ops nums =
match opstack, numstack, ops, nums with
| [], sn :: _, [], _ -> sn
| sop :: soptl, _, [], _ ->
eval soptl (apply sop numstack) ops nums
| [], _, op :: optl, n :: ntl ->
eval [op] (n :: numstack) optl ntl
| sop :: soptl, _, op :: _, _ when prec sop >= prec op ->
eval soptl (apply sop numstack) ops nums
| _, _, op :: optl, n :: ntl ->
eval (op :: opstack) (n :: numstack) optl ntl
| _ -> 0 (* Invalid input *)
let rec incr = function
| [] -> []
| Concat :: rest -> Plus :: incr rest
| x :: rest -> succ x :: rest
let find nums tot =
match nums with
| [] -> []
| numhd :: numtl ->
let rec try1 ops accum =
let accum' =
if eval [] [numhd] ops numtl = tot then
ops :: accum
else
accum
in
if List.for_all ((=) Concat) ops then
accum'
else try1 (incr ops) accum'
in
try1 (List.map (fun _ -> Plus) numtl) []
I came up with a slightly obscure implementation (for a variant of this problem) that is a bit better than brute force. It works in place, rather than generating intermediate data structures, keeping track of the combined values of the operators that have already been evaluated.
The trick is to keep track of a pending operator and value so that you can evaluate the "none" operator easily. That is, if the algorithm had just progressed though 1 + 23, the pending operator would be +, and the pending value would be 23, allowing you to easily generate either 1 + 23 + 4 or 1 + 234 as necessary.
type op = Add | Sub | Nothing
let print_ops ops =
let len = Array.length ops in
print_char '1';
for i = 1 to len - 1 do
Printf.printf "%s%d" (match ops.(i) with
| Add -> " + "
| Sub -> " - "
| Nothing -> "") (i + 1)
done;
print_newline ()
let solve k target =
let ops = Array.create k Nothing in
let rec recur i sum pending_op pending_value =
let sum' = match pending_op with
| Add -> sum + pending_value
| Sub -> if sum = 0 then pending_value else sum - pending_value
| Nothing -> pending_value in
if i = k then
if sum' = target then print_ops ops else ()
else
let digit = i + 1 in
ops.(i) <- Add;
recur (i + 1) sum' Add digit;
ops.(i) <- Sub;
recur (i + 1) sum' Sub digit;
ops.(i) <- Nothing;
recur (i + 1) sum pending_op (pending_value * 10 + digit) in
recur 0 0 Nothing 0
Note that this will generate duplicates - I didn't bother to fix that. Also, if you are doing this exercise to gain strength in functional programming, it might be beneficial to reject the imperative approach taken here and search for a similar solution that doesn't make use of assignments.

OCaml merge sort function

So this is a merge sort function I'm playing with in OCaml. The funny thing is the code delivers what I expect, which means, it sorts the list. But then raises some errors. So can someone please check my code and tell me what's going on and why these errors? And how do I eliminate them? I'm a OCaml newbie but I really want to get what's going on:
(* Merge Sort *)
(* This works but produces some extra error. Consult someone!! *)
let rec length_inner l n =
match l with
[] -> n
| h::t -> length_inner t (n + 1)
;;
let length l = length_inner l 0;;
let rec take n l =
if n = 0 then [] else
match l with
h::t -> h :: take (n - 1) t
;;
let rec drop n l =
if n = 0 then l else
match l with
h::t -> drop (n - 1) t
;;
let rec merge x y =
match x, y with
[], l -> l
| l, [] -> l
| hx::tx, hy::ty ->
if hx < hy
then hx :: merge tx (hy :: ty)
else hy :: merge (hx :: tx) ty
;;
let rec msort l =
match l with
[] -> []
| [x] -> [x]
| _ ->
let left = take (length l/2) l in
let right = drop (length l/2) l in
merge (msort left) (msort right)
;;
msort [53; 9; 2; 6; 19];;
In the terminal, I get:
OCaml version 4.00.1
# #use "prac.ml";;
val length_inner : 'a list -> int -> int = <fun>
val length : 'a list -> int = <fun>
File "prac.ml", line 13, characters 2-44:
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[]
val take : int -> 'a list -> 'a list = <fun>
File "prac.ml", line 19, characters 2-39:
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[]
val drop : int -> 'a list -> 'a list = <fun>
val merge : 'a list -> 'a list -> 'a list = <fun>
val msort : 'a list -> 'a list = <fun>
- : int list = [2; 6; 9; 19; 53]
#
The compiler is telling you that your pattern matches aren't exhaustive. In fact it's telling exactly what to try to see the problem. For example, you might try:
drop 2 []
To fix the problem you need to decide what to do with empty lists in your functions. Here's a definition of drop with exhaustive matches:
let rec drop n l =
if n = 0 then l
else
match l with
| [] -> []
| h::t -> drop (n - 1) t
If this isn't clear: your code doesn't say what to do with an empty list. Your matches only say what to do if the list has the form h :: t. But an empty list doesn't have this form. You need to add a case for [] to your matches.

Project Euler No. 14 Haskell

I'm trying to resolve problem 14 of Project Euler (http://projecteuler.net/problem=14) and I hit a dead end using Haskell.
Now, I know that the numbers may be small enough and I could do a brute force, but that isn't the purpose of my exercise.
I am trying to memorize the intermediate results in a Map of type Map Integer (Bool, Integer) with the meaning of:
- the first Integer (the key) holds the number
- the Tuple (Bool, Interger) holds either (True, Length) or (False, Number)
where Length = length of the chain
Number = the number before him
Ex:
for 13: the chain is 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
My map should contain :
13 - (True, 10)
40 - (False, 13)
20 - (False, 40)
10 - (False, 20)
5 - (False, 10)
16 - (False, 5)
8 - (False, 16)
4 - (False, 8)
2 - (False, 4)
1 - (False, 2)
Now when I search for another number like 40 i know that the chain has (10 - 1) length and so on.
I want now, if I search for 10, not only to tell me that length of 10 is (10 - 3) length and update the map, but also I want to update 20, 40 in case they are still (False, _)
My code:
import Data.Map as Map
solve :: [Integer] -> Map Integer (Bool, Integer)
solve xs = solve' xs Map.empty
where
solve' :: [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
solve' [] table = table
solve' (x:xs) table =
case Map.lookup x table of
Nothing -> countF x 1 (x:xs) table
Just (b, _) ->
case b of
True -> solve' xs table
False -> {-WRONG-} solve' xs table
f :: Integer -> Integer
f x
| x `mod` 2 == 0 = x `quot` 2
| otherwise = 3 * x + 1
countF :: Integer -> Integer -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
countF n cnt (x:xs) table
| n == 1 = solve' xs (Map.insert x (True, cnt) table)
| otherwise = countF (f n) (cnt + 1) (x:xs) $ checkMap (f n) n table
checkMap :: Integer -> Integer -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
checkMap n rez table =
case Map.lookup n table of
Nothing -> Map.insert n (False, rez) table
Just _ -> table
At the {-WRONG-} part we should update all the values like in the following example:
--We are looking for 10:
10 - (False, 20)
|
V {-finally-} update 10 => (True, 10 - 1 - 1 - 1)
20 - (False, 40) ^
| |
V update 20 => 20 - (True, 10 - 1 - 1)
40 - (False, 13) ^
| |
V update 40 => 40 - (True, 10 - 1)
13 - (True, 10) ^
| |
---------------------------
The problem is that I don't know if its possible to do 2 things in a function like updating a number and continue the recurence. In a C like language I may do something like (pseudocode):
void f(int n, tuple(b,nr), int &length, table)
{
if(b == False) f (nr, (table lookup nr), 0, table);
// the bool is true so we got a length
else
{
length = nr;
return;
}
// Since this is a recurence it would work as a stack, producing the right output
table update(n, --cnt);
}
The last instruction would work since we are sending cnt by reference. Also we always know that it will finish at some point and cnt should not be < 1.
The easiest optimization (as you have identified) is memoization. You have attempted create a memoization system yourself, however have come across issues on how to store the memoized values. There are solutions to doing this in a maintainable way, such as using a State monad or a STArray. However, there is a much simpler solution to your problem - use haskell's existing memoization. Haskell by default remembers constant values, so if you create a value that stores the collatz values, it will be automatically memoized!
A simple example of this is the following fibonacci definition:
fib :: Int -> Integer
fib n = fibValues !! n where
fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)
The fibValues is a [Integer], and as it is just a constant value, it is memoized. However, that doesn't mean it is all memoized at once, since as it is an infinte list, this would never finish. Instead, the values are only calculated when needed, as haskell is lazy.
So if you do something similar with your problem, you will get memoization without a lot of the work. However, using a list like above won't work well in your solution. This is because the collatz algorithm uses many different values to get the result for a given number, so the container used will require random access to be efficient. The obvious choice is an array.
collatzMemoized :: Array Integer Int
Next, we need to fill up the array with the correct values. I'll write this function pretending a collatz function exists that calculates the collatz value for any n. Also, note that arrays are fixed size, so a value needs to be used to determine the maximum number to memoize. I'll use a million, but any value can be used (it is a memory/speed tradeoff).
collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where
maxNumberToMemroize = 1000000
That is pretty straightforward, the listArray is given bounds, and the a list of all the collatz values in that range is given to it. Remember that this won't calculate all the collatz values straight away, as the values are lazy.
Now, the collatz function can be written. The most important part is to only check the collatzMemoized array if the number being checked is within its bounds:
collatz :: Integer -> Int
collatz 1 = 1
collatz n
| inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
| otherwise = 1 + collatz nextValue
where
nextValue = case n of
1 -> 1
n | even n -> n `div` 2
| otherwise -> 3 * n + 1
In ghci, you can now see the effectiveness of the memoization. Try collatz 200000. It will take about 2 seconds to finish. However, if you run it again, it will complete instantly.
Finally, the solution can be found:
maxCollatzUpTo :: Integer -> (Integer, Int)
maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where
and then printed:
main = print $ maxCollatzUpTo 1000000
If you run main, the result will be printed in about 10 seconds.
Now, a small problem with this approach is it uses a lot of stack space. It will work fine in ghci (which seems to use be more flexible with regards to stack space). However, if you compile it and try to run the executable, it will crash (with a stack space overflow). So to run the program, you have to specify more when you compile it. This can be done by adding -with-rtsopts='K64m' to the compile options. This increases the stack to 64mb.
Now the program can be compiled and ran:
> ghc -O3 --make -with-rtsopts='-K6m' problem.hs
Running ./problem will give the result in less than a second.
You are going about memoization the hard way, trying to write an imperative program in Haskell. Borrowing from David Eisenstat's solution, we'll solve it as j_random_hacker suggested:
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
The dynamic programming solution for this is to replace the recursion with looking things up in a table. Let's make a function where we can replace the recursive call:
collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
| n == 1 = 1
| even n = 1 + r (n `div` 2)
| otherwise = 1 + r (3*n + 1)
Now we could define the recursive algorithm as
collatzLength :: Integer -> Integer
collatzLength = collatzLengthDef collatzLength
Now we could also make a tabled version of this (it takes a number for the table size, and returns a collatzLength function that is calculated using a table of that size):
-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds
collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
where
bounds = (1, n)
table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
collatzLengthTableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (collatzLengthDef collatzLengthTableLookup) x
This works by defining the collatzLength to be a table lookup, with the table being the definition of the function, but with recursive calls replaced by table lookup. The table lookup function checks to see if the argument to the function is in the range that is tabled, and falls back on the definition of the function. We can even make this work for tabling any function like this:
tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
where
table = buildTable bounds (definition tableLookup)
tableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (definition tableLookup) x
collatzLengthTabled n = tableRange (1, n) collatzLengthDef
You just need to make sure that you
let memoized = collatzLengthTabled 10000000
... memoized ...
So that only one table is built in memory.
I remember finding memoisation of dynamic programming algorithms very counterintuitive in Haskell, and it's been a while since I've done it, but hopefully the following trick works for you.
But first, I don't quite understand your current DP scheme, though I suspect it may be quite inefficient as it seems like it will need to update many entries for each answer. (a) I don't know how to do this in Haskell, and (b) you don't need to do this to solve the problem efficiently ;-)
I suggest the following approach instead: first build an ordinary recursive function that computes the right answer for an input number. (Hint: it will have a signature like collatzLength :: Int -> Int.) When you have this function working, just replace its definition with the definition of an array whose elements are defined lazily with the array function using an association list, and replace all recursive calls to the function to array lookups (e.g. collatzLength 42 would become collatzLength ! 42). This will automagically populate the array in the necessary order! So your "top-level" collatzLength object will now actually be an array, rather than a function.
As I suggested above, I would use an array instead of a map datatype to hold the DP table, since you will need to store values for all integer indices from 1 up to 1,000,000.
I don't have a Haskell compiler handy, so I apologize for any broken code.
Without memoization, there's a function
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
With memoization, the type signature is
memoCL :: Map Integer Integer -> Integer -> (Map Integer Integer, Integer)
since memoCL receives a table as input and gives the updated table as output. What memoCL needs to do is intercept the return of the recursive call with a let form and insert the new result.
-- table must have an initial entry for 1
memoCL table n = case Map.lookup n table of
Just m -> (table, m)
Nothing -> let (table', m) = memoCL table (collatzStep n) in (Map.insert n (1 + m) table', 1 + m)
collatzStep :: Integer -> Integer
collatzStep n = if even n then n `div` 2 else 3*n + 1
At some point you'll get sick of the above idiom. Then it's time for monads.
I eventually modify the {-WRONG-} part to do what it should with a call to mark x (b, n) [] xs table where
mark :: Integer -> (Bool, Integer) -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
mark crtElem (b, n) list xs table
| b == False = mark n (findElem n table) (crtElem:list) xs table
| otherwise = continueWith n list xs table
continueWith :: Integer -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
continueWith _ [] xs table = solve' xs table
continueWith cnt (y:ys) xs table = continueWith (cnt - 1) ys xs (Map.insert y (True, cnt - 1) table)
findElem :: Integer -> Map Integer (Bool, Integer) -> (Bool, Integer)
findElem n table =
case Map.lookup n table of
Nothing -> (False, 0)
Just (b, nr) -> (b, nr)
But it seams that there are better (and far less verbose) answers than this 1
Maybe you might find interesting how I solved the problem. Its is pretty functional though it might be not the most efficient thing on earth :)
You can find the code here: https://github.com/fmancinelli/project-euler/blob/master/haskell/project-euler/Problem014.hs
P.S.: Disclaimer: I was doing Project Euler exercises in order to learn Haskell, so the quality of the solution could be debatable.
Since we are studying recursion schemes, here's one for you.
Let's consider functor N(A,B,X)=A+B*X, which is a stream of Bs with the last element being A.
{-# LANGUAGE DeriveFunctor
, TypeFamilies
, TupleSections #-}
import Data.Functor.Foldable
import qualified Data.Map as M
import Data.List
import Data.Function
import Data.Int
data N a b x = Z a | S b x deriving (Functor)
This stream is handy for several kinds of iterations. For one, we can use it to represent a chain of Ints in a Collatz sequence:
type instance Base Int64 = N Int Int64
instance Foldable Int64 where
project 1 = Z 1
project x | odd x = S x $ 3*x+1
project x = S x $ x `div` 2
This is just a algebra, not a initial one, because the transformation is not a isomorphism (same chain of Ints is part of a chain for 2*x and (x-1)/3), but this is sufficient to represent the fixpoint Base Int64 Int64.
With this definition, cata is going to feed the chain to the algebra given to it, and you can use it to construct a memo Map of integers to the chain length. Finally, anamorphism can use it to generate a stream of solutions to the problem of different sizes:
problems = ana (uncurry $ cata . phi) (M.empty, 1) where
phi :: M.Map Int64 Int ->
Base Int64 (Prim [(Int64, Int)] (M.Map Int64 Int, Int64)) ->
Prim [(Int64, Int)] (M.Map Int64 Int, Int64)
phi m (Z v) = found m 1 v
phi m (S x ~(Cons (_, v') (m', _))) = maybe (notFound m' x v') (found m x) $
M.lookup x m
The ~ before (Cons ...) means lazy pattern matching. We don't touch the pattern until the values are needed. If not for lazy pattern matching, it would always construct the whole chain, and using the map would be useless. With lazy pattern matching we only construct the values v' and m' if the chain length for x was not in the map.
Helper functions construct the stream of (Int, chain length) pairs:
found m x v = Cons (x, v) (m, x+1)
notFound m x v = Cons (x, 1+v) (M.insert x (1+v) m, x+1)
Now just take the first 999999 problems, and figure out the one that has the longest chain:
main = print $ maximumBy (compare `on` snd) $ take 999999 problems
This works slower than array-based solution, because Map lookup is logarithmic of map size, but this solution is not fixed size. Still, it finishes in about 5 seconds.

Pattern matching list tail tuple element

I have some Run Length Encoding code that I wrote as an exercise
let rle s =
s
|> List.map (fun x -> (x, 1))
|> List.fold (fun acc x ->
match acc with
| [] -> [(x, 1)]
| h::(x, n) -> h::(x, n+1)
| h -> h::(x, 1)
)
|> List.map (fun (x, n) ->
match n with
| 1 -> x.ToString()
| _ -> x.ToString() + n.ToString()
)
The pattern h::(x, n) -> h::(x, n+1) fails to compile.
Does anyone know why?
RLE (for grins)
let rle (s: string) =
let bldr = System.Text.StringBuilder()
let rec start = function
| [] -> ()
| c :: s -> count (1, c) s
and count (n, c) = function
| c1 :: s when c1 = c -> count (n+1, c) s
| s -> Printf.bprintf bldr "%d%c" n c; start s
start (List.ofSeq s)
bldr.ToString()
let s1 = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
let s2 = "12W1B12W3B24W1B14W"
rle s1 = s2 |> printfn "%b" //"true"
It can't compile because the second argument for :: pattern match must be a list, but here it is a tuple. In general, you seem to just misunderstand head and tail. Head is the top element while tail is a list of following elements. Essentially swapping them does the trick:
|> List.fold (fun acc x ->
match acc with
| [] -> [(x, 1)]
| (x0, n)::t when x0=x -> (x0, n+1)::t
| t -> (x, 1)::t
)
[]
Note 1: As #pad noticed, List.fold requires one more argument, a "bootstrap" accumulator to start with. Obviously, it should be just an empty list, [].
Note 2: you can't directly match x. Instead, you bind it to x0 and compare x0 with x.
Note 3: matching empty list [] is not necessary as it would happily work with the last pattern.
This doesn't answer your question, but I was bored and wrote an implementation you might find a bit more instructive -- just step through it with the debugger in Visual Studio or MonoDevelop.
let rec private rleRec encoded lastChar count charList =
match charList with
| [] ->
// No more chars left to process, but we need to
// append the current run before returning.
let encoded' = (count, lastChar) :: encoded
// Reverse the encoded list so it's in the correct
// order, then return it.
List.rev encoded'
| currentChar :: charList' ->
// Does the current character match the
// last character to be processed?
if currentChar = lastChar then
// Just increment the count and recurse.
rleRec encoded currentChar (count + 1) charList'
else
// The current character is not the same as the last.
// Append the character and run-length for the previous
// character to the 'encoded' list, then start a new run
// with the current character.
rleRec ((count, lastChar) :: encoded) currentChar 1 charList'
let rle charList =
// If the list is empty, just return an empty list
match charList with
| [] -> []
| hd :: tl ->
// Call the implementation of the RLE algorithm.
// The initial run starts with the first character in the list.
rleRec [] hd 1 tl
let rleOfString (str : string) =
rle (List.ofSeq str)
let rec printRle encoded =
match encoded with
| [] ->
printfn ""
| (length, c) :: tl ->
printf "%i%O" length c
printRle tl
let printRleOfString = rleOfString >> printRle
Pasting the code into F# interactive and using the Wikipedia example for run-length encoding:
> printRleOfString "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW";;
12W1B12W3B24W1B14W
val it : unit = ()

Recursive addition in F# using

I'm trying to implement the following recursive definition for addition in F#
m + 0 := m
m + (n + 1) := (m + n) + 1
I can't seem to get the syntax correct, The closest I've come is
let rec plus x y =
match y with
| 0 -> x;
| succ(y) -> succ( plus(x y) );
Where succ n = n + 1. It throws an error on pattern matching for succ.
I'm not sure what succ means in your example, but it is not a pattern defined in the standard F# library. Using just the basic functionality, you'll need to use a pattern that matches any number and then subtract one (and add one in the body):
let rec plus x y =
match y with
| 0 -> x
| y -> 1 + (plus x (y - 1))
In F# (unlike e.g. in Prolog), you can't use your own functions inside patterns. However, you can define active patterns that specify how to decompose input into various cases. The following takes an integer and returns either Zero (for zero) or Succ y for value y + 1:
let (|Zero|Succ|) n =
if n < 0 then failwith "Unexpected!"
if n = 0 then Zero else Succ(n - 1)
Then you can write code that is closer to your original version:
let rec plus x y =
match y with
| Zero -> x
| Succ y -> 1 + (plus x y)
As Tomas said, you can't use succ like this without declaring it. What you can do is to create a discriminated union that represents a number:
type Number =
| Zero
| Succ of Number
And then use that in the plus function:
let rec plus x y =
match y with
| Zero -> x
| Succ(y1) -> Succ (plus x y1)
Or you could declare it as the + operator:
let rec (+) x y =
match y with
| Zero -> x
| Succ(y1) -> Succ (x + y1)
If you kept y where I have y1, the code would work, because the second y would hide the first one. But I think doing so makes the code confusing.
type N = Zero | Succ of N
let rec NtoInt n =
match n with
| Zero -> 0
| Succ x -> 1 + NtoInt x
let rec plus x y =
match x with
| Zero -> y
| Succ n -> Succ (plus n y)
DEMO:
> plus (Succ (Succ Zero)) Zero |> NtoInt ;;
val it : int = 2
> plus (Succ (Succ Zero)) (Succ Zero) |> NtoInt ;;
val it : int = 3
let rec plus x y =
match y with
| 0 -> x
| _ -> plus (x+1) (y-1)

Resources