Help with algorithm for compute columns sum of a (quadtree) matrix? - algorithm

Given this definition and a test matrix:
data (Eq a, Show a) => QT a = C a | Q (QT a) (QT a) (QT a) (QT a)
deriving (Eq, Show)
data (Eq a, Num a, Show a) => Mat a = Mat {nexp :: Int, mat :: QT a}
deriving (Eq, Show)
-- test matrix, exponent is 2, that is matrix is 4 x 4
test = Mat 2 (Q (C 5) (C 6) (Q (C 1) (C 0) (C 2) (C 1)) (C 3))
| | |
| 5 | 6 |
| | |
-------------
|1 | 0| |
|--|--| 3 |
|2 | 1| |
I'm trying to write a function that will output a list of columns sum, like: [13, 11, 18, 18]. The base idea is to sum each sub-quadtree:
If quadtree is (C c), then output the a repeating 2 ^ (n - 1) times the value c * 2 ^ (n - 1). Example: first quadtree is (C 5) so we repeat 5 * 2^(2 - 1) = 10, 2 ^ (n - 1) = 2 times, obtaining [5, 5].
Otherwise, given (Q a b c d), we zipWith the colsum of a and c (and b and d).
Of course this is not working (not even compiling) because after some recursion we have:
zipWith (+) [[10, 10], [12, 12]] [zipWith (+) [[1], [0]] [[2], [1]], [6, 6]]
Because I'm beginning with Haskell I feel I'm missing something, need some advice on function I can use. Not working colsum definition is:
colsum :: (Eq a, Show a, Num a) => Mat a -> [a]
colsum m = csum (mat m)
where
n = nexp m
csum (C c) = take (2 ^ n) $ repeat (c * 2 ^ n)
csum (Q a b c d) = zipWith (+) [colsum $ submat a, colsum $ submat b]
[colsum $ submat c, colsum $ submat d]
submat q = Mat (n - 1) q
Any ideas would be great and much appreciated...

Probably "someone" should have explained to who is worried about the depth of the QuadTree that the nexp field in the Matrix type is exactly meant to be used to determine the real size of a (C _).
About the solution presented in the first answer, ok it works. However it is quite useless to construct and deconstruct Mat, this could be easily avoided. Moreover the call to fromIntegral to "bypass" the type checking problem coming from the use of replicate can be solved without forcing to first going to Integral and then coming back, like
let m = 2^n; k=2^n in replicate k (m*x)
Anyway, the challenge here is to avoid the quadratical behavior due to the ++, that is what I would expect.
Cheers,

Let's consider your colsum:
colsum :: (Eq a, Show a, Num a) => Mat a -> [a]
colsum m = csum (mat m)
where
n = nexp m
csum (C c) = take (2 ^ n) $ repeat (c * 2 ^ n)
csum (Q a b c d) = zipWith (+) [colsum $ submat a, colsum $ submat b]
[colsum $ submat c, colsum $ submat d]
submat q = Mat (n - 1) q
It is almost correct, except the line where you define csum (Q a b c d) = ....
Let think about types. colsum returns a list of numbers. ZipWith (+) sums two lists elementwise:
ghci> :t zipWith (+)
zipWith (+) :: Num a => [a] -> [a] -> [a]
This means that you need to pass two lists of numbers to zipWith (+). Instead you create two lists of lists of numbers, like this:
[colsum $ submat a, colsum $ submat b]
The type of this expression is [[a]], not [a] as you need.
What you need to do is to concatenate two lists of numbers to obtain a single list of numbers (and this is, probably, what you intended to do):
((colsum $ submat a) ++ (colsum $ submat b))
Similarly, you concatenate lists of partial sums for c and d then your function should start working.

Let's go more general, and come back to the goal at hand.
Consider how we would project a quadtree into a 2n×2n matrix. We may not need to create this projection in order to calculate its column sums, but it's a useful notion to work with.
If our quadtree is a single cell, then we'd just fill the entire matrix with that cell's value.
Otherwise, if n ≥ 1, we can divide the matrix up into quadrants, and let the subquadtrees each fill one quadrant (that is, have each subquadtree fill a 2n-1×2n-1 matrix).
Note that there's still a case remaining. What if n = 0 (that is, we have a 1×1 matrix) and the quadtree isn't a single cell? We need to specify some behaviour for this case - maybe we just let one of the subquadtrees populate the entire matrix, or we fill the matrix with some default value.
Now consider the column sums of such a projection.
If our quadtree was a single cell, then the 2n column sums will all be 2n
times the value stored in that cell.
(hint: look at replicate and genericReplicate on hoogle).
Otherwise, if n ≥ 1, then each column overlaps two distinct quadrants.
Half of our columns will be completely determined by the western quadrants,
and the other half by the eastern quadrants, The sum for a particular column
can be defined as the sum of the contribution to that column
from its northern half (that is, the column sum for that column in the northern quadrant),
and its southern half (likewise).
(hint: We'll need to append the western column sums to the eastern column sums
to get all the column sums, and combien the northern and southern demi-column sums
to get the actual sums for each column).
Again, we have a third case, and the column sum here depends on how
you project four subquadtrees onto a 1×1 matrix. Fortunately, a 1×1 matrix means
only a single column sum!
Now, we only care about a particular projection - the projection onto a matrix of size 2dd×2d
where d is the depth of our quadtree. So you'll need to figure the depth too. Since a
single cell fits "naturally" into a matrix of size 1×1, that implies that it has a
depth of 0. A quadbranch must have depth great enough to allow each of its subquads to fit
into their quadrant of the matrix.

Related

Generate all unique directed graphs with 2 inputs to each node

I'm trying to generate all unique digraphs that fit a spec:
each node must have exactly 2 inputs
and are allowed arbitrarily many outputs to other nodes in the graph
My current solution is slow. Eg for 6 nodes, the algo has taken 1.5 days to get where I think it's complete, but it'll probably be checking for a few more days still.
My algorithm for a graph with n nodes:
generate all n-length strings of 0, where one symbol is a 1, eg, for n=3, [[0,0,1], [0,1,0], [1,0,0]]. These can be thought of as rows from an identity matrix.
generate all possible n * n matrixes where each row is all possible combinations of step 1. + step 1.
This is the connectivity matrix where each cell represents a connection from column-index to row-index
So, for n=3, these are possible:
[0,1,0] + [1,0,0] = [1,1,0]
[1,0,0] + [1,0,0] = [2,0,0]
These represent the inputs to a node, and by adding step 1 to itself, the result will always represent 2 inputs.
For ex:
A B C
A' [[0,1,1],
B' [0,2,0],
C' [1,1,0]]
So B and C connect to A once each: B -> A', C -> A',
And B connects to itself twice: B => B'
I only want unique ones, so for each connectivity matrix generated, I can only keep it if it is not isomorphic to an already-seen graph.
This step is expensive. I need to convert the graph to a "canonical form" by running through each permutation of isomorphic graphs, sorting them, and considering the first one as the "canonical form".
If anyone dives into testing any of this out, here are the count of unique graphs for n nodes:
2 - 6
3 - 44
4 - 475
5 - 6874
6 - 109,934 (I think, it's not done running yet but I haven't found a new graph in >24 hrs.)
7 - I really wanna know!
Possible optimizations:
since I get to generate the graphs to test, is there a way of ruling them out, without testing, as being isomorphic to already-seen ones?
is there a faster graph-isomorphism algorithm? I think this one is related to "Nauty", and there are others I've read of in papers, but I haven't had the expertise (or bandwidth) to implement them yet.
Here's a demonstrable connectivity matrix that can be plotted at graphonline.ru for fun, showing self connections, and 2 connections to t he same node:
1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 0,
0, 1, 0, 1, 0, 0,
0, 1, 2, 0, 0, 0,
0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0,
here's the code in haskell if you want to play with it, but I'm more concerned about getting the algorithm right (eg pruning down the search space), than the implementation:
-- | generate all permutations of length n given symbols from xs
npermutations :: [a] -> Int -> [[a]]
npermutations xs size = mapM (const xs) [1..size]
identity :: Int -> [[Int]]
identity size = scanl
(\xs _ -> take size $ 0 : xs) -- keep shifting right
(1 : (take (size - 1) (repeat 0))) -- initial, [1,0,0,...]
[1 .. size-1] -- correct size
-- | return all possible pairings of [Column]
columnPairs :: [[a]] -> [([a], [a])]
columnPairs xs = (map (\x y -> (x,y)) xs)
<*> xs
-- | remove duplicates
rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
rmdups' _ [] = []
rmdups' a (b : c) = if Set.member b a
then rmdups' a c
else b : rmdups' (Set.insert b a) c
-- | all possible patterns for inputting 2 things into one node.
-- eg [0,1,1] means cells B, and C project into some node
-- [0,2,0] means cell B projects twice into one node
binaryInputs :: Int -> [[Int]]
binaryInputs size = rmdups $ map -- rmdups because [1,0]+[0,1] is same as flipped
(\(x,y) -> zipWith (+) x y)
(columnPairs $ identity size)
transposeAdjMat :: [[Int]] -> [[Int]]
transposeAdjMat ([]:_) = []
transposeAdjMat m = (map head m) : transposeAdjMat (map tail m)
-- | AdjMap [(name, inbounds)]
data AdjMap a = AdjMap [(a, [a])] deriving (Show, Eq)
addAdjColToMap :: Int -- index
-> [Int] -- inbound
-> AdjMap Int
-> AdjMap Int
addAdjColToMap ix col (AdjMap xs) =
let conns = foldl (\c (cnt, i) -> case cnt of
1 -> i:c
2 -> i:i:c
_ -> c
)
[]
(zip col [0..]) in
AdjMap ((ix, conns) : xs)
adjMatToMap :: [[Int]] -> AdjMap Int
adjMatToMap cols = foldl
(\adjMap#(AdjMap nodes) col -> addAdjColToMap (length nodes) col adjMap)
(AdjMap [])
cols
-- | a graph's canonical form : http://mfukar.github.io/2015/09/30/haskellxiii.html
-- very expensive algo, of course
canon :: (Ord a, Enum a, Show a) => AdjMap a -> String
canon (AdjMap g) = minimum $ map f $ Data.List.permutations [1..(length g)]
where
-- Graph vertices:
vs = map fst g
-- Find, via brute force on all possible orderings (permutations) of vs,
-- a mapping of vs to [1..(length g)] which is minimal.
-- For example, map [1, 5, 6, 7] to [1, 2, 3, 4].
-- Minimal is defined lexicographically, since `f` returns strings:
f p = let n = zip vs p
in (show [(snd x, sort id $ map (\x -> snd $ head $ snd $ break ((==) x . fst) n)
$ snd $ take_edge g x)
| x <- sort snd n])
-- Sort elements of N in ascending order of (map f N):
sort f n = foldr (\x xs -> let (lt, gt) = break ((<) (f x) . f) xs
in lt ++ [x] ++ gt) [] n
-- Get the first entry from the adjacency list G that starts from the given node X
-- (actually, the vertex is the first entry of the pair, hence `(fst x)`):
take_edge g x = head $ dropWhile ((/=) (fst x) . fst) g
-- | all possible matrixes where each node has 2 inputs and arbitrary outs
binaryMatrixes :: Int -> [[[Int]]]
binaryMatrixes size = let columns = binaryInputs size
unfiltered = mapM (const columns) [1..size] in
fst $ foldl'
(\(keep, seen) x -> let can = canon . adjMatToMap $ x in
(if Set.member can seen
then keep
else id $! x : keep
, Set.insert can seen))
([], Set.fromList [])
unfiltered
There are a number of approaches you could try. One thing that I do note is that having loops with multi-edges (colored loops?) is a little unusual, but is probably just needs a refinement of existing techniques.
Filter the output of another program
The obvious candidate here is of course nAUTy/traces (http://pallini.di.uniroma1.it/) or similar (saucy, bliss, etc). Depending on how you want to do this, it could be as simple as run nauty (for example) and output to file, then read in the list filtering as you go.
For larger values of n this could start to be a problem if you are generating huge files. I'm not sure whether you start to run out of space before you run out of time, but still. What might be better is to generate and test them as you go, throwing away candidates. For your purposes, there may be an existing library for generation - I found this one but I have no idea how good it is.
Use graph invariants
A very easy first step to more efficient listing of graphs is to filter using graph invariants. An obvious one would be degree sequence (the ordered list of degrees of the graph). Others include the number of cycles, the girth, and so on. For your purposes, there might be some indegree/outdegree sequence you could use.
The basic idea is to use the invariant as a filter to avoid expensive checks for isomorphism. You can store the (list of ) invariants for already generated graphs, and check the new one against the list first. The canonical form of a structure is a kind of invariant.
Implement an algorithm
There are lost of GI algorithms, including the ones used by nauty and friends. However, they do tend to be quite hard! The description given in this answer is an excellent overview, but the devil is in the details of course.
Also note that the description is for general graphs, while you have a specific subclass of graph that might be easier to generate. There may be papers out there for digraph listing (generating) but I have not checked.

Is there a fast algorithm to determine the godel number of a term of a context free language?

Suppose we have a simple grammar specification. There is a way to enumerate terms of that grammar that guarantees that any finite term will have a finite position, by iterating it diagonally. For example, for the following grammar:
S ::= add
add ::= mul | add + mul
mul ::= term | mul * term
term ::= number | ( S )
number ::= digit | digit number
digit ::= 0 | 1 | ... | 9
You can enumerate terms like that:
0
1
0+0
0*0
0+1
(0)
1+0
0*1
0+0*0
00
... etc
My question is: is there a way to do the opposite? That is, to take a valid term of that grammar, say, 0+0*0, and find its position on such enumeration - in that case, 9?
For this specific problem, we can cook up something fairly simple, if we allow ourselves to choose a different enumeration ordering. The idea is basically the one in Every Bit Counts, which I also mentioned in the comments. First, some preliminaries: some imports/extensions, a data type representing the grammar, and a pretty-printer. For the sake of simplicity, my digits only go up to 2 (big enough to not be binary any more, but small enough not to wear out my fingers and your eyes).
{-# LANGUAGE TypeSynonymInstances #-}
import Control.Applicative
import Data.Universe.Helpers
type S = Add
data Add = Mul Mul | Add :+ Mul deriving (Eq, Ord, Show, Read)
data Mul = Term Term | Mul :* Term deriving (Eq, Ord, Show, Read)
data Term = Number Number | Parentheses S deriving (Eq, Ord, Show, Read)
data Number = Digit Digit | Digit ::: Number deriving (Eq, Ord, Show, Read)
data Digit = D0 | D1 | D2 deriving (Eq, Ord, Show, Read, Bounded, Enum)
class PP a where pp :: a -> String
instance PP Add where
pp (Mul m) = pp m
pp (a :+ m) = pp a ++ "+" ++ pp m
instance PP Mul where
pp (Term t) = pp t
pp (m :* t) = pp m ++ "*" ++ pp t
instance PP Term where
pp (Number n) = pp n
pp (Parentheses s) = "(" ++ pp s ++ ")"
instance PP Number where
pp (Digit d) = pp d
pp (d ::: n) = pp d ++ pp n
instance PP Digit where pp = show . fromEnum
Now let's define the enumeration order. We'll use two basic combinators, +++ for interleaving two lists (mnemonic: the middle character is a sum, so we're taking elements from either the first argument or the second) and +*+ for the diagonalization (mnemonic: the middle character is a product, so we're taking elements from both the first and second arguments). More information on these in the universe documentation. One invariant we'll maintain is that our lists -- with the exception of digits -- are always infinite. This will be important later.
ss = adds
adds = (Mul <$> muls ) +++ (uncurry (:+) <$> adds +*+ muls)
muls = (Term <$> terms ) +++ (uncurry (:*) <$> muls +*+ terms)
terms = (Number <$> numbers) +++ (Parentheses <$> ss)
numbers = (Digit <$> digits) ++ interleave [[d ::: n | n <- numbers] | d <- digits]
digits = [D0, D1, D2]
Let's see a few terms:
*Main> mapM_ (putStrLn . pp) (take 15 ss)
0
0+0
0*0
0+0*0
(0)
0+0+0
0*(0)
0+(0)
1
0+0+0*0
0*0*0
0*0+0
(0+0)
0+0*(0)
0*1
Okay, now let's get to the good bit. Let's assume we have two infinite lists a and b. There's two things to notice. First, in a +++ b, all the even indices come from a, and all the odd indices come from b. So we can look at the last bit of an index to see which list to look in, and the remaining bits to pick an index in that list. Second, in a +*+ b, we can use the standard bijection between pairs of numbers and single numbers to translate between indices in the big list and pairs of indices in the a and b lists. Nice! Let's get to it. We'll define a class for Godel-able things that can be translated back and forth between numbers -- indices into the infinite list of inhabitants. Later we'll check that this translation matches the enumeration we defined above.
type Nat = Integer -- bear with me here
class Godel a where
to :: a -> Nat
from :: Nat -> a
instance Godel Nat where to = id; from = id
instance (Godel a, Godel b) => Godel (a, b) where
to (m_, n_) = (m + n) * (m + n + 1) `quot` 2 + m where
m = to m_
n = to n_
from p = (from m, from n) where
isqrt = floor . sqrt . fromIntegral
base = (isqrt (1 + 8 * p) - 1) `quot` 2
triangle = base * (base + 1) `quot` 2
m = p - triangle
n = base - m
The instance for pairs here is the standard Cantor diagonal. It's just a bit of algebra: use the triangle numbers to figure out where you're going/coming from. Now building up instances for this class is a breeze. Numbers are just represented in base 3:
-- this instance is a lie! there aren't infinitely many Digits
-- but we'll be careful about how we use it
instance Godel Digit where
to = fromIntegral . fromEnum
from = toEnum . fromIntegral
instance Godel Number where
to (Digit d) = to d
to (d ::: n) = 3 + to d + 3 * to n
from n
| n < 3 = Digit (from n)
| otherwise = let (q, r) = quotRem (n-3) 3 in from r ::: from q
For the remaining three types, we will, as suggested above, check the tag bit to decide which constructor to emit, and use the remaining bits as indices into a diagonalized list. All three instances necessarily look very similar.
instance Godel Term where
to (Number n) = 2 * to n
to (Parentheses s) = 1 + 2 * to s
from n = case quotRem n 2 of
(q, 0) -> Number (from q)
(q, 1) -> Parentheses (from q)
instance Godel Mul where
to (Term t) = 2 * to t
to (m :* t) = 1 + 2 * to (m, t)
from n = case quotRem n 2 of
(q, 0) -> Term (from q)
(q, 1) -> uncurry (:*) (from q)
instance Godel Add where
to (Mul m) = 2 * to m
to (m :+ t) = 1 + 2 * to (m, t)
from n = case quotRem n 2 of
(q, 0) -> Mul (from q)
(q, 1) -> uncurry (:+) (from q)
And that's it! We can now "efficiently" translate back and forth between parse trees and their Godel numbering for this grammar. Moreover, this translation matches the above enumeration, as you can verify:
*Main> map from [0..29] == take 30 ss
True
We did abuse many nice properties of this particular grammar -- non-ambiguity, the fact that almost all the nonterminals had infinitely many derivations -- but variations on this technique can get you quite far, especially if you are not too strict on requiring every number to be associated with something unique.
Also, by the way, you might notice that, except for the instance for (Nat, Nat), these Godel numberings are particularly nice in that they look at/produce one bit (or trit) at a time. So you could imagine doing some streaming. But the (Nat, Nat) one is pretty nasty: you have to know the whole number ahead of time to compute the sqrt. You actually can turn this into a streaming guy, too, without losing the property of being dense (every Nat being associated with a unique (Nat, Nat)), but that's a topic for another answer...

Euclid's algorithm using until

I'm a beginner in Haskell, just started now learning about folds and what not, in college, first year.
One of the problems I'm facing now is to define Euclid's algorithm using the until function.
Here's the Euclid's recursive definition (EDIT: just to show how euclid works, I'm trying to define euclid's without the recursive. Just using until):
gcd a b = if b == 0 then a else gcd b (a `mod` b)
Here's what i have using until:
gcd a b = until (==0) (mod a ) b
Obviously this doesn't make any sense since it's always going to return 0, as that is my stopping point instead of printing the value of a when b == 0. I can't for the life of me though figure out how to get the value of a.
Any help is appreciated.
Thank you in advance guys.
Hints:
Now
until :: (a -> Bool) -> (a -> a) -> a -> a
so we need a function that we can apply repeatedly until a condition holds, but we have two numbers a and b, so how can we do that?
The solution is to make the two numbers into one value, (a,b), so think of gcd this way:
uncurriedGCD (a,b) = if b == 0 then (a,a) else uncurriedGCD (b,a `mod` b)
Now you can make two functions, next & check and use them with until.
Helpers for until:
next (a,b) = (b,a `mod` b)
check (a,b) = b == 0
This means that we now could have written uncurriedGCD using until.
Answer:
For example:
ghci> until check next (6,4)
(2,0)
ghci> until check next (12,18)
(6,0)
So we can define:
gcd a b = c where (c,_) = until check next (a,b)
giving:
ghci> gcd 20 44
4
ghci> gcd 60 108
12
What the Euclid's algorithm says is this: for (a, b), computing (b, mod a b) until (the new) b equals zero. This can be translated directly to an implementation using until like this:
myGcd a b = until (\(x, y) -> y == 0) (\(x, y) -> (y, x `mod` y)) (a, b)

Project Euler No. 14 Haskell

I'm trying to resolve problem 14 of Project Euler (http://projecteuler.net/problem=14) and I hit a dead end using Haskell.
Now, I know that the numbers may be small enough and I could do a brute force, but that isn't the purpose of my exercise.
I am trying to memorize the intermediate results in a Map of type Map Integer (Bool, Integer) with the meaning of:
- the first Integer (the key) holds the number
- the Tuple (Bool, Interger) holds either (True, Length) or (False, Number)
where Length = length of the chain
Number = the number before him
Ex:
for 13: the chain is 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
My map should contain :
13 - (True, 10)
40 - (False, 13)
20 - (False, 40)
10 - (False, 20)
5 - (False, 10)
16 - (False, 5)
8 - (False, 16)
4 - (False, 8)
2 - (False, 4)
1 - (False, 2)
Now when I search for another number like 40 i know that the chain has (10 - 1) length and so on.
I want now, if I search for 10, not only to tell me that length of 10 is (10 - 3) length and update the map, but also I want to update 20, 40 in case they are still (False, _)
My code:
import Data.Map as Map
solve :: [Integer] -> Map Integer (Bool, Integer)
solve xs = solve' xs Map.empty
where
solve' :: [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
solve' [] table = table
solve' (x:xs) table =
case Map.lookup x table of
Nothing -> countF x 1 (x:xs) table
Just (b, _) ->
case b of
True -> solve' xs table
False -> {-WRONG-} solve' xs table
f :: Integer -> Integer
f x
| x `mod` 2 == 0 = x `quot` 2
| otherwise = 3 * x + 1
countF :: Integer -> Integer -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
countF n cnt (x:xs) table
| n == 1 = solve' xs (Map.insert x (True, cnt) table)
| otherwise = countF (f n) (cnt + 1) (x:xs) $ checkMap (f n) n table
checkMap :: Integer -> Integer -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
checkMap n rez table =
case Map.lookup n table of
Nothing -> Map.insert n (False, rez) table
Just _ -> table
At the {-WRONG-} part we should update all the values like in the following example:
--We are looking for 10:
10 - (False, 20)
|
V {-finally-} update 10 => (True, 10 - 1 - 1 - 1)
20 - (False, 40) ^
| |
V update 20 => 20 - (True, 10 - 1 - 1)
40 - (False, 13) ^
| |
V update 40 => 40 - (True, 10 - 1)
13 - (True, 10) ^
| |
---------------------------
The problem is that I don't know if its possible to do 2 things in a function like updating a number and continue the recurence. In a C like language I may do something like (pseudocode):
void f(int n, tuple(b,nr), int &length, table)
{
if(b == False) f (nr, (table lookup nr), 0, table);
// the bool is true so we got a length
else
{
length = nr;
return;
}
// Since this is a recurence it would work as a stack, producing the right output
table update(n, --cnt);
}
The last instruction would work since we are sending cnt by reference. Also we always know that it will finish at some point and cnt should not be < 1.
The easiest optimization (as you have identified) is memoization. You have attempted create a memoization system yourself, however have come across issues on how to store the memoized values. There are solutions to doing this in a maintainable way, such as using a State monad or a STArray. However, there is a much simpler solution to your problem - use haskell's existing memoization. Haskell by default remembers constant values, so if you create a value that stores the collatz values, it will be automatically memoized!
A simple example of this is the following fibonacci definition:
fib :: Int -> Integer
fib n = fibValues !! n where
fibValues = 1 : 1 : zipWith (+) fibValues (tail fibValues)
The fibValues is a [Integer], and as it is just a constant value, it is memoized. However, that doesn't mean it is all memoized at once, since as it is an infinte list, this would never finish. Instead, the values are only calculated when needed, as haskell is lazy.
So if you do something similar with your problem, you will get memoization without a lot of the work. However, using a list like above won't work well in your solution. This is because the collatz algorithm uses many different values to get the result for a given number, so the container used will require random access to be efficient. The obvious choice is an array.
collatzMemoized :: Array Integer Int
Next, we need to fill up the array with the correct values. I'll write this function pretending a collatz function exists that calculates the collatz value for any n. Also, note that arrays are fixed size, so a value needs to be used to determine the maximum number to memoize. I'll use a million, but any value can be used (it is a memory/speed tradeoff).
collatzMemoized = listArray (1, maxNumberToMemoize) $ map collatz [1..maxNumberToMemoize] where
maxNumberToMemroize = 1000000
That is pretty straightforward, the listArray is given bounds, and the a list of all the collatz values in that range is given to it. Remember that this won't calculate all the collatz values straight away, as the values are lazy.
Now, the collatz function can be written. The most important part is to only check the collatzMemoized array if the number being checked is within its bounds:
collatz :: Integer -> Int
collatz 1 = 1
collatz n
| inRange (bounds collatzMemoized) nextValue = 1 + collatzMemoized ! nextValue
| otherwise = 1 + collatz nextValue
where
nextValue = case n of
1 -> 1
n | even n -> n `div` 2
| otherwise -> 3 * n + 1
In ghci, you can now see the effectiveness of the memoization. Try collatz 200000. It will take about 2 seconds to finish. However, if you run it again, it will complete instantly.
Finally, the solution can be found:
maxCollatzUpTo :: Integer -> (Integer, Int)
maxCollatzUpTo n = maximumBy (compare `on` snd) $ zip [1..n] (map collatz [1..n]) where
and then printed:
main = print $ maxCollatzUpTo 1000000
If you run main, the result will be printed in about 10 seconds.
Now, a small problem with this approach is it uses a lot of stack space. It will work fine in ghci (which seems to use be more flexible with regards to stack space). However, if you compile it and try to run the executable, it will crash (with a stack space overflow). So to run the program, you have to specify more when you compile it. This can be done by adding -with-rtsopts='K64m' to the compile options. This increases the stack to 64mb.
Now the program can be compiled and ran:
> ghc -O3 --make -with-rtsopts='-K6m' problem.hs
Running ./problem will give the result in less than a second.
You are going about memoization the hard way, trying to write an imperative program in Haskell. Borrowing from David Eisenstat's solution, we'll solve it as j_random_hacker suggested:
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
The dynamic programming solution for this is to replace the recursion with looking things up in a table. Let's make a function where we can replace the recursive call:
collatzLengthDef :: (Integer -> Integer) -> Integer -> Integer
collatzLengthDef r n
| n == 1 = 1
| even n = 1 + r (n `div` 2)
| otherwise = 1 + r (3*n + 1)
Now we could define the recursive algorithm as
collatzLength :: Integer -> Integer
collatzLength = collatzLengthDef collatzLength
Now we could also make a tabled version of this (it takes a number for the table size, and returns a collatzLength function that is calculated using a table of that size):
-- A utility function that makes memoizing things easier
buildTable :: (Ix i) => (i, i) -> (i -> e) -> Array i e
buildTable bounds f = array $ map (\x -> (x, f x)) $ range bounds
collatzLengthTabled :: Integer -> Integer -> Integer
collatzLengthTabled n = collatzLengthTableLookup
where
bounds = (1, n)
table = buildTable bounds (collatzLengthDef collatzLengthTableLookup)
collatzLengthTableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (collatzLengthDef collatzLengthTableLookup) x
This works by defining the collatzLength to be a table lookup, with the table being the definition of the function, but with recursive calls replaced by table lookup. The table lookup function checks to see if the argument to the function is in the range that is tabled, and falls back on the definition of the function. We can even make this work for tabling any function like this:
tableRange :: (Ix a) => (a, a) -> ((a -> b) -> a -> b) -> a -> b
tableRange bounds definition = tableLookup
where
table = buildTable bounds (definition tableLookup)
tableLookup =
\x -> Case inRange bounds x of
True -> table ! x
_ -> (definition tableLookup) x
collatzLengthTabled n = tableRange (1, n) collatzLengthDef
You just need to make sure that you
let memoized = collatzLengthTabled 10000000
... memoized ...
So that only one table is built in memory.
I remember finding memoisation of dynamic programming algorithms very counterintuitive in Haskell, and it's been a while since I've done it, but hopefully the following trick works for you.
But first, I don't quite understand your current DP scheme, though I suspect it may be quite inefficient as it seems like it will need to update many entries for each answer. (a) I don't know how to do this in Haskell, and (b) you don't need to do this to solve the problem efficiently ;-)
I suggest the following approach instead: first build an ordinary recursive function that computes the right answer for an input number. (Hint: it will have a signature like collatzLength :: Int -> Int.) When you have this function working, just replace its definition with the definition of an array whose elements are defined lazily with the array function using an association list, and replace all recursive calls to the function to array lookups (e.g. collatzLength 42 would become collatzLength ! 42). This will automagically populate the array in the necessary order! So your "top-level" collatzLength object will now actually be an array, rather than a function.
As I suggested above, I would use an array instead of a map datatype to hold the DP table, since you will need to store values for all integer indices from 1 up to 1,000,000.
I don't have a Haskell compiler handy, so I apologize for any broken code.
Without memoization, there's a function
collatzLength :: Integer -> Integer
collatzLength n
| n == 1 = 1
| even n = 1 + collatzLength (n `div` 2)
| otherwise = 1 + collatzLength (3*n + 1)
With memoization, the type signature is
memoCL :: Map Integer Integer -> Integer -> (Map Integer Integer, Integer)
since memoCL receives a table as input and gives the updated table as output. What memoCL needs to do is intercept the return of the recursive call with a let form and insert the new result.
-- table must have an initial entry for 1
memoCL table n = case Map.lookup n table of
Just m -> (table, m)
Nothing -> let (table', m) = memoCL table (collatzStep n) in (Map.insert n (1 + m) table', 1 + m)
collatzStep :: Integer -> Integer
collatzStep n = if even n then n `div` 2 else 3*n + 1
At some point you'll get sick of the above idiom. Then it's time for monads.
I eventually modify the {-WRONG-} part to do what it should with a call to mark x (b, n) [] xs table where
mark :: Integer -> (Bool, Integer) -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
mark crtElem (b, n) list xs table
| b == False = mark n (findElem n table) (crtElem:list) xs table
| otherwise = continueWith n list xs table
continueWith :: Integer -> [Integer] -> [Integer] -> Map Integer (Bool, Integer) -> Map Integer (Bool, Integer)
continueWith _ [] xs table = solve' xs table
continueWith cnt (y:ys) xs table = continueWith (cnt - 1) ys xs (Map.insert y (True, cnt - 1) table)
findElem :: Integer -> Map Integer (Bool, Integer) -> (Bool, Integer)
findElem n table =
case Map.lookup n table of
Nothing -> (False, 0)
Just (b, nr) -> (b, nr)
But it seams that there are better (and far less verbose) answers than this 1
Maybe you might find interesting how I solved the problem. Its is pretty functional though it might be not the most efficient thing on earth :)
You can find the code here: https://github.com/fmancinelli/project-euler/blob/master/haskell/project-euler/Problem014.hs
P.S.: Disclaimer: I was doing Project Euler exercises in order to learn Haskell, so the quality of the solution could be debatable.
Since we are studying recursion schemes, here's one for you.
Let's consider functor N(A,B,X)=A+B*X, which is a stream of Bs with the last element being A.
{-# LANGUAGE DeriveFunctor
, TypeFamilies
, TupleSections #-}
import Data.Functor.Foldable
import qualified Data.Map as M
import Data.List
import Data.Function
import Data.Int
data N a b x = Z a | S b x deriving (Functor)
This stream is handy for several kinds of iterations. For one, we can use it to represent a chain of Ints in a Collatz sequence:
type instance Base Int64 = N Int Int64
instance Foldable Int64 where
project 1 = Z 1
project x | odd x = S x $ 3*x+1
project x = S x $ x `div` 2
This is just a algebra, not a initial one, because the transformation is not a isomorphism (same chain of Ints is part of a chain for 2*x and (x-1)/3), but this is sufficient to represent the fixpoint Base Int64 Int64.
With this definition, cata is going to feed the chain to the algebra given to it, and you can use it to construct a memo Map of integers to the chain length. Finally, anamorphism can use it to generate a stream of solutions to the problem of different sizes:
problems = ana (uncurry $ cata . phi) (M.empty, 1) where
phi :: M.Map Int64 Int ->
Base Int64 (Prim [(Int64, Int)] (M.Map Int64 Int, Int64)) ->
Prim [(Int64, Int)] (M.Map Int64 Int, Int64)
phi m (Z v) = found m 1 v
phi m (S x ~(Cons (_, v') (m', _))) = maybe (notFound m' x v') (found m x) $
M.lookup x m
The ~ before (Cons ...) means lazy pattern matching. We don't touch the pattern until the values are needed. If not for lazy pattern matching, it would always construct the whole chain, and using the map would be useless. With lazy pattern matching we only construct the values v' and m' if the chain length for x was not in the map.
Helper functions construct the stream of (Int, chain length) pairs:
found m x v = Cons (x, v) (m, x+1)
notFound m x v = Cons (x, 1+v) (M.insert x (1+v) m, x+1)
Now just take the first 999999 problems, and figure out the one that has the longest chain:
main = print $ maximumBy (compare `on` snd) $ take 999999 problems
This works slower than array-based solution, because Map lookup is logarithmic of map size, but this solution is not fixed size. Still, it finishes in about 5 seconds.

Finding largest f satisfying a property given f is non-decreasing in its arguments

this has been bugging me for a while.
Lets say you have a function f x y where x and y are integers and you know that f is strictly non-decreasing in its arguments,
i.e. f (x+1) y >= f x y and f x (y+1) >= f x y.
What would be the fastest way to find the largest f x y satisfying a property given that x and y are bounded.
I was thinking that this might be a variation of saddleback search and I was wondering if there was a name for this type of problem.
Also, more specifically I was wondering if there was a faster way to solve this problem if you knew that f was the multiplication operator.
Thanks!
Edit: Seeing the comments below, the property can be anything
Given a property g (where g takes a value and returns a boolean) I am simply looking for the largest f such that g(f) == True
For example, a naive implementation (in haskell) would be:
maximise :: (Int -> Int -> Int) -> (Int -> Bool) -> Int -> Int -> Int
maximise f g xLim yLim = head . filter g . reverse . sort $ results
where results = [f x y | x <- [1..xLim], y <- [1..yLim]]
Let's draw an example grid for your problem to help think about it. Here's an example plot of f for each x and y. It is monotone in each argument, which is an interesting constraint we might be able to do something clever with.
+------- x --------->
| 0 0 1 1 1 2
| 0 1 1 2 2 4
y 1 1 3 4 6 6
| 1 2 3 6 6 7
| 7 7 7 7 7 7
v
Since we don't know anything about the property, we can't really do better than to list the values in the range of f in decreasing order. The question is how to do that efficiently.
The first thing that comes to mind is to traverse it like a graph starting at the lower-right corner. Here is my attempt:
import Data.Maybe (listToMaybe)
maximise :: (Ord b, Num b) => (Int -> Int -> b) -> (b -> Bool) -> Int -> Int -> Maybe b
maximise f p xLim yLim =
listToMaybe . filter p . map (negate . snd) $
enumIncreasing measure successors (xLim,yLim)
where
measure (x,y) = negate $ f x y
successors (x,y) = [ (x-1,y) | x > 0 ] ++ [ (x,y-1) | y > 0 ] ]
The signature is not as general as it could be (Num should not be necessary, but I needed it to negate the measure function because enumIncreasing returns an increasing rather than a decreasing list -- I could have also done it with a newtype wrapper).
Using this function, we can find the largest odd number which can be written as a product of two numbers <= 100:
ghci> maximise (*) odd 100 100
Just 9801
I wrote enumIncreasing using meldable-heap on hackage to solve this problem, but it is pretty general. You could tweak the above to add additional constraints on the domain, etc.
The answer depends on what's expensive. The case that might be intersting is when f is expensive.
What you might want to do is look at pareto-optimality. Suppose you have two points
(1, 2) and (3, 4)
Then you know that the latter point is going to be a better solution, so long as f is a nondecreasing function. However, of course, if you have points,
(1, 2) and (2, 1)
then you can't know. So, one solution would be to establish a pareto-optimal frontier of points that the predicate g permits, and then evaluate these though f.

Resources