Simple loop with good performance in Haskell - algorithm

I am starting in Haskell and am interested in how to get matching performance for simple code I would normally write in C or Python. Consider the following problem.
You are given a long string of 1s and 0s of length n. We want to output, for each substring of length m, the number of 1s in that window. That is the output has n-m+1 different possible values between 0 and m inclusive.
In C this is very simple to do in time proportional to n and using extra space (on top of the space needed to store the input) proportional to m bits. You just count the number of 1s in the first window of length m and then maintain two pointers, one to the start of the window and one to the end and increment or decrement depending of whether one points to a 1 and the other points to a 0 or the opposite occurs.
Is it possible to get the same theoretical performance in a purely functional way in Haskell?
Some terrible code:
chunkBits m = helper
where helper [] = []
helper xs = sum (take m xs) : helper (drop m xs)
main = print $ chunkBits 5 [0,1,1,0,1,0,0,1,0,1,0,1,1,1,0,0,0,1]

C Code
Here is the C code you've described:
int sliding_window(const char * const str, const int n, const int m, int * result){
const char * back = str;
const char * front = str + m;
int sum = 0;
int i;
for(i = 0; i < m; ++i){
sum += str[i] == '1';
}
*result++ = sum;
for(; i < n; ++i){
sum += *front++ == '1';
sum -= *back++ == '1';
*result++ = sum;
}
return n - m + 1;
}
Algorithm
The code above is apparently O(n), since we have n iterations. But lets go a step back and have a look at the underlying algorithm:
Sum the first m elements. Keep this as sum. O(m)
Our first window has sum 1s. O(1)
Until we've exhausted our original string: O(n)
"Slide" the window. O(1)
add 1 to sum if we gain a '1' by sliding O(1)
subtract 1 from sum if we lose a '1' by sliding O(1)
Push sum onto the results. O(1)
Since n > m (otherwise there is no window), O(n) holds.
Moulding a Haskell variant
That's basically a left scan (scanl) with a way to get a list of those differences in (2.1.). So all we need is a way to somehow slide:
slide :: Int -> [Char] -> [Int]
slide m xs = zipWith f xs (drop m xs)
where
f '1' '0' = -1 -- we lose a one
f '0' '1' = 1 -- we gain a one
f _ _ = 0 -- nothing :/
That's O(n), where n is the length of our list.
slidingWindow :: Int -> [Char] -> [Int]
slidingWindow m xs = scanl (+) start (slide m xs)
where
start = length (filter (== '1') (take m xs))
That's O(n), same as in C, since both use the same algorithm.
Caveats
In a real life application, you would always use Text or ByteString instead of String, since the latter is a list of Char with much overhead. Since you only use a string of '1' and '0', you can use ByteString:
import Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as BS
import Data.List (scanl')
slide :: Int -> ByteString -> [Int]
slide m xs = BS.zipWith f xs (BS.drop m xs)
where
f '1' '0' = -1
f '0' '1' = 1
f _ _ = 0
slidingWindow :: Int -> ByteString -> [Int]
slidingWindow m xs = scanl' (+) start (slide m xs)
where
start = BS.count '1' (BS.take m xs)

Update
After reading the question more carefully I noticed that the
C program reads its input from an array.
So here is an equivalent Haskell "pure" function which performs the task.
import qualified Data.Vector as V
import Data.List
import Control.Monad
count :: Int -> V.Vector Int -> [Int]
count m v =
let c0 = V.sum (V.take m v)
n = V.length v
results = scanl' go c0 [0..n-m-1]
where go r i = r - (v V.! i) + (v V.! (i+m))
in results
test1 = let v = V.fromList [0,0,1,1,1,1,1,0,0,0,0]
in print $ count 3 v
Even though count returns a list it will be generated lazily. Moreover, if it is consume by another list operation it could be optimized via one of the various fusion techniques.
Original Answer
This is a good exercise, but why does it have to be "purely functional" (and what does that mean anyway)?
You can write the C algorithm in Haskell - it's not as terse, but it will
generate essentially the same code.
import Data.Vector.Unboxed.Mutable as V
count m = do
v <- V.replicate m '0'
let toInt ch = if ch == '1' then 1 else 0
let loop c i = do
ch <- getChar
oldch <- V.read v i
let c' = c + toInt ch - toInt oldch
V.write v i ch
let i' = mod (i+1) m
putStrLn $ show c
loop c' i'
loop 0 0
main = count 3
(For simplicity this generates n results.)
If you were benchmark this note that you are also including the performance of
getChar and putStrLn and show, so it might be difficult to make a fair
comparison with a C program. However, it has O(n) complexity and constant
memory usage which is what I think you're asking for.

The most basic level is re-implementing the cool HOF-based algorithms with hand-written recursive functions to express the loops.
Banged patterns mark arguments as strict, so simple values can be calculated without unnecessary delay (this is implicitly taken care of when using scanl', for example). This also shows that "pointers" are just names:
{-# LANGUAGE BangPatterns #-}
-- assumes xs has only 0s and 1s
counts :: Int -> [Int] -> [Int]
counts m xs = g 0 m xs
where
g !c 0 ys = h c ys xs
g !c _ [] = [] -- m > |xs|
g !c m (y:ys) = g (c+y) (m-1) ys
h !c [] _ = [c]
h !c (y:ys) (x:xs) = c : h (c+y-x) ys xs
Testing,
> counts [1,1,0,0,1,1,0,1] 2
[2,1,0,1,2,1,1]
> counts [1,1,0,0,1,1,1,1] 3
[2,1,1,2,3,3]

Related

Memoize multi-dimensional recursive solutions in haskell

I was solving a recursive problem in haskell, although I could get the solution I would like to cache outputs of sub problems since has over lapping sub-problem property.
The question is, given a grid of dimension n*m, and an integer k, how many ways are there to reach the gird (n, m) from (1, 1) with not more than k change of direction?
Here is the code without of memoization
paths :: Int -> Int -> Int -> Int -> Int -> Int -> Integer
paths i j n m k dir
| i > n || j > m || k < 0 = 0
| i == n && j == m = 1
| dir == 0 = paths (i+1) j n m k 1 + paths i (j+1) n m k 2 -- is in grid (1,1)
| dir == 1 = paths (i+1) j n m k 1 + paths i (j+1) n m (k-1) 2 -- down was the direction took to reach here
| dir == 2 = paths (i+1) j n m (k-1) 1 + paths i (j+1) n m k 2 -- right was the direction took to reach here
| otherwise = -1
Here the dependent variables are i, j, k, dir. In languages like C++/Java a 4-d DP array could have been used (dp[n][m][k][3], in Haskell I can't find a way to implement that.
"Tying the knot" is a well-known technique for getting the GHC runtime to memoize results for you, if you know ahead of time all the values you will ever need to look up. The idea is to turn your recursive function into a self-referential data structure, and then simply look up the value you actually care about. I chose to use Array for this, but a Map would work as well. In either case, the array or map you use must be lazy/non-strict, because we will be inserting values into it that we aren't ready to compute until the whole array is filled.
import Data.Array (array, bounds, inRange, (!))
paths :: Int -> Int -> Int -> Integer
paths m n k = go (1, 1, k, 0)
where go (i, j, k, dir)
| i == m && j == n = 1
| dir == 1 = get (i+1, j, k, 1) + get (i, j+1, k-1, 2) -- down was the direction took to reach here
| dir == 2 = get (i+1, j, k-1, 1) + get (i, j+1, k, 2) -- right was the direction took to reach here
| otherwise = get (i+1, j, k, 1) + get (i, j+1, k, 2) -- is in grid (1,1)
a = array ((1, 1, 0, 1), (m, n, k, 2))
[(c, go c) | c <- (,,,) <$> [1..m] <*> [1..n] <*> [0..k] <*> [1..2]]
get x | inRange (bounds a) x = a ! x
| otherwise = 0
I simplified your API a bit:
The m and n parameters don't change with each iteration, so they shouldn't be part of the recursive call
The client shouldn't have to tell you what i, j, and dir start as, so they've been removed from the function signature and implicitly start at 1, 1, and 0 respectively
I also swapped the order of m and n, because it's just weird to take an n parameter first. This caused me quite a bit of headache, because I didn't notice for a while that I also needed to change the base case!
Then, as I said earlier, the idea is to fill up the array with all the recursive calls we'll need to make: that's the array call. Notice the cells in array are initialized with a call to go, which (except for the base case!) involves calling get, which involves looking up an element in the array. In this way, a is self-referential or recursive. But we don't have to decide what order to look things up in, or what order to insert them in: we're sufficiently lazy that GHC evaluates the array elements as needed.
I've also been a bit cheeky by only making space in the array for dir=1 and dir=2, not dir=0. I get away with this because dir=0 only happens on the first call, and I can call go directly for that case, bypassing the bounds-checking in get. This trick does mean you'll get a runtime error if you pass an m or n less than 1, or a k less than zero. You could add a guard for that to paths itself, if you need to handle that case.
And of course, it does indeed work:
> paths 3 3 2
4
One other thing you could do would be to use a real data type for your direction, instead of an Int:
import Data.Array (Ix, array, bounds, inRange, (!))
import Prelude hiding (Right)
data Direction = Neutral | Down | Right deriving (Eq, Ord, Ix)
paths :: Int -> Int -> Int -> Integer
paths m n k = go (1, 1, k, Neutral)
where go (i, j, k, dir)
| i == m && j == n = 1
| otherwise = case dir of
Neutral -> get (i+1, j, k, Down) + get (i, j+1, k, Right)
Down -> get (i+1, j, k, Down) + get (i, j+1, k-1, Right)
Right -> get (i+1, j, k-1, Down) + get (i, j+1, k, Right)
a = array ((1, 1, 0, Down), (m, n, k, Right))
[(c, go c) | c <- (,,,) <$> [1..m] <*> [1..n] <*> [0..k] <*> [Down, Right]]
get x | inRange (bounds a) x = a ! x
| otherwise = 0
(I and J might be better names than Down and Right, I don't know if that's easier or harder to remember). I think this is probably an improvement, since the types have more meaning now, and you don't have this weird otherwise clause that handles things like dir=7 which ought to be illegal. But it is still a bit wonky because it relies on the ordering of the enum values: it would break if we put Neutral in between Down and Right. (I tried removing the Neutral direction entirely and adding more special-casing for the first step, but this gets ugly in its own way)
In Haskell these kinds of things aren't the most trivial ones, indeed. You would really like to have some in-place mutations going on to save up on memory and time, so I don't see any better way than equipping the frightening ST monad.
This could be done over various data structures, arrays, vectors, repa tensors. I chose HashTable from hashtables because it is the simplest to use and is performant enough to make sense in my example.
First of all, introduction:
{-# LANGUAGE Rank2Types #-}
module Solution where
import Control.Monad.ST
import Control.Monad
import Data.HashTable.ST.Basic as HT
Rank2Types are useful when dealing with ST, because of the phantom types. I picked the Basic variant of the hashtable, because authors claim it has the fastest lookups --- and we are going to lookup a lot.
It is advised to use a type alias for the map, so here we go:
type Mem s = HT.HashTable s (Int, Int, Int, Int) Integer
ST-free entrypoint just to create the map and call our monster:
runpaths :: Int -> Int -> Int -> Int -> Int -> Int -> Integer
runpaths i j n m k dir = runST $ do
mem <- HT.new
paths mem i j n m k dir
Here is memorized computation of paths. We just try to search for the result in the map, and if it is not there then we save it and return:
mempaths mem i j n m k dir = do
res <- HT.lookup mem (i, j, k, dir)
case res of
Just x -> return x
Nothing -> do
x <- paths mem i j n m k dir
HT.insert mem (i, j, k, dir) x
return x
And here goes the brain of the algorithm. It is just a monadic action that uses calls with memorization in place of plain recursion:
paths mem i j n m k dir
| i > n || j > m || k < 0 = return 0
| i == n && j == m = return 1
| dir == 0 = do
x1 <- mempaths mem (i+1) j n m k 1
x2 <- mempaths mem i (j+1) n m k 2 -- is in grid (1,1)
return $ x1 + x2
| dir == 1 = do
x1 <- mempaths mem (i+1) j n m k 1
x2 <- mempaths mem i (j+1) n m (k-1) 2 -- down was the direction took to reach here
return $ x1 + x2
| dir == 2 = do
x1 <- mempaths mem (i+1) j n m (k-1) 1
x2 <- mempaths mem i (j+1) n m k 2 -- right was the direction took to reach here
return $ x1 + x2
| otherwise = return (-1)

Floyd-Warshall Algorithm in Haskell

I'm studying Floyd-Warshall algorithm. Now having managed to implement it in Haskell, the way I implement it is similar to how it is implemented in imperative languages (that is to say, use list of lists to simulate 2D arrays), but this is really inefficient giving that accessing an element in a list is much more slower than in a array.
Is there a smarter way to do this in Haskell? I thought I could do this by concate some lists but keep failing.
My Code:
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall lst = fwAlg 1 $ initMatrix 0 $ list2matrix lst
fwAlg :: Int -> [[Weight]] -> [[Weight]]
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
-- a special case where k is 0
initMatrix :: Int -> [[Weight]] -> [[Weight]]
initMatrix n m = if n == rows m then m else initMatrix (n+1) $ updateAtM 0.0 (n,n) m
updateDist :: Int -> Int -> Int -> [[Weight]] -> [[Weight]]
updateDist i j k m =
let w = min (weight i j m) (weight i k m + weight k j m)
in updateAtM w (i, j) m
weight :: Vertice -> Vertice -> [[Weight]] -> Weight
weight i j m = let Just w = elemAt (i, j) m in w
The algorithm has a regular access pattern so we can avoid a lot of
indexing and still write it with lists, with (I think) the same
asymptotic performance as the imperative version.
If you do want to use arrays for more speed, you might still want to do
something similar to this with bulk operations on rows and columns
rather than reading and writing individual cells.
-- Let's have a type for weights. We could use Maybe but the ordering
-- behaviour is wrong - when there's no weight it should be like
-- +infinity.
data Weight = Weight Int | None deriving (Eq, Ord, Show)
addWeights :: Weight -> Weight -> Weight
addWeights (Weight x) (Weight y) = Weight (x + y)
addWeights _ _ = None
-- the main function just steps the matrix a number of times equal to
-- the node count. Also pass along k at each step.
floydwarshall :: [[Weight]] -> [[Weight]]
floydwarshall m = snd (iterate step (0, m) !! length m)
-- step takes k and the matrix for k, returns k+1 and the matrix for
-- k+1.
step :: (Int, [[Weight]]) -> (Int, [[Weight]])
step (k, m) = (k + 1, zipWith (stepRow ktojs) istok m)
where
ktojs = m !! k -- current k to each j
istok = transpose m !! k -- each i to current k
-- Make shortest paths from one i to all j.
-- We need the shortest paths from the current k to all j
-- and the shortest path from this i to the current k
-- and the shortest paths from this i to all j
stepRow :: [Weight] -> Weight -> [Weight] -> [Weight]
stepRow ktojs itok itojs = zipWith stepOne itojs ktojs
where
stepOne itoj ktoj = itoj `min` (itok `addWeights` ktoj)
-- example from wikipedia for testing
test :: [[Weight]]
test = [[Weight 0, None, Weight (-2), None],
[Weight 4, Weight 0, Weight 3, None],
[None, None, Weight 0, Weight 2],
[None, Weight (-1), None, Weight 0]]
I don't know how to achieve peak performance, but I can give you some tips on making your code abstract so that you can work on performance tuning more easily.
First of all, it would be nice if, when you change around your data types, you don't have to rewrite everything. Right now, you've made everything concretely about lists of lists, so let's see if we can abstract that out. First, we have to figure out what your minimal matrix interface is. Glancing at your code, you appear to have initMatrix, list2matrix, rows, elemAt, and updateAtM. These are the functions that query or modify your matrix, and these are what you would need to implement to make a new version of this code for a different Matrix type.
One way to organize this interface is to make a class out of it. For instance:
class Matrix m where
list2matrix :: [[a]] -> m a
matrix2List :: m a -> [[a]]
rows :: m a -> Int
elemAt :: Int -> Int -> m a -> a
updateAtM :: a -> (Int, Int) -> m a -> m a
setDiag :: a -> m a -> m a
(I went ahead and added a matrix2List function for extracting your result and renamed/modified initMatrix into setDiag, which felt a little more general.)
We can then update your code to use this new class:
floydwarshall :: Matrix m => [[Weight]] -> m Weight
floydwarshall lst = fwAlg 1 $ initMatrix $ list2matrix lst
fwAlg :: Matrix m => Int -> m Weight -> m Weight
fwAlg k m | k < rows m = let n = rows m
m' = foldl (\m (i,j) -> updateDist i j k m) m [(i,j) | i <- [0..n-1], j <- [0..n-1]]
in fwAlg (k+1) m'
| otherwise = m
initMatrix :: Matrix m => m Weight -> m Weight
initMatrix = setDiag 0
updateDist :: Matrix m => Int -> Int -> Int -> m Weight -> m Weight
updateDist i j k m =
let w = min (elemAt i j m) (elemAt i k m + elemAt k j m)
in updateAtM w (i, j) m
dist :: Matrix m => Int -> Int -> Int -> m Weight -> Weight
dist i j 0 m = elemAt i j m
dist i j k m = min (dist i j (k-1) m) (dist i k (k-1) m + dist k j (k-1) m)
Now all we need to do is start defining some Matrix types and see how performance is!
Let's start with lists, since you've already done this work. We'll have to use a newtype wrapper to make GHC happy, but ignoring the wrapping and unwrapping, this is morally the same as the code you wrote:
newtype ListMatrix a = ListMatrix { getListMatrix :: [[a]] }
instance Matrix ListMatrix where
list2matrix = ListMatrix
matrix2List = getListMatrix
rows = length . getListMatrix
elemAt i j (ListMatrix m) = m !! i !! j
updateAtM a (i,j) (ListMatrix m) =
let (firstRows, row:laterRows) = splitAt i m
(firstCols, _:laterCols) = splitAt j row
in ListMatrix $ firstRows <> ((firstCols <> (a:laterCols)):laterRows)
setDiag x = go 0
where go n m = if n == rows m then m else go (n+1) $ updateAtM x (n,n) m
(Also, I filled in elemAt and updateAtM.) You should be able to run
matrix2List #ListMatrix $ floydwarshall myList
and get the same result (and performance) that you currently have.
Now, on to the experimentation! All that's necessary is for us to define new instances of Matrix and see what happens. Perhaps we should try pure functions:
data FunMatrix a = FunMatrix { size :: Int, getFunMatrix :: Int -> Int -> a }
instance Matrix FunMatrix where
list2matrix l = FunMatrix (length l) (\i j -> l !! i !! j)
matrix2List (FunMatrix s f) = (\i -> f i <$> [0..s-1]) <$> [0..s-1]
rows = size
elemAt i j m = getFunMatrix m i j
updateAtM a (i,j) (FunMatrix s f) = FunMatrix s (\i' j' -> if i==i' && j==j' then a else f i' j')
setDiag x (FunMatrix s f) = FunMatrix s (\i j -> if i==j then x else f i j)
How does that perform? One problem is that the starting lookup function is still just indexing into the list of lists, which is slow. One fix would be to convert to an array or vector first and then index. Because we've nicely abstracted everything, all that would need to change is the definition of list2matrix right here, and you'll probably get a nice performance boost!
On the topic of performance, there's one other note I can point out. The definition of dist does some serious "dynamic programming". This could work fine if you were writing and reading directly into an array, but in this recursive form, you may end up doing a lot of duplicate work. One fix is to memoize. My goto memoization package is MemoTrie, which makes it really easy to memoize things. In this case, you could change dist to:
dist :: Matrix m => m Weight -> Int -> Int -> Int -> Weight
dist m = go'
where
go' = memo3 go
go i j 0 = elemAt i j m
go i j k = min (go' i j (k-1)) (go' i k (k-1) + go' k j (k-1))
That might give you a bit of a boost!
You might consider taking #Chi's advice and use STUArray, but you'll run into a problem: the STUArray interface demands that array lookups are in a monad. It's still possible to use the abstraction method I show off above, but you'll have to change the types of the functions. And, because you change the types in the interface, you'll need to update your algorithm code to be monadic. It can be a bit of a pain, but it might be necessary to get optimal performance.

Which one of the following is better?

So I have two implementations of the function tabulate, which, given a function f :: Int -> a and a number n, should produce the list [f 0, f 1, ..., f (n-1)]. I'm trying to guess which one is better in terms of work and span.
tabulate1 :: (Int -> a) -> Int -> [a]
tabulate1 f n = tab (\x -> f (n - x)) n where
tab _ 0 = []
tab g n = let (x,xs) = (g n) ||| (tab g (n-1))
in (x:xs)
tabulate2 :: (Int -> a) -> Int -> [a]
tabulate2 f n = tab f 0 (n-1) where
tab f n m
| n > m = []
| n == m = [f n]
| otherwise = let i = (n + m) `div` 2
(l, r) = (tab f n i) ||| (tab f i+1 m)
in (l ++ r)
While the first one avoids the using of (++), which has linear work and span, the second one computes the two sublists in parallel but uses (++).
So... which one is better?
Time and space complexity in Haskell is often non-trivial as it is a lazy language. This means that while a function might be O(n!), its result might never be needed and therefore never evaluated. Or like in this case, if your function returns a list, and only the first 3 elements are needed by other functions, only those are evaluated.
Anyways, your functions is just a particular case of map, and as such it could be coded in a much more readable way:
tabulate f n = map f [0..n]
Map is implemented with a fold, and is probably the most optimised version you could get

First appearance in Stern's Diatomic Sequence

You get an integer n and you need to find the index of its first appearance in Stern's Diatomic Sequence.
The sequence is defined like this:
a[0] = 0
a[1] = 1
a[2*i] = a[i]
a[2*i+1] = a[i] + a[i+1]
See MathWorld.
Because n can be up to 400000, it's not a good idea to brute-force it, especially since the time limit is 4000 ms.
The sequence is pretty odd: first occurrence of 8 is 21, but first occurrence of 6 is 33.
Any ideas how to solve this?
Maybe this might help: OEIS
We can easily solve for the first occurrence of a number in the range of 400000 in under four seconds:
Prelude Diatomic> firstDiatomic 400000
363490989
(0.03 secs, 26265328 bytes)
Prelude Diatomic> map firstDiatomic [400000 .. 400100]
[363490989,323659475,580472163,362981813,349334091,355685483,346478235,355707595
,291165867,346344083,347155797,316314293,576398643,315265835,313171245,355183267
,315444051,315970205,575509833,311741035,340569429,313223987,565355925,296441165
,361911645,312104147,557145429,317106853,323637939,324425077,610613547,311579309
,316037811,311744107,342436533,348992869,313382235,325406123,355818699,312128723
,347230875,324752171,313178421,312841811,313215645,321754459,576114987,325793195
,313148763,558545581,355294101,359224397,345462093,307583675,355677549,312120731
,341404245,316298389,581506779,345401947,312109779,316315061,315987123,313447771
,361540179,313878107,304788843,325765547,316036275,313731751,355635795,312035947
,346756533,313873883,349358379,357393763,559244877,313317739,325364139,312128107
,580201947,358182323,314944173,357403987,584291115,312158827,347448723,363246413
,315935571,349386085,315929427,312137323,357247725,313207657,320121429,356954923
,557139285,296392013,576042123,311726765,296408397]
(2.45 secs, 3201358192 bytes)
The key to it is the Calkin-Wilf tree.
Starting from the fraction 1/1, it is built by the rule that for a node with the fraction a/b, its left child carries the fraction a/(a+b), and its right child the fraction (a+b)/b.
1/1
/ \
/ \
/ \
1/2 2/1
/ \ / \
1/3 3/2 2/3 3/1
etc. The diatomic sequence (starting at index 1) is the sequence of numerators of the fractions in the Calkin-Wilf tree, when that is traversed level by level, each level from left to right.
If we look at the tree of indices
1
/ \
/ \
/ \
2 3
/ \ / \
4 5 6 7
/ \
8 9 ...
we can easily verify that the node at index k in the Calkin-Wilf tree carries the fraction a[k]/a[k+1] by induction.
That is obviously true for k = 1 (a[1] = a[2] = 1), and from then on,
for k = 2*j we have the left child of the node with index j, so the fraction is a[j]/(a[j]+a[j+1]) and a[k] = a[j] and a[k+1] = a[j] + a[j+1] are the defining equations of the sequence.
for k = 2*j+1 we have the right child of the node with index j, so the fraction is (a[j]+a[j+1])/a[j+1] and that is a[k]/a[k+1] again by the defining equations.
All positive reduced fractions occur exactly once in the Calkin-Wilf tree (left as an exercise for the reader), hence all positive integers occur in the diatomic sequence.
We can find the node in the Calkin-Wilf tree from the index by following the binary representation of the index, from the most significant bit to the least, for a 1-bit we go to the right child and for a 0-bit to the left. (For that, it is nice to augment the Calkin-Wilf tree with a node 0/1 whose right child is the 1/1 node, so that we need have a step for the most significant set bit of the index.)
Now, that doesn't yet help very much to solve the problem at hand.
But, let us first solve a related problem: For a reduced fraction p/q, determine its index.
Suppose that p > q. Then we know that p/q is a right child, and its parent is (p-q)/q. If also p-q > q, we have again a right child, whose parent is (p - 2*q)/q. Continuing, if
p = a*q + b, 1 <= b < q
then we reach the p/q node from the b/q node by going to the right child a times.
Now we need to find a node whose numerator is smaller than its denominator. That is of course the left child of its parent. The parent of b/q is b/(q-b) then. If
q = c*b + d, 1 <= d < b
we have to go to the left child c times from the node b/d to reach b/q.
And so on.
We can find the way from the root (1/1) to the p/q node using the continued fraction (I consider only simple continued fractions here) expansion of p/q. Let p > q and
p/q = [a_0, a_1, ..., a_r,1]
the continued fraction expansion of p/q ending in 1.
If r is even, then go to the right child a_r times, then to the left a_(r-1) times, then to the right child ... then a_1 times to the left child, and finally a_0 times to the right.
If r is odd, then first go to the left child a_r times, then a_(r-1) times to the right ... then a_1 times to the left child, and finally a_0 times to the right.
For p < q, we must end going to the left, hence start going to the left for even r and start going to the right for odd r.
We have thus found a close connection between the binary representation of the index and the continued fraction expansion of the fraction carried by the node via the path from the root to the node.
Let the run-length-encoding of the index k be
[c_1, c_2, ..., c_j] (all c_i > 0)
i.e. the binary representation of k starts with c_1 ones, followed by c_2 zeros, then c_3 ones etc., and ending with c_j
ones, if k is odd - hence j is also odd;
zeros, if k is even - hence j is also even.
Then [c_j, c_(j-1), ..., c_2, c_1] is the continued fraction expansion of a[k]/a[k+1] whose length has the same parity as k (every rational has exactly two continued fraction expansions, one with odd length, the other with even length).
The RLE gives the path from the 0/1 node above 1/1 to a[k]/a[k+1]. The length of the path is
the number of bits necessary to represent k, and
the sum of the partial quotients in the continued fraction expansion.
Now, to find the index of the first occurrence of n > 0 in the diatomic sequence, we first observe that the smallest index must necessarily be odd, since a[k] = a[k/2] for even k. Let the smallest index be k = 2*j+1. Then
the length of the RLE of k is odd,
the fraction at the node with index k is a[2*j+1]/a[2*j+2] = (a[j] + a[j+1])/a[j+1], hence it is a right child.
So the smallest index k with a[k] = n corresponds to the left-most ending of all the shortest paths to a node with numerator n.
The shortest paths correspond to the continued fraction expansions of n/m, where 0 < m <= n is coprime to n [the fraction must be reduced] with the smallest sum of the partial quotients.
What kind of length do we need to expect? Given a continued fraction p/q = [a_0, a_1, ..., a_r] with a_0 > 0 and sum
s = a_0 + ... + a_r
the numerator p is bounded by F(s+1) and the denominator q by F(s), where F(j) is the j-th Fibonacci number. The bounds are sharp, for a_0 = a_1 = ... = a_r = 1 the fraction is F(s+1)/F(s).
So if F(t) < n <= F(t+1), the sum of the partial quotients of the continued fraction expansion (either of the two) is >= t. Often there is an m such that the sum of the partial quotients of the continued fraction expansion of n/m is exactly t, but not always:
F(5) = 5 < 6 <= F(6) = 8
and the continued fraction expansions of the two reduced fractions 6/m with 0 < m <= 6 are
6/1 = [6] (alternatively [5,1])
6/5 = [1,4,1] (alternatively [1,5])
with sum of the partial quotients 6. However, the smallest possible sum of partial quotients is never much larger (the largest I'm aware of is t+2).
The continued fraction expansions of n/m and n/(n-m) are closely related. Let's assume that m < n/2, and let
n/m = [a_0, a_1, ..., a_r]
Then a_0 >= 2,
(n-m)/m = [a_0 - 1, a_1, ..., a_r]
and since
n/(n-m) = 1 + m/(n-m) = 1 + 1/((n-m)/m)
the continued fraction expansion of n/(n-m) is
n/(n-m) = [1, a_0 - 1, a_1, ..., a_r]
In particular, the sum of the partial quotients is the same for both.
Unfortunately, I'm not aware of a way to find the m with the smallest sum of partial quotients without brute force, so the algorithm is (I assume n > 2
for 0 < m < n/2 coprime to n, find the continued fraction expansion of n/m, collecting the ones with the smallest sum of the partial quotients (the usual algorithm produces expansions whose last partial quotient is > 1, we assume that).
Adjust the found continued fraction expansions [those are not large in number] it the following way:
if the CF [a_0, a_1, ..., a_r] has even length, convert it to [a_0, a_1, ..., a_(r-1), a_r - 1, 1]
otherwise, use [1, a_0 - 1, a_1, ..., a_(r-1), a_r - 1, 1]
(that chooses the one between n/m and n/(n-m) leading to the smaller index)
reverse the continued fractions to obtain the run-length-encodings of the corresponding indices
choose the smallest among them.
In step 1, it is useful to use the smallest sum found so far to short-cut.
Code (Haskell, since that's easiest):
module Diatomic (diatomic, firstDiatomic, fuscs) where
import Data.List
strip :: Int -> Int -> Int
strip p = go
where
go n = case n `quotRem` p of
(q,r) | r == 0 -> go q
| otherwise -> n
primeFactors :: Int -> [Int]
primeFactors n
| n < 1 = error "primeFactors: non-positive argument"
| n == 1 = []
| n `rem` 2 == 0 = 2 : go (strip 2 (n `quot` 2)) 3
| otherwise = go n 3
where
go 1 _ = []
go m p
| m < p*p = [m]
| r == 0 = p : go (strip p q) (p+2)
| otherwise = go m (p+2)
where
(q,r) = m `quotRem` p
contFracLim :: Int -> Int -> Int -> Maybe [Int]
contFracLim = go []
where
go acc lim n d = case n `quotRem` d of
(a,b) | lim < a -> Nothing
| b == 0 -> Just (a:acc)
| otherwise -> go (a:acc) (lim - a) d b
fixUpCF :: [Int] -> [Int]
fixUpCF [a]
| a < 3 = [a]
| otherwise = [1,a-2,1]
fixUpCF xs
| even (length xs) = case xs of
(1:_) -> fixEnd xs
(a:bs) -> 1 : (a-1) : bs
| otherwise = case xs of
(1:_) -> xs
(a:bs) -> 1 : fixEnd ((a-1):bs)
fixEnd :: [Int] -> [Int]
fixEnd [a,1] = [a+1]
fixEnd [a] = [a-1,1]
fixEnd (a:bs) = a : fixEnd bs
fixEnd _ = error "Shouldn't have called fixEnd with an empty list"
cfCompare :: [Int] -> [Int] -> Ordering
cfCompare (a:bs) (c:ds) = case compare a c of
EQ -> cfCompare ds bs
cp -> cp
fibs :: [Integer]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
toNumber :: [Int] -> Integer
toNumber = foldl' ((+) . (*2)) 0 . concat . (flip (zipWith replicate) $ cycle [1,0])
fuscs :: Integer -> (Integer, Integer)
fuscs 0 = (0,1)
fuscs 1 = (1,1)
fuscs n = case n `quotRem` 2 of
(q,r) -> let (a,b) = fuscs q
in if r == 0
then (a,a+b)
else (a+b,b)
diatomic :: Integer -> Integer
diatomic = fst . fuscs
firstDiatomic :: Int -> Integer
firstDiatomic n
| n < 0 = error "Diatomic sequence has no negative terms"
| n < 2 = fromIntegral n
| n == 2 = 3
| otherwise = toNumber $ bestCF n
bestCF :: Int -> [Int]
bestCF n = check [] estimate start
where
pfs = primeFactors n
(step,ops) = case pfs of
(2:xs) -> (2,xs)
_ -> (1,pfs)
start0 = (n-1) `quot` 2
start | even n && even start0 = start0 - 1
| otherwise = start0
eligible k = all ((/= 0) . (k `rem`)) ops
estimate = length (takeWhile (<= fromIntegral n) fibs) + 2
check candidates lim k
| k < 1 || n `quot` k >= lim = if null candidates
then check [] (2*lim) start
else minimumBy cfCompare candidates
| eligible k = case contFracLim lim n k of
Nothing -> check candidates lim (k-step)
Just cf -> let s = sum cf
in if s < lim
then check [fixUpCF cf] s (k - step)
else check (fixUpCF cf : candidates) lim (k-step)
| otherwise = check candidates lim (k-step)
I would recommend you read this letter from Dijkstra which explains an alternative way of computing this function via:
n, a, b := N, 1, 0;
do n ≠ 0 and even(n) → a, n:= a + b, n/2
odd(n) → b, n:= b + a, (n-1)/2
od {b = fusc(N)}
This starts with a,b=1,0 and effectively uses successive bits of N (from least to most significant) to increase a and b, the final result being the value of b.
The index of the first appearance of a particular value for b can therefore be computed via finding the smallest n for which this iteration will result in that value of b.
One method for finding this smallest n is to use A* search where the cost is the value of n. The efficiency of the algorithm will be determined by your choice of heuristic.
For the heuristic, I would recommend noting that:
the final value will always be a multiple of the gcd(a,b) (this can be used to rule out some nodes that can never produce the target)
b always increases
there is a maximum (exponential) rate at which b can increase (the rate depends on the current value of a)
EDIT
Here is some example Python code to illustrate the A* approach.
from heapq import *
def gcd(a,b):
while a:
a,b=b%a,a
return b
def heuristic(node,goal):
"""Estimate least n required to make b==goal"""
n,a,b,k = node
if b==goal: return n
# Otherwise needs to have at least one more bit set
# Improve this heuristic to make the algorithm faster
return n+(1<<k)
def diatomic(goal):
"""Return index of first appearance of n in Stern's Diatomic sequence"""
start=0,1,0,0
f_score=[] # This is used as a heap
heappush(f_score, (0,start) )
while 1:
s,node = heappop(f_score)
n,a,b,k = node
if b==goal:
return n
for node in [ (n,a+b,b,k+1),(n+(1<<k),a,b+a,k+1) ]:
n2,a2,b2,k2 = node
if b2<=goal and (goal%gcd(a2,b2))==0:
heappush(f_score,(heuristic(node,goal),node))
print [diatomic(n) for n in xrange(1,10)]

Fast obtention of all the subsets of size N in Haskell

The following (unoptimal) code generates all the subsets of size N for certain subset.
This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set
Can anybody suggest how to optimize the code?
import qualified Data.Set as Set
subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
| Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
| otherwise = doSubsetsOfSizeN n s
where doSubsetsOfSizeN n s
| n == 0 = Set.singleton Set.empty
| Set.size s == n = Set.singleton s
| otherwise =
case Set.minView s of
Nothing -> Set.empty
Just (firstS, restS) ->
let partialN n = doSubsetsOfSizeN n restS in
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
This is inspired by Pascal's triangle.
choose :: [b] -> Int -> [[b]]
_ `choose` 0 = [[]]
[] `choose` _ = []
(x:xs) `choose` k = (x:) `fmap` (xs `choose` (k-1)) ++ xs `choose` k
This code works but, as I said, is highly unoptimal.
Doesn't seem so terribly bad to me. The number of subsets of size k of a set of size n is n `choose` k which grows rather fast for k ~ n/2. So creating all the subsets must scale badly.
Using an intermediate list to avoid the O(log(n)) of Set.insert doesn't seem help due to the large cost of later reconverting the list to a Set.
Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.
But first, there is an inefficiency in your code that is simple to repair:
Set.map (Set.insert firstS) (partialN (n-1))
Note that Set.map must rebuild a tree from scratch. But we know that firstS is always smaller than any element in any of the sets in partialN (n-1), so we can use Set.mapMonotonic that can reuse the spine of the set.
And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of Set.fromList we can use the more efficient Set.fromDistinctAscList. Transcribing the algorithm yields
onlyLists :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
onlyLists n s
| n == 0 = Set.singleton Set.empty
| Set.size s < n || n < 0 = error "onlyLists: out of range n"
| Set.size s == n = Set.singleton s
| otherwise = Set.fromDistinctAscList . map Set.fromDistinctAscList $
go n (Set.size s) (Set.toList s)
where
go 1 _ xs = map return xs
go k l (x:xs)
| k == l = [x:xs]
| otherwise = map (x:) (go (k-1) (l-1) xs) ++ go k (l-1) xs
which in the few benchmarks I've run is between 1.5 and 2× faster than the amended algorithm using Sets.
And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420's.
subsets :: Int -> [a] -> [[a]]
subsets 0 _ = [[]]
subsets _ [] = []
subsets k (x:xs) = map (x:) (subsets (k - 1) xs) ++ subsets k xs
First, use a better algorithm.
Look at your final line:
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
Evaluating doSubsetsOfSizeN k (Set.fromList $ 1:2:xs) will involve evaluating doSubsetsOfSizeN (k-1) (Set.fromList xs) twice (once when inserting 1, and once when inserting 2). This duplication is wasteful.
Enter a better algorithm.
mine :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
mine n s | Set.size s < n || n < 0 = Set.empty
| otherwise = Set.foldr cons nil s !! n
where
nil :: Ord a => [Set.Set (Set.Set a)]
nil = Set.singleton Set.empty : repeat Set.empty
cons :: Ord a => a -> [Set.Set (Set.Set a)] -> [Set.Set (Set.Set a)]
cons x sets = zipWith Set.union sets
(Set.empty : map (Set.map $ Set.insert x) sets)
mine 9 (Data.Set.fromList [0..18]) `seq` () is faster than subsetsOfSizeN 9 (Data.Set.fromList [0..18]) `seq` () and should have better asymptotic performance.
I haven't tried optimising this any further. There may be a better algorithm still.
(If the cost of insert and fromList are issues, you should consider giving back a list of lists instead of a set of sets.)
I found this, may be it can help you
f [] = [[1]]
f l = (:) [u] l'
where
u = succ (head (head l))
l' = (++) l (map(\x->(:) u x) l)
fix f n = if (n==0) then [] else f (fix f (n-1))
To test it
$ length $ (fix f 10) => 1023 -- The empty set is always include then == 1024

Resources