Can I prove (s : Stream a) -> (head s :: tail s = s) in Idris? - lazy-evaluation

The following Idris proof doesn’t typecheck.
hts : (s : Stream a) -> (head s :: tail s = s)
hts (x::xs) = Refl
The error I get is:
Type mismatch between
x :: Delay xs = x :: Delay xs
and
x :: Delay (tail (x :: Delay xs)) = x :: Delay xs
A similar proof for non-empty Vects typechecks just fine:
import Data.Vect
htv : (s : Vect (S k) a) -> (head s :: tail s = s)
htv (x::xs) = Refl
So I’m guessing the issue is in the laziness of Stream.
My working theory is that Idris doesn’t like simplifying whatever is inside of Delay, because it might get into an infinite loop that way. However, I want to coerce Idris to dip its toes in anyway, as the definition of Prelude.Stream.tail guarantees that the LHS would then reduce to x :: Delay xs, completing my proof.
Is my suspicion correct? And can I fix the proof somehow?

Yes, it can be done. I used an auxiliary congruence lemma:
%default total
consCong : {xs, ys : Stream a} -> (x : a) -> xs = ys -> x :: xs = x :: ys
consCong _ Refl = Refl
to prove the main lemma:
hts : (s : Stream a) -> (head s :: tail s = s)
hts (x :: xs) = consCong _ $ Refl

Related

Is it possible to make `foldrRanges` as fast as `foldrRange2D`?

This:
foldrRange :: (Int -> t -> t) -> t -> Int -> Int -> t
foldrRange cons nil a b = foldr cons nil [a..b-1]
Defines a function that folds over a list from a til b. This:
foldrRange :: (Int -> t -> t) -> t -> Int -> Int -> t
foldrRange cons nil a b = go (b-1) nil where
go b !r | b < a = r
| otherwise = go (b-1) (cons b r)
{-# INLINE foldrRange #-}
is a ~50x faster version due to proper strictness usage (we know the last element, so we can roll like foldl').
This:
foldrRange2D cons nil (ax,ay) (bx,by)
= foldr cons nil
$ do
y <- [ay..by-1]
x <- [ax..bx-1]
return (x,y)
Is a 2D version of foldrRange, i.e., it works over 2D rectangles so that foldrRange2d (:) [] (0,0) (2,2) == [(0,0),(1,0),(0,1),(1,1)]. This:
foldrRange2D :: ((Int,Int) -> t -> t) -> t -> (Int,Int) -> (Int,Int) -> t
foldrRange2D cons nil (ax,ay) (bx,by) = go (by-1) nil where
go by !r
| by < ay = r
| otherwise = go (by-1) (foldrRange (\ ax -> cons (ax,by)) r ax bx)
Is, again, an ~50x faster definition due to better strictness usage. Writing foldrRange3D, foldrRange4D, etc., would be cumbersome, so one can generalize it like so:
foldrRangeND :: forall t . ([Int] -> t -> t) -> t -> [Int] -> [Int] -> t
foldrRangeND cons nil as bs = foldr co ni (zip as bs) [] nil where
co (a,b) tail lis = foldrRange (\ h t -> tail (h:lis) . t) id a b
ni lis = cons lis
Unfortunately this definition is at around 120 times slower than foldrRange2D, as one can verify with this test:
main = do
let n = 2000
print $ foldrRange2D (\ (a,b) c -> a+b+c) 0 (0,0) (n,n)
print $ foldrRanges (\ [a,b] c -> a+b+c) 0 [0,0] [n,n]
I could probably use ST to get a faster foldrRanges, but is it possible to do so with recursion alone?
You have an efficient implementation of your algorithm which is inductive on the dimension of the input. Fortunately, you can do that in Haskell!
First, replace lists with type level Nat indexed vectors. This gives us a type to do induction on (it could probably be done with lists ... but this is much safer).
data Nat = Z | S Nat
infixl 5 :<
data Vec (n :: Nat) (a :: *) where
Nil :: Vec Z a
(:<) :: Vec n a -> a -> Vec (S n) a
instance Functor (Vec n) where
fmap _ Nil = Nil
fmap f (xs :< x) = fmap f xs :< f x
Then your desired function is just the same as the 2D case - just generalize the recursive call:
{-# INLINE foldrRangeN #-}
foldrRangeN :: (Vec n Int -> t -> t) -> t -> Vec n Int -> Vec n Int -> t
foldrRangeN f x Nil Nil = f Nil x
foldrRangeN cons nil (ax :< ay) (bx :< by) = go (by-1) nil where
go by !r
| by < ay = r
| otherwise = go (by-1) (foldrRangeN (\ ax -> cons (ax :< by)) r ax bx)
Although when I tested the performance, I was disappointed to see it couldn't keep up with the 2D version. The trick seems to be more inlining. By putting the function in a class, you can get it to inline at each 'dimension' (there must be a better way to do this...)
class FoldrRange n where
foldrRangeN' :: (Vec n Int -> t -> t) -> t -> Vec n Int -> Vec n Int -> t
instance FoldrRange Z where
{-# INLINE foldrRangeN' #-}
foldrRangeN' f x Nil Nil = f Nil x
instance FoldrRange n => FoldrRange (S n) where
{-# INLINE foldrRangeN' #-}
foldrRangeN' cons nil (ax :< ay) (bx :< by) = go (by-1) nil where
go by !r
| by < ay = r
| otherwise = go (by-1) (foldrRangeN' (\ ax -> cons (ax :< by)) r ax bx)
Tested as follows:
main = do
i:n':_ <- getArgs
let n = read n' :: Int
rs = [ foldrRange2D (\ (a,b) c -> a+b+c) 0 (0,0) (n,n)
, foldrRangeND (\ [a,b] c -> a+b+c) 0 [0,0] [n,n]
, foldrRangeN (\ (Nil :< a :< b) c -> a+b+c) 0 (Nil :< 0 :< 0) (Nil :< n :< n)
, foldrRangeN' (\ (Nil :< a :< b) c -> a+b+c) 0 (Nil :< 0 :< 0) (Nil :< n :< n)
]
print $ rs !! read i
and the results on my system
./test 0 4000 +RTS -s : 0.02s
./test 1 4000 +RTS -s : 7.63s
./test 2 4000 +RTS -s : 0.59s
./test 3 4000 +RTS -s : 0.03s

Generic algorithm to enumerate sum and product types on Haskell?

Some time ago, I've asked how to map back and forth from godel numbers to terms of a context-free language. While the answer solved the issue specificaly, I'm having trouble in actually programming it generically. So, this question is more generic: given a recursive algebraic data type with terminals, sums and products - such as
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
what is an algorithm that will map a term of this type to its godel number, and its inverse?
Edit: for example:
data Foo = A | B Foo | C Foo deriving Show
to :: Foo -> Int
to A = 1
to (B x) = to x * 2
to (C x) = to x * 2 + 1
from :: Int -> Foo
from 1 = A
from n = case mod n 2 of
0 -> B (from (div n 2))
1 -> C (from (div n 2))
Here, to and from do what I want for Foo. I'm just asking for a systematic way to derive those functions for any datatype.
In order to avoid dealing with a particular Goedel numbering, let's define a class that'll abstract the necessary operations (with some imports we'll need later):
{-# LANGUAGE TypeOperators, DefaultSignatures, FlexibleContexts, DeriveGeneric #-}
import Control.Applicative
import GHC.Generics
import Test.QuickCheck
import Test.QuickCheck.Gen
class GodelNum a where
fromInt :: Integer -> a
toInt :: a -> Maybe Integer
encode :: [a] -> a
decode :: a -> [a]
So we can inject natural numbers and encode sequences. Let's further create a canonical instance of this class that'll use throughout the code, which does no real Goedel encoding, just constructs a tree of terms.
data TermNum = Value Integer | Complex [TermNum]
deriving (Show)
instance GodelNum TermNum where
fromInt = Value
toInt (Value x) = Just x
toInt _ = Nothing
encode = Complex
decode (Complex xs) = xs
decode _ = []
For real encoding we'd use another implementation that'd use just one Integer, something like newtype SomeGoedelNumbering = SGN Integer.
Let's further create a class for types that we can encode/decode:
class GNum a where
gto :: (GodelNum g) => a -> g
gfrom :: (GodelNum g) => g -> Maybe a
default gto :: (Generic a, GodelNum g, GGNum (Rep a)) => a -> g
gto = ggto . from
default gfrom :: (Generic a, GodelNum g, GGNum (Rep a)) => g -> Maybe a
gfrom = liftA to . ggfrom
The last four lines define a generic implementation of gto and gfrom using GHC Generics and DefaultSignatures. The class GGNum that they use is a helper class which we'll use to define encoding for the atomic ADT operations - products, sums, etc.:
class GGNum f where
ggto :: (GodelNum g) => f a -> g
ggfrom :: (GodelNum g) => g -> Maybe (f a)
-- no-arg constructors
instance GGNum U1 where
ggto U1 = encode []
ggfrom _ = Just U1
-- products
instance (GGNum a, GGNum b) => GGNum (a :*: b) where
ggto (a :*: b) = encode [ggto a, ggto b]
ggfrom e | [x, y] <- decode e = liftA2 (:*:) (ggfrom x) (ggfrom y)
| otherwise = Nothing
-- sums
instance (GGNum a, GGNum b) => GGNum (a :+: b) where
ggto (L1 x) = encode [fromInt 0, ggto x]
ggto (R1 y) = encode [fromInt 1, ggto y]
ggfrom e | [n, x] <- decode e = case toInt n of
Just 0 -> L1 <$> ggfrom x
Just 1 -> R1 <$> ggfrom x
_ -> Nothing
-- metadata
instance (GGNum a) => GGNum (M1 i c a) where
ggto (M1 x) = ggto x
ggfrom e = M1 <$> ggfrom e
-- constants and recursion of kind *
instance (GNum a) => GGNum (K1 i a) where
ggto (K1 x) = gto x
ggfrom e = K1 <$> gfrom e
Having that, we can then define a data type like yours and just declare its GNum instance, everything else will be automatically derived.
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
deriving (Eq, Show, Generic)
instance GNum Term where
And just to be sure we've done everything right, let's use QuickCheck to verify that our gfrom is an inverse of gto:
instance Arbitrary Term where
arbitrary = oneof [ return AtomA
, return AtomB
, SumL <$> arbitrary
, SumR <$> arbitrary
, Prod <$> arbitrary <*> arbitrary
]
prop_enc_dec :: Term -> Property
prop_enc_dec x = Just x === gfrom (gto x :: TermNum)
main :: IO ()
main = quickCheck prop_enc_dec
Notes:
The same thing could be accomplished using Scrap Your Boilerplate, perhaps more efficiently, as it allows somewhat higher-level access - enumerating constructors and records, etc.
See also paper Efficient Bijective G¨odel Numberings for Term Algebras (I haven't read the paper yet, but seems related).
For fun, I decided to try the approach in the link you posted, and didn't get stuck anywhere. So here's my code, with no commentary (the explanation is the same as the last time). First, code stolen from the other answer:
{-# LANGUAGE TypeSynonymInstances #-}
import Control.Applicative
import Data.Universe.Helpers
type Nat = Integer
class Godel a where
to :: a -> Nat
from :: Nat -> a
instance Godel Nat where to = id; from = id
instance (Godel a, Godel b) => Godel (a, b) where
to (m_, n_) = (m + n) * (m + n + 1) `quot` 2 + m where
m = to m_
n = to n_
from p = (from m, from n) where
isqrt = floor . sqrt . fromIntegral
base = (isqrt (1 + 8 * p) - 1) `quot` 2
triangle = base * (base + 1) `quot` 2
m = p - triangle
n = base - m
And the code specific to your new type:
data Term = Prod Term Term | SumL Term | SumR Term | AtomA | AtomB
deriving (Eq, Ord, Read, Show)
ts = AtomA : AtomB : interleave [uncurry Prod <$> ts +*+ ts, SumL <$> ts, SumR <$> ts]
instance Godel Term where
to AtomA = 0
to AtomB = 1
to (Prod t1 t2) = 2 + 0 + 3 * to (t1, t2)
to (SumL t) = 2 + 1 + 3 * to t
to (SumR t) = 2 + 2 + 3 * to t
from 0 = AtomA
from 1 = AtomB
from n = case quotRem (n-2) 3 of
(q, 0) -> uncurry Prod (from q)
(q, 1) -> SumL (from q)
(q, 2) -> SumR (from q)
The same ghci test as last time:
*Main> take 30 (map from [0..]) == take 30 ts
True

Haskell: comparison of techniques for generating combinations

I was doing a few of the 99 Haskell Problems earlier and I thought that exercise 27 ("write a function to enumerate the possible combinations") was interesting as it's a simple concept and it lends itself to multiple implementations.
I was curious about relative efficiency so I decided to run a couple of different implementations - results are in the table below. (For reference: Emacs bash ansi-term in LXDE (Ubuntu 14.04) running on VirtualBox; Thinkpad X220; 8gb RAM, i5 64bit 2.4ghz.)
TL;DR:
(i) Why are combination-generating techniques #7 and #8 (from the table below; code included at bottom of post) so much faster than the rest?
(ii) Also, what do the figures in the Bytes column actually represent?
(i) It's odd because function #7 works by filtering the powerset (which is waaaay larger than the combinations list); I suspect this is laziness at work, i.e., that this is the function which is most effectively exploiting the fact that we've only asked for the length of the list and not the list itself. (Also, its 'memory usage' is lower than that of the other functions, but, then again, I'm not sure exactly what memory-related stat is being shown.)
Regarding function #8: kudos to Bergi for that ridiculously fast implementation and thanks to user5402 for suggesting the addition. Still trying to wrap my ahead around the speed difference of this one.
(ii) The figures in the Bytes column are reported by GHCi after running the :set +s command; they clearly don't represent max memory usage as I only have ~25gb of RAM + free HD space.)?
Code:
import Data.List
--algorithms to generate combinations
--time required to compute the following: length $ 13 "abcdefghijklmnopqrstuvwxyz"
--(90.14 secs, 33598933424 bytes)
combDC1 :: (Eq a) => Int -> [a] -> [[a]]
combDC1 n xs = filter (/= []) $ combHelper n n xs []
combHelper :: Int -> Int -> [a] -> [a] -> [[a]]
combHelper n _ [] chosen = if length chosen == n
then [chosen]
else [[]]
combHelper n i remaining chosen
| length chosen == n = [chosen]
| n - length chosen > length remaining = [[]]
| otherwise = combHelper n (i-1) (tail remaining) ((head remaining):chosen) ++
combHelper n i (tail remaining) chosen
--(167.63 secs, 62756587760 bytes)
combSoln1 :: Int -> [a] -> [([a],[a])]
combSoln1 0 xs = [([],xs)]
combSoln1 n [] = []
combSoln1 n (x:xs) = ts ++ ds
where
ts = [ (x:ys,zs) | (ys,zs) <- combSoln1 (n-1) xs ]
ds = [ (ys,x:zs) | (ys,zs) <- combSoln1 n xs ]
--(71.40 secs, 30480652480 bytes)
combSoln2 :: Int -> [a] -> [[a]]
combSoln2 0 _ = [ [] ]
combSoln2 n xs = [ y:ys | y:xs' <- tails xs
, ys <- combSoln2 (n-1) xs']
--(83.75 secs, 46168207528 bytes)
combSoln3 :: Int -> [a] -> [[a]]
combSoln3 0 _ = return []
combSoln3 n xs = do
y:xs' <- tails xs
ys <- combSoln3 (n-1) xs'
return (y:ys)
--(92.34 secs, 40541644232 bytes)
combSoln4 :: Int -> [a] -> [[a]]
combSoln4 0 _ = [[]]
combSoln4 n xs = [ xs !! i : x | i <- [0..(length xs)-1]
, x <- combSoln4 (n-1) (drop (i+1) xs) ]
--(90.63 secs, 33058536696 bytes)
combSoln5 :: Int -> [a] -> [[a]]
combSoln5 _ [] = [[]]
combSoln5 0 _ = [[]]
combSoln5 k (x:xs) = x_start ++ others
where x_start = [ x : rest | rest <- combSoln5 (k-1) xs ]
others = if k <= length xs then combSoln5 k xs else []
--(61.74 secs, 33053297832 bytes)
combSoln6 :: Int -> [a] -> [[a]]
combSoln6 0 _ = [[]]
combSoln6 _ [] = []
combSoln6 n (x:xs) = (map (x:) (combSoln6 (n-1) xs)) ++ (combSoln6 n xs)
--(8.41 secs, 10785499208 bytes)
combSoln7 k ns = filter ((k==).length) (subsequences ns)
--(3.15 secs, 2889815872 bytes)
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
You should also test the algorithm found in this SO answer:
subsequences of length n from list performance
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
On my machine I get the following timing and memory usage from ghci:
ghci> length $ combSoln7 13 "abcdefghijklmnopqrstuvwxyz"
10400600
(13.42 secs, 10783921008 bytes)
ghci> length $ subsequencesOfSize 13 "abcdefghijklmnopqrstuvwxyz"
10400600
(6.52 secs, 2889807480 bytes)
fact :: (Integral a) => a -> a
fact n = product [1..n]
ncombs n k = -- to evaluate number of combinations
let n' = toInteger n
k' = toInteger k
in div (fact n') ((fact k') * (fact (n' - k')))
combinations :: Int -> [a] -> [[a]]
combinations 0 xs = [[]]
combinations 1 xs = [[x] | x <- xs]
combinations n xs =
let ps = reverse [0..n - 1]
inc (p:[])
| pn < length xs = pn:[]
| otherwise = p:[]
where pn = p + 1
inc (p:ps)
| pn < length xs = pn:ps
| (head psn) < length xs = inc ((head psn):psn)
| otherwise = (p:ps)
where pn = p + 1
psn = inc ps
amount = ncombs (length xs) n
pointers = take (fromInteger amount) (iterate inc ps)
c' xs ps = map (xs!!) (reverse ps)
in map (c' xs) pointers
I am learning Haskell and found a comparably fast implementation. I had a hard time with the type system with some functions requiring Ints and some fractional numbers and some Integers. On my computer the fastest solution presented here takes about 6,1 seconds to run and mine takes 3,5 to 2,9 seconds.

Construct infinite sorted list without adding duplicates

I am relatively new to Haskell, but I am trying to learn both by reading and trying to solve problems on Project Euler. I am currently trying to implement a function that takes an infinite list of integers and returns the ordered list of pairwise sums of elements in said list. I am really looking for solutions to the specific issue I am facing, rather than advice on different strategies or approaches, but those are welcome as well, as being a coder doesn't mean knowing how to implement a strategy, but also choosing the best strategy available.
My approach relies on traversing an infinite list of infinite generators and retrieving elements in order, with several mathematical properties that are useful in implementing my solution.
If I were trying to obtain the sequence of pairwise sums of the natural numbers, for example, this would be my code:
myList :: [Integer]
myList = [1..]
myGens :: [[Integer]]
myGens = gens myList
where
gens = \xs -> map (\x -> [x+y|y<-(dropWhile (<x) xs)]) xs
Regardless of the number set used, provided that it is sorted, the following conditions hold:
∀ i ≥ 0, head (gens xs !! i) == 2*(myList !! i)
∀ i,j,k ≥ 0, l > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j+l)
Special cases for the second condition are:
∀ i,j ≥ 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+1) !! j)
∀ i,j ≥ 0, k > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j)
Here is the particular code I am trying to modify:
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,i) = step xs cs xss
counts = inc i cs
streams = chop i xss
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,Int)
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,Int)
pace hs xs#((x,i):xt) = minim (x,i) hs xt
where
minim :: (Integer,Int) -> [Integer] -> [(Integer,Int)] -> (Integer,Int)
minim m _ [] = m
minim m#(g,i) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = y
| g > h = minim y hs ynt
| 2*(hs !! n) > g = m
| otherwise = minim m hs ynt
defer :: [Int] -> [[a]] -> [(a,Int)]
defer cs xss = (infer (zip cs (zip (map head xss) [0..])))
infer :: [(Int,(a,Int))] -> [(a,Int)]
infer [] = []
infer ((c,xi):xis) | c == 0 = xi:[]
| otherwise = xi:(infer (dropWhile (\(p,(q,r)) -> p>=c) xis))
The set in question I am using has the property that multiple distinct pairs produce an identical sum. I want an efficient method of handling all duplicate elements at once, in order to avoid an increased cost of computing all the pairwise sums up to N, as it requires M more tests if M is the number of duplicates.
Does anyone have any suggestions?
EDIT:
I made some changes to the code, independently of what was suggested, and would appreciate feedback on the relative efficiencies of my original code, my revised code, and the proposals so far.
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,is) = step xs cs xss
counts = foldr (\i -> inc i) cs is
streams = foldr (\i -> chop i) xss is
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,[Int])
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,[Int])
pace hs xs#((x,i):xt) = minim (x,(i:[])) hs xt
where
minim :: (Integer,[Int]) -> [Integer] -> [(Integer,Int)] -> (Integer,[Int])
minim m _ [] = m
minim m#(g,is#(i:_)) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = (h,[n])
| g > h = minim (h,[n]) hs ynt
| g == h && 2*(hs !! n) > h = (g,n:is)
| g == h = minim (g,n:is) hs ynt
| g < h && 2*(hs !! n) > g = m
| g < h = minim m hs ynt
Also, I left out the code for inc and chop:
alter :: (a->a) -> Int -> [a] -> [a]
alter = \f -> \n -> \xs -> (take (n) xs) ++ [f (xs !! n)] ++ (drop (n+1) xs)
inc :: Int -> [Int] -> [Int]
inc = alter (1+)
chop :: Int -> [[a]] -> [[a]]
chop = alter (tail)
I'm going to present a solution that uses an infinite pairing heap. We'll have logarithmic overhead per element constructed, but no one knows how to do better (in a model with comparison-based methods and real numbers).
The first bit of code is just the standard pairing heap.
module Queue where
import Data.Maybe (fromMaybe)
data Queue k = E
| T k [Queue k]
deriving Show
fromOrderedList :: (Ord k) => [k] -> Queue k
fromOrderedList [] = E
fromOrderedList [k] = T k []
fromOrderedList (k1 : ks'#(k2 : _ks''))
| k1 <= k2 = T k1 [fromOrderedList ks']
mergePairs :: (Ord k) => [Queue k] -> Queue k
mergePairs [] = E
mergePairs [q] = q
mergePairs (q1 : q2 : qs'') = merge (merge q1 q2) (mergePairs qs'')
merge :: (Ord k) => Queue k -> Queue k -> Queue k
merge (E) q2 = q2
merge q1 (E) = q1
merge q1#(T k1 q1's) q2#(T k2 q2's)
= if k1 <= k2 then T k1 (q2 : q1's) else T k2 (q1 : q2's)
deleteMin :: (Ord k) => Queue k -> Maybe (k, Queue k)
deleteMin (E) = Nothing
deleteMin (T k q's) = Just (k, mergePairs q's)
toOrderedList :: (Ord k) => Queue k -> [k]
toOrderedList q
= fromMaybe [] $
do (k, q') <- deleteMin q
return (k : toOrderedList q')
Note that fromOrderedList accepts infinite lists. I think that this can be justified theoretically by pretending as though the infinite list of descendants effectively are merged "just in time". This feels like the kind of thing that should be in the literature on purely functional data structures already, but I'm going to be lazy and not look right now.
The function mergeOrderedByMin takes this one step further and merges a potentially infinite list of queues, where the min element in each queue is nondecreasing. I don't think that we can reuse merge, since merge appears to be insufficiently lazy.
mergeOrderedByMin :: (Ord k) => [Queue k] -> Queue k
mergeOrderedByMin [] = E
mergeOrderedByMin (E : qs') = mergeOrderedByMin qs'
mergeOrderedByMin (T k q's : qs')
= T k (mergeOrderedByMin qs' : q's)
The next function removes duplicates from a sorted list. It's in the library that m09 suggested, but for the sake of completeness, I'll define it here.
nubOrderedList :: (Ord k) => [k] -> [k]
nubOrderedList [] = []
nubOrderedList [k] = [k]
nubOrderedList (k1 : ks'#(k2 : _ks''))
| k1 < k2 = k1 : nubOrderedList ks'
| k1 == k2 = nubOrderedList ks'
Finally, we put it all together. I'll use the squares as an example.
squares :: [Integer]
squares = map (^ 2) [0 ..]
sumsOfTwoSquares :: [Integer]
sumsOfTwoSquares
= nubOrderedList $ toOrderedList $
mergeOrderedByMin
[fromOrderedList (map (s +) squares) | s <- squares]
If you don't want to modify your code that much, you can use the nub function of Data.List.Ordered (installable by cabal install data-ordlist) to filter duplicates out.
It runs in linear time, ie complexity wise your algorithm won't change.
for your example [1..] the result is just [2..]. A "very smart compiler" could deduce this from the general solution with implicit heap, that follows.
gens xs is better expressed as
gens xs = map (\t#(x:_) -> map (x+) t) $ tails xs -- or should it be
-- map (\(x:ys) -> map (x+) ys) $ tails xs -- ?
Its resulting list of lists is easily merged without duplicates by tree-like folding1 (pictured here), with
pairsums xs = foldi (\(x:l) r-> x : union l r) $ gens xs
This assumes the input list is ordered in increasing order. If it's merely in non-decreasing order (with only finite runs of equals in it, of course), you'll need to slap an orderedNub on top of that (as m09 mentions),
pairsums' = orderedNub . pairsums
Just by using foldi where foldr would work, we often get an algorithmic improvement in complexity from a factor of n to log n, a pretty significant speedup. I use it as a general tool all the time.
1The code, adjusted for infinite lists only:
foldi f (x:xs) = f x (foldi f (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
union (x:xs) (y:ys) = case compare x y of
LT -> x : union xs (y:ys)
EQ -> x : union xs ys
GT -> y : union (x:xs) ys
See also:
mergesort as foldtree (by Heinrich Apfelmus)
infinite tree folding (by Dave Bayer)
Implicit Heap (by apfelmus)
I propose to build the pairs above the diagonal, that way a lot of duplicates are not even generated:
sums xs = zipWith (map . (+)) hs ts where
(hs:ts) = tails xs
Now you have a list of lists, each containing sorted sums. Because they are sorted, it is possible to determine the next element of the sequence in a finite number of steps:
filtermerge :: (Ord a) => [[a]]->[a]
filtermerge ((h:t):ts) = h : filtermerge (insert t ts) where
insert [] ts = ts
insert xs [] = [xs]
insert h ([]:t) = insert h t
insert (h:t) ts#((h1:t1):t2)
| h < h1 = (h:t):ts
| h == h1 = insert (h:t) $ insert t1 t2
| otherwise = insert (h1:t1) $ insert (h:t) t2
filtermerge _ = []

Efficient functional algorithm for computing closure under an operator

I'm interested in efficient functional algorithms (preferably in Haskell, and even more preferably already implemented as part of a library!) for computing the closure of a container under a unary operator.
A basic and inefficient example of what I have in mind, for lists, is:
closure :: Ord a => (a -> a) -> [a] -> [a]
closure f xs = first_dup (iterate (\xs -> nub $ sort $ xs ++ map f xs) xs) where
first_dup (xs:ys:rest) = if xs == ys then xs else first_dup (ys:rest)
A more efficient implementation keeps tracks of the new elements generated at each stage (the "fringe") and doesn't apply the function to elements to which it has already been applied:
closure' :: Ord a => (a -> a) -> [a] -> [a]
closure' f xs = stable (iterate close (xs, [])) where
-- return list when it stabilizes, i.e., when fringe is empty
stable ((fringe,xs):iterates) = if null fringe then xs else stable iterates
-- one iteration of closure on (fringe, rest); key invariants:
-- (1) fringe and rest are disjoint; (2) (map f rest) subset (fringe ++ rest)
close (fringe, xs) = (fringe', xs') where
xs' = sort (fringe ++ xs)
fringe' = filter (`notElem` xs') (map f fringe)
As an example, if xs is a nonempty sublist of [0..19], then closure' (\x->(x+3)`mod`20) xs is [0..19], and the iteration stabilizes in 20 steps for [0], 13 steps for [0,1], and 4 steps for [0,4,8,12,16].
Even more efficiency could be gotten using a tree-based ordered-set implementation.
Has this been done already? What about the related but harder question of closure under binary (or higher-arity) operators?
How about something like this which uses the Hash Array Mapped Trie data structures in unordered-containers. For unordered-containers member and insert are O(min(n,W)) where W is the length of the hash.
module Closed where
import Data.HashSet (HashSet)
import Data.Hashable
import qualified Data.HashSet as Set
data Closed a = Closed { seen :: HashSet a, iter :: a -> a }
insert :: (Hashable a, Eq a) => a -> Closed a -> Closed a
insert a c#(Closed set iter)
| Set.member a set = c
| otherwise = insert (iter a) $ Closed (Set.insert a set) iter
empty :: (a -> a) -> Closed a
empty = Closed Set.empty
close :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed a
close iter = foldr insert (empty iter)
Here's a variation on the above that generates the solution set more lazily, in a breadth-first manner.
data Closed' a = Unchanging | Closed' (a -> a) (HashSet a) (Closed' a)
close' :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed' a
close' iter = build Set.empty where
inserter :: (Hashable a, Eq a) => a -> (HashSet a, [a]) -> (HashSet a, [a])
inserter a (set, fresh) | Set.member a set = (set, fresh)
| otherwise = (Set.insert a set, a:fresh)
build curr [] = Unchanging
build curr as =
Closed' iter curr $ step (foldr inserter (curr, []) as)
step (set, added) = build set (map iter added)
-- Only computes enough iterations of the closure to
-- determine whether a particular element has been generated yet
--
-- Returns both a boolean and a new 'Closed'' value which will
-- will be more precisely defined and thus be faster to query
member :: (Hashable a, Eq a) => a -> Closed' a -> (Bool, Closed' a)
member _ Unchanging = False
member a c#(Closed' _ set next) | Set.member a set = (True, c)
| otherwise = member a next
improve :: Closed' a -> Maybe ([a], Closed' a)
improve Unchanging = Nothing
improve (Closed' _ set next) = Just (Set.toList set, next)
seen' :: Closed' a -> HashSet a
seen' Unchanging = Set.empty
seen' (Closed' _ set Unchanging) = set
seen' (Closed' _ set next) = seen' next
And to check
>>> member 6 $ close (+1) [0]
...
>>> fst . member 6 $ close' (+1) [0]
True

Resources