Efficient functional algorithm for computing closure under an operator - algorithm

I'm interested in efficient functional algorithms (preferably in Haskell, and even more preferably already implemented as part of a library!) for computing the closure of a container under a unary operator.
A basic and inefficient example of what I have in mind, for lists, is:
closure :: Ord a => (a -> a) -> [a] -> [a]
closure f xs = first_dup (iterate (\xs -> nub $ sort $ xs ++ map f xs) xs) where
first_dup (xs:ys:rest) = if xs == ys then xs else first_dup (ys:rest)
A more efficient implementation keeps tracks of the new elements generated at each stage (the "fringe") and doesn't apply the function to elements to which it has already been applied:
closure' :: Ord a => (a -> a) -> [a] -> [a]
closure' f xs = stable (iterate close (xs, [])) where
-- return list when it stabilizes, i.e., when fringe is empty
stable ((fringe,xs):iterates) = if null fringe then xs else stable iterates
-- one iteration of closure on (fringe, rest); key invariants:
-- (1) fringe and rest are disjoint; (2) (map f rest) subset (fringe ++ rest)
close (fringe, xs) = (fringe', xs') where
xs' = sort (fringe ++ xs)
fringe' = filter (`notElem` xs') (map f fringe)
As an example, if xs is a nonempty sublist of [0..19], then closure' (\x->(x+3)`mod`20) xs is [0..19], and the iteration stabilizes in 20 steps for [0], 13 steps for [0,1], and 4 steps for [0,4,8,12,16].
Even more efficiency could be gotten using a tree-based ordered-set implementation.
Has this been done already? What about the related but harder question of closure under binary (or higher-arity) operators?

How about something like this which uses the Hash Array Mapped Trie data structures in unordered-containers. For unordered-containers member and insert are O(min(n,W)) where W is the length of the hash.
module Closed where
import Data.HashSet (HashSet)
import Data.Hashable
import qualified Data.HashSet as Set
data Closed a = Closed { seen :: HashSet a, iter :: a -> a }
insert :: (Hashable a, Eq a) => a -> Closed a -> Closed a
insert a c#(Closed set iter)
| Set.member a set = c
| otherwise = insert (iter a) $ Closed (Set.insert a set) iter
empty :: (a -> a) -> Closed a
empty = Closed Set.empty
close :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed a
close iter = foldr insert (empty iter)
Here's a variation on the above that generates the solution set more lazily, in a breadth-first manner.
data Closed' a = Unchanging | Closed' (a -> a) (HashSet a) (Closed' a)
close' :: (Hashable a, Eq a) => (a -> a) -> [a] -> Closed' a
close' iter = build Set.empty where
inserter :: (Hashable a, Eq a) => a -> (HashSet a, [a]) -> (HashSet a, [a])
inserter a (set, fresh) | Set.member a set = (set, fresh)
| otherwise = (Set.insert a set, a:fresh)
build curr [] = Unchanging
build curr as =
Closed' iter curr $ step (foldr inserter (curr, []) as)
step (set, added) = build set (map iter added)
-- Only computes enough iterations of the closure to
-- determine whether a particular element has been generated yet
--
-- Returns both a boolean and a new 'Closed'' value which will
-- will be more precisely defined and thus be faster to query
member :: (Hashable a, Eq a) => a -> Closed' a -> (Bool, Closed' a)
member _ Unchanging = False
member a c#(Closed' _ set next) | Set.member a set = (True, c)
| otherwise = member a next
improve :: Closed' a -> Maybe ([a], Closed' a)
improve Unchanging = Nothing
improve (Closed' _ set next) = Just (Set.toList set, next)
seen' :: Closed' a -> HashSet a
seen' Unchanging = Set.empty
seen' (Closed' _ set Unchanging) = set
seen' (Closed' _ set next) = seen' next
And to check
>>> member 6 $ close (+1) [0]
...
>>> fst . member 6 $ close' (+1) [0]
True

Related

Haskell - Sort by first second element and then by first element

I have a list of tuples and I would like to sort it by second element (descending) and then by first element (ascending).
My code looks like this:
sortedOcc :: Eq a => [a] -> [(a, Int)]
sortedOcc = sortBy (flip compare `on` snd) . occurences
and this is the first sorting by the second element of list returned by occurences (function). How should I add the second sort (ascending) by the first element?
The Data.Ord module provides a Down newtype whose purpose is solely to reverse the ordering.
It also provides a comparing function:
comparing :: Ord a => (b -> a) -> b -> b -> Ordering
which must be fed some transformation function before it can be passed to sortBy.
Like this:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> sortBy (comparing (\(a,v) -> (Down v, a))) [(1,2),(1,3),(5,2),(5,3)]
[(1,3),(5,3),(1,2),(5,2)]
λ>
The values returned by the transformation function are then sorted using their own “natural” order. In our case, this is the lexicographic order on pairs of ordered types.
Overall, the code would require an Ord a constraint:
sortedOcc :: Ord a => [a] -> [(a, Int)]
sortedOcc = sortBy (comparing (\(a,v) -> (Down v, a))) . occurences
I'd probably write this using the Monoid instance on Ordering and on function types.
Sorting on the second value in the tuple looks like flip compare `on` snd, as you've already determined, while sorting on the first value looks like compare `on` fst.
These can be combined Monoidally with <>.
d :: [(String , Int)]
d = [("b", 1), ("a", 1), ("c",3), ("d",4)]
sortedD = sortBy ((flip compare `on` snd) <> (compare `on` fst)) d
I know that the rest of the answers are shorter, but I recommend you to implement these lazy functions yourself before using the already Haskell implemented ones, so you understand how it works.
-- Order a list of tuples by the first item
orderBy1stTupleItem :: Ord a => (a, b1) -> (a, b2) -> Ordering
orderBy1stTupleItem tup1 tup2
| item1 > item2 = GT
| item1 < item2 = LT
| otherwise = EQ
where
item1 = fst tup1
item2 = fst tup2
-- Order a list of tuples by the second item
orderBy2ndTupleItem :: Ord a1 => (a2, a1) -> (a3, a1) -> Ordering
orderBy2ndTupleItem tup1 tup2
| item1 > item2 = GT
| item1 < item2 = LT
| otherwise = EQ
where
item1 = snd tup1
item2 = snd tup2
-- Wrapper Function: Order a list of tuples by the first item and later by the second item
orderTuplesBy1stThenBy2ndItem :: (Ord a1, Ord a2) => [(a2, a1)] -> [(a2, a1)]
orderTuplesBy1stThenBy2ndItem listTuples =
sortBy orderBy2ndTupleItem (sortBy orderBy1stTupleItem listTuples)
Example
let exampleListTuples = [(1,2),(0,8),(6,1),(3,6),(9,1),(7,8),(0,9)]
Then let's get the 1st list, ordered by the first item of each tuple:
> listOrderedByTuple1stItem = sortBy orderBy1stTupleItem exampleListTuples
> listOrderedByTuple1stItem
[(0,8),(0,9),(1,2),(3,6),(6,1),(7,8),(9,1)]
Now we order this result list by the second item of each tuple
> sortBy orderBy2ndTupleItem listOrderedByTuple1stItem
[(6,1),(9,1),(1,2),(3,6),(0,8),(7,8),(0,9)]
Or, you can just run the wrapper function orderTuplesBy1stThenBy2ndItem as follows:
> sortBy orderTuplesBy1stThenBy2ndItem exampleListTuples
What is sortBy's signature?
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
This means that its first argument must have the type a -> a -> Ordering:
sortedOcc :: Eq a => [a] -> [(a, Int)]
sortedOcc = sortBy g . occurences
g :: a -> a -> Ordering
g = (flip compare `on` snd)
but that means that
g :: a -> a -> Ordering
g x y = (flip compare `on` snd) x y
= flip compare (snd x) (snd y)
= compare (snd y) (snd x)
and so to add your requirement into the mix we simply have to write it down,
= let test1 = compare (snd y) (snd x)
test2 = compare (snd y) (snd x)
in ......
right?
The above intentionally contains errors, which should be straightforward for you to fix.
A word of advice, only use point-free code if it is easy and natural for you to read and write, and modify.

Haskell: Sort an almost-sorted array

I've been learning Haskell in my spare time working through LYAH. Would like to improve upon my Haskell (/ Functional programming) skills by solving some problems from the imperative world. One of the problems from EPI is to print an "almost sorted array", in a sorted fashion where it is guaranteed that no element in the array is more than k away from its correct position. The input is a stream of elements and the requirement is to do this in O(n log k) time complexity and O(k) space complexity.
I've attempted to re-implement the imperative solution in Haskell as follows:
import qualified Data.Heap as Heap
-- print the k-sorted list in a sorted fashion
ksorted :: (Ord a, Show a) => [a] -> Int -> IO ()
ksorted [] _ = return ()
ksorted xs k = do
heap <- ksorted' xs Heap.empty
mapM_ print $ (Heap.toAscList heap) -- print the remaining elements in the heap.
where
ksorted' :: (Ord a, Show a) => [a] -> Heap.MinHeap a -> IO (Heap.MinHeap a)
ksorted' [] h = return h
ksorted' (x:xs) h = do let (m, h') = getMinAndBuildHeap h x in
(printMin m >> ksorted' xs h')
printMin :: (Show a) => Maybe a -> IO ()
printMin m = case m of
Nothing -> return ()
(Just item) -> print item
getMinAndBuildHeap :: (Ord a, Show a) => Heap.MinHeap a -> a -> (Maybe a, Heap.MinHeap a)
getMinAndBuildHeap h item= if (Heap.size h) > k
then ((Heap.viewHead h), (Heap.insert item (Heap.drop 1 h)))
else (Nothing, (Heap.insert item h))
I would like to know a better way of solving this in Haskell. Any inputs would be appreciated.
[Edit 1]: The input is stream, but for now I assumed a list instead (with only a forward iterator/ input iterator in some sense.)
[Edit 2]: added Data.Heap import to the code.
Thanks.
I think the main improvement is to separate the production of the sorted list from the printing of the sorted list. So:
import Data.Heap (MinHeap)
import qualified Data.Heap as Heap
ksort :: Ord a => Int -> [a] -> [a]
ksort k xs = go (Heap.fromList b) e where
(b, e) = splitAt (k-1) xs
go :: Ord a => MinHeap a -> [a] -> [a]
go heap [] = Heap.toAscList heap
go heap (x:xs) = x' : go heap' xs where
Just (x', heap') = Heap.view (Heap.insert x heap)
printKSorted :: (Ord a, Show a) => Int -> [a] -> IO ()
printKSorted k xs = mapM_ print (ksort k xs)
If I were feeling extra-special-fancy, I might try to turn go into a foldr or perhaps a mapAccumR, but in this case I think the explicit recursion is relatively readable, too.

Sort file system data without using anything else than Prelude

I have defined a data type called FS the following way:
type Name= String
data Ext where { Txt::Ext ; Mp3::Ext ; Jar::Ext ; Doc::Ext ; Hs::Ext }
deriving (Eq, Show)
data FS where { A :: (Name,Ext) -> FS;
Dir :: Name-> [FS] -> FS }
deriving (Eq, Show)
(A stands for file and Dir for directory)
And I'm trying to make a function that given a FS (directory) it returns the same FS but ordered alphabetically at all levels, my attempt so far is the following:
orderFS :: FS-> FS
orderFS (Dir x y) = Dir x (map orderFS (sort y));
orderFS (A (x,y)) = A (x,y);
The only piece I'm missing is a function called "sort" that takes a [FS] and returns it ordered alphabetically by the Name field.
I read that there are functions like sort from Data.List that can help, but I have to do this without using anything else than Prelude.
So how should I implement such function? Thanks in advance
I do not believe that there are any sorting functions in Prelude but not in a module like Data.List. Note that Data.List is in the base package that is part of GHC, so in basically any situation where Prelude is available, I would imagine that Data.List would be as well---you shouldn't need to download/include any other packages in order to use it.
That said, if you do want to write your own sorting function, you are probably best off taking an existing simple sorting algorithm and using it. There are very neat/simple ways of writing quicksorts and merge sorts in Haskell, although the obvious implementations sometimes don't have the same exact performance characteristics as you would expect. Merge sort, for example, has roughly the same asymptotics, but partitioning the list into two actually takes some time, since the list is singly-linked and you therefore have to walk through half of it in order to split it. But, it can be a very nice short function that looks a lot like the algorithm, and is probably worth doing as a learning exercise.
Also, I noticed that you are defining your Ext and FS types as GADTs, which I'm not really sure about the motivation for; I would suggest using the non-GADT syntax, which is much simpler for this example:
type Name = String
data Ext = Txt | Mp3 | Jar | Doc | Hs deriving (Eq, Show)
data FS = A Name Ext | Dir Name [FS] deriving (Eq, Show)
In order to sort them by name, it would probably be worth writing a simple accessor function that can get the name of an FS:
name :: FS -> Name
name (A n _) = n
name (Dir n _) = n
Another approach would be to factor out the thing (Name) that is common to both cases:
data FS = NamedFS { name :: Name, fs :: UnnamedFS }
data UnnamedFS = A Ext | Dir [FS]
The first entry here uses record syntax, which, among other things, will automatically make a name :: FS -> Name accessor, as well as an fs :: FS -> UnnamedFS.
For the actual sort, it looks a lot like the algorithmic description of merge sort. To start with, let's write a function to divide a list in two:
split :: [a] -> ([a], [a])
split xs = splitAt (length xs `div` 2) xs
We also need a function to merge two sorted lists:
merge :: Ord a => [a] -> [a] -> [a]
merge [] x = x
merge x [] = x
merge (x:xs) (y:ys) | x < y = x:merge xs (y:ys)
| otherwise = y:merge (x:xs) ys
Actually, this is not what we want, because it always uses the < from the Ord instance; instead, we want something that takes in a comparison function. In this case, we assume that if the comparison function returns true when called with x and y, x is conceptually less than y.
merge :: (a -> a -> Bool) -> [a] -> [a] -> [a]
merge _ [] x = x
merge _ x [] = x
merge le (x:xs) (y:ys) | x `le` y = x:merge le xs (y:ys)
| otherwise = y:merge le (x:xs) ys
Now, we can implement mergesort like usual:
mergeSort :: (a -> a -> Bool) -> [a] -> [a]
mergeSort _ [] = []
mergeSort _ [x] = [x]
mergeSort f l = merge f (mergeSort f left) (mergeSort f right)
where (left, right) = split l
And just call it like:
-- fss is some [FS]
mergeSort (\x y -> name x < name y) fss
If we could use Data.Ord, this could be further reduced to:
mergeSort (comparing name) fss
The function in Data.List that could help you is sortOn
this together with a function getName :: FS -> Name would allow you to sort by comparing the Names.
If you cannot use functions from Data.List you will have to implement a sorting algorithm yourself (of which there are many to choose from). One example is "QuickSort" as implemented in the Learn you a Haskell book:
quicksort :: [FS] -> [FS]
quicksort [] = []
quicksort (x:xs) =
let smallerSorted = quicksort [a | a <- xs, getName a <= getName x]
biggerSorted = quicksort [a | a <- xs, getName a > getName x]
in smallerSorted ++ [x] ++ biggerSorted
Note that I changed the comparisons to compare the getNames instead of the whole nodes.
Another thing: you are using GADT syntax to define your datatypes and I cannot see any reason for you to do so. Here is how I would write them instead using ordinary datatype declarations:
data Ext
= Txt
| Mp3
| Jar
| Doc
| Hs
deriving (Eq, Show)
data FS
= File Name Ext
| Dir Name [FS]
deriving (Eq, Show)
In my opinion, the most natural list-sorting function in Haskell is the bottom-up mergesort (Peter Amidon's answer gives a top-down mergesort). Suppose you're starting with the list
[25,1,22,2,10,8,6,20,13,28,5,3,11]
The first step is to turn the list into a list of lists. The simplest way is with map (:[]), which yields
[[25],[1],[22],[2],[10],[8],[6],[20],[13],[28],[5],[3],[11]]
Next, you merge lists pairwise:
[[1,25],[2,22],[8,10],[6,20],[13,28],[3,5],[11]]
Repeat!
[[1,2,22,25],[6,8,10,20],[3,5,13,28],[11]]
And again!
[[1,2,6,8,10,20,22,25],[3,5,11,13,28]]
And once more:
[[1,2,3,5,6,8,10,11,13,20,22,25,28]]
As we now have just one list, we extract it.
To implement this, I think it's best to use a custom type for the lists of lists in this case, to express the appropriate strictness:
infixr 5 :::
data LL a = ![a] ::: LL a | Nil
Implementation proceeds step by step. While breaking the list up into singletons is the simplest way to start off the process, it's a bit wasteful and will slow down access to the first few elements. So let's go by pairs there too:
breakUp :: (a -> a -> Ordering) -> [a] -> LL a
breakUp _cmp [] = Nil
breakUp _cmp xs#[a] = xs ::: Nil
breakUp cmp (x1 : x2 : xs)
| GT <- cmp x1 x2 = [x2,x1] ::: breakUp cmp xs
| otherwise = [x1,x2] ::: breakUp cmp xs
Now we need to write a merge function:
merge :: (a -> a -> Ordering)
-> [a] -> [a] -> [a]
merge _cmp [] ys = ys
merge _cmp xs [] = xs
merge cmp xss#(x:xs) yss#(y:ys)
| GT <- cmp x y = y : merge cmp xss ys
| otherwise = x : merge cmp xs yss
Next we need to be able to merge a list of lists pairwise:
mergePairwise :: (a -> a -> Ordering)
-> LL a -> LL a
mergePairwise _cmp Nil = Nil
mergePairwise _cmp xs#(_ ::: Nil) = xs
mergePairwise cmp (xs1 ::: xs2 ::: xss)
= merge cmp xs1 xs2 ::: mergePairwise cmp xss
Then we must merge up until done:
mergeAll :: (a -> a -> Ordering)
-> LL a -> [a]
mergeAll _cmp Nil = []
mergeAll _cmp (xs ::: Nil) = xs
mergeAll cmp xss = mergeAll cmp (mergePairwise cmp xss)
Now we're cooking with glass grass bass!
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
sortBy cmp = mergeAll cmp . breakUp cmp
sort :: Ord a => [a] -> [a]
sort = sortBy compare
There's a bit of room to improve the above implementation by recognizing that the lists stored in the lists of lists are never empty. So we might get a small performance improvement using
data NonEmpty a = a :| [a]
data LL a = (:::) {-# UNPACK #-} !(NonEmpty a) (LL a) | Nil

Generalizing a combinatoric function?

I've been solving a few combinatoric problems on Haskell, so I wrote down those 2 functions:
permutations :: (Eq a) => [a] -> [[a]]
permutations [] = [[]]
permutations list = do
x <- list
xs <- permutations (filter (/= x) list)
return (x : xs)
combinations :: (Eq a, Ord a) => Int -> [a] -> [[a]]
combinations 0 _ = [[]]
combinations n list = do
x <- list
xs <- combinations (n-1) (filter (> x) list)
return (x : xs)
Which works as follows:
*Main> permutations [1,2,3]
[[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]
*Main> combinations 2 [1,2,3,4]
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Those were uncomfortably similar, so I had to abstract it. I wrote the following abstraction:
combinatoric next [] = [[]]
combinatoric next list = do
x <- list
xs <- combinatoric next (next x list)
return (x : xs)
Which receives a function that controls how to filter the elements of the list. It can be used to easily define permutations:
permutations :: (Eq a) => [a] -> [[a]]
permutations = combinatoric (\ x ls -> filter (/= x) ls)
But I couldn't define combinations this way since it carries an state (n). I could extend the combinatoric with an additional state argument, but that'd become too clunky and I remember such approach was not necessary in a somewhat similar situation. Thus, I wonder: is it possible to define combinations using combinatorics? If not, what is a better abstraction of combinatorics which successfully subsumes both functions?
This isn't a direct answer to your question (sorry), but I don't think your code is correct. The Eq and Ord constraints tipped me off - they shouldn't be necessary - so I wrote a couple of QuickCheck properties.
prop_numberOfPermutations xs = length (permutations xs) === factorial (length xs)
where _ = (xs :: [Int]) -- force xs to be instantiated to [Int]
prop_numberOfCombinations (Positive n) (NonEmpty xs) = n <= length xs ==>
length (combinations n xs) === choose (length xs) n
where _ = (xs :: [Int])
factorial :: Int -> Int
factorial x = foldr (*) 1 [1..x]
choose :: Int -> Int -> Int
choose n 0 = 1
choose 0 r = 0
choose n r = choose (n-1) (r-1) * n `div` r
The first property checks that the number of permutations of a list of length n is n!. The second checks that the number of r-combinations of a list of length n is C(n, r). Both of these properties fail when I run them against your definitions:
ghci> quickCheck prop_numberOfPermutations
*** Failed! Falsifiable (after 5 tests and 4 shrinks):
[0,0,0]
3 /= 6
ghci> quickCheck prop_numberOfCombinations
*** Failed! Falsifiable (after 4 tests and 1 shrink):
Positive {getPositive = 2}
NonEmpty {getNonEmpty = [3,3]}
0 /= 1
It looks like your functions fail when the input list contains duplicate elements. Writing an abstraction for an incorrect implementation isn't a good idea - don't try and run before you can walk! You might find it helpful to read the source code for the standard library's definition of permutations, which does not have an Eq constraint.
First let's improve the original functions. You assume that all elements are distinct wrt their equality for permutations, and that they're distinct and have an ordering for combinations. These constraints aren't necessary and as described in the other answer, the code can produce wrong results. Following the robustness principle, let's accept just unconstrained lists. For this we'll need a helper function that produces all possible splits of a list:
split :: [a] -> [([a], a, [a])]
split = loop []
where
loop _ [] = []
loop rs (x:xs) = (rs, x, xs) : loop (x:rs) xs
Note that the implementation causes prefixes returned by this function to be reversed, but it's nothing we require.
This allows us to write generic permutations and combinations.
permutations :: [a] -> [[a]]
permutations [] = [[]]
permutations list = do
(pre, x, post) <- split list
-- reversing 'pre' isn't really necessary, but makes the output
-- order natural
xs <- permutations (reverse pre ++ post)
return (x : xs)
combinations :: Int -> [a] -> [[a]]
combinations 0 _ = [[]]
combinations n list = do
(_, x, post) <- split list
xs <- combinations (n-1) post
return (x : xs)
Now what they have in common:
At each step they pick an element to output,
update the list of elements to pick from and
stop after some condition is met.
The last point is a bit problematic, as for permutations we end once the list to choose from is empty, while for combinations we have a counter. This is probably the reason why it was difficult to generalize. We can work around this by realizing that for permutations the number of steps is equal to the length of the input list, so we can express the condition in the number of repetitions.
For such problems it's often very convenient to express them using StateT s [] monad, where s is the state we're working with. In our case it'll be the list of elements to choose from. The core of our combinatorial functions can be then expressed with StateT [a] [] a: pick an element from the state and update the state for the next step. Since the stateful computations all happen in the [] monad, we automatically branch all possibilities. With that, we can define a generic function:
import Control.Monad.State
combinatoric :: Int -> StateT [a] [] b -> [a] -> [[b]]
combinatoric n k = evalStateT $ replicateM n k
And then define permutations and combinations by specifying the appropriate number of repetitions and what's the core StateT [a] [] a function:
permutations' :: [a] -> [[a]]
permutations' xs = combinatoric (length xs) f xs
where
f = StateT $ map (\(pre, x, post) -> (x, reverse pre ++ post)) . split
combinations' :: Int -> [a] -> [[a]]
combinations' n xs = combinatoric n f xs
where
f = StateT $ map (\(_, x, post) -> (x, post)) . split

Construct infinite sorted list without adding duplicates

I am relatively new to Haskell, but I am trying to learn both by reading and trying to solve problems on Project Euler. I am currently trying to implement a function that takes an infinite list of integers and returns the ordered list of pairwise sums of elements in said list. I am really looking for solutions to the specific issue I am facing, rather than advice on different strategies or approaches, but those are welcome as well, as being a coder doesn't mean knowing how to implement a strategy, but also choosing the best strategy available.
My approach relies on traversing an infinite list of infinite generators and retrieving elements in order, with several mathematical properties that are useful in implementing my solution.
If I were trying to obtain the sequence of pairwise sums of the natural numbers, for example, this would be my code:
myList :: [Integer]
myList = [1..]
myGens :: [[Integer]]
myGens = gens myList
where
gens = \xs -> map (\x -> [x+y|y<-(dropWhile (<x) xs)]) xs
Regardless of the number set used, provided that it is sorted, the following conditions hold:
∀ i ≥ 0, head (gens xs !! i) == 2*(myList !! i)
∀ i,j,k ≥ 0, l > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j+l)
Special cases for the second condition are:
∀ i,j ≥ 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+1) !! j)
∀ i,j ≥ 0, k > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j)
Here is the particular code I am trying to modify:
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,i) = step xs cs xss
counts = inc i cs
streams = chop i xss
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,Int)
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,Int)
pace hs xs#((x,i):xt) = minim (x,i) hs xt
where
minim :: (Integer,Int) -> [Integer] -> [(Integer,Int)] -> (Integer,Int)
minim m _ [] = m
minim m#(g,i) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = y
| g > h = minim y hs ynt
| 2*(hs !! n) > g = m
| otherwise = minim m hs ynt
defer :: [Int] -> [[a]] -> [(a,Int)]
defer cs xss = (infer (zip cs (zip (map head xss) [0..])))
infer :: [(Int,(a,Int))] -> [(a,Int)]
infer [] = []
infer ((c,xi):xis) | c == 0 = xi:[]
| otherwise = xi:(infer (dropWhile (\(p,(q,r)) -> p>=c) xis))
The set in question I am using has the property that multiple distinct pairs produce an identical sum. I want an efficient method of handling all duplicate elements at once, in order to avoid an increased cost of computing all the pairwise sums up to N, as it requires M more tests if M is the number of duplicates.
Does anyone have any suggestions?
EDIT:
I made some changes to the code, independently of what was suggested, and would appreciate feedback on the relative efficiencies of my original code, my revised code, and the proposals so far.
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,is) = step xs cs xss
counts = foldr (\i -> inc i) cs is
streams = foldr (\i -> chop i) xss is
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,[Int])
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,[Int])
pace hs xs#((x,i):xt) = minim (x,(i:[])) hs xt
where
minim :: (Integer,[Int]) -> [Integer] -> [(Integer,Int)] -> (Integer,[Int])
minim m _ [] = m
minim m#(g,is#(i:_)) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = (h,[n])
| g > h = minim (h,[n]) hs ynt
| g == h && 2*(hs !! n) > h = (g,n:is)
| g == h = minim (g,n:is) hs ynt
| g < h && 2*(hs !! n) > g = m
| g < h = minim m hs ynt
Also, I left out the code for inc and chop:
alter :: (a->a) -> Int -> [a] -> [a]
alter = \f -> \n -> \xs -> (take (n) xs) ++ [f (xs !! n)] ++ (drop (n+1) xs)
inc :: Int -> [Int] -> [Int]
inc = alter (1+)
chop :: Int -> [[a]] -> [[a]]
chop = alter (tail)
I'm going to present a solution that uses an infinite pairing heap. We'll have logarithmic overhead per element constructed, but no one knows how to do better (in a model with comparison-based methods and real numbers).
The first bit of code is just the standard pairing heap.
module Queue where
import Data.Maybe (fromMaybe)
data Queue k = E
| T k [Queue k]
deriving Show
fromOrderedList :: (Ord k) => [k] -> Queue k
fromOrderedList [] = E
fromOrderedList [k] = T k []
fromOrderedList (k1 : ks'#(k2 : _ks''))
| k1 <= k2 = T k1 [fromOrderedList ks']
mergePairs :: (Ord k) => [Queue k] -> Queue k
mergePairs [] = E
mergePairs [q] = q
mergePairs (q1 : q2 : qs'') = merge (merge q1 q2) (mergePairs qs'')
merge :: (Ord k) => Queue k -> Queue k -> Queue k
merge (E) q2 = q2
merge q1 (E) = q1
merge q1#(T k1 q1's) q2#(T k2 q2's)
= if k1 <= k2 then T k1 (q2 : q1's) else T k2 (q1 : q2's)
deleteMin :: (Ord k) => Queue k -> Maybe (k, Queue k)
deleteMin (E) = Nothing
deleteMin (T k q's) = Just (k, mergePairs q's)
toOrderedList :: (Ord k) => Queue k -> [k]
toOrderedList q
= fromMaybe [] $
do (k, q') <- deleteMin q
return (k : toOrderedList q')
Note that fromOrderedList accepts infinite lists. I think that this can be justified theoretically by pretending as though the infinite list of descendants effectively are merged "just in time". This feels like the kind of thing that should be in the literature on purely functional data structures already, but I'm going to be lazy and not look right now.
The function mergeOrderedByMin takes this one step further and merges a potentially infinite list of queues, where the min element in each queue is nondecreasing. I don't think that we can reuse merge, since merge appears to be insufficiently lazy.
mergeOrderedByMin :: (Ord k) => [Queue k] -> Queue k
mergeOrderedByMin [] = E
mergeOrderedByMin (E : qs') = mergeOrderedByMin qs'
mergeOrderedByMin (T k q's : qs')
= T k (mergeOrderedByMin qs' : q's)
The next function removes duplicates from a sorted list. It's in the library that m09 suggested, but for the sake of completeness, I'll define it here.
nubOrderedList :: (Ord k) => [k] -> [k]
nubOrderedList [] = []
nubOrderedList [k] = [k]
nubOrderedList (k1 : ks'#(k2 : _ks''))
| k1 < k2 = k1 : nubOrderedList ks'
| k1 == k2 = nubOrderedList ks'
Finally, we put it all together. I'll use the squares as an example.
squares :: [Integer]
squares = map (^ 2) [0 ..]
sumsOfTwoSquares :: [Integer]
sumsOfTwoSquares
= nubOrderedList $ toOrderedList $
mergeOrderedByMin
[fromOrderedList (map (s +) squares) | s <- squares]
If you don't want to modify your code that much, you can use the nub function of Data.List.Ordered (installable by cabal install data-ordlist) to filter duplicates out.
It runs in linear time, ie complexity wise your algorithm won't change.
for your example [1..] the result is just [2..]. A "very smart compiler" could deduce this from the general solution with implicit heap, that follows.
gens xs is better expressed as
gens xs = map (\t#(x:_) -> map (x+) t) $ tails xs -- or should it be
-- map (\(x:ys) -> map (x+) ys) $ tails xs -- ?
Its resulting list of lists is easily merged without duplicates by tree-like folding1 (pictured here), with
pairsums xs = foldi (\(x:l) r-> x : union l r) $ gens xs
This assumes the input list is ordered in increasing order. If it's merely in non-decreasing order (with only finite runs of equals in it, of course), you'll need to slap an orderedNub on top of that (as m09 mentions),
pairsums' = orderedNub . pairsums
Just by using foldi where foldr would work, we often get an algorithmic improvement in complexity from a factor of n to log n, a pretty significant speedup. I use it as a general tool all the time.
1The code, adjusted for infinite lists only:
foldi f (x:xs) = f x (foldi f (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
union (x:xs) (y:ys) = case compare x y of
LT -> x : union xs (y:ys)
EQ -> x : union xs ys
GT -> y : union (x:xs) ys
See also:
mergesort as foldtree (by Heinrich Apfelmus)
infinite tree folding (by Dave Bayer)
Implicit Heap (by apfelmus)
I propose to build the pairs above the diagonal, that way a lot of duplicates are not even generated:
sums xs = zipWith (map . (+)) hs ts where
(hs:ts) = tails xs
Now you have a list of lists, each containing sorted sums. Because they are sorted, it is possible to determine the next element of the sequence in a finite number of steps:
filtermerge :: (Ord a) => [[a]]->[a]
filtermerge ((h:t):ts) = h : filtermerge (insert t ts) where
insert [] ts = ts
insert xs [] = [xs]
insert h ([]:t) = insert h t
insert (h:t) ts#((h1:t1):t2)
| h < h1 = (h:t):ts
| h == h1 = insert (h:t) $ insert t1 t2
| otherwise = insert (h1:t1) $ insert (h:t) t2
filtermerge _ = []

Resources