Generalizing a combinatoric function? - algorithm

I've been solving a few combinatoric problems on Haskell, so I wrote down those 2 functions:
permutations :: (Eq a) => [a] -> [[a]]
permutations [] = [[]]
permutations list = do
x <- list
xs <- permutations (filter (/= x) list)
return (x : xs)
combinations :: (Eq a, Ord a) => Int -> [a] -> [[a]]
combinations 0 _ = [[]]
combinations n list = do
x <- list
xs <- combinations (n-1) (filter (> x) list)
return (x : xs)
Which works as follows:
*Main> permutations [1,2,3]
[[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]
*Main> combinations 2 [1,2,3,4]
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Those were uncomfortably similar, so I had to abstract it. I wrote the following abstraction:
combinatoric next [] = [[]]
combinatoric next list = do
x <- list
xs <- combinatoric next (next x list)
return (x : xs)
Which receives a function that controls how to filter the elements of the list. It can be used to easily define permutations:
permutations :: (Eq a) => [a] -> [[a]]
permutations = combinatoric (\ x ls -> filter (/= x) ls)
But I couldn't define combinations this way since it carries an state (n). I could extend the combinatoric with an additional state argument, but that'd become too clunky and I remember such approach was not necessary in a somewhat similar situation. Thus, I wonder: is it possible to define combinations using combinatorics? If not, what is a better abstraction of combinatorics which successfully subsumes both functions?

This isn't a direct answer to your question (sorry), but I don't think your code is correct. The Eq and Ord constraints tipped me off - they shouldn't be necessary - so I wrote a couple of QuickCheck properties.
prop_numberOfPermutations xs = length (permutations xs) === factorial (length xs)
where _ = (xs :: [Int]) -- force xs to be instantiated to [Int]
prop_numberOfCombinations (Positive n) (NonEmpty xs) = n <= length xs ==>
length (combinations n xs) === choose (length xs) n
where _ = (xs :: [Int])
factorial :: Int -> Int
factorial x = foldr (*) 1 [1..x]
choose :: Int -> Int -> Int
choose n 0 = 1
choose 0 r = 0
choose n r = choose (n-1) (r-1) * n `div` r
The first property checks that the number of permutations of a list of length n is n!. The second checks that the number of r-combinations of a list of length n is C(n, r). Both of these properties fail when I run them against your definitions:
ghci> quickCheck prop_numberOfPermutations
*** Failed! Falsifiable (after 5 tests and 4 shrinks):
[0,0,0]
3 /= 6
ghci> quickCheck prop_numberOfCombinations
*** Failed! Falsifiable (after 4 tests and 1 shrink):
Positive {getPositive = 2}
NonEmpty {getNonEmpty = [3,3]}
0 /= 1
It looks like your functions fail when the input list contains duplicate elements. Writing an abstraction for an incorrect implementation isn't a good idea - don't try and run before you can walk! You might find it helpful to read the source code for the standard library's definition of permutations, which does not have an Eq constraint.

First let's improve the original functions. You assume that all elements are distinct wrt their equality for permutations, and that they're distinct and have an ordering for combinations. These constraints aren't necessary and as described in the other answer, the code can produce wrong results. Following the robustness principle, let's accept just unconstrained lists. For this we'll need a helper function that produces all possible splits of a list:
split :: [a] -> [([a], a, [a])]
split = loop []
where
loop _ [] = []
loop rs (x:xs) = (rs, x, xs) : loop (x:rs) xs
Note that the implementation causes prefixes returned by this function to be reversed, but it's nothing we require.
This allows us to write generic permutations and combinations.
permutations :: [a] -> [[a]]
permutations [] = [[]]
permutations list = do
(pre, x, post) <- split list
-- reversing 'pre' isn't really necessary, but makes the output
-- order natural
xs <- permutations (reverse pre ++ post)
return (x : xs)
combinations :: Int -> [a] -> [[a]]
combinations 0 _ = [[]]
combinations n list = do
(_, x, post) <- split list
xs <- combinations (n-1) post
return (x : xs)
Now what they have in common:
At each step they pick an element to output,
update the list of elements to pick from and
stop after some condition is met.
The last point is a bit problematic, as for permutations we end once the list to choose from is empty, while for combinations we have a counter. This is probably the reason why it was difficult to generalize. We can work around this by realizing that for permutations the number of steps is equal to the length of the input list, so we can express the condition in the number of repetitions.
For such problems it's often very convenient to express them using StateT s [] monad, where s is the state we're working with. In our case it'll be the list of elements to choose from. The core of our combinatorial functions can be then expressed with StateT [a] [] a: pick an element from the state and update the state for the next step. Since the stateful computations all happen in the [] monad, we automatically branch all possibilities. With that, we can define a generic function:
import Control.Monad.State
combinatoric :: Int -> StateT [a] [] b -> [a] -> [[b]]
combinatoric n k = evalStateT $ replicateM n k
And then define permutations and combinations by specifying the appropriate number of repetitions and what's the core StateT [a] [] a function:
permutations' :: [a] -> [[a]]
permutations' xs = combinatoric (length xs) f xs
where
f = StateT $ map (\(pre, x, post) -> (x, reverse pre ++ post)) . split
combinations' :: Int -> [a] -> [[a]]
combinations' n xs = combinatoric n f xs
where
f = StateT $ map (\(_, x, post) -> (x, post)) . split

Related

Is there such a thing as maximumWith?

Specifically I'm searching for a function 'maximumWith',
maximumWith :: (Foldable f, Ord b) => (a -> b) -> f a -> a
Which behaves in the following way:
maximumWith length [[1, 2], [0, 1, 3]] == [0, 1, 3]
maximumWith null [[(+), (*)], []] == []
maximumWith (const True) x == head x
My use case is picking the longest word in a list.
For this I'd like something akin to maximumWith length.
I'd thought such a thing existed, since sortWith etc. exist.
Let me collect all the notes in the comments together...
Let's look at sort. There are 4 functions in the family:
sortBy is the actual implementation.
sort = sortBy compare uses Ord overloading.
sortWith = sortBy . comparing is the analogue of your desired maximumWith. However, this function has an issue. The ranking of an element is given by applying the given mapping function to it. However, the ranking is not memoized, so if an element needs to compared multiple times, the ranking will be recomputed. You can only use it guilt-free if the ranking function is very cheap. Such functions include selectors (e.g. fst), and newtype constructors. YMMV on simple arithmetic and data constructors. Between this inefficiency, the simplicity of the definition, and its location in GHC.Exts, it's easy to deduce that it's not used that often.
sortOn fixes the inefficiency by decorating each element with its image under the ranking function in a pair, sorting by the ranks, and then erasing them.
The first two have analogues in maximum: maximumBy and maximum. sortWith has no analogy; you may as well write out maximumBy (comparing _) every time. There is also no maximumOn, even though such a thing would be more efficient. The easiest way to define a maximumOn is probably just to copy sortOn:
maximumOn :: (Functor f, Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . maximumBy (comparing fst) . fmap annotate
where annotate e = let r = rank e in r `seq` (r, e)
There's a bit of interesting code in maximumBy that keeps this from optimizing properly on lists. It also works to use
maximumOn :: (Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . fromJust . foldl' max' Nothing
where max' Nothing x = let r = rank x in r `seq` Just (r, x)
max' old#(Just (ro, xo)) xn = let rn = rank xn
in case ro `compare` rn of
LT -> Just (rn, xo)
_ -> old
These pragmas may be useful:
{-# SPECIALIZE maximumOn :: Ord r => (a -> r) -> [a] -> a #-}
{-# SPECIALIZE maximumOn :: (a -> Int) -> [a] -> a #-}
HTNW has explained how to do what you asked, but I figured I should mention that for the specific application you mentioned, there's a way that's more efficient in certain cases (assuming the words are represented by Strings). Suppose you want
longest :: [[a]] -> [a]
If you ask for maximumOn length [replicate (10^9) (), []], then you'll end up calculating the length of a very long list unnecessarily. There are several ways to work around this problem, but here's how I'd do it:
data MS a = MS
{ _longest :: [a]
, _longest_suffix :: [a]
, _longest_bound :: !Int }
We will ensure that longest is the first of the longest strings seen thus far, and that longest_bound + length longest_suffix = length longest.
step :: MS a -> [a] -> MS a
step (MS longest longest_suffix longest_bound) xs =
go longest_bound longest_suffix xs'
where
-- the new list is not longer
go n suffo [] = MS longest suffo n
-- the new list is longer
go n [] suffn = MS xs suffn n
-- don't know yet
go !n (_ : suffo) (_ : suffn) =
go (n + 1) suffo suffn
xs' = drop longest_bound xs
longest :: [[a]] -> [a]
longest = _longest . foldl' step (MS [] [] 0)
Now if the second to longest list has q elements, we'll walk at most q conses into each list. This is the best possible complexity. Of course, it's only significantly better than the maximumOn solution when the longest list is much longer than the second to longest.

Finding a "Count Sequence"

Given a list of integers xs, let:
count :: [Integer] -> Integer -> Integer
count xs n = length . filter (==n) $ xs
count the number of times the integer n occurs in the list.
Now, given a "list" (some sort of array of integers, can be something besides a List) of length n, write a function
countSequence :: [Integer] -> Integer -> Integer -> Integer
countSequence xs n m = [count xs x | x <- [0..m]]
that outputs the "list of counts" (0th index contains number of times 0 occurs in the list, 1st index contains number of times 1 occurs in the list, etc) that has time compleity o(m*n)
The above implementation I've given has complexity O(m*n). In Python (which I'm more familiar with), it's easy to do this in O(m + n) time --- iterate through the list, and each element increment a counter in some other list, which is initialized to be all zeros and length (m+1).
How could I get a better implementation in Haskell? I'd prefer if it wasn't some trivial way to implement the Python solution (such as adding another input to the function to keep the "list of counts" in and then interating through it).
In O(n+m) (sort of, I think, maybe):
import Data.Ix (inRange)
import qualified Data.IntMap.Strict as IM
countSequence m =
foldl' count IM.empty . filter (inRange (0,m))
where count a b = IM.insertWith (+) b 1 a
gives
> countSequence 2 [1,2,3,1,2,-1]
fromList [(1,2),(2,2)]
I haven't used n because you also didn't use n and I'm not sure what it's supposed to be. I also moved the list to the last argument to put it in a position to be eta reduced.
I think you should use your Python intuition -- iterate through the one list and increment a counter in another list. Here's an implementation with O(n+m) runtime:
import Data.Array
countSequence xs m = accumArray (+) 0 (0,m) [(x, 1) | x <- xs, inRange (0,m) x]
(This use case is even the motivating example for the existence of accumArray in the documentation!) In ghci:
> countSequence ([1..5] ++ [1,3..5] ++ [1,4..5] ++ [1,5]) 3
array (0,3) [(0,0),(1,4),(2,1),(3,2)]
I guess using Data.IntMap would be as efficient as it gets for this job. One foldr pass is done to establish the IntMap (cm) and a map to construct a new list holding the counts of elements at corresponding positions.
import qualified Data.IntMap.Lazy as IM
countSequence :: [Int] -> [Int]
countSequence xs = map (\x -> let cm = foldr (\x m -> IM.alter (\mx -> if mx == Nothing then Just 1 else fmap (+1) mx) x m) IM.empty xs
in IM.findWithDefault 0 x cm) xs
*Main> countSequence [1,2,5,1,3,7,8,5,6,4,1,2,3,7,9,3,4,8]
[3,2,2,3,3,2,2,2,1,2,3,2,3,2,1,3,2,2]
*Main> countSequence [4,5,4]
[2,1,2]
*Main> *Main> countSequence [9,8,7,6,5]
[1,1,1,1,1]

Slower execution when using an infinite list

I'm beginning to try and get my head round haskell performance, and what makes things fast and slow, and I'm a little confused by this.
I have two implementations of a function that generates a list of primes up to a certain value. The first is straight off the Haskell wiki:
primesTo :: (Ord a, Num a, Enum a) => a -> [a]
primesTo m = eratos [2..m] where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])
The second is the same, but using an infinite list internally:
primes2 :: (Ord a, Num a, Enum a) => a -> [a]
primes2 m = takeWhile (<= m) (eratos [2..]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])
In both cases, the minus function is:
minus :: (Ord a) => [a] -> [a] -> [a]
minus (x:xs) (y:ys) = case (compare x y) of
LT -> x : minus xs (y:ys)
EQ -> minus xs ys
GT -> minus (x:xs) ys
minus xs _ = xs
The latter implementation is significantly (~100x) slower than the former, and I don't get why. I would have thought that haskell's lazy evalutation would make them fairly equivalent under the hood.
This is obviously a reduced test case for the purposes of the question - in real life the optimisation would be no problem (although I don't understand why it is needed), but to me a function that just generates an infinite list of primes is more generically useful than a finite list, but appears slower to work with.
Looks like to me that there's a big difference between
(xs `minus` [p*p, p*p+p..m]) -- primesTo
(xs `minus` [p*p, p*p+p..]) -- primes2
The function minus steps through lists pairwise and terminates when one list reaches the end. In the first minus expression above, this occurs in no more than (m-p*p)/p steps when the latter list is exhausted. In the second one, it will always take steps on the order of length xs.
So your infinite lists have disabled at least one meaningful optimization.
One difference is that in the second case you need to generate one extra prime. You need to generate the first prime greater than m before takeWhile knows its time to stop.
Additionally, the [..m] bounds on both the list to filter and the lists of multiples help reduce the number of calculations. Whenever one of these lists gets empty minus immediately returns via its secons clause while in the infinite case the minus gets stuck in the first case. You can explore this a bit better if you also test the cases where only one of the lists is infinite:
--this is also slow
primes3 :: (Ord a, Num a, Enum a) => a -> [a]
primes3 m = takeWhile (<= m) (eratos [2..m]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])
--this fast
primes4 :: (Ord a, Num a, Enum a) => a -> [a]
primes4 m = takeWhile (<= m) (eratos [2..]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])

most idiomatic way to implement recursive list comprehension in F#

the question in short: What is the most idiomatic way to do "recursive List comprehension" in F#?
more detailed: As I have learned so far (I am new to F#) we have essentially the following tools to "build up" lists: List.map and list comprehension. Imho they both do more or less the same thing, they generate a list by "altering" the elements of a given list (in case of comprehension the given list is of the form [k..n]).
What I want to do is to inductively build up lists (before people ask: for no other reason than curiosity) i.e. is there any built in function with the behavior one would expect from a function called something like "List.maplist" that might take as arguments
a function f : 'a List -> 'a and an n : int,
returning the list
[... ; f (f []) ; f [] ] of length n.
To illustrate what I mean I wrote such a function on my own (as an exercise)
let rec recListComprehension f n =
if n=0 then []
else
let oldList = recListComprehension f (n-1)
f (oldList) :: oldList
or a bit less readable but in turn tail recursive:
let rec tailListComprehension f n list =
if n=0 then list
else tailListComprehension f (n-1) ((f list)::list)
let trecListComprehension f n = tailListComprehension f n []
for example, a list containing the first 200 fibonacci numbers can be generated by
let fiboGen =
function
| a::b::tail -> a+b
| _ -> 1UL
trecListComprehension (fiboGen) 200
to sum up the question: Is there a build in function in F# that behaves more or less like "trecListComprehension" and if not what is the most idiomatic way to achieve this sort of functionality?
PS: sorry for being a bit verbose..
What is the most idiomatic way to do "recursive List comprehension" in F#?
It's the matter of style. You will encounter high-order functions more often. For certain situations e.g. expressing nested computation or achieving laziness, using sequence expression seems more natural.
To illustrate, your example is written in sequence expression:
let rec recListComprehension f n = seq {
if n > 0 then
let oldList = recListComprehension f (n-1)
yield f oldList
yield! oldList }
recListComprehension fiboGen 200 |> Seq.toList
You have a very readable function with both laziness and tail-recursiveness which you can't easily achieve by using Seq.unfold.
Similarly, nested computation of cartesian product is more readable to use sequence expression / list comprehension:
let cartesian xs ys =
[ for x in xs do
for y in ys do
yield (x, y) ]
than to use high-order functions:
let cartesian xs ys =
List.collect (fun x -> List.map (fun y -> (x, y)) ys) xs
I once asked about differences between list comprehension and high-order functions which might be of your interest.
You're basically folding over the numeric range. So it could be written:
let listComp f n = List.fold (fun xs _ -> f xs :: xs) [] [1 .. n]
This has the added benefit of gracefully handling negative values of n.
You could do a Seq.unfold and then do Seq.toList.
See the example from here:
let seq1 = Seq.unfold (fun state -> if (state > 20) then None else Some(state, state + 1)) 0
printfn "The sequence seq1 contains numbers from 0 to 20."
for x in seq1 do printf "%d " x
let fib = Seq.unfold (fun state ->
if (snd state > 1000) then None
else Some(fst state + snd state, (snd state, fst state + snd state))) (1,1)
printfn "\nThe sequence fib contains Fibonacci numbers."
for x in fib do printf "%d " x

Two simple codes to generate divisors of a number. Why is the recursive one faster?

While solving a problem, I had to calculate the divisors of a number. I have two implementations that produce all divisors > 1 for a given number.
The first is using simple recursion:
divisors :: Int64 -> [Int64]
divisors k = divisors' 2 k
where
divisors' n k | n*n > k = [k]
| n*n == k = [n, k]
| k `mod` n == 0 = (n:(k `div` n):result)
| otherwise = result
where result = divisors' (n+1) k
The second one uses list processing functions from the Prelude:
divisors2 :: Int64 -> [Int64]
divisors2 k = k : (concatMap (\x -> [x, k `div` x]) $!
filter (\x -> k `mod` x == 0) $!
takeWhile (\x -> x*x <= k) [2..])
I find that the first implementation is faster (I printed the whole list returned, so that no part of the result remains unevaluated due to laziness). The two implementations produce differently ordered divisors, but that is not a problem for me. (In fact, if k is a perfect square, the square root is output twice in the second implementation - again not a problem).
In general are such recursive implementations faster in Haskell? Also, I would appreciate any pointers to make either of these codes faster. Thanks!
EDIT:
Here is the code I am using to compare these two implementations for performance: https://gist.github.com/3414372
Here are my timing measurements:
Using divisor2 with strict evaluation ($!)
$ ghc --make -O2 div.hs
[1 of 1] Compiling Main ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1
real 0m7.651s
user 0m7.604s
sys 0m0.012s
Using divisors2 with lazy evaluation ($):
$ ghc --make -O2 div.hs
[1 of 1] Compiling Main ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1
real 0m7.461s
user 0m7.444s
sys 0m0.012s
Using function divisors
$ ghc --make -O2 div.hs
[1 of 1] Compiling Main ( div.hs, div.o )
Linking div ...
$ time ./div > /tmp/out1
real 0m7.058s
user 0m7.036s
sys 0m0.020s
The recursive version is not in general faster than the list-based version. This is because the GHC compiler employs List fusion optimizations when a computation follows a certain pattern. This means that list generators and "list transformers" might be fused into one big generator instead.
However, when you use $!, you basically tell the compiler to "Please produce the first cons of this list before performing the next step." This means that GHC is forced to at least compute one intermediate list element, which disables the whole fusion optimization entirely.
So, the second algorithm is slower, because you produce intermediate lists that have to be constructed and destructed, while the recursive algorithm simply produces a single list straight away.
Since you asked, to make it faster a different algorithm should be used. Simple and straightforward is to find a prime factorization first, then construct the divisors from it somehow.
Standard prime factorization by trial division is:
factorize :: Integral a => a -> [a]
factorize n = go n (2:[3,5..]) -- or: `go n primes`
where
go n ds#(d:t)
| d*d > n = [n]
| r == 0 = d : go q ds
| otherwise = go n t
where (q,r) = quotRem n d
-- factorize 12348 ==> [2,2,3,3,7,7,7]
Equal prime factors can be grouped and counted:
import Data.List (group)
primePowers :: Integral a => a -> [(a, Int)]
primePowers n = [(head x, length x) | x <- group $ factorize n]
-- primePowers = map (head &&& length) . group . factorize
-- primePowers 12348 ==> [(2,2),(3,2),(7,3)]
Divisors are usually constructed, though out of order, with:
divisors :: Integral a => a -> [a]
divisors n = map product $ sequence
[take (k+1) $ iterate (p*) 1 | (p,k) <- primePowers n]
Hence, we have
numDivisors :: Integral a => a -> Int
numDivisors n = product [ k+1 | (_,k) <- primePowers n]
The product here comes from the sequence in the definition above it, because sequence :: Monad m => [m a] -> m [a] for list monad m ~ [] constructs lists of all possible combinations of elements picked by one from each member list, sequence_lists = foldr (\xs rs -> [x:r | x <- xs, r <- rs]) [[]], so that length . sequence_lists === product . map length, and or course length . take n === n for infinite argument lists.
In-order generation is possible, too:
ordDivisors :: Integral a => a -> [a]
ordDivisors n = foldr (\(p,k)-> foldi merge [] . take (k+1) . iterate (map (p*)))
[1] $ reverse $ primePowers n
foldi :: (a -> a -> a) -> a -> [a] -> a
foldi f z (x:xs) = f x (foldi f z (pairs xs)) where
pairs (x:y:xs) = f x y:pairs xs
pairs xs = xs
foldi f z [] = z
merge :: Ord a => [a] -> [a] -> [a]
merge (x:xs) (y:ys) = case (compare y x) of
LT -> y : merge (x:xs) ys
_ -> x : merge xs (y:ys)
merge xs [] = xs
merge [] ys = ys
{- ordDivisors 12348 ==>
[1,2,3,4,6,7,9,12,14,18,21,28,36,42,49,63,84,98,126,147,196,252,294,343,441,588,
686,882,1029,1372,1764,2058,3087,4116,6174,12348] -}
This definition is productive, too, i.e. it starts producing the divisors right away, without noticeable delay:
{- take 20 $ ordDivisors $ product $ concat $ replicate 5 $ take 11 primes
==> [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
(0.00 secs, 525068 bytes)
numDivisors $ product $ concat $ replicate 5 $ take 11 primes
==> 362797056 -}

Resources