First non-repeating char in a string ? in haskell or F# - algorithm

Given a sequence of char what is the most efficient way to find the first non repeating char
Interested purely functional implementation haskell or F# preffered.

A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)
The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.

A simple F# solution:
let f (s: string) =
let n = Map(Seq.countBy id s)
Seq.find (fun c -> n.[c] = 1) s

Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.
open System
open System.IO
open System.Collections.Generic
let Solve (str : string) =
let arrStr = str.ToCharArray()
let sorted = Array.sort arrStr
let len = str.Length - 1
let rec Inner i =
if i = len + 1 then
'-'
else
let index = Array.BinarySearch(sorted, arrStr.[i])
if index = 0 && sorted.[index+1] <> sorted.[index] then
arrStr.[i]
elif index = len && sorted.[index-1] <> sorted.[index] then
arrStr.[i]
elif index > 0 && index < len &&
sorted.[index+1] <> sorted.[index] &&
sorted.[index-1] <> sorted.[index] then
arrStr.[i]
else
Inner (i + 1)
Inner 0
let _ =
printfn "%c" (Solve "abcdefabcf")
A - means all characters are repeated.
Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.

Here's a bit longish solution, but guaranteed to be worst-case O(n log n):
import List
import Data.Ord.comparing
sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)
index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]
dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
dropRepeated $ dropWhile ((==(fst x)).fst) xs
| otherwise =
x:(dropRepeated xs)
nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index
firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs
The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.
The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).
Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...

An alternate Haskell O(n log n) solution using Data.Map and no sorting:
module NonRepeat (
firstNonRepeat
)
where
import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)
data Occurance = Occ { first :: Int, count :: Int }
deriving (Eq, Ord)
note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)
combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)
firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
where occurances = toList . fromListWith combine . zipWith note [0..]
findMinimum = safeMinimum . filter ((== 1).count.snd)
safeMinimum [] = Nothing
safeMinimum xs = Just $ minimumBy (comparing snd) xs

let firstNonRepeating (str:string) =
let rec inner i cMap =
if i = str.Length then
cMap
|> Map.filter (fun c (count, index) -> count = 1)
|> Map.toSeq
|> Seq.minBy (fun (c, (count, index)) -> index)
|> fst
else
let c = str.[i]
let value = if cMap.ContainsKey c then
let (count, index) = cMap.[c]
(count + 1, index)
else
(1, i)
let cMap = cMap.Add(c, value)
inner (i + 1) cMap
inner 0 (Map.empty)
Here is a simpler version that sacrifices speed.
let firstNonRepeating (str:string) =
let (c, count) = str
|> Seq.countBy (fun c -> c)
|> Seq.minBy (fun (c, count) -> count)
if count = 1 then Some c else None

How about something like this:
let firstNonRepeat s =
let repeats =
((Set.empty, Set.empty), s)
||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
|> snd
s
|> Seq.tryFind (fun c -> not (Set.contains c repeats))

This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):
static char FstNonRepeatedChar(string s)
{
return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}

Related

Why is my Haskell selection sort implementation extremely fast?

I implemented selection sort and compared it to Data.List's sort. It is orders of magnitudes faster than Data.List's sort. If I apply it to 10,000 randomly generated numbers the results are as follows:
✓ in 1.22µs: Selection sort
✓ in 9.84ms: Merge sort (Data.List)
This can't be right. First I thought maybe merge sort's intermediate results are cached and selection sort uses those to be much faster. Even when I comment out merge sort and only time selection sort, it is this fast however. I also verified the output and it is correctly sorted.
What causes this behaviour?
I use this code to test:
{-# LANGUAGE BangPatterns #-}
module Lib
( testSortingAlgorithms
) where
import System.Random (randomRIO)
import Text.Printf
import Control.Exception
import System.CPUTime
import Data.List (sort, sortOn)
selectionSort :: Ord a => [a] -> [a]
selectionSort [] = []
selectionSort nrs =
let (smallest, rest) = getSmallest nrs
in smallest : selectionSort rest
where getSmallest :: Ord a => [a] -> (a, [a])
getSmallest [a] = (a, [])
getSmallest (a:as) = let (smallest, rest) = getSmallest as
in if smallest > a then (a, smallest : rest)
else (smallest, a : rest)
main :: IO ()
main = testSortingAlgorithms
testSortingAlgorithms :: IO ()
testSortingAlgorithms = do
!list' <- list (10000)
results <- mapM (timeIt list') sorts
let results' = sortOn fst results
mapM_ (\(diff, msg) -> printf (msg) (diff::Double)) results'
return ()
sorts :: Ord a => [(String, [a] -> [a])]
sorts = [
("Selection sort", selectionSort)
, ("Merge sort (Data.List)", sort)
]
list :: Int -> IO [Int]
list n = sequence $ replicate n $ randomRIO (-127,127::Int)
timeIt :: (Ord a, Show a)
=> [a] -> (String, [a] -> [a]) -> IO (Double, [Char])
timeIt vals (name, sorter) = do
start <- getCPUTime
--v <- sorter vals `seq` return ()
let !v = sorter vals
--putStrLn $ show v
end <- getCPUTime
let (diff, ext) = unit $ (fromIntegral (end - start)) / (10^3)
let msg = if correct v
then (" ✓ in %0.2f" ++ ext ++ ": " ++ name ++ "\n")
else (" ✗ in %0.2f" ++ ext ++ ": " ++ name ++ "\n")
return (diff, msg)
correct :: (Ord a) => [a] -> Bool
correct [] = True
correct (a:[]) = True
correct (a1:a2:as) = a1 <= a2 && correct (a2:as)
unit :: Double -> (Double, String)
unit v | v < 10^3 = (v, "ns")
| v < 10^6 = (v / 10^3, "µs")
| v < 10^9 = (v / 10^6, "ms")
| otherwise = (v / 10^9, "s")
You write
let !v = sorter vals
which is "strict", but only to WHNF. So you are only timing how long it takes to find the smallest element of the list, not how long it takes to sort the whole thing. Selection sort starts by doing exactly that, so it is "optimal" for this incorrect benchmark, while mergesort does a bunch more work that's "wasted" if you only look at the first element.

Haskell Multiple Function Composition

Im trying to understand function composition in Haskell.
According to ZVON http://zvon.org/other/haskell/Outputprelude/filter_f.html
the filter function should have two arguments, a bool function and a list.
Example filter (>5) [1,2,3,4,5,6,7,8] returns anything greater than 5:
[6,7,8]
Question, how does the following line with several function compositions pass in a boolean for the filter to utilize?
map fst . filter snd . assocs . soeA
shouldn't it be map fst . filter (==True) snd . assocs . soeA
To analyze I run the first two functions of the composition and pass an argument: assocs . soeA $ 9 returns
[(0,False),(1,False),(2,True),(3,True),(4,False),(5,True),(6,False),(7,True),(8,False),(9,False)]
soe 9 returns [2,3,5,7]
Somehow the bool value in each array element of soeA is being used, but any help explaining how this composition is working would be very much appreciated.
Full Code is:
`
module FastSeive where
import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
soeST :: forall s. Int -> ST s (STUArray s Int Bool)
soeST n = do
arr <- newArray (0, n) True
mapM_ (\i -> writeArray arr i False) [0, 1]
let n2 = n `div` 2
let loop :: Int -> ST s ()
loop i | i > n2 = return ()
loop i = do
b <- readArray arr i
let reset :: Int -> ST s ()
reset j | j > n = return ()
reset j = writeArray arr j False >> reset (j + i)
when b (reset (2*i))
loop (succ i)
loop 2
return arr
soeA :: Int -> UArray Int Bool
soeA n = runST (soeST n >>= freeze)
soe :: Int -> [Int]
soe = map fst . filter snd . assocs . soeA
soeCount :: Int -> Int
soeCount = length . filter id . elems . soeA
`
The short answer is: here, snd is the Bool-returning function filter expects. In the expression you wrote: map fst . filter (==True) snd . assocs . soeA. snd would be filter's second argument, while (==True) would be the first one. Of course, it won't typecheck because filter is already applied to two arguments, and cannot be used in function composition: it's not a function anymore.
For a longer answer, we can actually apply (.)'s definition to find out what's happening:
(f . g) x = f (g x)
-- In haskell, it is defined as being right associative
-- Meaning that if we put explicit parenthesises, we'd have:
soe = (map fst . (filter snd . (assocs . soeA)))
-- That only really matters for the compiler, though,
-- because we know function composition is associative.
soe = map fst . filter snd . assocs . soeA
-- "Un-pointfree-ing" it:
soe x = (map fst . filter snd . assocs . soeA) x
-- Applying (.)'s definition:
soe x = map fst ((filter snd . assocs . soeA) x)
-- Again:
soe x = map fst (filter snd ((assocs . soeA) x))
-- And again:
soe x = map fst (filter snd (asocs (soeA x)))
It's now clear that snd is filter's first argument, while the second argument will evaluate to what assocs (soeA x) will evaluate to.
More generally, when one writes f . g . h, this can be read right-to-left as a function that first applies h to its argument, then g to the result, then f to the next result, and yields that final value.
Now, for the even longer answer, we can look at how the types for your expression will be inferred. It'll tell us why snd is the Bool-returning function filter expects even though it has a type signature of snd :: (a, b) -> b.
Disclaimer: I don't have a background in compiler engineering; the terms I'll be using may be inexact.
The type of filter is (a -> Bool) -> [a] -> [a]. The type of snd is (a, b) -> b.
Those are actually parameterized types. We can make the type parameters explicit:
filter :: forall a. (a -> Bool) -> [a] -> [a]
snd :: forall a b. (a, b) -> b
We'll also rename filter's type argument in order to make it non-ambiguous in what we'll write next:
filter :: forall c. (c -> Bool) -> [c] -> [c]
filter gets applied first to snd. So, we can try and unify c -> Bool from filter with (a, b) -> b, snd's type. We get these equations :
c -> Bool = (a, b) -> b
===
c = (a, b)
b = Bool
===
c = (a, Bool)
b = Bool
We'll assume that assocs (soeA x)'s type is [(Int, Bool)]. Since filter's second argument has the type [c], we can unify further:
[c] = [(Int, Bool)]
===
c = (Int, Bool)
This also gives us:
(Int, Bool) = c = (a, Bool)
===
a = Int
So, after type application, we get these concrete types for our sub-expressions :
filter :: ((Int, Bool) -> Bool) -> [(Int, Bool)] -> [(Int, Bool)]
snd :: (Int, Bool) -> Bool
Well, of course, we could have used GHC's type inference all along to tell us about that, either using GHCi, or through a text editor's haskell plugin.

Haskell performance using dynamic programming

I am attempting to calculate the Levenshtein distance between two strings using dynamic programming. This is being done through Hackerrank, so I have timing constraints. I used a techenique I saw in: How are Dynamic Programming algorithms implemented in idiomatic Haskell? and it seems to be working. Unfortunaly, it is timing out in one test case. I do not have access to the specific test case, so I don't know the exact size of the input.
import Control.Monad
import Data.Array.IArray
import Data.Array.Unboxed
main = do
n <- readLn
replicateM_ n $ do
s1 <- getLine
s2 <- getLine
print $ editDistance s1 s2
editDistance :: String -> String -> Int
editDistance s1 s2 = dynamic editDistance' (length s1, length s2)
where
s1' :: UArray Int Char
s1' = listArray (1,length s1) s1
s2' :: UArray Int Char
s2' = listArray (1,length s2) s2
editDistance' table (i,j)
| min i j == 0 = max i j
| otherwise = min' (table!((i-1),j) + 1) (table!(i,(j-1)) + 1) (table!((i-1),(j-1)) + cost)
where
cost = if s1'!i == s2'!j then 0 else 1
min' a b = min (min a b)
dynamic :: (Array (Int,Int) Int -> (Int,Int) -> Int) -> (Int,Int) -> Int
dynamic compute (xBnd, yBnd) = table!(xBnd,yBnd)
where
table = newTable $ map (\coord -> (coord, compute table coord)) [(x,y) | x<-[0..xBnd], y<-[0..yBnd]]
newTable xs = array ((0,0),fst (last xs)) xs
I've switched to using arrays, but that speed up was insufficient. I cannot use Unboxed arrays, because this code relies on laziness. Are there any glaring performance mistakes I have made? Or how else can I speed it up?
The backward equations for edit distance calculations are:
f(i, j) = minimum [
1 + f(i + 1, j), -- delete from the 1st string
1 + f(i, j + 1), -- delete from the 2nd string
f(i + 1, j + 1) + if a(i) == b(j) then 0 else 1 -- substitute or match
]
So within each dimension, you need nothing more than the very next index: + 1. This is a sequential access pattern, not random access to require arrays; and can be implemented using lists and nested right folds:
editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = head . foldr loop [n, n - 1..0] $ zip a [m, m - 1..]
where
(m, n) = (length a, length b)
loop (s, l) lst = foldr go [l] $ zip3 b lst (tail lst)
where
go (t, i, j) acc#(k:_) = inc `seq` inc:acc
where inc = minimum [i + 1, k + 1, if s == t then j else j + 1]
You may test this code in Hackerrank Edit Distance Problem as in:
import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
import Text.Read (readMaybe)
editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = ... -- as implemented above
main :: IO ()
main = do
Just n <- readMaybe <$> getLine
replicateM_ n $ do
a <- getLine
b <- getLine
print $ editDistance a b
which passes all tests with a decent performance.

Construct infinite sorted list without adding duplicates

I am relatively new to Haskell, but I am trying to learn both by reading and trying to solve problems on Project Euler. I am currently trying to implement a function that takes an infinite list of integers and returns the ordered list of pairwise sums of elements in said list. I am really looking for solutions to the specific issue I am facing, rather than advice on different strategies or approaches, but those are welcome as well, as being a coder doesn't mean knowing how to implement a strategy, but also choosing the best strategy available.
My approach relies on traversing an infinite list of infinite generators and retrieving elements in order, with several mathematical properties that are useful in implementing my solution.
If I were trying to obtain the sequence of pairwise sums of the natural numbers, for example, this would be my code:
myList :: [Integer]
myList = [1..]
myGens :: [[Integer]]
myGens = gens myList
where
gens = \xs -> map (\x -> [x+y|y<-(dropWhile (<x) xs)]) xs
Regardless of the number set used, provided that it is sorted, the following conditions hold:
∀ i ≥ 0, head (gens xs !! i) == 2*(myList !! i)
∀ i,j,k ≥ 0, l > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j+l)
Special cases for the second condition are:
∀ i,j ≥ 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+1) !! j)
∀ i,j ≥ 0, k > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j)
Here is the particular code I am trying to modify:
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,i) = step xs cs xss
counts = inc i cs
streams = chop i xss
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,Int)
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,Int)
pace hs xs#((x,i):xt) = minim (x,i) hs xt
where
minim :: (Integer,Int) -> [Integer] -> [(Integer,Int)] -> (Integer,Int)
minim m _ [] = m
minim m#(g,i) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = y
| g > h = minim y hs ynt
| 2*(hs !! n) > g = m
| otherwise = minim m hs ynt
defer :: [Int] -> [[a]] -> [(a,Int)]
defer cs xss = (infer (zip cs (zip (map head xss) [0..])))
infer :: [(Int,(a,Int))] -> [(a,Int)]
infer [] = []
infer ((c,xi):xis) | c == 0 = xi:[]
| otherwise = xi:(infer (dropWhile (\(p,(q,r)) -> p>=c) xis))
The set in question I am using has the property that multiple distinct pairs produce an identical sum. I want an efficient method of handling all duplicate elements at once, in order to avoid an increased cost of computing all the pairwise sums up to N, as it requires M more tests if M is the number of duplicates.
Does anyone have any suggestions?
EDIT:
I made some changes to the code, independently of what was suggested, and would appreciate feedback on the relative efficiencies of my original code, my revised code, and the proposals so far.
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,is) = step xs cs xss
counts = foldr (\i -> inc i) cs is
streams = foldr (\i -> chop i) xss is
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,[Int])
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,[Int])
pace hs xs#((x,i):xt) = minim (x,(i:[])) hs xt
where
minim :: (Integer,[Int]) -> [Integer] -> [(Integer,Int)] -> (Integer,[Int])
minim m _ [] = m
minim m#(g,is#(i:_)) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = (h,[n])
| g > h = minim (h,[n]) hs ynt
| g == h && 2*(hs !! n) > h = (g,n:is)
| g == h = minim (g,n:is) hs ynt
| g < h && 2*(hs !! n) > g = m
| g < h = minim m hs ynt
Also, I left out the code for inc and chop:
alter :: (a->a) -> Int -> [a] -> [a]
alter = \f -> \n -> \xs -> (take (n) xs) ++ [f (xs !! n)] ++ (drop (n+1) xs)
inc :: Int -> [Int] -> [Int]
inc = alter (1+)
chop :: Int -> [[a]] -> [[a]]
chop = alter (tail)
I'm going to present a solution that uses an infinite pairing heap. We'll have logarithmic overhead per element constructed, but no one knows how to do better (in a model with comparison-based methods and real numbers).
The first bit of code is just the standard pairing heap.
module Queue where
import Data.Maybe (fromMaybe)
data Queue k = E
| T k [Queue k]
deriving Show
fromOrderedList :: (Ord k) => [k] -> Queue k
fromOrderedList [] = E
fromOrderedList [k] = T k []
fromOrderedList (k1 : ks'#(k2 : _ks''))
| k1 <= k2 = T k1 [fromOrderedList ks']
mergePairs :: (Ord k) => [Queue k] -> Queue k
mergePairs [] = E
mergePairs [q] = q
mergePairs (q1 : q2 : qs'') = merge (merge q1 q2) (mergePairs qs'')
merge :: (Ord k) => Queue k -> Queue k -> Queue k
merge (E) q2 = q2
merge q1 (E) = q1
merge q1#(T k1 q1's) q2#(T k2 q2's)
= if k1 <= k2 then T k1 (q2 : q1's) else T k2 (q1 : q2's)
deleteMin :: (Ord k) => Queue k -> Maybe (k, Queue k)
deleteMin (E) = Nothing
deleteMin (T k q's) = Just (k, mergePairs q's)
toOrderedList :: (Ord k) => Queue k -> [k]
toOrderedList q
= fromMaybe [] $
do (k, q') <- deleteMin q
return (k : toOrderedList q')
Note that fromOrderedList accepts infinite lists. I think that this can be justified theoretically by pretending as though the infinite list of descendants effectively are merged "just in time". This feels like the kind of thing that should be in the literature on purely functional data structures already, but I'm going to be lazy and not look right now.
The function mergeOrderedByMin takes this one step further and merges a potentially infinite list of queues, where the min element in each queue is nondecreasing. I don't think that we can reuse merge, since merge appears to be insufficiently lazy.
mergeOrderedByMin :: (Ord k) => [Queue k] -> Queue k
mergeOrderedByMin [] = E
mergeOrderedByMin (E : qs') = mergeOrderedByMin qs'
mergeOrderedByMin (T k q's : qs')
= T k (mergeOrderedByMin qs' : q's)
The next function removes duplicates from a sorted list. It's in the library that m09 suggested, but for the sake of completeness, I'll define it here.
nubOrderedList :: (Ord k) => [k] -> [k]
nubOrderedList [] = []
nubOrderedList [k] = [k]
nubOrderedList (k1 : ks'#(k2 : _ks''))
| k1 < k2 = k1 : nubOrderedList ks'
| k1 == k2 = nubOrderedList ks'
Finally, we put it all together. I'll use the squares as an example.
squares :: [Integer]
squares = map (^ 2) [0 ..]
sumsOfTwoSquares :: [Integer]
sumsOfTwoSquares
= nubOrderedList $ toOrderedList $
mergeOrderedByMin
[fromOrderedList (map (s +) squares) | s <- squares]
If you don't want to modify your code that much, you can use the nub function of Data.List.Ordered (installable by cabal install data-ordlist) to filter duplicates out.
It runs in linear time, ie complexity wise your algorithm won't change.
for your example [1..] the result is just [2..]. A "very smart compiler" could deduce this from the general solution with implicit heap, that follows.
gens xs is better expressed as
gens xs = map (\t#(x:_) -> map (x+) t) $ tails xs -- or should it be
-- map (\(x:ys) -> map (x+) ys) $ tails xs -- ?
Its resulting list of lists is easily merged without duplicates by tree-like folding1 (pictured here), with
pairsums xs = foldi (\(x:l) r-> x : union l r) $ gens xs
This assumes the input list is ordered in increasing order. If it's merely in non-decreasing order (with only finite runs of equals in it, of course), you'll need to slap an orderedNub on top of that (as m09 mentions),
pairsums' = orderedNub . pairsums
Just by using foldi where foldr would work, we often get an algorithmic improvement in complexity from a factor of n to log n, a pretty significant speedup. I use it as a general tool all the time.
1The code, adjusted for infinite lists only:
foldi f (x:xs) = f x (foldi f (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
union (x:xs) (y:ys) = case compare x y of
LT -> x : union xs (y:ys)
EQ -> x : union xs ys
GT -> y : union (x:xs) ys
See also:
mergesort as foldtree (by Heinrich Apfelmus)
infinite tree folding (by Dave Bayer)
Implicit Heap (by apfelmus)
I propose to build the pairs above the diagonal, that way a lot of duplicates are not even generated:
sums xs = zipWith (map . (+)) hs ts where
(hs:ts) = tails xs
Now you have a list of lists, each containing sorted sums. Because they are sorted, it is possible to determine the next element of the sequence in a finite number of steps:
filtermerge :: (Ord a) => [[a]]->[a]
filtermerge ((h:t):ts) = h : filtermerge (insert t ts) where
insert [] ts = ts
insert xs [] = [xs]
insert h ([]:t) = insert h t
insert (h:t) ts#((h1:t1):t2)
| h < h1 = (h:t):ts
| h == h1 = insert (h:t) $ insert t1 t2
| otherwise = insert (h1:t1) $ insert (h:t) t2
filtermerge _ = []

Calculating permutations in F#

Inspired by this question and answer, how do I create a generic permutations algorithm in F#? Google doesn't give any useful answers to this.
EDIT: I provide my best answer below, but I suspect that Tomas's is better (certainly shorter!)
you can also write something like this:
let rec permutations list taken =
seq { if Set.count taken = List.length list then yield [] else
for l in list do
if not (Set.contains l taken) then
for perm in permutations list (Set.add l taken) do
yield l::perm }
The 'list' argument contains all the numbers that you want to permute and 'taken' is a set that contains numbers already used. The function returns empty list when all numbers all taken.
Otherwise, it iterates over all numbers that are still available, gets all possible permutations of the remaining numbers (recursively using 'permutations') and appends the current number to each of them before returning (l::perm).
To run this, you'll give it an empty set, because no numbers are used at the beginning:
permutations [1;2;3] Set.empty;;
I like this implementation (but can't remember the source of it):
let rec insertions x = function
| [] -> [[x]]
| (y :: ys) as l -> (x::l)::(List.map (fun x -> y::x) (insertions x ys))
let rec permutations = function
| [] -> seq [ [] ]
| x :: xs -> Seq.concat (Seq.map (insertions x) (permutations xs))
Tomas' solution is quite elegant: it's short, purely functional, and lazy. I think it may even be tail-recursive. Also, it produces permutations lexicographically. However, we can improve performance two-fold using an imperative solution internally while still exposing a functional interface externally.
The function permutations takes a generic sequence e as well as a generic comparison function f : ('a -> 'a -> int) and lazily yields immutable permutations lexicographically. The comparison functional allows us to generate permutations of elements which are not necessarily comparable as well as easily specify reverse or custom orderings.
The inner function permute is the imperative implementation of the algorithm described here. The conversion function let comparer f = { new System.Collections.Generic.IComparer<'a> with member self.Compare(x,y) = f x y } allows us to use the System.Array.Sort overload which does in-place sub-range custom sorts using an IComparer.
let permutations f e =
///Advances (mutating) perm to the next lexical permutation.
let permute (perm:'a[]) (f: 'a->'a->int) (comparer:System.Collections.Generic.IComparer<'a>) : bool =
try
//Find the longest "tail" that is ordered in decreasing order ((s+1)..perm.Length-1).
//will throw an index out of bounds exception if perm is the last permuation,
//but will not corrupt perm.
let rec find i =
if (f perm.[i] perm.[i-1]) >= 0 then i-1
else find (i-1)
let s = find (perm.Length-1)
let s' = perm.[s]
//Change the number just before the tail (s') to the smallest number bigger than it in the tail (perm.[t]).
let rec find i imin =
if i = perm.Length then imin
elif (f perm.[i] s') > 0 && (f perm.[i] perm.[imin]) < 0 then find (i+1) i
else find (i+1) imin
let t = find (s+1) (s+1)
perm.[s] <- perm.[t]
perm.[t] <- s'
//Sort the tail in increasing order.
System.Array.Sort(perm, s+1, perm.Length - s - 1, comparer)
true
with
| _ -> false
//permuation sequence expression
let c = f |> comparer
let freeze arr = arr |> Array.copy |> Seq.readonly
seq { let e' = Seq.toArray e
yield freeze e'
while permute e' f c do
yield freeze e' }
Now for convenience we have the following where let flip f x y = f y x:
let permutationsAsc e = permutations compare e
let permutationsDesc e = permutations (flip compare) e
My latest best answer
//mini-extension to List for removing 1 element from a list
module List =
let remove n lst = List.filter (fun x -> x <> n) lst
//Node type declared outside permutations function allows us to define a pruning filter
type Node<'a> =
| Branch of ('a * Node<'a> seq)
| Leaf of 'a
let permutations treefilter lst =
//Builds a tree representing all possible permutations
let rec nodeBuilder lst x = //x is the next element to use
match lst with //lst is all the remaining elements to be permuted
| [x] -> seq { yield Leaf(x) } //only x left in list -> we are at a leaf
| h -> //anything else left -> we are at a branch, recurse
let ilst = List.remove x lst //get new list without i, use this to build subnodes of branch
seq { yield Branch(x, Seq.map_concat (nodeBuilder ilst) ilst) }
//converts a tree to a list for each leafpath
let rec pathBuilder pth n = // pth is the accumulated path, n is the current node
match n with
| Leaf(i) -> seq { yield List.rev (i :: pth) } //path list is constructed from root to leaf, so have to reverse it
| Branch(i, nodes) -> Seq.map_concat (pathBuilder (i :: pth)) nodes
let nodes =
lst //using input list
|> Seq.map_concat (nodeBuilder lst) //build permutations tree
|> Seq.choose treefilter //prune tree if necessary
|> Seq.map_concat (pathBuilder []) //convert to seq of path lists
nodes
The permutations function works by constructing an n-ary tree representing all possible permutations of the list of 'things' passed in, then traversing the tree to construct a list of lists. Using 'Seq' dramatically improves performance as it makes everything lazy.
The second parameter of the permutations function allows the caller to define a filter for 'pruning' the tree before generating the paths (see my example below, where I don't want any leading zeros).
Some example usage: Node<'a> is generic, so we can do permutations of 'anything':
let myfilter n = Some(n) //i.e., don't filter
permutations myfilter ['A';'B';'C';'D']
//in this case, I want to 'prune' leading zeros from my list before generating paths
let noLeadingZero n =
match n with
| Branch(0, _) -> None
| n -> Some(n)
//Curry myself an int-list permutations function with no leading zeros
let noLZperm = permutations noLeadingZero
noLZperm [0..9]
(Special thanks to Tomas Petricek, any comments welcome)
If you need distinct permuations (when the original set has duplicates), you can use this:
let rec insertions pre c post =
seq {
if List.length post = 0 then
yield pre # [c]
else
if List.forall (fun x->x<>c) post then
yield pre#[c]#post
yield! insertions (pre#[post.Head]) c post.Tail
}
let rec permutations l =
seq {
if List.length l = 1 then
yield l
else
let subperms = permutations l.Tail
for sub in subperms do
yield! insertions [] l.Head sub
}
This is a straight-forward translation from this C# code. I am open to suggestions for a more functional look-and-feel.
Take a look at this one:
http://fsharpcode.blogspot.com/2010/04/permutations.html
let length = Seq.length
let take = Seq.take
let skip = Seq.skip
let (++) = Seq.append
let concat = Seq.concat
let map = Seq.map
let (|Empty|Cons|) (xs:seq<'a>) : Choice<Unit, 'a * seq<'a>> =
if (Seq.isEmpty xs) then Empty else Cons(Seq.head xs, Seq.skip 1 xs)
let interleave x ys =
seq { for i in [0..length ys] ->
(take i ys) ++ seq [x] ++ (skip i ys) }
let rec permutations xs =
match xs with
| Empty -> seq [seq []]
| Cons(x,xs) -> concat(map (interleave x) (permutations xs))
If you need permutations with repetitions, this is the "by the book" approach using List.indexed instead of element comparison to filter out elements while constructing a permutation.
let permutations s =
let rec perm perms carry rem =
match rem with
| [] -> carry::perms
| l ->
let li = List.indexed l
let permutations =
seq { for ci in li ->
let (i, c) = ci
(perm
perms
(c::carry)
(li |> List.filter (fun (index, _) -> i <> index) |> List.map (fun (_, char) -> char))) }
permutations |> Seq.fold List.append []
perm [] [] s

Resources