Merge sorted inputs in Haskell? - algorithm

I'm a newbie to Haskell, and I'm trying to write an elegant function to merge an arbitrary number of sorted lists into a single sorted list... Can anyone provide an elegant and efficient reference implementation?
Thanks!

Something like this should work:
merge2 pred xs [] = xs
merge2 pred [] ys = ys
merge2 pred (x:xs) (y:ys) =
case pred x y of
True -> x: merge2 pred xs (y:ys)
False -> y: merge2 pred (x:xs) ys
merge pred [] = []
merge pred (x:[]) = x
merge pred (x:xs) = merge2 pred x (merge pred xs)
Here, the function merge2 merges 2 lists. The function merge merges a list of lists. The pred is predicate you use for sorting.
Example:
merge (<) [[1, 3, 9], [2, 3, 4], [7, 11, 15, 22]]
should return
[1,2,3,3,4,7,9,11,15,22]

Since I like taking advantage of infix operators and higher-order functions where it makes sense to, I would write
infixr 5 ##
(##) :: (Ord a) => [a] -> [a] -> [a]
-- if one side is empty, the merges can only possibly go one way
[] ## ys = ys
xs ## [] = xs
-- otherwise, take the smaller of the two heads out, and continue with the rest
(x:xs) ## (y:ys) = case x `compare` y of
LT -> x : xs ## (y:ys)
EQ -> x : xs ## ys
GT -> y : (x:xs) ## ys
-- a n-way merge can be implemented by a repeated 2-way merge
merge :: (Ord a) => [[a]] -> [a]
merge = foldr1 (##)
Here, xs ## ys merges two lists by their natural ordering (and drops duplicates), while merge [xs, ys, zs..] merges any number of lists.
This leads to the very natural definition of the Hamming numbers:
hamming :: (Num a, Ord a) => [a]
hamming = 1 : map (2*) hamming ## map (3*) hamming ## map (5*) hamming
hamming = 1 : merge [map (n*) hamming | n <- [2, 3, 5]] -- alternative
-- this generates, in order, all numbers of the form 2^i * 3^j * 5^k
-- hamming = [1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,27,30,32,36,40,45,48,50,..]
Stealing yairchu's unimplemented idea:
{-# LANGUAGE ViewPatterns #-}
import qualified Data.Map as M
import Data.List (foldl', unfoldr)
import Data.Maybe (mapMaybe)
-- merge any number of ordered lists, dropping duplicate elements
merge :: (Ord a) => [[a]] -> [a]
-- create a map of {n: [tails of lists starting with n]}; then
-- repeatedly take the least n and re-insert the tails
merge = unfoldr ((=<<) step . M.minViewWithKey) . foldl' add M.empty where
add m (x:xs) = M.insertWith' (++) x [xs] m; add m _ = m
step ((x, xss), m) = Just (x, foldl' add m xss)
-- merge any number of ordered lists, preserving duplicate elements
mergeDup :: (Ord a) => [[a]] -> [a]
-- create a map of {(n, i): tail of list number i (which starts with n)}; then
-- repeatedly take the least n and re-insert the tail
-- the index i <- [0..] is used to prevent map from losing duplicates
mergeDup = unfoldr step . M.fromList . mapMaybe swap . zip [0..] where
swap (n, (x:xs)) = Just ((x, n), xs); swap _ = Nothing
step (M.minViewWithKey -> Just (((x, n), xs), m)) =
Just (x, case xs of y:ys -> M.insert (y, n) ys m; _ -> m)
step _ = Nothing
where merge, like my original, eliminates duplicates, while mergeDup preserves them (like Igor's answer).

if efficiency wasn't a concern I'd go with
merge = sort . concat
otherwise:
merge :: Ord a => [[a]] -> [a]
merge [] = []
merge lists =
minVal : merge nextLists
where
heads = map head lists
(minVal, minIdx) = minimum $ zip heads [0..]
(pre, ((_:nextOfMin):post)) = splitAt minIdx lists
nextLists =
if null nextOfMin
then pre ++ post
else pre ++ nextOfMin : post
note however that this implementation always linearly searches for the minimum (while for a large number of list one may wish to maintain a heap etc.)

Unlike the other posts, I would have merge :: [a] -> [a] -> [a]
type SortedList a = [a]
merge :: (Ord a) => SortedList a -> SortedList a -> SortedList a
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
| x < y = x : merge xs (y : ys)
| otherwise = y : merge (x : xs) ys
mergeAll :: (Ord a) => [SortedList a] -> SortedList a
mergeAll = foldr merge []

Just a quick note: if you want to have the optimal log n behavior when merging several lists (such as you'd get with a priority queue), you can do it very easily with a tweak to Igor's beautiful solution above. (I would have put this as a comment on his answer above, but I don't have enough reputation.) In particular, you do:
merge2 pred xs [] = xs
merge2 pred [] ys = ys
merge2 pred (x:xs) (y:ys) =
case pred x y of
True -> x: merge2 pred xs (y:ys)
False -> y: merge2 pred (x:xs) ys
everyother [] = []
everyother e0:[] = e0:[]
everyother (e0:e1:es) = e0:everyother es
merge pred [] = []
merge pred (x:[]) = x
merge pred xs = merge2 pred (merge pred . everyother $ xs)
(merge pred . everyother . tail $ xs)
Note that a real priority queue would be a bit faster/more space efficient, but that this solution is asymptotically just as good and, as I say, has the advantage that it's a very minor tweak to Igor's beautifully clear solution above.
Comments?

Related

How do I sort a merged list into ascending order?

merge :: [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) = x : y : merge xs ys
I have got this working but now I need to sort them in ascending order.
You just need to compare x and y and decide which to add to the result first. Note that you only add one at a time; the next element after x might still come before y.
Note you need the Ord constraint to ensure that <= is defined for a.
merge :: Ord a => [a] -> [a] -> [a]
merge xs [] = xs
merge [] ys = ys
merge xs#(x:xs') ys#(y:ys') | x <= y = x : merge xs' ys
| otherwise = y : merge xs ys'

Is this a correctly implemented mergesort in Haskell?

I could not find my code anywhere on the net, so can you please tell me why or why not the function myMergeSort is a mergesort? I know my function myMergeSort sorts, but am not sure if it really sorts using the mergesort algorithm or if it is a different algorithm. I just began with Haskell a few days ago.
merge xs [] = xs
merge [] ys = ys
merge (x : xs) (y : ys)
| x <= y = x : merge xs (y : ys)
| otherwise = y : merge (x : xs) ys
myMergeSort :: [Int] -> [Int]
myMergeSort [] = []
myMergeSort (x:[]) = [x]
myMergeSort (x:xs) = foldl merge [] (map (\x -> [x]) (x:xs))
I have no questions about the merge function.
The following function mergeSortOfficial was the solution presented to us, I understand it but am not sure if I am implementing the mergesort algorithm in my function myMergeSort correctly or not.
Official solution - implemenation:
mergeSortOfficial [] = []
mergeSortOfficial (x : []) = [x]
mergeSortOfficial xs = merge
(mergeSortOfficial (take ((length xs) ‘div‘ 2) xs))
(mergeSortOfficial (drop ((length xs) ‘div‘ 2) xs))
No, that's not mergeSort. That's insertionSort, which is essentially the same algorithm as bubbleSort, depending on how you stare at it. At each step, a singleton list is merged with the accumulated ordered-list-so-far, so, effectively, the element of that singleton is inserted.
As other commenters have already observed, to get mergeSort (and in particular, its efficiency), it's necessary to divide the problem repeatedly into roughly equal parts (rather than "one element" and "the rest"). The "official" solution gives a rather clunky way to do that. I quite like
foldr (\ x (ys, zs) -> (x : zs, ys)) ([], [])
as a way to split a list in two, not in the middle, but into elements in even and odd positions.
If, like me, you like to have structure up front where you can see it, you can make ordered lists a Monoid.
import Data.Monoid
import Data.Foldable
import Control.Newtype
newtype Merge x = Merge {merged :: [x]}
instance Newtype (Merge x) [x] where
pack = Merge
unpack = merged
instance Ord x => Monoid (Merge x) where
mempty = Merge []
mappend (Merge xs) (Merge ys) = Merge (merge xs ys) where
-- merge is as you defined it
And now you have insertion sort just by
ala' Merge foldMap (:[]) :: [x] -> [x]
One way to get the divide-and-conquer structure of mergeSort is to make it a data structure: binary trees.
data Tree x = None | One x | Node (Tree x) (Tree x) deriving Foldable
I haven't enforced a balancing invariant here, but I could. The point is that the same operation as before has another type
ala' Merge foldMap (:[]) :: Tree x -> [x]
which merges lists collected from a treelike arrangement of elements. To obtain said arrangements, think "what's cons for Tree?" and make sure you keep your balance, by the same kind of twistiness I used in the above "dividing" operation.
twistin :: x -> Tree x -> Tree x -- a very cons-like type
twistin x None = One x
twistin x (One y) = Node (One x) (One y)
twistin x (Node l r) = Node (twistin x r) l
Now you have mergeSort by building a binary tree, then merging it.
mergeSort :: Ord x => [x] -> [x]
mergeSort = ala' Merge foldMap (:[]) . foldr twistin None
Of course, introducing the intermediate data structure has curiosity value, but you can easily cut it out and get something like
mergeSort :: Ord x => [x] -> [x]
mergeSort [] = []
mergeSort [x] = [x]
mergeSort xs = merge (mergeSort ys) (mergeSort zs) where
(ys, zs) = foldr (\ x (ys, zs) -> (x : zs, ys)) ([], []) xs
where the tree has become the recursion structure of the program.
myMergeSort is not a correct merge sort. It is a correct insertion sort though. We start with an empty list, then insert the elements one-by-one into the correct position:
myMergeSort [2, 1, 4, 3] ==
foldl merge [] [[2], [1], [4], [3]] ==
((([] `merge` [2]) `merge` [1]) `merge` [4]) `merge` [3] ==
(([2] `merge` [1]) `merge` [4]) `merge` [3]
([1, 2] `merge` [4]) `merge` [3] ==
[1, 2, 4] `merge` [3] ==
[1, 2, 3, 4]
Since each insertion takes linear time, the whole sort is quadratic.
mergeSortOfficial is technically right, but it's inefficient. length takes linear time, and it's called at each level of recursion for the total length of the list. take and drop are also linear. The overall complexity remains the optimal n * log n, but we run a couple of unnecessary circles.
If we stick to top-down merging, we could do better with splitting the list to a list of elements with even indices and another with odd indices. Splitting is still linear, but it's only a single traversal instead of two (length and then take / drop in the official sort).
split :: [a] -> ([a], [a])
split = go [] [] where
go as bs [] = (as, bs)
go as bs (x:xs) = go (x:bs) as xs
mergeSortOfficial :: [Int] -> [Int]
mergeSortOfficial [] = []
mergeSortOfficial (x : []) = [x]
mergeSortOfficial xs =
let (as, bs) = split xs in
merge (mergeSortOfficial as) (mergeSortOfficial bs)
As WillNess noted in the comments, the above split yields an unstable sort. We can use a stable alternative:
import Control.Arrow
stableSplit :: [a] -> ([a], [a])
stableSplit xs = go xs xs where
go (x:xs) (_:_:ys) = first (x:) (go xs ys)
go xs ys = ([], xs)
The best way is probably doing a bottom-up merge. It's the approach the sort in Data.List takes. Here we merge consecutive pairs of lists until there is only a single list left:
mergeSort :: Ord a => [a] -> [a]
mergeSort [] = []
mergeSort xs = mergeAll (map (:[]) xs) where
mergePairs (x:y:ys) = merge x y : mergePairs ys
mergePairs xs = xs
mergeAll [xs] = xs
mergeAll xs = mergeAll (mergePairs xs)
Data.List.sort works largely the same as above, except it starts with finding descending and ascending runs in the input instead of just creating singleton lists from the elements.

Permutation of a list in haskell

We are given two lists xs :: [a] and ys :: [Int]. For example:
xs = ["some", "random", "text"]
ys = [2, 3, 1]
We have to generate a new list zs :: [a], such that zs is a permutation of xs generated using ys. For above example:
zs = ["random", "text", "some"]
Explanation: "random" occurs at 2nd position in xs, "text" occurs on the 3rd position and "some" occurs on the 1st position.
Till now, I have arrived at this solution:
f :: [a] -> [Int] -> [a]
f xs ys = getList (listArray (1, n) xs) ys where
n = length xs
getList :: Array Int a -> [Int] -> [a]
getList a ys = [ a ! x | x <- ys]
Is there a better definition for f which will avoid use of array? I am looking for memory efficient solutions. Array is a bad choice if xs is say a large list of big strings. Time complexity of f could be relaxed to O(n log n).
Simply sorting twice, back and forth, does the job:
import Data.Ord
import Data.List
f :: [a] -> [Int] -> [a]
f xs = map fst . sortBy (comparing snd) . zip xs .
map fst . sortBy (comparing snd) . zip ([1..] :: [Int])
So that
Prelude Data.Ord Data.List> f ["some", "random", "text"] [2, 3, 1]
["random","text","some"]
(using the idea from this answer).
Since we sort on Int indices the both times, you can use some integer sorting like radix sort, for an O(n) solution.

Balanced Partition in Haskell

In haskell, how can I generate a balanced partition of a set?
Assuming I have a set {1,3,4,6,9}, a balanced partition of that set would be s1{9,3} and s2{6,4,1}, seeing as s1-s2 is 1.
Well, for brute force, we can generate all partitions recursively by generating partitions for the tail and then putting the head on the left list or the right:
partitions :: [a] -> [([a], [a])]
partitions [] = [([], [])]
partitions (x : xs) = let ps = partitions xs in
[(x : ys, zs) | (ys, zs) <- ps] ++ [(ys, x : zs) | (ys, zs) <- ps]
have a way to compute the unbalance:
unbalance :: Num a => ([a], [a]) -> a
unbalance (ys, zs) = abs (sum ys - sum zs)
and then put it all together:
balancedPartition :: (Num a, Ord a) => [a] -> ([a], [a])
balancedPartition = minimumBy (comparing unbalance) . partitions
Here's the complete module:
module Balance where
import Data.List(minimumBy)
import Data.Ord(comparing)
partitions :: [a] -> [([a], [a])]
partitions [] = [([], [])]
partitions (x : xs) = let ps = partitions xs in
[(x : ys, zs) | (ys, zs) <- ps] ++ [(ys, x : zs) | (ys, zs) <- ps]
unbalance :: Num a => ([a], [a]) -> a
unbalance (ys, zs) = abs (sum ys - sum zs)
balancedPartition :: (Num a, Ord a) => [a] -> ([a], [a])
balancedPartition = minimumBy (comparing unbalance) . partitions
Here's a solution that does a little better:
balancedPartition :: (Num a, Ord a) => [a] -> ([a], [a])
balancedPartition = snd . head . partitionsByBadness . sort
where
-- recursively builds a list of possible partitionings and their badness
-- sorted by their (increasing) badness
partitionsByBadness [] = [(0, ([], []))]
partitionsByBadness (x:xs) = let res = partitionsByBadness xs
withX = map ( (+x) *** first (x:)) res
sansX = map (subtract x *** second (x:)) res
in merge withX $ normalize sansX
-- When items are added to the second list, the difference between the sums
-- decreases - and might become negative
-- We take those cases and swap the lists, so that the first list has always
-- a greater sum and the difference is always positive
-- So that we can sort the list again (with linear complexity)
normalize xs = let (neg, pos) = span ((<0) . fst) xs
in merge pos $ reverse $ map (negate *** swap) neg
-- merge two sorted lists (as known from mergesort, but
-- omits "duplicates" with same badness)
merge :: Ord k => [(k, v)] -> [(k, v)] -> [(k, v)]
merge [] zss = zss
merge yss [] = yss
merge yss#(y:ys) zss#(z:zs) = case comparing fst y z of
LT -> y : merge ys zss
EQ -> merge ys zss
GT -> z : merge yss zs
Bin packing works pretty well:
% stack ghci --package Binpack
λ: import Data.BinPack
λ: let bins numberOfBins items = let totalSize = sum items; binSize = succ (totalSize `div` (max 1 numberOfBins)) in binpack WorstFit Decreasing id (replicate numberOfBins (emptyBin binSize)) items
λ: bins 2 [1,3,4,6,9]
([(0,[3,9]),(1,[1,4,6])],[])
If you know your input will fit into the bins you can extract out the partitions:
λ: map snd . fst . bins 2 $ [1,3,4,6,9]
[[3,9],[1,4,6]]

Simplifying selectionsort and mergesort

I have managed to implement insertionsort and quicksort in a couple of lines, but selectionsort and mergesort still give me headaches ;)
selectionsort [] = []
selectionsort (x:xs) =
let (minimum, greater) = extractMinimum x [] xs
in minimum : selectionsort greater
extractMinimum minimumSoFar greater [] = (minimumSoFar, greater)
extractMinimum minimumSoFar greater (x:xs)
| x < minimumSoFar = extractMinimum x (minimumSoFar:greater) xs
| otherwise = extractMinimum minimumSoFar (x:greater) xs
Is something like the extractMinimum function available in the standard library? I tried hoogling for (a -> a -> Bool/Ordering) -> [a] -> (a, [a]) without any luck.
mergesort [ ] = [ ]
mergesort [x] = [x]
mergesort xs =
let (left, right) = splitAt (length xs `div` 2) xs
in merge (mergesort left) (mergesort right)
merge xs [] = xs
merge [] ys = ys
merge xxs#(x:xs) yys#(y:ys)
| x < y = x : merge xs yys
| otherwise = y : merge xxs ys
Again, do I have to write merge myself, or can I reuse existing components? Hoogle gave me no useful results for (a -> a -> Bool/Ordering) -> [a] -> [a] -> [a].
There's nothing in the standard libraries, but at least merge is provided by a package on hackage, although I'm not sure it's worth pulling in a dependency for such a simple function.
However,
merge xxs#(x:xs) yys#(y:ys)
| x < y = x : merge xs yys
| otherwise = y : merge xxs ys
produces a non-stable sort, to get a stable sort, the condition to place x should be x <= y.
For extractMinimum, I haven't found anything either, but I can offer an alternative definition,
extractMinimum :: Ord a => a -> [a] -> (a,[a])
extractMinimum x = foldl' select (x, [])
where
select (mini, greater) y
| y < mini = (y, mini:greater)
| otherwise = (mini, y:greater)
A nice definition of selectionSort would be
import Data.List -- for unfoldr
selectionSort :: Ord a => [a] -> [a]
selectionSort = unfoldr getMin
where
getMin [] = Nothing
getMin (x:xs) = Just $ extractMinimum x xs
My suggestion for selection sort:
import Data.List
selectionsort xs = unfoldr f xs where
f [] = Nothing
f xs = Just $ extractMinimum xs
extractMinimum (x:xs) = foldl' f (x,[]) xs where
f (minimum, greater) x | x < minimum = (x, minimum : greater)
| otherwise = (minimum, x : greater)

Resources