Reversible permutations algorithm - algorithm

Which algorithm for permutation of list is predictable?
For example, i can get number of i-th permutation
(Haskell code)
--List of all possible permutations
permut [] = [[]]
permut xs = [x:ys|x<-xs,ys<-permut (delete x xs)]
--In command line call:
> permut "abc" !! 2
"bac"
but i don't know how to reverse it.
I want to o something like this:
> getNumOfPermut "abc" "bac"
2
Any reversible algorithm goes!
Thank you in advance!

Okay, I wanted to wait until you answered my question about what you had tried, but I had so much fun working out the answer that I just had to write it up and share it. Nerd sniping, I guess! I'm sure I'm not the first to have invented the algorithm below, but I hope you enjoy the presentation.
Our first step is to give an actual runnable implementation of permut (which you have not done). Our implementation strategy will be a simple one: choose some element of the list, choose some permutation of the remaining elements, and concatenate the two.
chooseFrom [] = []
chooseFrom (x:xs) = (x,xs) : [(y, x:ys) | (y, ys) <- chooseFrom xs]
permut [] = [[]]
permut xs = do
(element, remaining) <- chooseFrom xs
permutation <- permut remaining
return (element:permutation)
If we run this on a sample list, it's pretty clear how it behaves:
> permut [1..4]
[[0,1,2,3],[0,1,3,2],[0,2,1,3],[0,2,3,1],[0,3,1,2],[0,3,2,1],[1,0,2,3],[1,0,3,2],[1,2,0,3],[1,2,3,0],[1,3,0,2],[1,3,2,0],[2,0,1,3],[2,0,3,1],[2,1,0,3],[2,1,3,0],[2,3,0,1],[2,3,1,0],[3,0,1,2],[3,0,2,1],[3,1,0,2],[3,1,2,0],[3,2,0,1],[3,2,1,0]]
The result has a lot of structure; for example, if we group by the first element of the contained lists, there are four groups, each containing 6 (which is 3!) elements:
> mapM_ print $ groupBy ((==) `on` head) it
[[0,1,2,3],[0,1,3,2],[0,2,1,3],[0,2,3,1],[0,3,1,2],[0,3,2,1]]
[[1,0,2,3],[1,0,3,2],[1,2,0,3],[1,2,3,0],[1,3,0,2],[1,3,2,0]]
[[2,0,1,3],[2,0,3,1],[2,1,0,3],[2,1,3,0],[2,3,0,1],[2,3,1,0]]
[[3,0,1,2],[3,0,2,1],[3,1,0,2],[3,1,2,0],[3,2,0,1],[3,2,1,0]]
So! The first digit of the list tells us "how many 6s to add". Additionally, each list in the above grouping exhibits similar structure: the lists in the first group have three groups of 2! elements each containing 1, 2, and 3 as their second element; the lists in each of those groups have 2 groups of 1! elements each starting with each of the remaining digits; and each of those groups have 1 group(s) of 0! elements each starting with the only remaining digit. So the second digit tells us "how many 2s to add", the third digit tells us "how many 1s to add", and the last digit tells us "how many 1s to add" (but always tells us to add 0 1s).
If you have implemented a change-of-base function on numbers before (e.g. decimal to hexadecimal or similar) you may recognize this pattern. Indeed, we can treat this as a change-of-base operation with a sliding base: instead of 1s, 10s, 100s, 1000s, and so on columns, we have 0!s, 1!s, 2!s, 3!s, 4!s, and so on columns. Let's write it! For efficiency, we'll compute all the sliding bases up front with a factorials function.
import Data.List
factorials n = scanr (*) 1 [n,n-1..1]
deleteAt i xs = case splitAt i xs of (b, e) -> b ++ drop 1 e
permutIndices permutation original
= go (factorials (length permutation - 1))
permutation
original
where
go _ [] [] = [0]
go _ [] _ = []
go _ _ [] = []
go (base:bases) (x:xs) ys = do
i <- elemIndices x ys
remainder <- go bases xs (deleteAt i ys)
return (i*base + remainder)
go [] _ _ = error "the impossible happened!"
Here's a sample sanity-check:
> map (`permutIndices` [1..4]) (permut [1..4])
[[0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23]]
And, for fun, here you can see it handling ambiguity correctly:
> permutIndices "acbba" "aabbc"
[21,23,45,47]
> map (permut "aabbc"!!) it
["acbba","acbba","acbba","acbba"]
...and showing that it's significantly more efficient than elemIndices:
> :set +s
> elemIndices "zyxwvutsr" (permut "rstuvwxyz")
[362879]
(2.65 secs, 1288004848 bytes)
> permutIndices "zyxwvutsr" "rstuvwxyz"
[362879]
(0.00 secs, 1030304 bytes)
Less than one thousandth the allocation/time. Seems like a win!

So, to be clear, you are looking for a way to find the position of a given permution-
"bac"
in a list of given permutions-
["abc", "acb", "bac", ....]
This problem actually has nothing inherently to do with permutions themselves. You want to find the location of an element in an array.
As #raymonad mentioned in his comment, stackoverflow.com/questions/20641772/ deals with this question, and the answer there was, use elemIndex.
elemIndex thePermutionToFind $ permut theString
Keep in mind, that if letters repeat, a value might appear more than once in the output, if your "permut" function doesn't remove these duplicates (ie- Note that permut "aa" = ["aa", "aa"]).... In this case the elemIndices function will come in useful.
If elemIndex returns Nothing, it means the string you supplied wasn't a permution.
(this isn't the most effecient algorithm for large strings, since the number of permutions grows like the factorial of the size of the string.... Which is worse than exponential.)

Related

Algorithm to find list given dot product and another list

I need to write a function findL that takes a list L1 of integers and a desired dot product n, and returns a list L2 of nonnegative integers such that L1 · L2 = n. (By "dot product" I mean the sum of the pairwise products; for example, [1,2] · [3,4] = 1·3+2·4 = 11.)
So, for example, findL(11, [1,2]) might return SOME [3,4]. If there's no possible list, I return NONE.
I'm using a functional language. (Specifically Standard ML, but the exact language isn't so important, I'm just trying to think of an FP algorithm.) What I have written so far:
Let's say I have findL(n, L1):
if L1 = [], I return NONE.
if L1 = [x] (list of length 1)
if (n >= 0 and x > 0 and n mod x = 0), return SOME [n div x]
else return NONE
If L1 has length greater than 1, I recurse on findL (n, L[1:]). If that returns a list L2, I return [1] concatenated to L2. If the recursive call returns NONE, I did another recursive call on findL (0, L[1:]) and prepended [n div x] to the result if it wasn't NONE. This works on many inputs but are failing on others.
I need to change part 3, but I'm not sure if I have the right idea. I would appreciate any tips!
Unless you need to say that empty lists in the input are always bad (even n = 0 with the list []), I'd recommend returning something different for an empty list based on whether you've reached 0 at the end (everything has been subtracted away) or not, then recurse when receiving any nonempty list rather than special-casing a one-element list.
As far as step three, you need to test every possible positive integer multiple of the first element of your input list until they exceed n, not just the first and last. The first non-None value you get is good enough, so you just prepend the multiplier (not the multiple) to the return list. If everything gives you Nones, you return None.
I don't know SML, but here's how I'd do it in Haskell:
import Data.Maybe (isJust, listToMaybe)
-- Find linear combinations of positive integers
solve :: Integer -> [Integer] -> Maybe [Integer]
-- If we've made it to the end with zero left, good!
solve 0 [] = Just []
-- Otherwise, this way isn't the way to go.
solve _ [] = Nothing
-- If one of the elements of the input list is zero, just multiply that element by one.
solve n (0:xs) = case solve n xs of
Nothing -> Nothing
Just ys -> Just (1:ys)
solve n (x:xs) = listToMaybe -- take first solution if it exists
. map (\ (m, Just ys) -> m:ys) -- put multiplier at front of list
. filter (isJust . snd) -- remove nonsolutions
. zip [1 ..] -- tuple in the multiplier
. map (\ m -> solve (n - m) xs) -- use each multiple
$ [x, x + x .. n] -- the multiples of x up to n
Here it is solving 11 with [1, 2] and 1 with [1, 2].

Haskell Recursion - finding largest difference between numbers in list

Here's the problem at hand: I need to find the largest difference between adjacent numbers in a list using recursion. Take the following list for example: [1,2,5,6,7,9]. The largest difference between two adjacent numbers is 3 (between 2 and 5).
I know that recursion may not be the best solution, but I'm trying to improve my ability to use recursion in Haskell.
Here's the current code I currently have:
largestDiff (x:y:xs) = if (length (y:xs) > 1) then max((x-y), largestDiff (y:xs)) else 0
Basically - the list will keep getting shorter until it reaches 1 (i.e. no more numbers can be compared, then it returns 0). As 0 passes up the call stack, the max function is then used to implement a 'King of the Hill' type algorithm. Finally - at the end of the call stack, the largest number should be returned.
Trouble is, I'm getting an error in my code that I can't work around:
Occurs check: cannot construct the infinite type:
t1 = (t0, t1) -> (t0, t1)
In the return type of a call of `largestDiff'
Probable cause: `largestDiff' is applied to too few arguments
In the expression: largestDiff (y : xs)
In the first argument of `max', namely
`((x - y), largestDiff (y : xs))'
Anyone have some words of wisdom to share?
Thanks for your time!
EDIT: Thanks everyone for your time - I ended up independently discovering a much simpler way after much trial and error.
largestDiff [] = error "List too small"
largestDiff [x] = error "List too small"
largestDiff [x,y] = abs(x-y)
largestDiff (x:y:xs) = max(abs(x-y)) (largestDiff (y:xs))
Thanks again, all!
So the reason why your code is throwing an error is because
max((x-y), largestDiff (y:xs))
In Haskell, you do not use parentheses around parameters and separate them by commas, the correct syntax is
max (x - y) (largestDiff (y:xs))
The syntax you used is getting parsed as
max ((x - y), largestDiff (y:xs))
Which looks like you're passing a tuple to max!
However, this does not solve the problem. I always got 0 back. Instead, I would recommend breaking up the problem into two functions. You want to calculate the maximum of the difference, so first write a function to calculate the differences and then a function to calculate the maximum of those:
diffs :: Num a => [a] -> [a]
diffs [] = [] -- No elements case
diffs [x] = [] -- One element case
diffs (x:y:xs) = y - x : diffs (y:xs) -- Two or more elements case
largestDiff :: (Ord a, Num a) => [a] -> a
largestDiff xs = maximum $ map abs $ diffs xs
Notice how I've pulled the recursion out into the simplest possible case. We didn't need to calculate the maximum as we traversed the list; it's possible, just more complex. Since Haskell has a handy built-in function for calculating the maximum of a list for us, we can also leverage that. Our recursive function is clean and simple, and it is then combined with maximum to implement the desired largestDiff. As an FYI, diffs is really just a function to compute the derivative of a list of numbers, it can be a very useful function for data processing.
EDIT: Needed Ord constraint on largestDiff and added in map abs before calculating maximum.
Here's my take at it.
First some helpers:
diff a b = abs(a-b)
pick a b = if a > b then a else b
Then the solution:
mdiff :: [Int] -> Int
mdiff [] = 0
mdiff [_] = 0
mdiff (a:b:xs) = pick (diff a b) (mdiff (b:xs))
You have to provide two closing clauses, because the sequence might have either even or odd number of elements.
Another solution to this problem, which circumvents your error, can be obtained
by just transforming lists and folding/reducing them.
import Data.List (foldl')
diffs :: (Num a) => [a] -> [a]
diffs x = zipWith (-) x (drop 1 x)
absMax :: (Ord a, Num a) => [a] -> a
absMax x = foldl' max (fromInteger 0) (map abs x)
Now I admit this is a bit dense for a beginner, so I will explain the above.
The function zipWith transforms two given lists by using a binary function,
which is (-) in this case.
The second list we pass to zipWith is drop 1 x, which is just another way of
describing the tail of a list, but where tail [] results in an error,
drop 1 [] just yields the empty list. So drop 1 is the "safer" choice.
So the first function calculates the adjacent differences.
The name of the second function suggests that it calculates the maximum absolute
value of a given list, which is only partly true, it results in "0" if passed an
empty list.
But how does this happen, reading from right to left, we see that map abs
transforms every list element to its absolute value, which is asserted by
the Num a constraint. Then the foldl'-function traverses the list and
accumulates the maximum of the previous accumulator and the current element of
the list traversal. Moreover I'd like to mention that foldl' is the "strict"
sister/brother of the foldl-function, where the latter is rarely of use,
because it tends to build up a bunch of unevaluated expressions called thunks.
So let's quit all this blah blah and see it in action ;-)
> let a = diffs [1..3] :: [Int]
>>> zipWith (-) [1,2,3] (drop 1 [1,2,3])
<=> zipWith (-) [1,2,3] [2,3]
<=> [1-2,2-3] -- zipWith stops at the end of the SHORTER list
<=> [-1,-1]
> b = absMax a
>>> foldl' max (fromInteger 0) (map abs [-1,-1])
-- fromInteger 0 is in this case is just 0 - interesting stuff only happens
-- for other numerical types
<=> foldl' max 0 (map abs [-1,-1])
<=> foldl' max 0 [1,1]
<=> foldl' max (max 0 1) [1]
<=> foldl' max 1 [1]
<=> foldl' max (max 1 1) []
<=> foldl' max 1 [] -- foldl' _ acc [] returns just the accumulator
<=> 1

Recursion confusion in Haskell again - subsets with an inclusion test

I'm testing a simple program to generate subsets with an inclusion test. For example, given
*Main Data.List> factorsets 7
[([2],2),([2,3],1),([3],1),([5],1),([7],1)]
calling chooseP 3 (factorsets 7), I would like to get (read from right to left, a la cons)
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([2],2)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
But my program is returning an extra [([7],1),([5],1),([3],1)] (and missing a [([7],1),([5],1),([2],2)]):
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([3],1)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
The inclusion test is: members' first part of the tuple must have a null intersection.
Once tested as working, the plan is to sum the internal products of each subset's snds, rather than accumulate them.
Since I've asked a similar question before, I imagine that an extra branch is generated since when the recursion splits at [2,3], the second branch runs over the same possibilities once it passes the skipped section. Any pointers on how to resolve that would be appreciated; and if you'd like to share ideas about how to enumerate and sum such product combinations more efficiently, that would be great, too.
Haskell code:
chooseP k xs = chooseP' xs [] 0 where
chooseP' [] product count = if count == k then [product] else []
chooseP' yys product count
| count == k = [product]
| null yys = []
| otherwise = f ++ g
where (y:ys) = yys
(factorsY,numY) = y
f = let zzs = dropWhile (\(fs,ns) -> not . and . map (null . intersect fs . fst) $ product) yys
in if null zzs
then chooseP' [] product count
else let (z:zs) = zzs in chooseP' zs (z:product) (count + 1)
g = if and . map (null . intersect factorsY . fst) $ product
then chooseP' ys product count
else chooseP' ys [] 0
Your code is complicated enough that I might recommend starting over. Here's how I would proceed.
Write a specification. Let it be as stupidly inefficient as necessary -- for example, the spec I choose below will build all combinations of k elements from the list, then filter out the bad ones. Even the filter will be stupidly slow.
sorted xs = sort xs == xs
unique xs = nub xs == xs
disjoint xs = and $ liftM2 go xs xs where
go x1 x2 = x1 == x2 || null (intersect x1 x2)
-- check that x is valid according to all the validation functions in fs
-- (there are other fun ways to spell this, but this is particularly
-- readable and clearly correct -- just what we want from a spec)
allFuns fs x = all ($x) fs
choosePSpec k = filter good . replicateM k where
good pairs = allFuns [unique, disjoint, sorted] (map fst pairs)
Just to make sure it's right, we can test it at the prompt:
*Main> mapM_ print $ choosePSpec 3 [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
[([2],2),([3],1),([5],1)]
[([2],2),([3],1),([7],1)]
[([2],2),([5],1),([7],1)]
[([2,3],1),([5],1),([7],1)]
[([3],1),([5],1),([7],1)]
Looks good.
Now that we have a spec, we can try to improve the speed one refactoring at a time, always checking that it matches the spec. The first thing I'd want to do is notice that we can ensure uniqueness and sortedness just by sorting the input and picking things "in an increasing way". To do this, we can define a function which chooses subsequences of a given length. It piggy-backs on the tails function, which you can think of as nondeterministically choosing a place to split its input list.
subseq 0 xs = [[]]
subseq n xs = do
x':xt <- tails xs
xs' <- subseq (n-1) xt
return (x':xs')
Here's an example of this function in action:
*Main> subseq 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Now we can write a slightly faster chooseP by replacing replicateM with subseq. Recall that we're assuming the inputs are already sorted and unique, though.
choosePSlow k = filter good . subseq k where
good pairs = disjoint $ map fst pairs
We can sanity-check that it's working by running it on the particular input we have from above:
*Main> let i = [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
*Main> choosePSlow 3 i == choosePSpec 3 i
True
Or, better yet, we can stress-test it with QuickCheck. We'll need a tiny bit more code. The condition k < 5 is just because the spec is so hopelessly slow that bigger values of k take forever.
propSlowMatchesSpec :: NonNegative Int -> OrderedList ([Int], Int) -> Property
propSlowMatchesSpec (NonNegative k) (Ordered xs)
= k < 5 && unique (map fst xs)
==> choosePSlow k xs == choosePSpec k xs
*Main> quickCheck propSlowMatchesSpec
+++ OK, passed 100 tests.
There are several more opportunities to make things faster. For instance, the disjoint test could be sped up using choose 2 instead of liftM2; or we might be able to ensure disjointness during element selection and prune the search even earlier; etc. How you want to improve it from here I leave to you -- but the basic technique (start with stupid and slow, then make it smarter, testing as you go) should be helpful to you.

Haskell: brute force and maximum subarray problem

I am trying to solve the maximum sub array problem with a brute force approach i.e generating all the possible subarrays combinations. I got something that works but it's not satisfying at all because it produces way too many duplicated subarrays.
Does anyone knows a smart way to generate all the subarrays (in [[]] form) with a minimal number of duplicated elements ?
By the way, I'm new to Haskell. Here's my current solution:
import qualified Data.List as L
maximumSubList::[Integer]->[Integer]
maximumSubList x = head $ L.sortBy (\a b -> compare (sum b) (sum a)) $ L.nub $ slice x
where
-- slice will return all the "sub lists"
slice [] = []
slice x = (slice $ tail x) ++ (sliceLeft x) ++ (sliceRight x)
-- Create sub lists by removing "left" part
-- ex [1,2,3] -> [[1,2,3],[2,3],[3]]
sliceRight [] = []
sliceRight x = x : (sliceRight $ tail x)
-- Create sub lists by removing "right" part
-- ex [1,2,3] -> [[1,2,3],[1,2],[1]]
sliceLeft [] = []
sliceLeft x = x : (sliceLeft $ init x)
There are many useful functions for operating on lists in the standard Data.List module.
import Data.List
slice :: [a] -> [[a]]
slice = filter (not . null) . concatMap tails . inits
dave4420's answer is how to do what you want to do using smart, concise Haskell. I'm no Haskell expert, but I occasionally play around with it and find solving a problem like this to be an interesting distraction, and enjoy figuring out exactly why it works. Hopefully the following explanation will be helpful :)
The key property of dave4420's answer (which your answer doesn't have) is that the pair (startPos, endPos) is unique for each subarray it generates. Now, observe that two subarrays are distinct if either their startPos or endPos is different. Applying inits to the original array returns a list of subarrays that each have unique startPos, and the same endPos (equal to the number of elements in the array). Applying tails to each of these subarrays in turn produces another list of subarrays -- one list of subarrays is output per input subarray. Notice that tails does not disturb the distinctness between input subarrays because the subarrays output by invoking tails on a single input subarray all retain the same startPos: that is, if you have two subarrays with distinct startPoses, and put both of them through tails, each of the subarrays produced from the first input subarray will be distinct from each of the subarrays produced from the second one.
Additionally, each of the subarrays produced by the invocation of tails on a single subarray are distinct because, although they all share the same startPos, they all have distinct endPoses. Therefore all subarrays produced by (concatMap tails) . inits are distinct. It only remains to note that no subarray is missed out: for any subarray starting at position i and ending at position j, that subarray must appear as the j-i+1th list produced by applying tails to the i+1th list produced by inits. So in conclusion, every possible subarray appears exactly once!

In Haskell, how can you sort a list of infinite lists of strings?

So basically, if I have a (finite or infinite) list of (finite or infinite) lists of strings, is it possible to sort the list by length first and then by lexicographic order, excluding duplicates? A sample input/output would be:
Input:
[["a", "b",...], ["a", "aa", "aaa"], ["b", "bb", "bbb",...], ...]
Output:
["a", "b", "aa", "bb", "aaa", "bbb", ...]
I know that the input list is not a valid haskell expression but suppose that there is an input like that. I tried using merge algorithm but it tends to hang on the inputs that I give it. Can somebody explain and show a decent sorting function that can do this? If there isn't any function like that, can you explain why?
In case somebody didn't understand what I meant by the sorting order, I meant that shortest length strings are sorted first AND if one or more strings are of same length then they are sorted using < operator.
Thanks!
Ultimately, you can't sort an infinite list, because items at the tail of the list could percolate all the way to the front of the result, so you can't finish sorting an infinite list until you've seen the last item, but your list is infinite, so you'll never get there.
The only way that you could even try to sort an infinite list would require constraints on the inhabitants of the list. If the values of the list items comes from a well-founded set and the contents of the list are unique then you could at least make some progress in returning elements the initial elements of the list. For instance if the list was of distinct natural numbers, you could return the first 0 you see, then the first 1, etc. but you couldn't make any headway in the result until you saw 2, no matter how far down the list you went. Ultimately, if you ever skipped an element in the set because it wasn't present in the source, you'd cease to produce new output elements until you had the entire input in hand.
You can do the same thing with strings, because they are well founded, but that is only even remotely viable if you plan on returning all possible strings.
In short, if you need this, you're going about solving the problem you have in the wrong way. This isn't a tractable path to any solution you will want to use.
As yairchu noted, merging a finite number of sorted infinite lists works fine though.
In general it is impossible to sort infinite lists. Because the smallest item could be at infinite position and we must find it before we output it.
Merging infinite sorted lists is possible.
In general, merging an infinite list of sorted lists is impossible. For same reason that sorting them is.
Merging an infinite list of sorted lists, which are sorted by heads (forall i j. i < j => head (lists !! i) <= head (lists !! j)), is possible.
So I'm guessing that what you really want is to merge a sorted infinite list of sorted lists. It's even a task that makes some sense. There's even some existing code that uses it, implemented for monadic lists there - kinda ugly syntax-wise etc. So here's a version for plain lists:
mergeOnSortedHeads :: Ord b => (a -> b) -> [[a]] -> [a]
mergeOnSortedHeads _ [] = []
mergeOnSortedHeads f ([]:xs) = mergeOnSortedHeads f xs
mergeOnSortedHeads f ((x:xs):ys) =
x : mergeOnSortedHeads f (bury xs ys)
where
bury [] ks = ks
bury js [] = [js]
bury js ([]:ks) = bury js ks
bury jj#(j:js) ll#(kk#(k:ks):ls)
| f j <= f k = jj : ll
| otherwise = kk : bury jj ls
ghci> take 20 $ mergeOnSortedHeads id $ [[0,4,6], [2,3,9], [3,5..], [8]] ++ map repeat [12..]
[0,2,3,3,4,5,6,7,8,9,9,11,12,12,12,12,12,12,12,12]
btw: what do you need this for?
Well, I'm going to ignore your request for sorting infinite data.
To sort by length of the sublists, then by lexicographic order, we can do this pretty easily. Oh, and you want duplicates removed.
We'll start with a sample:
> s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]
And then build the program incrementally.
First sorting on length (using Data.Ord.comparing to build the sort body):
> sortBy (comparing length) s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]
Ok. That looks reasonable. So let's just concat, and sortBy length then alpha:
> sortBy (comparing length) . nub . concat $ s
["a","b","aa","bb","aaa","bbb"]
If your input is sorted. Otherwise you'll need a sligtly different body to sortBy.
Thanks to everyone for their inputs and sorry for the late reply. Turns out I was just approaching the problem in a wrong way. I was trying to do what Yairchu showed but I was using the built in function length to do the merging but length doesnt work on an infinite list for obvious reasons. Anyways, I solved my problem by sorting as I created the list on the go, not on the end result. I wonder what other languages offer infinite lists? Such a weird but useful concept.
Here is an algorithm that let you online sort:
it is not efficient, but it is lazy enough to let you goto different sort generations, even if you sort infinite lists. It is a nice gimmick, but not very usable. For example sorting the infinite list [10,9..]:
*Main> take 10 $ sortingStream [10,9..] !! 0
[9,8,7,6,5,4,3,2,1,0]
*Main> take 10 $ sortingStream [10,9..] !! 1
[8,7,6,5,4,3,2,1,0,-1]
*Main> take 10 $ sortingStream [10,9..] !! 2
[7,6,5,4,3,2,1,0,-1,-2]
*Main> take 10 $ sortingStream [10,9..] !! 3
[6,5,4,3,2,1,0,-1,-2,-3]
*Main> take 10 $ sortingStream [10,9..] !! 4
[5,4,3,2,1,0,-1,-2,-3,-4]
*Main> take 10 $ sortingStream [10,9..] !! 1000
[-991,-992,-993,-994,-995,-996,-997,-998,-999,-1000]
As you can see the sorting improves each generation. The code:
produce :: ([a] -> [a]) -> [a] -> [[a]]
produce f xs = f xs : (produce f (f xs))
sortingStream :: (Ord a) => [a] -> [[a]]
sortingStream = produce ss
ss :: (Ord a) => [a] -> [a]
ss [] = []
ss [x] = [x]
ss [x,y] | x <= y = [x,y]
| otherwise = [y,x]
ss (x:y:xs) | x <= y = x: (ss (y:xs))
| otherwise = y:(ss (x:xs))
Whether it can be done depends very much on the nature of your input data. If you can 'stop looking' for lists of a certain length when you've seen a longer one and there are only a finite number of lists of each length, then you can go through the lengths in ascending order, sort those and concatenate the results. Something like this should work:
listsUptoLength n xss = takeWhile (\xs -> length xs <= n) $ xss
listsUptoLength' n [] = []
listsUptoLength' n (xss:xsss) = case listsUptoLength n xss of
[] -> []
xss' -> xss' : listsUptoLength' n xsss
listsOfLength n xsss = concatMap (\xss -> (filter (\xs -> length xs == n) xss)) (listsUptoLength' n xsss)
sortInfinite xsss = concatMap (\n -> sort . nub $ (listsOfLength n xsss)) [0..]
f xs y = [xs ++ replicate n y | n <- [1..]]
test = [ map (\x -> [x]) ['a'..'e'], f "" 'a', f "" 'b', f "b" 'a', f "a" 'b' ] ++ [f start 'c' | start <- f "" 'a']
(The names could probably be chosen to be more illuminating :)
I'm guessing you're working with regular expressions, so I think something like this could be made to work; I repeat the request for more background!

Resources