Can someone please help explain how this merge sort algorithm works? - sorting

I started learning F# yesterday and am struggling a bit with all the new functional programming.
I am trying to understand this implementation of merge sort which uses a split function. The split function is defined as:
let rec split = function
| [] -> ([], [])
| [a] -> ([a], [])
| a :: b :: cs -> let (r, s) = split cs
in (a :: r, b :: s)
The way I understand it, we take a list and return a tuple of lists, having split the list into two halves. If we pattern match on the empty list we return a tuple of empty lists, if we match on a list with one element we return a tuple with the list and an empty list, but the recursive case is eluding me.
a :: b :: cs means a prepended to b prepended to cs, right? So this is the case where the list has at least 3 elements? If so, we return two values, r and s, but I have not seen this "in" keyword used before. As far as I can tell, we prepend a, the first element, to r and b, the second element, to s, and then split on the remainder of the list, cs. But this does not appear to split the list in half to me.
Could anybody please help explain how the recursive case works? Thanks a lot.

You can ignore the in keyword in this case, so you can read the last case as just:
| a :: b :: cs ->
let (r, s) = split cs
(a :: r, b :: s)
Note that this will match any list of length 2 or greater, not 3 as you originally thought. When the list has exactly two elements, cs will be the empty list.
So what's going on in this case is:
If the list has at least 2 elements:
Name the first element a
Name the second element b
Name the rest of the list cs (even if it's empty)
Split cs recursively, which gives us two new lists, r and s
Create two more new lists:
One with a on the front of r
The other with b on the front of s
Return the two new lists
You can see this in operation if you call the function like this:
split [] |> printfn "%A" // [],[]
split [1] |> printfn "%A" // [1],[]
split [1; 2] |> printfn "%A" // [1],[2]
split [1; 2; 3] |> printfn "%A" // [1; 3],[2]
split [1; 2; 3; 4] |> printfn "%A" // [1; 3],[2; 4]
split [1; 2; 3; 4; 5] |> printfn "%A" // [1; 3; 5],[2; 4]
Update: What exactly does in do?
The in keyword is just a way to put a let-binding inside an expression. So, for example, we could write let x = 5 in x + x, which is an expression that has the value 10. This syntax is inherited from OCaml, and is still useful when you want to write the entire expression on one line.
In modern F#, we can use whitespace/indentation instead, by replacing the in keyword with a newline. So nowadays, we would usually write this expression as follows:
let x = 5
x + x
The two forms are semantically equivalent. More details here.

cs is [] when there are only two items in the list. When there are 3 or more items in the list then it recurses where cs is the list without the first two items. When there is only one item it returns [a],[] and when the list is empty it returns [],[] .

Related

Not able to sort a list properly in ocaml

So I'm trying to sort this list of integers so that all the even numbers are in the front and the odds are all in the back. I have my program now which works for the most part but it keeps reversing the order of my odds numbers which I don't want it to do. E.g. given the input [1;2;3;4;5;6] I would like to get [2;4;6;1;3;5], but I'm getting [2;4;6;5;3;1] Any help is greatly appreciated!
let rec evens (xl:int list) (odd:int list) : int list =
match xl with
| [] -> []
| h::t ->
if h mod 2 = 0
then (h)::evens t odd
else
evens t odd#[(h)]
The main part of your current code parses like this:
if h mod 2 = 0 then
h :: (evens t odd)
else
(evens t odd) # [h]
It says this: if the next number h is even, sort out the rest of the list, then add h to the front. If the next number h is odd, sort out the rest of the list, then add h to the end. So it follows that the odd numbers will be reversed at the end.
It's worth noting that your parameter named odd is always passed along unchanged, and hence will always be an empty list (or whatever you pass as the second parameter of evens).
When I first looked at your code, I assumed you were planning to accumulate the odd numbers in the odd parameter. If you want to do that, you need to make two changes. First you need to rewrite like this:
if h mod 2 = 0 then
h :: evens t odd
else
evens t (odd # [h])
The precedence rules of OCaml require the parentheses if you want to add h to the odd parameter. Your current code adds h to the returned result of evens (as above).
This rewrite will accumulate the odd numbers, in order, in the odd parameter.
Then you need to actually use the odd parameter at the end of the recursion. I.e., you need to use it when xl is empty.
The standard library has a neat solution to your problem.
List.partition (fun x -> x mod 2 = 0) [1;2;3;4;5;6]
- : int list * int list = ([2; 4; 6], [1; 3; 5])
The partition function splits your list into a tuple of two lists:
The list of elements that validate a predicate;
The list of elements that don't.
All you have to do is combine those lists together.
let even_first l =
let evens, odds = List.partition (fun x -> x mod 2 = 0) l in
evens # odds
If you want to make it more generic, let the predicate be an argument:
let order_by_predicate ~f l =
let valid, invalid = List.partition f l in
valid # invalid

F# List optimisation

From an unordered list of int, I want to have the smallest difference between two elements. I have a code that is working but way to slow. Can anyone sugest some change to improve the performance? Please explain why you did the change and what will be the performance gain.
let allInt = [ 5; 8; 9 ]
let sortedList = allInt |> List.sort;
let differenceList = [ for a in 0 .. N-2 do yield sortedList.Item a - sortedList.Item a + 1 ]
printfn "%i" (List.min differenceList) // print 1 (because 9-8 smallest difference)
I think I'm doing to much list creation or iteration but I don't know how to write it differently in F#...yet.
Edit: I'm testing this code on list with 100 000 items or more.
Edit 2: I believe that if I can calculte the difference and have the min in one go it should improve the perf a lot, but I don't know how to do that, anay idea?
Thanks in advance
The List.Item performs in O(n) time and is probably the main performance bottle neck in your code. The evaluation of differenceList iterates the elements of sortedList by index, which means the performance is around O((N-2)(2(N-2))), which simplifies to O(N^2), where N is the number of elements in sortedList. For long lists, this will eventually perform badly.
What I would do is to eliminate calls to Item and instead use the List.pairwise operation
let data =
[ let rnd = System.Random()
for i in 1..100000 do yield rnd.Next() ]
#time
let result =
data
|> List.sort
|> List.pairwise // convert list from [a;b;c;...] to [(a,b); (b,c); ...]
|> List.map (fun (a,b) -> a - b |> abs) // Calculates the absolute difference
|> List.min
#time
The #time directives lets me measure execution time in F# Interactive and the output I get when running this code is:
--> Timing now on
Real: 00:00:00.029, CPU: 00:00:00.031, GC gen0: 1, gen1: 1, gen2: 0
val result : int = 0
--> Timing now off
F#'s built-in list type is implemented as a linked list, which means accessing elements by index has to enumerate the list all the way to the index each time. In your case you have two index accesses repeated N-2 times, getting slower and slower with each iteration, as the index grows and each access needs to go through longer part of the list.
First way out of this would be using an array instead of a list, which is a trivial change, but grants you faster index access.
(*
[| and |] let you define an array literal,
alternatively use List.toArray allInt
*)
let allInt = [| 5; 8; 9 |]
let sortedArray = allInt |> Array.sort;
let differenceList = [ for a in 0 .. N-2 do yield sortedArray.[a] - sortedArray.[a + 1] ]
Another approach might be pairing up the neighbours in the list, subtracting them and then finding a min.
let differenceList =
sortedList
|> List.pairwise
|> List.map (fun (x,y) -> x - y)
List.pairwise takes a list of elements and returns a list of the neighbouring pairs. E.g. in your example List.pairwise [ 5; 8; 9 ] = [ (5, 8); (8, 9) ], so that you can easily work with the pairs in the next step, the subtraction mapping.
This way is better, but these functions from List module take a list as input and produce a new list as the output, having to pass through the list 3 times (1 for pairwise, 1 for map, 1 for min at the end). To solve this, you can use functions from the Seq module, which work with .NETs IEnumerable<'a> interface allowing lazy evaluation resulting usually in fewer passes.
Fortunately in this case Seq defines alternatives for all the functions we use here, so the next step is trivial:
let differenceSeq =
sortedList
|> Seq.pairwise
|> Seq.map (fun (x,y) -> x - y)
let minDiff = Seq.min differenceSeq
This should need only one enumeration of the list (excluding the sorting phase of course).
But I cannot guarantee you which approach will be fastest. My bet would be on simply using an array instead of the list, but to find out, you will have to try it out and measure for yourself, on your data and your hardware. BehchmarkDotNet library can help you with that.
The rest of your question is adequately covered by the other answers, so I won't duplicate them. But nobody has yet addressed the question you asked in your Edit 2. To answer that question, if you're doing a calculation and then want the minimum result of that calculation, you want List.minBy. One clue that you want List.minBy is when you find yourself doing a map followed by a min operation (as both the other answers are doing): that's a classic sign that you want minBy, which does that in one operation instead of two.
There's one gotcha to watch out for when using List.minBy: It returns the original value, not the result of the calculation. I.e., if you do ints |> List.pairwise |> List.minBy (fun (a,b) -> abs (a - b)), then what List.minBy is going to return is a pair of items, not the difference. It's written that way because if it gives you the original value but you really wanted the result, you can always recalculate the result; but if it gave you the result and you really wanted the original value, you might not be able to get it. (Was that difference of 1 the difference between 8 and 9, or between 4 and 5?)
So in your case, you could do:
let allInt = [5; 8; 9]
let minPair =
allInt
|> List.pairwise
|> List.minBy (fun (x,y) -> abs (x - y))
let a, b = minPair
let minDifference = abs (a - b)
printfn "The difference between %d and %d was %d" a b minDifference
The List.minBy operation also exists on sequences, so if your list is large enough that you want to avoid creating an intermediate list of pairs, then use Seq.pairwise and Seq.minBy instead:
let allInt = [5; 8; 9]
let minPair =
allInt
|> Seq.pairwise
|> Seq.minBy (fun (x,y) -> abs (x - y))
let a, b = minPair
let minDifference = abs (a - b)
printfn "The difference between %d and %d was %d" a b minDifference
EDIT: Yes, I see that you've got a list of 100,000 items. So you definitely want the Seq version of this. The F# seq type is just IEnumerable, so if you're used to C#, think of the Seq functions as LINQ expressions and you'll have the right idea.
P.S. One thing to note here: see how I'm doing let a, b = minPair? That's called destructuring assignment, and it's really useful. I could also have done this:
let a, b =
allInt
|> Seq.pairwise
|> Seq.minBy (fun (x,y) -> abs (x - y))
and it would have given me the same result. Seq.minBy returns a tuple of two integers, and the let a, b = (tuple of two integers) expression takes that tuple, matches it against the pattern a, b, and thus assigns a to have the value of that tuple's first item, and b to have the value of that tuple's second item. Notice how I used the phrase "matches it against the pattern": this is the exact same thing as when you use a match expression. Explaining match expressions would make this answer too long, so I'll just point you to an excellent reference on them if you haven't already read it:
https://fsharpforfunandprofit.com/posts/match-expression/
Here is my solution:
let minPair xs =
let foo (x, y) = abs (x - y)
xs
|> List.allPairs xs
|> List.filter (fun (x, y) -> x <> y)
|> List.minBy foo
|> foo

Adding 2 Int Lists Together F#

I am working on homework and the problem is where we get 2 int lists of the same size, and then add the numbers together. Example as follows.
vecadd [1;2;3] [4;5;6];; would return [5;7;9]
I am new to this and I need to keep my code pretty simple so I can learn from it. I have this so far. (Not working)
let rec vecadd L K =
if L <> [] then vecadd ((L.Head+K.Head)::L) K else [];;
I essentially want to just replace the first list (L) with the added numbers. Also I have tried to code it a different way using the match cases.
let rec vecadd L K =
match L with
|[]->[]
|h::[]-> L
|h::t -> vecadd ((h+K.Head)::[]) K
Neither of them are working and I would appreciate any help I can get.
First, your idea about modifying the first list instead of returning a new one is misguided. Mutation (i.e. modifying data in place) is the number one reason for bugs today (used to be goto, but that's been banned for a long time now). Making every operation produce a new datum rather than modify existing ones is much, much safer. And in some cases it may be even more performant, quite counterintuitively (see below).
Second, the way you're trying to do it, you're not doing what you think you're doing. The double-colon doesn't mean "modify the first item". It means "attach an item in front". For example:
let a = [1; 2; 3]
let b = 4 :: a // b = [4; 1; 2; 3]
let c = 5 :: b // c = [5; 4; 1; 2; 3]
That's how lists are actually built: you start with a empty list and prepend items to it. The [1; 2; 3] syntax you're using is just a syntactic sugar for that. That is, [1; 2; 3] === 1::2::3::[].
So how do I modify a list, you ask? The answer is, you don't! F# lists are immutable data structures. Once you've created a list, you can't modify it.
This immutability allows for an interesting optimization. Take another look at the example I posted above, the one with three lists a, b, and c. How many cells of memory do you think these three lists occupy? The first list has 3 items, second - 4, and third - 5, so the total amount of memory taken must be 12, right? Wrong! The total amount of memory taken up by these three lists is actually just 5 cells. This is because list b is not a block of memory of length 4, but rather just the number 4 paired with a pointer to the list a. The number 4 is called "head" of the list, and the pointer is called its "tail". Similarly, the list c consists of one number 5 (its "head") and a pointer to list b, which is its "tail".
If lists were not immutable, one couldn't organize them like this: what if somebody modifies my tail? Lists would have to be copied every time (google "defensive copy").
So the only way to do with lists is to return a new one. What you're trying to do can be described like this: if the input lists are empty, the result is an empty list; otherwise, the result is the sum of tails prepended with the sum of heads. You can write this down in F# almost verbatim:
let rec add a b =
match a, b with
| [], [] -> [] // sum of two empty lists is an empty list
| a::atail, b::btail -> (a + b) :: (add atail btail) // sum of non-empty lists is sum of their tails prepended with sum of their heads
Note that this program is incomplete: it doesn't specify what the result should be when one input is empty and the other is not. The compiler will generate a warning about this. I'll leave the solution as an exercise for the reader.
You can map over both lists together with List.map2 (see the docs)
It goes over both lists pairwise and you can give it a function (the first parameter of List.map2) to apply to every pair of elements from the lists. And that generates the new list.
let a = [1;2;3]
let b = [4;5;6]
let vecadd = List.map2 (+)
let result = vecadd a b
printfn "%A" result
And if you want't to do more work 'yourself' something like this?
let a = [1;2;3]
let b = [4;5;6]
let vecadd l1 l2 =
let rec step l1 l2 acc =
match l1, l2 with
| [], [] -> acc
| [], _ | _, [] -> failwithf "one list is bigger than the other"
| h1 :: t1, h2 :: t2 -> step t1 t2 (List.append acc [(h1 + h2)])
step l1 l2 []
let result = vecadd a b
printfn "%A" result
The step function is a recursive function that takes two lists and an accumulator to carry the result.
In the last match statement it does three things
Sum the head of both lists
Add the result to the accumulator
Recursively call itself with the new accumulator and the tails of the lists
The first match returns the accumulator when the remaining lists are empty
The second match returns an error when one of the lists is longer than the other.
The accumulator is returned as the result when the remaining lists are empty.
The call step l1 l2 [] kicks it off with the two supplied lists and an empty accumulator.
I have done this for crossing two lists (multiply items with same index together):
let items = [1I..50_000I]
let another = [1I..50_000I]
let rec cross a b =
let rec cross_internal = function
| r, [], [] -> r
| r, [], t -> r#t
| r, t, [] -> r#t
| r, head::t1, head2::t2 -> cross_internal(r#[head*head2], t1, t2)
cross_internal([], a, b)
let result = cross items another
result |> printf "%A,"
Note: not really performant. There are list object creations at each step which is horrible. Ideally the inner function cross_internal must create a mutable list and keep updating it.
Note2: my ranges were larger initially and using bigint (hence the I suffix in 50_000) but then reduced the sample code above to just 50,500 elements.

F# insert/remove item from list

How should I go about removing a given element from a list? As an example, say I have list ['A'; 'B'; 'C'; 'D'; 'E'] and want to remove the element at index 2 to produce the list ['A'; 'B'; 'D'; 'E']? I've already written the following code which accomplishes the task, but it seems rather inefficient to traverse the start of the list when I already know the index.
let remove lst i =
let rec remove lst lst' =
match lst with
| [] -> lst'
| h::t -> if List.length lst = i then
lst' # t
else
remove t (lst' # [h])
remove lst []
let myList = ['A'; 'B'; 'C'; 'D'; 'E']
let newList = remove myList 2
Alternatively, how should I insert an element at a given position? My code is similar to the above approach and most likely inefficient as well.
let insert lst i x =
let rec insert lst lst' =
match lst with
| [] -> lst'
| h::t -> if List.length lst = i then
lst' # [x] # lst
else
insert t (lst' # [h])
insert lst []
let myList = ['A'; 'B'; 'D'; 'E']
let newList = insert myList 2 'C'
Removing element at the specified index isn't a typical operation in functional programming - that's why it seems difficult to find the right implementation of these operations. In functional programming, you'll usually process the list element-by-element using recursion, or implement the processing in terms of higher-level declarative operations. Perhaps if you could clarfiy what is your motivation, we can give a better answer.
Anyway, to implement the two operations you wanted, you can use existing higher-order functions (that traverse the entire list a few times, because there is really no good way of doing this without traversing the list):
let removeAt index input =
input
// Associate each element with a boolean flag specifying whether
// we want to keep the element in the resulting list
|> List.mapi (fun i el -> (i <> index, el))
// Remove elements for which the flag is 'false' and drop the flags
|> List.filter fst |> List.map snd
To insert element to the specified index, you could write:
let insertAt index newEl input =
// For each element, we generate a list of elements that should
// replace the original one - either singleton list or two elements
// for the specified index
input |> List.mapi (fun i el -> if i = index then [newEl; el] else [el])
|> List.concat
However, as noted earlier - unless you have a very good reasons for using these functions, you should probably consider describing your goals more broadly and use an alternative (more functional) solution.
Seems the most idiomatic (not tail recursive):
let rec insert v i l =
match i, l with
| 0, xs -> v::xs
| i, x::xs -> x::insert v (i - 1) xs
| i, [] -> failwith "index out of range"
let rec remove i l =
match i, l with
| 0, x::xs -> xs
| i, x::xs -> x::remove (i - 1) xs
| i, [] -> failwith "index out of range"
it seems rather inefficient to
traverse the start of the list when I
already know the index.
F# lists are singly-linked lists, so you don't have indexed access to them. But most of the time, you don't need it. The majority of indexed operations on arrays are iteration from front to end, which is exactly the most common operation on immutable lists. Its also pretty common to add items to the end of an array, which isn't really the most efficient operation on singly linked lists, but most of the time you can use the "cons and reverse" idiom or use an immutable queue to get the same result.
Arrays and ResizeArrays are really the best choice if you need indexed access, but they aren't immutable. A handful of immutable data structures like VLists allow you to create list-like data structures supporting O(1) cons and O(log n) indexed random access if you really need it.
If you need random access in a list, consider using System.Collections.Generic.List<T> or System.Collections.Generic.LinkedList<T> instead of a F# list.
I know this has been here for a while now, but just had to do something like this recently and I came up with this solution, maybe it isn't the most efficient, but it surely is the shortest idiomatic code I found for it
let removeAt index list =
list |> List.indexed |> List.filter (fun (i, _) -> i <> index) |> List.map snd
The List.Indexed returns a list of tuples which are the index in the list and the actual item in that position after that all it takes is to filter the one tuple matching the inputted index and get the actual item afterwards.
I hope this helps someone who's not extremely concerned with efficiency and wants brief code
The following includes a bit of error checking as well
let removeAt index = function
| xs when index >= 0 && index < List.length xs ->
xs
|> List.splitAt index
|> fun (x,y) -> y |> List.skip 1 |> List.append x
| ys -> ys
Lets go thru it and explain the code
// use the function syntax
let removeAt index = function
// then check if index is within the size of the list
| xs when index >= 0 && index < List.length xs ->
xs
// split the list before the given index
// splitAt : int -> 'T list -> ('T list * 'T list)
// this gives us a tuple of the the list with 2 sublists
|> List.splitAt index
// define a function which
// first drops the element on the snd tuple element
// then appends the remainder of that sublist to the fst tuple element
// and return all of it
|> fun (x,y) -> y |> List.skip 1 |> List.append x
//index out of range so return the original list
| ys -> ys
And if you don't like the idea of simply returning the original list on indexOutOfRange - wrap the return into something
let removeAt index = function
| xs when index >= 0 && index < List.length xs ->
xs
|> List.splitAt index
|> fun (x,y) -> y |> List.skip 1 |> List.append x
|> Some
| ys -> None
I think this should be quite faster than Juliet's or Tomas' proposal but most certainly Mauricio's comment is hitting it home. If one needs to remove or delete items other data structures seem a better fit.

In Haskell, how can you sort a list of infinite lists of strings?

So basically, if I have a (finite or infinite) list of (finite or infinite) lists of strings, is it possible to sort the list by length first and then by lexicographic order, excluding duplicates? A sample input/output would be:
Input:
[["a", "b",...], ["a", "aa", "aaa"], ["b", "bb", "bbb",...], ...]
Output:
["a", "b", "aa", "bb", "aaa", "bbb", ...]
I know that the input list is not a valid haskell expression but suppose that there is an input like that. I tried using merge algorithm but it tends to hang on the inputs that I give it. Can somebody explain and show a decent sorting function that can do this? If there isn't any function like that, can you explain why?
In case somebody didn't understand what I meant by the sorting order, I meant that shortest length strings are sorted first AND if one or more strings are of same length then they are sorted using < operator.
Thanks!
Ultimately, you can't sort an infinite list, because items at the tail of the list could percolate all the way to the front of the result, so you can't finish sorting an infinite list until you've seen the last item, but your list is infinite, so you'll never get there.
The only way that you could even try to sort an infinite list would require constraints on the inhabitants of the list. If the values of the list items comes from a well-founded set and the contents of the list are unique then you could at least make some progress in returning elements the initial elements of the list. For instance if the list was of distinct natural numbers, you could return the first 0 you see, then the first 1, etc. but you couldn't make any headway in the result until you saw 2, no matter how far down the list you went. Ultimately, if you ever skipped an element in the set because it wasn't present in the source, you'd cease to produce new output elements until you had the entire input in hand.
You can do the same thing with strings, because they are well founded, but that is only even remotely viable if you plan on returning all possible strings.
In short, if you need this, you're going about solving the problem you have in the wrong way. This isn't a tractable path to any solution you will want to use.
As yairchu noted, merging a finite number of sorted infinite lists works fine though.
In general it is impossible to sort infinite lists. Because the smallest item could be at infinite position and we must find it before we output it.
Merging infinite sorted lists is possible.
In general, merging an infinite list of sorted lists is impossible. For same reason that sorting them is.
Merging an infinite list of sorted lists, which are sorted by heads (forall i j. i < j => head (lists !! i) <= head (lists !! j)), is possible.
So I'm guessing that what you really want is to merge a sorted infinite list of sorted lists. It's even a task that makes some sense. There's even some existing code that uses it, implemented for monadic lists there - kinda ugly syntax-wise etc. So here's a version for plain lists:
mergeOnSortedHeads :: Ord b => (a -> b) -> [[a]] -> [a]
mergeOnSortedHeads _ [] = []
mergeOnSortedHeads f ([]:xs) = mergeOnSortedHeads f xs
mergeOnSortedHeads f ((x:xs):ys) =
x : mergeOnSortedHeads f (bury xs ys)
where
bury [] ks = ks
bury js [] = [js]
bury js ([]:ks) = bury js ks
bury jj#(j:js) ll#(kk#(k:ks):ls)
| f j <= f k = jj : ll
| otherwise = kk : bury jj ls
ghci> take 20 $ mergeOnSortedHeads id $ [[0,4,6], [2,3,9], [3,5..], [8]] ++ map repeat [12..]
[0,2,3,3,4,5,6,7,8,9,9,11,12,12,12,12,12,12,12,12]
btw: what do you need this for?
Well, I'm going to ignore your request for sorting infinite data.
To sort by length of the sublists, then by lexicographic order, we can do this pretty easily. Oh, and you want duplicates removed.
We'll start with a sample:
> s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]
And then build the program incrementally.
First sorting on length (using Data.Ord.comparing to build the sort body):
> sortBy (comparing length) s
[["a","b"],["a","aa","aaa"],["b","bb","bbb"]]
Ok. That looks reasonable. So let's just concat, and sortBy length then alpha:
> sortBy (comparing length) . nub . concat $ s
["a","b","aa","bb","aaa","bbb"]
If your input is sorted. Otherwise you'll need a sligtly different body to sortBy.
Thanks to everyone for their inputs and sorry for the late reply. Turns out I was just approaching the problem in a wrong way. I was trying to do what Yairchu showed but I was using the built in function length to do the merging but length doesnt work on an infinite list for obvious reasons. Anyways, I solved my problem by sorting as I created the list on the go, not on the end result. I wonder what other languages offer infinite lists? Such a weird but useful concept.
Here is an algorithm that let you online sort:
it is not efficient, but it is lazy enough to let you goto different sort generations, even if you sort infinite lists. It is a nice gimmick, but not very usable. For example sorting the infinite list [10,9..]:
*Main> take 10 $ sortingStream [10,9..] !! 0
[9,8,7,6,5,4,3,2,1,0]
*Main> take 10 $ sortingStream [10,9..] !! 1
[8,7,6,5,4,3,2,1,0,-1]
*Main> take 10 $ sortingStream [10,9..] !! 2
[7,6,5,4,3,2,1,0,-1,-2]
*Main> take 10 $ sortingStream [10,9..] !! 3
[6,5,4,3,2,1,0,-1,-2,-3]
*Main> take 10 $ sortingStream [10,9..] !! 4
[5,4,3,2,1,0,-1,-2,-3,-4]
*Main> take 10 $ sortingStream [10,9..] !! 1000
[-991,-992,-993,-994,-995,-996,-997,-998,-999,-1000]
As you can see the sorting improves each generation. The code:
produce :: ([a] -> [a]) -> [a] -> [[a]]
produce f xs = f xs : (produce f (f xs))
sortingStream :: (Ord a) => [a] -> [[a]]
sortingStream = produce ss
ss :: (Ord a) => [a] -> [a]
ss [] = []
ss [x] = [x]
ss [x,y] | x <= y = [x,y]
| otherwise = [y,x]
ss (x:y:xs) | x <= y = x: (ss (y:xs))
| otherwise = y:(ss (x:xs))
Whether it can be done depends very much on the nature of your input data. If you can 'stop looking' for lists of a certain length when you've seen a longer one and there are only a finite number of lists of each length, then you can go through the lengths in ascending order, sort those and concatenate the results. Something like this should work:
listsUptoLength n xss = takeWhile (\xs -> length xs <= n) $ xss
listsUptoLength' n [] = []
listsUptoLength' n (xss:xsss) = case listsUptoLength n xss of
[] -> []
xss' -> xss' : listsUptoLength' n xsss
listsOfLength n xsss = concatMap (\xss -> (filter (\xs -> length xs == n) xss)) (listsUptoLength' n xsss)
sortInfinite xsss = concatMap (\n -> sort . nub $ (listsOfLength n xsss)) [0..]
f xs y = [xs ++ replicate n y | n <- [1..]]
test = [ map (\x -> [x]) ['a'..'e'], f "" 'a', f "" 'b', f "b" 'a', f "a" 'b' ] ++ [f start 'c' | start <- f "" 'a']
(The names could probably be chosen to be more illuminating :)
I'm guessing you're working with regular expressions, so I think something like this could be made to work; I repeat the request for more background!

Resources