Slower execution when using an infinite list - performance

I'm beginning to try and get my head round haskell performance, and what makes things fast and slow, and I'm a little confused by this.
I have two implementations of a function that generates a list of primes up to a certain value. The first is straight off the Haskell wiki:
primesTo :: (Ord a, Num a, Enum a) => a -> [a]
primesTo m = eratos [2..m] where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])
The second is the same, but using an infinite list internally:
primes2 :: (Ord a, Num a, Enum a) => a -> [a]
primes2 m = takeWhile (<= m) (eratos [2..]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])
In both cases, the minus function is:
minus :: (Ord a) => [a] -> [a] -> [a]
minus (x:xs) (y:ys) = case (compare x y) of
LT -> x : minus xs (y:ys)
EQ -> minus xs ys
GT -> minus (x:xs) ys
minus xs _ = xs
The latter implementation is significantly (~100x) slower than the former, and I don't get why. I would have thought that haskell's lazy evalutation would make them fairly equivalent under the hood.
This is obviously a reduced test case for the purposes of the question - in real life the optimisation would be no problem (although I don't understand why it is needed), but to me a function that just generates an infinite list of primes is more generically useful than a finite list, but appears slower to work with.

Looks like to me that there's a big difference between
(xs `minus` [p*p, p*p+p..m]) -- primesTo
(xs `minus` [p*p, p*p+p..]) -- primes2
The function minus steps through lists pairwise and terminates when one list reaches the end. In the first minus expression above, this occurs in no more than (m-p*p)/p steps when the latter list is exhausted. In the second one, it will always take steps on the order of length xs.
So your infinite lists have disabled at least one meaningful optimization.

One difference is that in the second case you need to generate one extra prime. You need to generate the first prime greater than m before takeWhile knows its time to stop.
Additionally, the [..m] bounds on both the list to filter and the lists of multiples help reduce the number of calculations. Whenever one of these lists gets empty minus immediately returns via its secons clause while in the infinite case the minus gets stuck in the first case. You can explore this a bit better if you also test the cases where only one of the lists is infinite:
--this is also slow
primes3 :: (Ord a, Num a, Enum a) => a -> [a]
primes3 m = takeWhile (<= m) (eratos [2..m]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..])
--this fast
primes4 :: (Ord a, Num a, Enum a) => a -> [a]
primes4 m = takeWhile (<= m) (eratos [2..]) where
eratos [] = []
eratos (p:xs) = p : eratos (xs `minus` [p*p, p*p+p..m])

Related

Is there such a thing as maximumWith?

Specifically I'm searching for a function 'maximumWith',
maximumWith :: (Foldable f, Ord b) => (a -> b) -> f a -> a
Which behaves in the following way:
maximumWith length [[1, 2], [0, 1, 3]] == [0, 1, 3]
maximumWith null [[(+), (*)], []] == []
maximumWith (const True) x == head x
My use case is picking the longest word in a list.
For this I'd like something akin to maximumWith length.
I'd thought such a thing existed, since sortWith etc. exist.
Let me collect all the notes in the comments together...
Let's look at sort. There are 4 functions in the family:
sortBy is the actual implementation.
sort = sortBy compare uses Ord overloading.
sortWith = sortBy . comparing is the analogue of your desired maximumWith. However, this function has an issue. The ranking of an element is given by applying the given mapping function to it. However, the ranking is not memoized, so if an element needs to compared multiple times, the ranking will be recomputed. You can only use it guilt-free if the ranking function is very cheap. Such functions include selectors (e.g. fst), and newtype constructors. YMMV on simple arithmetic and data constructors. Between this inefficiency, the simplicity of the definition, and its location in GHC.Exts, it's easy to deduce that it's not used that often.
sortOn fixes the inefficiency by decorating each element with its image under the ranking function in a pair, sorting by the ranks, and then erasing them.
The first two have analogues in maximum: maximumBy and maximum. sortWith has no analogy; you may as well write out maximumBy (comparing _) every time. There is also no maximumOn, even though such a thing would be more efficient. The easiest way to define a maximumOn is probably just to copy sortOn:
maximumOn :: (Functor f, Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . maximumBy (comparing fst) . fmap annotate
where annotate e = let r = rank e in r `seq` (r, e)
There's a bit of interesting code in maximumBy that keeps this from optimizing properly on lists. It also works to use
maximumOn :: (Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . fromJust . foldl' max' Nothing
where max' Nothing x = let r = rank x in r `seq` Just (r, x)
max' old#(Just (ro, xo)) xn = let rn = rank xn
in case ro `compare` rn of
LT -> Just (rn, xo)
_ -> old
These pragmas may be useful:
{-# SPECIALIZE maximumOn :: Ord r => (a -> r) -> [a] -> a #-}
{-# SPECIALIZE maximumOn :: (a -> Int) -> [a] -> a #-}
HTNW has explained how to do what you asked, but I figured I should mention that for the specific application you mentioned, there's a way that's more efficient in certain cases (assuming the words are represented by Strings). Suppose you want
longest :: [[a]] -> [a]
If you ask for maximumOn length [replicate (10^9) (), []], then you'll end up calculating the length of a very long list unnecessarily. There are several ways to work around this problem, but here's how I'd do it:
data MS a = MS
{ _longest :: [a]
, _longest_suffix :: [a]
, _longest_bound :: !Int }
We will ensure that longest is the first of the longest strings seen thus far, and that longest_bound + length longest_suffix = length longest.
step :: MS a -> [a] -> MS a
step (MS longest longest_suffix longest_bound) xs =
go longest_bound longest_suffix xs'
where
-- the new list is not longer
go n suffo [] = MS longest suffo n
-- the new list is longer
go n [] suffn = MS xs suffn n
-- don't know yet
go !n (_ : suffo) (_ : suffn) =
go (n + 1) suffo suffn
xs' = drop longest_bound xs
longest :: [[a]] -> [a]
longest = _longest . foldl' step (MS [] [] 0)
Now if the second to longest list has q elements, we'll walk at most q conses into each list. This is the best possible complexity. Of course, it's only significantly better than the maximumOn solution when the longest list is much longer than the second to longest.

Haskell: Sort an almost-sorted array

I've been learning Haskell in my spare time working through LYAH. Would like to improve upon my Haskell (/ Functional programming) skills by solving some problems from the imperative world. One of the problems from EPI is to print an "almost sorted array", in a sorted fashion where it is guaranteed that no element in the array is more than k away from its correct position. The input is a stream of elements and the requirement is to do this in O(n log k) time complexity and O(k) space complexity.
I've attempted to re-implement the imperative solution in Haskell as follows:
import qualified Data.Heap as Heap
-- print the k-sorted list in a sorted fashion
ksorted :: (Ord a, Show a) => [a] -> Int -> IO ()
ksorted [] _ = return ()
ksorted xs k = do
heap <- ksorted' xs Heap.empty
mapM_ print $ (Heap.toAscList heap) -- print the remaining elements in the heap.
where
ksorted' :: (Ord a, Show a) => [a] -> Heap.MinHeap a -> IO (Heap.MinHeap a)
ksorted' [] h = return h
ksorted' (x:xs) h = do let (m, h') = getMinAndBuildHeap h x in
(printMin m >> ksorted' xs h')
printMin :: (Show a) => Maybe a -> IO ()
printMin m = case m of
Nothing -> return ()
(Just item) -> print item
getMinAndBuildHeap :: (Ord a, Show a) => Heap.MinHeap a -> a -> (Maybe a, Heap.MinHeap a)
getMinAndBuildHeap h item= if (Heap.size h) > k
then ((Heap.viewHead h), (Heap.insert item (Heap.drop 1 h)))
else (Nothing, (Heap.insert item h))
I would like to know a better way of solving this in Haskell. Any inputs would be appreciated.
[Edit 1]: The input is stream, but for now I assumed a list instead (with only a forward iterator/ input iterator in some sense.)
[Edit 2]: added Data.Heap import to the code.
Thanks.
I think the main improvement is to separate the production of the sorted list from the printing of the sorted list. So:
import Data.Heap (MinHeap)
import qualified Data.Heap as Heap
ksort :: Ord a => Int -> [a] -> [a]
ksort k xs = go (Heap.fromList b) e where
(b, e) = splitAt (k-1) xs
go :: Ord a => MinHeap a -> [a] -> [a]
go heap [] = Heap.toAscList heap
go heap (x:xs) = x' : go heap' xs where
Just (x', heap') = Heap.view (Heap.insert x heap)
printKSorted :: (Ord a, Show a) => Int -> [a] -> IO ()
printKSorted k xs = mapM_ print (ksort k xs)
If I were feeling extra-special-fancy, I might try to turn go into a foldr or perhaps a mapAccumR, but in this case I think the explicit recursion is relatively readable, too.

Generalizing a combinatoric function?

I've been solving a few combinatoric problems on Haskell, so I wrote down those 2 functions:
permutations :: (Eq a) => [a] -> [[a]]
permutations [] = [[]]
permutations list = do
x <- list
xs <- permutations (filter (/= x) list)
return (x : xs)
combinations :: (Eq a, Ord a) => Int -> [a] -> [[a]]
combinations 0 _ = [[]]
combinations n list = do
x <- list
xs <- combinations (n-1) (filter (> x) list)
return (x : xs)
Which works as follows:
*Main> permutations [1,2,3]
[[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]]
*Main> combinations 2 [1,2,3,4]
[[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
Those were uncomfortably similar, so I had to abstract it. I wrote the following abstraction:
combinatoric next [] = [[]]
combinatoric next list = do
x <- list
xs <- combinatoric next (next x list)
return (x : xs)
Which receives a function that controls how to filter the elements of the list. It can be used to easily define permutations:
permutations :: (Eq a) => [a] -> [[a]]
permutations = combinatoric (\ x ls -> filter (/= x) ls)
But I couldn't define combinations this way since it carries an state (n). I could extend the combinatoric with an additional state argument, but that'd become too clunky and I remember such approach was not necessary in a somewhat similar situation. Thus, I wonder: is it possible to define combinations using combinatorics? If not, what is a better abstraction of combinatorics which successfully subsumes both functions?
This isn't a direct answer to your question (sorry), but I don't think your code is correct. The Eq and Ord constraints tipped me off - they shouldn't be necessary - so I wrote a couple of QuickCheck properties.
prop_numberOfPermutations xs = length (permutations xs) === factorial (length xs)
where _ = (xs :: [Int]) -- force xs to be instantiated to [Int]
prop_numberOfCombinations (Positive n) (NonEmpty xs) = n <= length xs ==>
length (combinations n xs) === choose (length xs) n
where _ = (xs :: [Int])
factorial :: Int -> Int
factorial x = foldr (*) 1 [1..x]
choose :: Int -> Int -> Int
choose n 0 = 1
choose 0 r = 0
choose n r = choose (n-1) (r-1) * n `div` r
The first property checks that the number of permutations of a list of length n is n!. The second checks that the number of r-combinations of a list of length n is C(n, r). Both of these properties fail when I run them against your definitions:
ghci> quickCheck prop_numberOfPermutations
*** Failed! Falsifiable (after 5 tests and 4 shrinks):
[0,0,0]
3 /= 6
ghci> quickCheck prop_numberOfCombinations
*** Failed! Falsifiable (after 4 tests and 1 shrink):
Positive {getPositive = 2}
NonEmpty {getNonEmpty = [3,3]}
0 /= 1
It looks like your functions fail when the input list contains duplicate elements. Writing an abstraction for an incorrect implementation isn't a good idea - don't try and run before you can walk! You might find it helpful to read the source code for the standard library's definition of permutations, which does not have an Eq constraint.
First let's improve the original functions. You assume that all elements are distinct wrt their equality for permutations, and that they're distinct and have an ordering for combinations. These constraints aren't necessary and as described in the other answer, the code can produce wrong results. Following the robustness principle, let's accept just unconstrained lists. For this we'll need a helper function that produces all possible splits of a list:
split :: [a] -> [([a], a, [a])]
split = loop []
where
loop _ [] = []
loop rs (x:xs) = (rs, x, xs) : loop (x:rs) xs
Note that the implementation causes prefixes returned by this function to be reversed, but it's nothing we require.
This allows us to write generic permutations and combinations.
permutations :: [a] -> [[a]]
permutations [] = [[]]
permutations list = do
(pre, x, post) <- split list
-- reversing 'pre' isn't really necessary, but makes the output
-- order natural
xs <- permutations (reverse pre ++ post)
return (x : xs)
combinations :: Int -> [a] -> [[a]]
combinations 0 _ = [[]]
combinations n list = do
(_, x, post) <- split list
xs <- combinations (n-1) post
return (x : xs)
Now what they have in common:
At each step they pick an element to output,
update the list of elements to pick from and
stop after some condition is met.
The last point is a bit problematic, as for permutations we end once the list to choose from is empty, while for combinations we have a counter. This is probably the reason why it was difficult to generalize. We can work around this by realizing that for permutations the number of steps is equal to the length of the input list, so we can express the condition in the number of repetitions.
For such problems it's often very convenient to express them using StateT s [] monad, where s is the state we're working with. In our case it'll be the list of elements to choose from. The core of our combinatorial functions can be then expressed with StateT [a] [] a: pick an element from the state and update the state for the next step. Since the stateful computations all happen in the [] monad, we automatically branch all possibilities. With that, we can define a generic function:
import Control.Monad.State
combinatoric :: Int -> StateT [a] [] b -> [a] -> [[b]]
combinatoric n k = evalStateT $ replicateM n k
And then define permutations and combinations by specifying the appropriate number of repetitions and what's the core StateT [a] [] a function:
permutations' :: [a] -> [[a]]
permutations' xs = combinatoric (length xs) f xs
where
f = StateT $ map (\(pre, x, post) -> (x, reverse pre ++ post)) . split
combinations' :: Int -> [a] -> [[a]]
combinations' n xs = combinatoric n f xs
where
f = StateT $ map (\(_, x, post) -> (x, post)) . split

How to find a second largest number in a list in Haskell?

The question is like this:
Write a function that takes an integer list and returns its length and the
second largest integer in the list.
I can solve this with two functions but is there any solution that use only one function to do it?
Any help is appreciated!
Thanks!
Don't make it complicated.
f xs = (sort xs !! 1, length xs)
If you choose to make it complicated, make it complicated in the right way.
Edited to use #ThomasM.DuBuisson's suggestion
You can solve this the same way that you could finding the max: using a fold. Max can be pretty trivially implemented with
mymaximum :: Ord a => [a] -> a
mymaximum xs = foldl searcher (head xs) xs
where
searcher :: Ord a => a -> a -> a
searcher a b
| a > b = a
| otherwise = b
So we can implement it similarly by just keeping up with the two largest values (notice that we have to "seed" the fold with a tuple as well):
nextmaximum :: Ord a => [a] -> a
nextmaximum xs = fst $ foldl searcher (h, h) xs
where
h = head xs
searcher :: (Ord a) => (a, a) -> a -> (a, a)
searcher (s, f) x = (min f (max s x), max f x)
You can just compose individual functions together. This is neither efficient nor robust, but it's sure easy to write:
f xs = maximum . filter (< maximum xs) $ xs
head . (!!1) . group . sortBy (flip compare) $ [1,1,2,3,4,4,4,5,5,6,6,6,6]
5

Finding unique (as in only occurring once) element haskell

I need a function which takes a list and return unique element if it exists or [] if it doesn't. If many unique elements exists it should return the first one (without wasting time to find others).
Additionally I know that all elements in the list come from (small and known) set A.
For example this function does the job for Ints:
unique :: Ord a => [a] -> [a]
unique li = first $ filter ((==1).length) ((group.sort) li)
where first [] = []
first (x:xs) = x
ghci> unique [3,5,6,8,3,9,3,5,6,9,3,5,6,9,1,5,6,8,9,5,6,8,9]
ghci> [1]
This is however not good enough because it involves sorting (n log n) while it could be done in linear time (because A is small).
Additionally it requires the type of list elements to be Ord while all which should be needed is Eq. It would also be nice if amount of comparisons was as small as possible (ie if we traverse a list and encounter element el twice we don't test subsequent elements for equality with el)
This is why for example this: Counting unique elements in a list doesn't solve the problem - all answers involve either sorting or traversing the whole list to find count of all elements.
The question is: how to do it correctly and efficiently in Haskell ?
Okay, linear time, from a finite domain. The running time will be O((m + d) log d), where m is the size of the list and d is the size of the domain, which is linear when d is fixed. My plan is to use the elements of the set as the keys of a trie, with the counts as values, then look through the trie for elements with count 1.
import qualified Data.IntTrie as IntTrie
import Data.List (foldl')
import Control.Applicative
Count each of the elements. This traverses the list once, builds a trie with the results (O(m log d)), then returns a function which looks up the result in the trie (with running time O(log d)).
counts :: (Enum a) => [a] -> (a -> Int)
counts xs = IntTrie.apply (foldl' insert (pure 0) xs) . fromEnum
where
insert t x = IntTrie.modify' (fromEnum x) (+1) t
We use the Enum constraint to convert values of type a to integers in order to index them in the trie. An Enum instance is part of the witness of your assumption that a is a small, finite set (Bounded would be the other part, but see below).
And then look for ones that are unique.
uniques :: (Eq a, Enum a) => [a] -> [a] -> [a]
uniques dom xs = filter (\x -> cts x == 1) dom
where
cts = counts xs
This function takes as its first parameter an enumeration of the entire domain. We could have required a Bounded a constraint and used [minBound..maxBound] instead, which is semantically appealing to me since finite is essentially Enum+Bounded, but quite inflexible since now the domain needs to be known at compile time. So I would choose this slightly uglier but more flexible variant.
uniques traverses the domain once (lazily, so head . uniques dom will only traverse as far as it needs to to find the first unique element -- not in the list, but in dom), for each element running the lookup function which we have established is O(log d), so the filter takes O(d log d), and building the table of counts takes O(m log d). So uniques runs in O((m + d) log d), which is linear when d is fixed. It will take at least Ω(m log d) to get any information from it, because it has to traverse the whole list to build the table (you have to get all the way to the end of the list to see if an element was repeated, so you can't do better than this).
There really isn't any way to do this efficiently with just Eq. You'd need to use some much less efficient way to build the groups of equal elements, and you can't know that only one of a particular element exists without scanning the whole list.
Also, note that to avoid useless comparisons you'd need a way of checking to see if an element has been encountered before, and the only way to do that would be to have a list of elements known to have multiple occurrences, and the only way to check if the current element is in that list is... to compare it for equality with each.
If you want this to work faster than O(something really horrible) you need that Ord constraint.
Ok, based on the clarifications in comments, here's a quick and dirty example of what I think you're looking for:
unique [] _ _ = Nothing
unique _ [] [] = Nothing
unique _ (r:_) [] = Just r
unique candidates results (x:xs)
| x `notElem` candidates = unique candidates results xs
| x `elem` results = unique (delete x candidates) (delete x results) xs
| otherwise = unique candidates (x:results) xs
The first argument is a list of candidates, which should initially be all possible elements. The second argument is the list of possible results, which should initially be empty. The third argument is the list to examine.
If it runs out of candidates, or reaches the end of the list with no results, it returns Nothing. If it reaches the end of the list with results, it returns the one at the front of the result list.
Otherwise, it examines the next input element: If it's not a candidate, it ignores it and continues. If it's in the result list we've seen it twice, so remove it from the result and candidate lists and continue. Otherwise, add it to the results and continue.
Unfortunately, this still has to scan the entire list for even a single result, since that's the only way to be sure it's actually unique.
First off, if your function is intended to return at most one element, you should almost certainly use Maybe a instead of [a] to return your result.
Second, at minimum, you have no choice but to traverse the entire list: you can't tell for sure if any given element is actually unique until you've looked at all the others.
If your elements are not Ordered, but can only be tested for Equality, you really have no better option than something like:
firstUnique (x:xs)
| elem x xs = firstUnique (filter (/= x) xs)
| otherwise = Just x
firstUnique [] = Nothing
Note that you don't need to filter out the duplicated elements if you don't want to -- the worst case is quadratic either way.
Edit:
The above misses the possibility of early exit due to the above-mentioned small/known set of possible elements. However, note that the worst case will still require traversing the entire list: all that is necessary is for at least one of these possible elements to be missing from the list...
However, an implementation that provides an early out in case of set exhaustion:
firstUnique = f [] [<small/known set of possible elements>] where
f [] [] _ = Nothing -- early out
f uniques noshows (x:xs)
| elem x uniques = f (delete x uniques) noshows xs
| elem x noshows = f (x:uniques) (delete x noshows) xs
| otherwise = f uniques noshows xs
f [] _ [] = Nothing
f (u:_) _ [] = Just u
Note that if your list has elements which shouldn't be there (because they aren't in the small/known set), they will be pointedly ignored by the above code...
As others have said, without any additional constraints, you can't do this in less than quadratic time, because without knowing something about the elements, you can't keep them in some reasonable data structure.
If we are able to compare elements, an obvious O(n log n) solution to compute the count of elements first and then find the first one with count equal to 1:
import Data.List (foldl', find)
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe (fromMaybe)
count :: (Ord a) => Map a Int -> a -> Int
count m x = fromMaybe 0 $ Map.lookup x m
add :: (Ord a) => Map a Int -> a -> Map a Int
add m x = Map.insertWith (+) x 1 m
uniq :: (Ord a) => [a] -> Maybe a
uniq xs = find (\x -> count cs x == 1) xs
where
cs = foldl' add Map.empty xs
Note that the log n factor comes from the fact that we need to operate on a Map of size n. If the list has only k unique elements then the size of our map will be at most k, so the overall complexity will be just O(n log k).
However, we can do even better - we can use a hash table instead of a map to get an O(n) solution. For this we'll need the ST monad to perform mutable operations on the hash map, and our elements will have to be Hashable. The solution is basically the same as before, just a little bit more complex due to working within the ST monad:
import Control.Monad
import Control.Monad.ST
import Data.Hashable
import qualified Data.HashTable.ST.Basic as HT
import Data.Maybe (fromMaybe)
count :: (Eq a, Hashable a) => HT.HashTable s a Int -> a -> ST s Int
count ht x = liftM (fromMaybe 0) (HT.lookup ht x)
add :: (Eq a, Hashable a) => HT.HashTable s a Int -> a -> ST s ()
add ht x = count ht x >>= HT.insert ht x . (+ 1)
uniq :: (Eq a, Hashable a) => [a] -> Maybe a
uniq xs = runST $ do
-- Count all elements into a hash table:
ht <- HT.newSized (length xs)
forM_ xs (add ht)
-- Find the first one with count 1
first (\x -> liftM (== 1) (count ht x)) xs
-- Monadic variant of find which exists once an element is found.
first :: (Monad m) => (a -> m Bool) -> [a] -> m (Maybe a)
first p = f
where
f [] = return Nothing
f (x:xs') = do
b <- p x
if b then return (Just x)
else f xs'
Notes:
If you know that there will be only a small number of distinct elements in the list, you could use HT.new instead of HT.newSized (length xs). This will save you some memory and one pass over xs but in the case of many distinct elements the hash table will be have to resized several times.
Here is a version that does the trick:
unique :: Eq a => [a] -> [a]
unique = select . collect []
where
collect acc [] = acc
collect acc (x : xs) = collect (insert x acc) xs
insert x [] = [[x]]
insert x (ys#(y : _) : yss)
| x == y = (x : ys) : yss
| otherwise = ys : insert x yss
select [] = []
select ([x] : _) = [x]
select ((_ : _) : xss) = select xss
So, first we traverse the input list (collect) while maintaining a list of buckets of equal elements that we update with insert. Then we simply select the first element that appears in a singleton bucket (select).
The bad news is that this takes quadratic time: for every visited element in collect we need to go over the list of buckets. I am afraid that is the price you will have to pay for only being able to constrain the element type to be in Eq.
Something like this look pretty good.
unique = fst . foldl' (\(a, b) c -> if (c `elem` b)
then (a, b)
else if (c `elem` a)
then (delete c a, c:b)
else (c:a, b)) ([],[])
The first element of the resulted tuple of the fold, contain what you are expecting, a list containing unique element. The second element of the tuple is the memory of the process remembered if an element has already been discarded or not.
About space performance.
As your problem is design, all the element of the list should be traversed at least one time, before a result can be display. And the internal algorithm must keep trace of discarded value in addition to the good one, but discarded value will appears only one time. Then in the worst case the required amount of memory is equal to the size of the inputted list. This sound goods as you said that expected input are small.
About time performance.
As the expected input are small and not sorted by default, trying to sort the list into the algorithm is useless, or before to apply it is useless. In fact statically we can almost said, that the extra operation to place an element at its ordered place (into the sub list a and b of the tuple (a,b)) will cost the same amount of time than to check if this element appear into the list or not.
Below a nicer and more explicit version of the foldl' one.
import Data.List (foldl', delete, elem)
unique :: Eq a => [a] -> [a]
unique = fst . foldl' algorithm ([], [])
where
algorithm (result0, memory0) current =
if (current `elem` memory0)
then (result0, memory0)
else if (current`elem` result0)
then (delete current result0, memory)
else (result, memory0)
where
result = current : result0
memory = current : memory0
Into the nested if ... then ... else ... instruction the list result is traversed twice in the worst case, this can be avoid using the following helper function.
unique' :: Eq a => [a] -> [a]
unique' = fst . foldl' algorithm ([], [])
where
algorithm (result, memory) current =
if (current `elem` memory)
then (result, memory)
else helper current result memory []
where
helper current [] [] acc = ([current], [])
helper current [] memory acc = (acc, memory)
helper current (r:rs) memory acc
| current == r = (acc ++ rs, current:memory)
| otherwise = helper current rs memory (r:acc)
But the helper can be rewrite using fold as follow, which is definitely nicer.
helper current [] _ = ([current],[])
helper current memory result =
foldl' (\(r, m) x -> if x==current
then (r, current:m)
else (current:r, m)) ([], memory) $ result

Resources