Haskell any way to improve this code - performance

Hey I have implemented this code segment as a move ordering system for a alpha-beta pruning function. It does speed up my code by a little but when I profiled my code I saw it was very clunky.
move_ord [] (primary_ord,secondary_ord) = primary_ord ++ secondary_ord
move_ord (y:ys) (primary_ord,secondary_ord) = case no_of_pebbles state y of
0 -> move_ord ys (primary_ord,secondary_ord)
13 -> move_ord ys (y : primary_ord,secondary_ord)
x
| 7 - y == x -> move_ord ys (y : primary_ord,secondary_ord)
| otherwise -> move_ord ys (primary_ord,y : secondary_ord)
It is meant to place moves with specific pebble values (13 and 7-y==x) at the front of the list. While also filtering out illegal moves of 0 pebbles.
Pebbles are stored as Int. y is a Int.
Thank you in advance.

Does the order in which the elements of primary_ord appear matter?
No it does not. I am ordering branches to check first for alpha-beta cutoffs. The cases I outlined have a higher probability of triggering a pruning on the next branch evaluated. Though since I have no other information they can be in any order as long as they appear in front of the other cases.
In that case, you should deliver the good ones as soon as you find them, and only defer delivering the bad ones.
If move_ord is - except in the recursive calls - only called with ([],[]) as the second argument, I'd recommend
move_ord = go []
where
go acc (y:ys) = case no_of_pebbles state y of
0 -> go acc ys
13 -> y : go acc ys
x | x == 7-y -> y : go acc ys
| otherwise -> go (y:acc) ys
go acc _ = acc
Thus a) you can run in smaller space (unless the consumer accumulates the entire result) and b) the consumer need not wait for the entire list to be traversed before it can start working.
Of course, if there are only very few or even none "good" ys, it may not make a difference, and if the consumer needs the entire list before it can do anything neither. But usually, that should improve matters somewhat. Otherwise, there is not much that can be done in this function, no_of_pebbles would be what uses the most resources here.
If move_ord can be called with non-empty primary_ord or secondary_ord, use a wrapper
move_ord xs (primary, secondary) = primary ++ go secondary xs
where
go acc ... -- as above

I'm assuming that move_ord starts out being called as move_ord ys ([], []). We then have a streaming filter pattern on Either.
import Data.Either
sorter :: (a -> Bool) -> [a] -> [Either a a]
sorter p = map go where go x = if p x then Left x else Right x
then,
uncurry (++)
. partitionEithers
. sorter (\x -> no_of_pebbles x == 13 || 7 - x == no_of_pebbles x)
. filter (\x -> no_of_pebbles x != 0)
Which is still a little ugly because we keep computing no_of_pebbles in various places. This might be alright for documentation purposes, but we could also precompute no_of_pebbles.
uncurry (++)
. partitionEithers
. sorter (\(x, num) -> num == 13 || 7 - x == num)
. filter ((!=0) . snd)
. map (\x -> (x, no_of_pebbles x))

Specialization.
As you use literal constant the compiler will infer default Integer type not Int.
Then you need to specialize the type signature of your function, like so.
move_ord :: [Int] -> ([Int], [Int]) -> ([Int], [Int])
Memoization.
Your input list can contain duplicate element, then two strategy are possible.
Memoize your call of no_of_pebbles, it will save you extract computation, or you can sort and remove the duplicate of your input list before processing it.
Return a tuple.
You accumulate the response as a tuple then maybe you should return it as is.
Trying to merge the two element of it into the function seems to be out of scope.
Should be manage later in your code, and it's good to know that list store in tuple are common data type know as dlist.

Related

Is there such a thing as maximumWith?

Specifically I'm searching for a function 'maximumWith',
maximumWith :: (Foldable f, Ord b) => (a -> b) -> f a -> a
Which behaves in the following way:
maximumWith length [[1, 2], [0, 1, 3]] == [0, 1, 3]
maximumWith null [[(+), (*)], []] == []
maximumWith (const True) x == head x
My use case is picking the longest word in a list.
For this I'd like something akin to maximumWith length.
I'd thought such a thing existed, since sortWith etc. exist.
Let me collect all the notes in the comments together...
Let's look at sort. There are 4 functions in the family:
sortBy is the actual implementation.
sort = sortBy compare uses Ord overloading.
sortWith = sortBy . comparing is the analogue of your desired maximumWith. However, this function has an issue. The ranking of an element is given by applying the given mapping function to it. However, the ranking is not memoized, so if an element needs to compared multiple times, the ranking will be recomputed. You can only use it guilt-free if the ranking function is very cheap. Such functions include selectors (e.g. fst), and newtype constructors. YMMV on simple arithmetic and data constructors. Between this inefficiency, the simplicity of the definition, and its location in GHC.Exts, it's easy to deduce that it's not used that often.
sortOn fixes the inefficiency by decorating each element with its image under the ranking function in a pair, sorting by the ranks, and then erasing them.
The first two have analogues in maximum: maximumBy and maximum. sortWith has no analogy; you may as well write out maximumBy (comparing _) every time. There is also no maximumOn, even though such a thing would be more efficient. The easiest way to define a maximumOn is probably just to copy sortOn:
maximumOn :: (Functor f, Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . maximumBy (comparing fst) . fmap annotate
where annotate e = let r = rank e in r `seq` (r, e)
There's a bit of interesting code in maximumBy that keeps this from optimizing properly on lists. It also works to use
maximumOn :: (Foldable f, Ord r) => (a -> r) -> f a -> a
maximumOn rank = snd . fromJust . foldl' max' Nothing
where max' Nothing x = let r = rank x in r `seq` Just (r, x)
max' old#(Just (ro, xo)) xn = let rn = rank xn
in case ro `compare` rn of
LT -> Just (rn, xo)
_ -> old
These pragmas may be useful:
{-# SPECIALIZE maximumOn :: Ord r => (a -> r) -> [a] -> a #-}
{-# SPECIALIZE maximumOn :: (a -> Int) -> [a] -> a #-}
HTNW has explained how to do what you asked, but I figured I should mention that for the specific application you mentioned, there's a way that's more efficient in certain cases (assuming the words are represented by Strings). Suppose you want
longest :: [[a]] -> [a]
If you ask for maximumOn length [replicate (10^9) (), []], then you'll end up calculating the length of a very long list unnecessarily. There are several ways to work around this problem, but here's how I'd do it:
data MS a = MS
{ _longest :: [a]
, _longest_suffix :: [a]
, _longest_bound :: !Int }
We will ensure that longest is the first of the longest strings seen thus far, and that longest_bound + length longest_suffix = length longest.
step :: MS a -> [a] -> MS a
step (MS longest longest_suffix longest_bound) xs =
go longest_bound longest_suffix xs'
where
-- the new list is not longer
go n suffo [] = MS longest suffo n
-- the new list is longer
go n [] suffn = MS xs suffn n
-- don't know yet
go !n (_ : suffo) (_ : suffn) =
go (n + 1) suffo suffn
xs' = drop longest_bound xs
longest :: [[a]] -> [a]
longest = _longest . foldl' step (MS [] [] 0)
Now if the second to longest list has q elements, we'll walk at most q conses into each list. This is the best possible complexity. Of course, it's only significantly better than the maximumOn solution when the longest list is much longer than the second to longest.

a haskell function to test if an integer appears after another integer

I'm writing a function called after which takes a list of integers and two integers as parameters. after list num1 num2 should return True if num1 occurs in the list and num2 occurs in list afternum1. (Not necessarily immediately after).
after::[Int]->Int->Int->Bool
after [] _ _=False
after [x:xs] b c
|x==b && c `elem` xs =True
|x/=b && b `elem` xs && b `elem` xs=True
This is what I have so far,my biggest problem is that I don't know how to force num2 to be after num1.
There's a few different ways to approach this one; while it's tempting to go straight for recursion on this, it's nice to
avoid using recursion explicitly if there's another option.
Here's a simple version using some list utilities. Note that it's a Haskell idiom that the object we're operating over is usually the last argument. In this case switching the arguments lets us write it as a pipeline with it's third argument (the list) passed implicitly:
after :: Int -> Int -> [Int] -> Bool
after a b = elem b . dropWhile (/= a)
Hopefully this is pretty easy to understand; we drop elements of the list until we hit an a, assuming we find one we check if there's a b in the remaining list. If there was no a, this list is [] and obviously there's no b there, so it returns False as expected.
You haven't specified what happens if 'a' and 'b' are equal, so I'll leave it up to you to adapt it for that case. HINT: add a tail somewhere ;)
Here are a couple of other approaches if you're interested:
This is pretty easily handled using a fold;
We have three states to model. Either we're looking for the first elem, or
we're looking for the second elem, or we've found them (in the right order).
data State =
FindA | FindB | Found
deriving Eq
Then we can 'fold' (aka reduce) the list down to the result of whether it matches or not.
after :: Int -> Int -> [Int] -> Bool
after a b xs = foldl go FindA xs == Found
where
go FindA x = if x == a then FindB else FindA
go FindB x = if x == b then Found else FindB
go Found _ = Found
You can also do it recursively if you like:
after :: Int -> Int -> [Int] -> Bool
after _ _ [] = False
after a b (x:xs)
| x == a = b `elem` xs
| otherwise = after a b xs
Cheers!
You can split it into two parts: the first one will find the first occurrence of num1. After that, you just need to drop all elements before it and just check that num2 is in the remaining part of the list.
There's a standard function elemIndex for the first part. The second one is just elem.
import Data.List (elemIndex)
after xs x y =
case x `elemIndex` xs of
Just i -> y `elem` (drop (i + 1) xs)
Nothing -> False
If you'd like to implement it without elem or elemIndex, you could include a subroutine. Something like:
after xs b c = go xs False
where go (x:xs) bFound
| x == b && not (null xs) = go xs True
| bFound && x == c = True
| null xs = False
| otherwise = go xs bFound

Performance gain implementing concatMap with foldl' for finite list?

I read from Foldr Foldl Foldl' that foldl' is more efficient for long finite lists because of the strictness property. I am aware that it is not suitable for infinite list.
Thus, I am limiting the comparison only for long finite lists.
concatMap
concatMap is implemented using foldr, which gives it laziness. However, using it with long finite lists will build up a long unreduced chain according to the article.
concatMap :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap f xs = build (\c n -> foldr (\x b -> foldr c b (f x)) n xs)
Thus I come up with the following implementation with use of foldl'.
concatMap' :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap' f = reverse . foldl' (\acc x -> f x ++ acc) []
Test it out
I have build the following two functions to test out the performance.
lastA = last . concatMap (: []) $ [1..10000]
lastB = last . concatMap' (: []) $ [1..10000]
However, I was shocked by the results.
lastA:
(0.23 secs, 184,071,944 bytes)
(0.24 secs, 184,074,376 bytes)
(0.24 secs, 184,071,048 bytes)
(0.24 secs, 184,074,376 bytes)
(0.25 secs, 184,075,216 bytes)
lastB:
(0.81 secs, 224,075,080 bytes)
(0.76 secs, 224,074,504 bytes)
(0.78 secs, 224,072,888 bytes)
(0.84 secs, 224,073,736 bytes)
(0.79 secs, 224,074,064 bytes)
Follow-up Questions
concatMap outcompetes my concatMap' in both time and memory. I wonder there are mistakes I made in my concatMap' implementation.
Thus, I doubt the articles for stating the goodness of foldl'.
Are there any black magic in concatMap to make it so efficient?
Is it true that foldl' is more efficient for long finite list?
Is it true that using foldr with long finite lists will build up a long unreduced chain and impact the performance?
Are there any black magic in concatMap to make it so efficient?
No, not really.
Is it true that foldl' is more efficient for long finite list?
Not always. It depends on the folding function.
The point is, foldl and foldl' always have to scan the whole input list before producing the output. Instead, foldr does not always have to.
As an extreme case, consider
foldr (\x xs -> x) 0 [10..10000000]
which evaluates to 10 instantly -- only the first element of the list is evaluated. The reduction goes something like
foldr (\x xs -> x) 0 [10..10000000]
= foldr (\x xs -> x) 0 (10 : [11..10000000])
= (\x xs -> x) 10 (foldr (\x xs -> x) 0 [11..10000000])
= (\xs -> 10) (foldr (\x xs -> x) 0 [11..10000000])
= 10
and the recursive call is not evaluated thanks to laziness.
In general, when computing foldr f a xs, it is important to check whether f y ys is able to construct a part of the output before evaluating ys. For instance
foldr f [] xs
where f y ys = (2*y) : ys
produces a list cell _ : _ before evaluating 2*y and ys. This makes it an excellent candidate for foldr.
Again, we can define
map f xs = foldr (\y ys -> f y : ys) [] xs
which runs just fine. It consumes one element from xs and outputs the first output cell. Then it consumes the next element, outputs the next element, and so on. Using foldl' would not output anything until the whole list is processed, making the code quite inefficient.
Instead, if we wrote
sum xs = foldr (\y ys -> y+ys) 0 xs
then we do not output anything after the first element of xs is consumed.
We build a long chain of thunks, wasting a lot of memory.
Here, foldl' would instead work in constant space.
Is it true that using foldr with long finite lists will build up a long unreduced chain and impact the performance?
Not always. It strongly depends on how the output is consumed by the caller.
As a thumb rule, if the output is "atomic", meaning that the output consumer can not observe only a part of it (e.g. Bool, Int, ...) then it's better to use foldl'. If the output is "composed" of many independent values (list, trees, ...) probably foldr is a better choice, if f can produce its output step-by-step, in a "streaming" fashion.

Recursion confusion in Haskell again - subsets with an inclusion test

I'm testing a simple program to generate subsets with an inclusion test. For example, given
*Main Data.List> factorsets 7
[([2],2),([2,3],1),([3],1),([5],1),([7],1)]
calling chooseP 3 (factorsets 7), I would like to get (read from right to left, a la cons)
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([2],2)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
But my program is returning an extra [([7],1),([5],1),([3],1)] (and missing a [([7],1),([5],1),([2],2)]):
[[([5],1),([3],1),([2],2)]
,[([7],1),([3],1),([2],2)]
,[([7],1),([5],1),([3],1)]
,[([7],1),([5],1),([2,3],1)]
,[([7],1),([5],1),([3],1)]]
The inclusion test is: members' first part of the tuple must have a null intersection.
Once tested as working, the plan is to sum the internal products of each subset's snds, rather than accumulate them.
Since I've asked a similar question before, I imagine that an extra branch is generated since when the recursion splits at [2,3], the second branch runs over the same possibilities once it passes the skipped section. Any pointers on how to resolve that would be appreciated; and if you'd like to share ideas about how to enumerate and sum such product combinations more efficiently, that would be great, too.
Haskell code:
chooseP k xs = chooseP' xs [] 0 where
chooseP' [] product count = if count == k then [product] else []
chooseP' yys product count
| count == k = [product]
| null yys = []
| otherwise = f ++ g
where (y:ys) = yys
(factorsY,numY) = y
f = let zzs = dropWhile (\(fs,ns) -> not . and . map (null . intersect fs . fst) $ product) yys
in if null zzs
then chooseP' [] product count
else let (z:zs) = zzs in chooseP' zs (z:product) (count + 1)
g = if and . map (null . intersect factorsY . fst) $ product
then chooseP' ys product count
else chooseP' ys [] 0
Your code is complicated enough that I might recommend starting over. Here's how I would proceed.
Write a specification. Let it be as stupidly inefficient as necessary -- for example, the spec I choose below will build all combinations of k elements from the list, then filter out the bad ones. Even the filter will be stupidly slow.
sorted xs = sort xs == xs
unique xs = nub xs == xs
disjoint xs = and $ liftM2 go xs xs where
go x1 x2 = x1 == x2 || null (intersect x1 x2)
-- check that x is valid according to all the validation functions in fs
-- (there are other fun ways to spell this, but this is particularly
-- readable and clearly correct -- just what we want from a spec)
allFuns fs x = all ($x) fs
choosePSpec k = filter good . replicateM k where
good pairs = allFuns [unique, disjoint, sorted] (map fst pairs)
Just to make sure it's right, we can test it at the prompt:
*Main> mapM_ print $ choosePSpec 3 [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
[([2],2),([3],1),([5],1)]
[([2],2),([3],1),([7],1)]
[([2],2),([5],1),([7],1)]
[([2,3],1),([5],1),([7],1)]
[([3],1),([5],1),([7],1)]
Looks good.
Now that we have a spec, we can try to improve the speed one refactoring at a time, always checking that it matches the spec. The first thing I'd want to do is notice that we can ensure uniqueness and sortedness just by sorting the input and picking things "in an increasing way". To do this, we can define a function which chooses subsequences of a given length. It piggy-backs on the tails function, which you can think of as nondeterministically choosing a place to split its input list.
subseq 0 xs = [[]]
subseq n xs = do
x':xt <- tails xs
xs' <- subseq (n-1) xt
return (x':xs')
Here's an example of this function in action:
*Main> subseq 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Now we can write a slightly faster chooseP by replacing replicateM with subseq. Recall that we're assuming the inputs are already sorted and unique, though.
choosePSlow k = filter good . subseq k where
good pairs = disjoint $ map fst pairs
We can sanity-check that it's working by running it on the particular input we have from above:
*Main> let i = [([2],2),([2,3],1),([3],1),([5],1),([7],1)]
*Main> choosePSlow 3 i == choosePSpec 3 i
True
Or, better yet, we can stress-test it with QuickCheck. We'll need a tiny bit more code. The condition k < 5 is just because the spec is so hopelessly slow that bigger values of k take forever.
propSlowMatchesSpec :: NonNegative Int -> OrderedList ([Int], Int) -> Property
propSlowMatchesSpec (NonNegative k) (Ordered xs)
= k < 5 && unique (map fst xs)
==> choosePSlow k xs == choosePSpec k xs
*Main> quickCheck propSlowMatchesSpec
+++ OK, passed 100 tests.
There are several more opportunities to make things faster. For instance, the disjoint test could be sped up using choose 2 instead of liftM2; or we might be able to ensure disjointness during element selection and prune the search even earlier; etc. How you want to improve it from here I leave to you -- but the basic technique (start with stupid and slow, then make it smarter, testing as you go) should be helpful to you.

Finding unique (as in only occurring once) element haskell

I need a function which takes a list and return unique element if it exists or [] if it doesn't. If many unique elements exists it should return the first one (without wasting time to find others).
Additionally I know that all elements in the list come from (small and known) set A.
For example this function does the job for Ints:
unique :: Ord a => [a] -> [a]
unique li = first $ filter ((==1).length) ((group.sort) li)
where first [] = []
first (x:xs) = x
ghci> unique [3,5,6,8,3,9,3,5,6,9,3,5,6,9,1,5,6,8,9,5,6,8,9]
ghci> [1]
This is however not good enough because it involves sorting (n log n) while it could be done in linear time (because A is small).
Additionally it requires the type of list elements to be Ord while all which should be needed is Eq. It would also be nice if amount of comparisons was as small as possible (ie if we traverse a list and encounter element el twice we don't test subsequent elements for equality with el)
This is why for example this: Counting unique elements in a list doesn't solve the problem - all answers involve either sorting or traversing the whole list to find count of all elements.
The question is: how to do it correctly and efficiently in Haskell ?
Okay, linear time, from a finite domain. The running time will be O((m + d) log d), where m is the size of the list and d is the size of the domain, which is linear when d is fixed. My plan is to use the elements of the set as the keys of a trie, with the counts as values, then look through the trie for elements with count 1.
import qualified Data.IntTrie as IntTrie
import Data.List (foldl')
import Control.Applicative
Count each of the elements. This traverses the list once, builds a trie with the results (O(m log d)), then returns a function which looks up the result in the trie (with running time O(log d)).
counts :: (Enum a) => [a] -> (a -> Int)
counts xs = IntTrie.apply (foldl' insert (pure 0) xs) . fromEnum
where
insert t x = IntTrie.modify' (fromEnum x) (+1) t
We use the Enum constraint to convert values of type a to integers in order to index them in the trie. An Enum instance is part of the witness of your assumption that a is a small, finite set (Bounded would be the other part, but see below).
And then look for ones that are unique.
uniques :: (Eq a, Enum a) => [a] -> [a] -> [a]
uniques dom xs = filter (\x -> cts x == 1) dom
where
cts = counts xs
This function takes as its first parameter an enumeration of the entire domain. We could have required a Bounded a constraint and used [minBound..maxBound] instead, which is semantically appealing to me since finite is essentially Enum+Bounded, but quite inflexible since now the domain needs to be known at compile time. So I would choose this slightly uglier but more flexible variant.
uniques traverses the domain once (lazily, so head . uniques dom will only traverse as far as it needs to to find the first unique element -- not in the list, but in dom), for each element running the lookup function which we have established is O(log d), so the filter takes O(d log d), and building the table of counts takes O(m log d). So uniques runs in O((m + d) log d), which is linear when d is fixed. It will take at least Ω(m log d) to get any information from it, because it has to traverse the whole list to build the table (you have to get all the way to the end of the list to see if an element was repeated, so you can't do better than this).
There really isn't any way to do this efficiently with just Eq. You'd need to use some much less efficient way to build the groups of equal elements, and you can't know that only one of a particular element exists without scanning the whole list.
Also, note that to avoid useless comparisons you'd need a way of checking to see if an element has been encountered before, and the only way to do that would be to have a list of elements known to have multiple occurrences, and the only way to check if the current element is in that list is... to compare it for equality with each.
If you want this to work faster than O(something really horrible) you need that Ord constraint.
Ok, based on the clarifications in comments, here's a quick and dirty example of what I think you're looking for:
unique [] _ _ = Nothing
unique _ [] [] = Nothing
unique _ (r:_) [] = Just r
unique candidates results (x:xs)
| x `notElem` candidates = unique candidates results xs
| x `elem` results = unique (delete x candidates) (delete x results) xs
| otherwise = unique candidates (x:results) xs
The first argument is a list of candidates, which should initially be all possible elements. The second argument is the list of possible results, which should initially be empty. The third argument is the list to examine.
If it runs out of candidates, or reaches the end of the list with no results, it returns Nothing. If it reaches the end of the list with results, it returns the one at the front of the result list.
Otherwise, it examines the next input element: If it's not a candidate, it ignores it and continues. If it's in the result list we've seen it twice, so remove it from the result and candidate lists and continue. Otherwise, add it to the results and continue.
Unfortunately, this still has to scan the entire list for even a single result, since that's the only way to be sure it's actually unique.
First off, if your function is intended to return at most one element, you should almost certainly use Maybe a instead of [a] to return your result.
Second, at minimum, you have no choice but to traverse the entire list: you can't tell for sure if any given element is actually unique until you've looked at all the others.
If your elements are not Ordered, but can only be tested for Equality, you really have no better option than something like:
firstUnique (x:xs)
| elem x xs = firstUnique (filter (/= x) xs)
| otherwise = Just x
firstUnique [] = Nothing
Note that you don't need to filter out the duplicated elements if you don't want to -- the worst case is quadratic either way.
Edit:
The above misses the possibility of early exit due to the above-mentioned small/known set of possible elements. However, note that the worst case will still require traversing the entire list: all that is necessary is for at least one of these possible elements to be missing from the list...
However, an implementation that provides an early out in case of set exhaustion:
firstUnique = f [] [<small/known set of possible elements>] where
f [] [] _ = Nothing -- early out
f uniques noshows (x:xs)
| elem x uniques = f (delete x uniques) noshows xs
| elem x noshows = f (x:uniques) (delete x noshows) xs
| otherwise = f uniques noshows xs
f [] _ [] = Nothing
f (u:_) _ [] = Just u
Note that if your list has elements which shouldn't be there (because they aren't in the small/known set), they will be pointedly ignored by the above code...
As others have said, without any additional constraints, you can't do this in less than quadratic time, because without knowing something about the elements, you can't keep them in some reasonable data structure.
If we are able to compare elements, an obvious O(n log n) solution to compute the count of elements first and then find the first one with count equal to 1:
import Data.List (foldl', find)
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe (fromMaybe)
count :: (Ord a) => Map a Int -> a -> Int
count m x = fromMaybe 0 $ Map.lookup x m
add :: (Ord a) => Map a Int -> a -> Map a Int
add m x = Map.insertWith (+) x 1 m
uniq :: (Ord a) => [a] -> Maybe a
uniq xs = find (\x -> count cs x == 1) xs
where
cs = foldl' add Map.empty xs
Note that the log n factor comes from the fact that we need to operate on a Map of size n. If the list has only k unique elements then the size of our map will be at most k, so the overall complexity will be just O(n log k).
However, we can do even better - we can use a hash table instead of a map to get an O(n) solution. For this we'll need the ST monad to perform mutable operations on the hash map, and our elements will have to be Hashable. The solution is basically the same as before, just a little bit more complex due to working within the ST monad:
import Control.Monad
import Control.Monad.ST
import Data.Hashable
import qualified Data.HashTable.ST.Basic as HT
import Data.Maybe (fromMaybe)
count :: (Eq a, Hashable a) => HT.HashTable s a Int -> a -> ST s Int
count ht x = liftM (fromMaybe 0) (HT.lookup ht x)
add :: (Eq a, Hashable a) => HT.HashTable s a Int -> a -> ST s ()
add ht x = count ht x >>= HT.insert ht x . (+ 1)
uniq :: (Eq a, Hashable a) => [a] -> Maybe a
uniq xs = runST $ do
-- Count all elements into a hash table:
ht <- HT.newSized (length xs)
forM_ xs (add ht)
-- Find the first one with count 1
first (\x -> liftM (== 1) (count ht x)) xs
-- Monadic variant of find which exists once an element is found.
first :: (Monad m) => (a -> m Bool) -> [a] -> m (Maybe a)
first p = f
where
f [] = return Nothing
f (x:xs') = do
b <- p x
if b then return (Just x)
else f xs'
Notes:
If you know that there will be only a small number of distinct elements in the list, you could use HT.new instead of HT.newSized (length xs). This will save you some memory and one pass over xs but in the case of many distinct elements the hash table will be have to resized several times.
Here is a version that does the trick:
unique :: Eq a => [a] -> [a]
unique = select . collect []
where
collect acc [] = acc
collect acc (x : xs) = collect (insert x acc) xs
insert x [] = [[x]]
insert x (ys#(y : _) : yss)
| x == y = (x : ys) : yss
| otherwise = ys : insert x yss
select [] = []
select ([x] : _) = [x]
select ((_ : _) : xss) = select xss
So, first we traverse the input list (collect) while maintaining a list of buckets of equal elements that we update with insert. Then we simply select the first element that appears in a singleton bucket (select).
The bad news is that this takes quadratic time: for every visited element in collect we need to go over the list of buckets. I am afraid that is the price you will have to pay for only being able to constrain the element type to be in Eq.
Something like this look pretty good.
unique = fst . foldl' (\(a, b) c -> if (c `elem` b)
then (a, b)
else if (c `elem` a)
then (delete c a, c:b)
else (c:a, b)) ([],[])
The first element of the resulted tuple of the fold, contain what you are expecting, a list containing unique element. The second element of the tuple is the memory of the process remembered if an element has already been discarded or not.
About space performance.
As your problem is design, all the element of the list should be traversed at least one time, before a result can be display. And the internal algorithm must keep trace of discarded value in addition to the good one, but discarded value will appears only one time. Then in the worst case the required amount of memory is equal to the size of the inputted list. This sound goods as you said that expected input are small.
About time performance.
As the expected input are small and not sorted by default, trying to sort the list into the algorithm is useless, or before to apply it is useless. In fact statically we can almost said, that the extra operation to place an element at its ordered place (into the sub list a and b of the tuple (a,b)) will cost the same amount of time than to check if this element appear into the list or not.
Below a nicer and more explicit version of the foldl' one.
import Data.List (foldl', delete, elem)
unique :: Eq a => [a] -> [a]
unique = fst . foldl' algorithm ([], [])
where
algorithm (result0, memory0) current =
if (current `elem` memory0)
then (result0, memory0)
else if (current`elem` result0)
then (delete current result0, memory)
else (result, memory0)
where
result = current : result0
memory = current : memory0
Into the nested if ... then ... else ... instruction the list result is traversed twice in the worst case, this can be avoid using the following helper function.
unique' :: Eq a => [a] -> [a]
unique' = fst . foldl' algorithm ([], [])
where
algorithm (result, memory) current =
if (current `elem` memory)
then (result, memory)
else helper current result memory []
where
helper current [] [] acc = ([current], [])
helper current [] memory acc = (acc, memory)
helper current (r:rs) memory acc
| current == r = (acc ++ rs, current:memory)
| otherwise = helper current rs memory (r:acc)
But the helper can be rewrite using fold as follow, which is definitely nicer.
helper current [] _ = ([current],[])
helper current memory result =
foldl' (\(r, m) x -> if x==current
then (r, current:m)
else (current:r, m)) ([], memory) $ result

Resources