Why is my Haskell selection sort implementation extremely fast? - performance

I implemented selection sort and compared it to Data.List's sort. It is orders of magnitudes faster than Data.List's sort. If I apply it to 10,000 randomly generated numbers the results are as follows:
✓ in 1.22µs: Selection sort
✓ in 9.84ms: Merge sort (Data.List)
This can't be right. First I thought maybe merge sort's intermediate results are cached and selection sort uses those to be much faster. Even when I comment out merge sort and only time selection sort, it is this fast however. I also verified the output and it is correctly sorted.
What causes this behaviour?
I use this code to test:
{-# LANGUAGE BangPatterns #-}
module Lib
( testSortingAlgorithms
) where
import System.Random (randomRIO)
import Text.Printf
import Control.Exception
import System.CPUTime
import Data.List (sort, sortOn)
selectionSort :: Ord a => [a] -> [a]
selectionSort [] = []
selectionSort nrs =
let (smallest, rest) = getSmallest nrs
in smallest : selectionSort rest
where getSmallest :: Ord a => [a] -> (a, [a])
getSmallest [a] = (a, [])
getSmallest (a:as) = let (smallest, rest) = getSmallest as
in if smallest > a then (a, smallest : rest)
else (smallest, a : rest)
main :: IO ()
main = testSortingAlgorithms
testSortingAlgorithms :: IO ()
testSortingAlgorithms = do
!list' <- list (10000)
results <- mapM (timeIt list') sorts
let results' = sortOn fst results
mapM_ (\(diff, msg) -> printf (msg) (diff::Double)) results'
return ()
sorts :: Ord a => [(String, [a] -> [a])]
sorts = [
("Selection sort", selectionSort)
, ("Merge sort (Data.List)", sort)
]
list :: Int -> IO [Int]
list n = sequence $ replicate n $ randomRIO (-127,127::Int)
timeIt :: (Ord a, Show a)
=> [a] -> (String, [a] -> [a]) -> IO (Double, [Char])
timeIt vals (name, sorter) = do
start <- getCPUTime
--v <- sorter vals `seq` return ()
let !v = sorter vals
--putStrLn $ show v
end <- getCPUTime
let (diff, ext) = unit $ (fromIntegral (end - start)) / (10^3)
let msg = if correct v
then (" ✓ in %0.2f" ++ ext ++ ": " ++ name ++ "\n")
else (" ✗ in %0.2f" ++ ext ++ ": " ++ name ++ "\n")
return (diff, msg)
correct :: (Ord a) => [a] -> Bool
correct [] = True
correct (a:[]) = True
correct (a1:a2:as) = a1 <= a2 && correct (a2:as)
unit :: Double -> (Double, String)
unit v | v < 10^3 = (v, "ns")
| v < 10^6 = (v / 10^3, "µs")
| v < 10^9 = (v / 10^6, "ms")
| otherwise = (v / 10^9, "s")

You write
let !v = sorter vals
which is "strict", but only to WHNF. So you are only timing how long it takes to find the smallest element of the list, not how long it takes to sort the whole thing. Selection sort starts by doing exactly that, so it is "optimal" for this incorrect benchmark, while mergesort does a bunch more work that's "wasted" if you only look at the first element.

Related

pure Knuth/Fisher-Yates shuffle in haskell

In go I could write a function like this:
func pureFisherYates(s []int, swaps []int) []int {
newS := copy(s)
for i, _ := range newS {
for _, j := range swaps {
newS[i], newS[j] = newS[j], newS[i]
}
}
}
To me this seems like a pure function. It always returns the same output given the same input, and it doesn't mutate the state of the world (except in some strict sense the same way that any other function does, taking up cpu resources, creating heat energy, etc). Yet whenever I look for how to do pure shuffling I find stuff like this, and whenever I look for specifically Haskell implementation Fisher-Yates I either get an 0^2 Fisher-Yates implemented with a list or a [a] -> IO [a] implementation. Does there exist a [a] -> [a] O(n) shuffle and if not why is my above go implementation impure.
The ST monad allows exactly such encapsulated mutability, and Data.Array.ST contains arrays which can be mutated in ST and then an immutable version returned outside.
https://wiki.haskell.org/Random_shuffle gives two implementations of Fisher-Yates shuffle using ST. They aren't literally [a] -> [a], but that's because random number generation needs to be handled as well:
import System.Random
import Data.Array.ST
import Control.Monad
import Control.Monad.ST
import Data.STRef
-- | Randomly shuffle a list without the IO Monad
-- /O(N)/
shuffle' :: [a] -> StdGen -> ([a],StdGen)
shuffle' xs gen = runST (do
g <- newSTRef gen
let randomRST lohi = do
(a,s') <- liftM (randomR lohi) (readSTRef g)
writeSTRef g s'
return a
ar <- newArray n xs
xs' <- forM [1..n] $ \i -> do
j <- randomRST (i,n)
vi <- readArray ar i
vj <- readArray ar j
writeArray ar j vi
return vj
gen' <- readSTRef g
return (xs',gen'))
where
n = length xs
newArray :: Int -> [a] -> ST s (STArray s Int a)
newArray n xs = newListArray (1,n) xs
and
import Control.Monad
import Control.Monad.ST
import Control.Monad.Random
import System.Random
import Data.Array.ST
import GHC.Arr
shuffle :: RandomGen g => [a] -> Rand g [a]
shuffle xs = do
let l = length xs
rands <- forM [0..(l-2)] $ \i -> getRandomR (i, l-1)
let ar = runSTArray $ do
ar <- thawSTArray $ listArray (0, l-1) xs
forM_ (zip [0..] rands) $ \(i, j) -> do
vi <- readSTArray ar i
vj <- readSTArray ar j
writeSTArray ar j vi
writeSTArray ar i vj
return ar
return (elems ar)
*Main> evalRandIO (shuffle [1..10])
[6,5,1,7,10,4,9,2,8,3]
EDIT: with a fixed swaps argument as in your Go code, the code is quite simple
{-# LANGUAGE ScopedTypeVariables #-}
import Data.Array.ST
import Data.Foldable
import Control.Monad.ST
shuffle :: forall a. [a] -> [Int] -> [a]
shuffle xs swaps = runST $ do
let n = length xs
ar <- newListArray (1,n) xs :: ST s (STArray s Int a)
for_ [1..n] $ \i ->
for_ swaps $ \j -> do
vi <- readArray ar i
vj <- readArray ar j
writeArray ar j vi
writeArray ar i vj
getElems ar
but I am not sure you can reasonably call it Fisher-Yates shuffle.

Haskell: Sort an almost-sorted array

I've been learning Haskell in my spare time working through LYAH. Would like to improve upon my Haskell (/ Functional programming) skills by solving some problems from the imperative world. One of the problems from EPI is to print an "almost sorted array", in a sorted fashion where it is guaranteed that no element in the array is more than k away from its correct position. The input is a stream of elements and the requirement is to do this in O(n log k) time complexity and O(k) space complexity.
I've attempted to re-implement the imperative solution in Haskell as follows:
import qualified Data.Heap as Heap
-- print the k-sorted list in a sorted fashion
ksorted :: (Ord a, Show a) => [a] -> Int -> IO ()
ksorted [] _ = return ()
ksorted xs k = do
heap <- ksorted' xs Heap.empty
mapM_ print $ (Heap.toAscList heap) -- print the remaining elements in the heap.
where
ksorted' :: (Ord a, Show a) => [a] -> Heap.MinHeap a -> IO (Heap.MinHeap a)
ksorted' [] h = return h
ksorted' (x:xs) h = do let (m, h') = getMinAndBuildHeap h x in
(printMin m >> ksorted' xs h')
printMin :: (Show a) => Maybe a -> IO ()
printMin m = case m of
Nothing -> return ()
(Just item) -> print item
getMinAndBuildHeap :: (Ord a, Show a) => Heap.MinHeap a -> a -> (Maybe a, Heap.MinHeap a)
getMinAndBuildHeap h item= if (Heap.size h) > k
then ((Heap.viewHead h), (Heap.insert item (Heap.drop 1 h)))
else (Nothing, (Heap.insert item h))
I would like to know a better way of solving this in Haskell. Any inputs would be appreciated.
[Edit 1]: The input is stream, but for now I assumed a list instead (with only a forward iterator/ input iterator in some sense.)
[Edit 2]: added Data.Heap import to the code.
Thanks.
I think the main improvement is to separate the production of the sorted list from the printing of the sorted list. So:
import Data.Heap (MinHeap)
import qualified Data.Heap as Heap
ksort :: Ord a => Int -> [a] -> [a]
ksort k xs = go (Heap.fromList b) e where
(b, e) = splitAt (k-1) xs
go :: Ord a => MinHeap a -> [a] -> [a]
go heap [] = Heap.toAscList heap
go heap (x:xs) = x' : go heap' xs where
Just (x', heap') = Heap.view (Heap.insert x heap)
printKSorted :: (Ord a, Show a) => Int -> [a] -> IO ()
printKSorted k xs = mapM_ print (ksort k xs)
If I were feeling extra-special-fancy, I might try to turn go into a foldr or perhaps a mapAccumR, but in this case I think the explicit recursion is relatively readable, too.

Haskell performance using dynamic programming

I am attempting to calculate the Levenshtein distance between two strings using dynamic programming. This is being done through Hackerrank, so I have timing constraints. I used a techenique I saw in: How are Dynamic Programming algorithms implemented in idiomatic Haskell? and it seems to be working. Unfortunaly, it is timing out in one test case. I do not have access to the specific test case, so I don't know the exact size of the input.
import Control.Monad
import Data.Array.IArray
import Data.Array.Unboxed
main = do
n <- readLn
replicateM_ n $ do
s1 <- getLine
s2 <- getLine
print $ editDistance s1 s2
editDistance :: String -> String -> Int
editDistance s1 s2 = dynamic editDistance' (length s1, length s2)
where
s1' :: UArray Int Char
s1' = listArray (1,length s1) s1
s2' :: UArray Int Char
s2' = listArray (1,length s2) s2
editDistance' table (i,j)
| min i j == 0 = max i j
| otherwise = min' (table!((i-1),j) + 1) (table!(i,(j-1)) + 1) (table!((i-1),(j-1)) + cost)
where
cost = if s1'!i == s2'!j then 0 else 1
min' a b = min (min a b)
dynamic :: (Array (Int,Int) Int -> (Int,Int) -> Int) -> (Int,Int) -> Int
dynamic compute (xBnd, yBnd) = table!(xBnd,yBnd)
where
table = newTable $ map (\coord -> (coord, compute table coord)) [(x,y) | x<-[0..xBnd], y<-[0..yBnd]]
newTable xs = array ((0,0),fst (last xs)) xs
I've switched to using arrays, but that speed up was insufficient. I cannot use Unboxed arrays, because this code relies on laziness. Are there any glaring performance mistakes I have made? Or how else can I speed it up?
The backward equations for edit distance calculations are:
f(i, j) = minimum [
1 + f(i + 1, j), -- delete from the 1st string
1 + f(i, j + 1), -- delete from the 2nd string
f(i + 1, j + 1) + if a(i) == b(j) then 0 else 1 -- substitute or match
]
So within each dimension, you need nothing more than the very next index: + 1. This is a sequential access pattern, not random access to require arrays; and can be implemented using lists and nested right folds:
editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = head . foldr loop [n, n - 1..0] $ zip a [m, m - 1..]
where
(m, n) = (length a, length b)
loop (s, l) lst = foldr go [l] $ zip3 b lst (tail lst)
where
go (t, i, j) acc#(k:_) = inc `seq` inc:acc
where inc = minimum [i + 1, k + 1, if s == t then j else j + 1]
You may test this code in Hackerrank Edit Distance Problem as in:
import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
import Text.Read (readMaybe)
editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = ... -- as implemented above
main :: IO ()
main = do
Just n <- readMaybe <$> getLine
replicateM_ n $ do
a <- getLine
b <- getLine
print $ editDistance a b
which passes all tests with a decent performance.

Construct infinite sorted list without adding duplicates

I am relatively new to Haskell, but I am trying to learn both by reading and trying to solve problems on Project Euler. I am currently trying to implement a function that takes an infinite list of integers and returns the ordered list of pairwise sums of elements in said list. I am really looking for solutions to the specific issue I am facing, rather than advice on different strategies or approaches, but those are welcome as well, as being a coder doesn't mean knowing how to implement a strategy, but also choosing the best strategy available.
My approach relies on traversing an infinite list of infinite generators and retrieving elements in order, with several mathematical properties that are useful in implementing my solution.
If I were trying to obtain the sequence of pairwise sums of the natural numbers, for example, this would be my code:
myList :: [Integer]
myList = [1..]
myGens :: [[Integer]]
myGens = gens myList
where
gens = \xs -> map (\x -> [x+y|y<-(dropWhile (<x) xs)]) xs
Regardless of the number set used, provided that it is sorted, the following conditions hold:
∀ i ≥ 0, head (gens xs !! i) == 2*(myList !! i)
∀ i,j,k ≥ 0, l > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j+l)
Special cases for the second condition are:
∀ i,j ≥ 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+1) !! j)
∀ i,j ≥ 0, k > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j)
Here is the particular code I am trying to modify:
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,i) = step xs cs xss
counts = inc i cs
streams = chop i xss
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,Int)
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,Int)
pace hs xs#((x,i):xt) = minim (x,i) hs xt
where
minim :: (Integer,Int) -> [Integer] -> [(Integer,Int)] -> (Integer,Int)
minim m _ [] = m
minim m#(g,i) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = y
| g > h = minim y hs ynt
| 2*(hs !! n) > g = m
| otherwise = minim m hs ynt
defer :: [Int] -> [[a]] -> [(a,Int)]
defer cs xss = (infer (zip cs (zip (map head xss) [0..])))
infer :: [(Int,(a,Int))] -> [(a,Int)]
infer [] = []
infer ((c,xi):xis) | c == 0 = xi:[]
| otherwise = xi:(infer (dropWhile (\(p,(q,r)) -> p>=c) xis))
The set in question I am using has the property that multiple distinct pairs produce an identical sum. I want an efficient method of handling all duplicate elements at once, in order to avoid an increased cost of computing all the pairwise sums up to N, as it requires M more tests if M is the number of duplicates.
Does anyone have any suggestions?
EDIT:
I made some changes to the code, independently of what was suggested, and would appreciate feedback on the relative efficiencies of my original code, my revised code, and the proposals so far.
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,is) = step xs cs xss
counts = foldr (\i -> inc i) cs is
streams = foldr (\i -> chop i) xss is
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,[Int])
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,[Int])
pace hs xs#((x,i):xt) = minim (x,(i:[])) hs xt
where
minim :: (Integer,[Int]) -> [Integer] -> [(Integer,Int)] -> (Integer,[Int])
minim m _ [] = m
minim m#(g,is#(i:_)) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = (h,[n])
| g > h = minim (h,[n]) hs ynt
| g == h && 2*(hs !! n) > h = (g,n:is)
| g == h = minim (g,n:is) hs ynt
| g < h && 2*(hs !! n) > g = m
| g < h = minim m hs ynt
Also, I left out the code for inc and chop:
alter :: (a->a) -> Int -> [a] -> [a]
alter = \f -> \n -> \xs -> (take (n) xs) ++ [f (xs !! n)] ++ (drop (n+1) xs)
inc :: Int -> [Int] -> [Int]
inc = alter (1+)
chop :: Int -> [[a]] -> [[a]]
chop = alter (tail)
I'm going to present a solution that uses an infinite pairing heap. We'll have logarithmic overhead per element constructed, but no one knows how to do better (in a model with comparison-based methods and real numbers).
The first bit of code is just the standard pairing heap.
module Queue where
import Data.Maybe (fromMaybe)
data Queue k = E
| T k [Queue k]
deriving Show
fromOrderedList :: (Ord k) => [k] -> Queue k
fromOrderedList [] = E
fromOrderedList [k] = T k []
fromOrderedList (k1 : ks'#(k2 : _ks''))
| k1 <= k2 = T k1 [fromOrderedList ks']
mergePairs :: (Ord k) => [Queue k] -> Queue k
mergePairs [] = E
mergePairs [q] = q
mergePairs (q1 : q2 : qs'') = merge (merge q1 q2) (mergePairs qs'')
merge :: (Ord k) => Queue k -> Queue k -> Queue k
merge (E) q2 = q2
merge q1 (E) = q1
merge q1#(T k1 q1's) q2#(T k2 q2's)
= if k1 <= k2 then T k1 (q2 : q1's) else T k2 (q1 : q2's)
deleteMin :: (Ord k) => Queue k -> Maybe (k, Queue k)
deleteMin (E) = Nothing
deleteMin (T k q's) = Just (k, mergePairs q's)
toOrderedList :: (Ord k) => Queue k -> [k]
toOrderedList q
= fromMaybe [] $
do (k, q') <- deleteMin q
return (k : toOrderedList q')
Note that fromOrderedList accepts infinite lists. I think that this can be justified theoretically by pretending as though the infinite list of descendants effectively are merged "just in time". This feels like the kind of thing that should be in the literature on purely functional data structures already, but I'm going to be lazy and not look right now.
The function mergeOrderedByMin takes this one step further and merges a potentially infinite list of queues, where the min element in each queue is nondecreasing. I don't think that we can reuse merge, since merge appears to be insufficiently lazy.
mergeOrderedByMin :: (Ord k) => [Queue k] -> Queue k
mergeOrderedByMin [] = E
mergeOrderedByMin (E : qs') = mergeOrderedByMin qs'
mergeOrderedByMin (T k q's : qs')
= T k (mergeOrderedByMin qs' : q's)
The next function removes duplicates from a sorted list. It's in the library that m09 suggested, but for the sake of completeness, I'll define it here.
nubOrderedList :: (Ord k) => [k] -> [k]
nubOrderedList [] = []
nubOrderedList [k] = [k]
nubOrderedList (k1 : ks'#(k2 : _ks''))
| k1 < k2 = k1 : nubOrderedList ks'
| k1 == k2 = nubOrderedList ks'
Finally, we put it all together. I'll use the squares as an example.
squares :: [Integer]
squares = map (^ 2) [0 ..]
sumsOfTwoSquares :: [Integer]
sumsOfTwoSquares
= nubOrderedList $ toOrderedList $
mergeOrderedByMin
[fromOrderedList (map (s +) squares) | s <- squares]
If you don't want to modify your code that much, you can use the nub function of Data.List.Ordered (installable by cabal install data-ordlist) to filter duplicates out.
It runs in linear time, ie complexity wise your algorithm won't change.
for your example [1..] the result is just [2..]. A "very smart compiler" could deduce this from the general solution with implicit heap, that follows.
gens xs is better expressed as
gens xs = map (\t#(x:_) -> map (x+) t) $ tails xs -- or should it be
-- map (\(x:ys) -> map (x+) ys) $ tails xs -- ?
Its resulting list of lists is easily merged without duplicates by tree-like folding1 (pictured here), with
pairsums xs = foldi (\(x:l) r-> x : union l r) $ gens xs
This assumes the input list is ordered in increasing order. If it's merely in non-decreasing order (with only finite runs of equals in it, of course), you'll need to slap an orderedNub on top of that (as m09 mentions),
pairsums' = orderedNub . pairsums
Just by using foldi where foldr would work, we often get an algorithmic improvement in complexity from a factor of n to log n, a pretty significant speedup. I use it as a general tool all the time.
1The code, adjusted for infinite lists only:
foldi f (x:xs) = f x (foldi f (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
union (x:xs) (y:ys) = case compare x y of
LT -> x : union xs (y:ys)
EQ -> x : union xs ys
GT -> y : union (x:xs) ys
See also:
mergesort as foldtree (by Heinrich Apfelmus)
infinite tree folding (by Dave Bayer)
Implicit Heap (by apfelmus)
I propose to build the pairs above the diagonal, that way a lot of duplicates are not even generated:
sums xs = zipWith (map . (+)) hs ts where
(hs:ts) = tails xs
Now you have a list of lists, each containing sorted sums. Because they are sorted, it is possible to determine the next element of the sequence in a finite number of steps:
filtermerge :: (Ord a) => [[a]]->[a]
filtermerge ((h:t):ts) = h : filtermerge (insert t ts) where
insert [] ts = ts
insert xs [] = [xs]
insert h ([]:t) = insert h t
insert (h:t) ts#((h1:t1):t2)
| h < h1 = (h:t):ts
| h == h1 = insert (h:t) $ insert t1 t2
| otherwise = insert (h1:t1) $ insert (h:t) t2
filtermerge _ = []

First non-repeating char in a string ? in haskell or F#

Given a sequence of char what is the most efficient way to find the first non repeating char
Interested purely functional implementation haskell or F# preffered.
A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)
The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.
A simple F# solution:
let f (s: string) =
let n = Map(Seq.countBy id s)
Seq.find (fun c -> n.[c] = 1) s
Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.
open System
open System.IO
open System.Collections.Generic
let Solve (str : string) =
let arrStr = str.ToCharArray()
let sorted = Array.sort arrStr
let len = str.Length - 1
let rec Inner i =
if i = len + 1 then
'-'
else
let index = Array.BinarySearch(sorted, arrStr.[i])
if index = 0 && sorted.[index+1] <> sorted.[index] then
arrStr.[i]
elif index = len && sorted.[index-1] <> sorted.[index] then
arrStr.[i]
elif index > 0 && index < len &&
sorted.[index+1] <> sorted.[index] &&
sorted.[index-1] <> sorted.[index] then
arrStr.[i]
else
Inner (i + 1)
Inner 0
let _ =
printfn "%c" (Solve "abcdefabcf")
A - means all characters are repeated.
Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.
Here's a bit longish solution, but guaranteed to be worst-case O(n log n):
import List
import Data.Ord.comparing
sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)
index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]
dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
dropRepeated $ dropWhile ((==(fst x)).fst) xs
| otherwise =
x:(dropRepeated xs)
nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index
firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs
The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.
The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).
Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...
An alternate Haskell O(n log n) solution using Data.Map and no sorting:
module NonRepeat (
firstNonRepeat
)
where
import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)
data Occurance = Occ { first :: Int, count :: Int }
deriving (Eq, Ord)
note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)
combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)
firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
where occurances = toList . fromListWith combine . zipWith note [0..]
findMinimum = safeMinimum . filter ((== 1).count.snd)
safeMinimum [] = Nothing
safeMinimum xs = Just $ minimumBy (comparing snd) xs
let firstNonRepeating (str:string) =
let rec inner i cMap =
if i = str.Length then
cMap
|> Map.filter (fun c (count, index) -> count = 1)
|> Map.toSeq
|> Seq.minBy (fun (c, (count, index)) -> index)
|> fst
else
let c = str.[i]
let value = if cMap.ContainsKey c then
let (count, index) = cMap.[c]
(count + 1, index)
else
(1, i)
let cMap = cMap.Add(c, value)
inner (i + 1) cMap
inner 0 (Map.empty)
Here is a simpler version that sacrifices speed.
let firstNonRepeating (str:string) =
let (c, count) = str
|> Seq.countBy (fun c -> c)
|> Seq.minBy (fun (c, count) -> count)
if count = 1 then Some c else None
How about something like this:
let firstNonRepeat s =
let repeats =
((Set.empty, Set.empty), s)
||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
|> snd
s
|> Seq.tryFind (fun c -> not (Set.contains c repeats))
This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):
static char FstNonRepeatedChar(string s)
{
return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}

Resources