Is it bad to use toList?

Is it bad to use toList? - performance

Suppose 2 Maps
import qualified Data.Map as M
sparse1, sparse2 :: M.Map Int Float
sparse1 = M.fromList [(1,2.0),(10,3),(12,5),(100,7),(102,11)]
sparse2 = M.fromList [(2,13.0),(11,17),(12,19),(101,23),(102,29)]
How do you define an elegant function
combi :: M.Map Int Float -> M.Map Int Float -> Float
such that combi sparse1 sparse2 returns 414.0 (= 5 * 19 + 11 * 29) because 12 and 102 are the only common keys of the two maps ? There is an elegant (simple and efficient) function with lists since those would be strictly ordered:
combiList xs ys = cL xs ys 0
cL [] _ acc = acc
cL _ [] acc = acc
cL (x#(k,r):xs) (y#(k',r'):ys) acc
| k < k' = cL xs (y:ys) acc
| k == k' = cL xs ys (acc+r*r')
| k > k' = cL (x:xs) ys acc
But is
combi m1 m2 = combiList (M.toList m1) (M.toList m2)
a good idea knowing the lists are no more used in the rest of the code ? And if not, how would you efficiently write combi without toList ?

Using fold and intersectWith on the maps is a bit more elegant (and probably faster):
combi :: M.Map Int Float -> M.Map Int Float -> Float
combi x y = M.fold (+) 0 $ M.intersectionWith (*) x y
combi sparse1 sparse2 returns 414.0 as desired.
And if you care about performance, try using Data.IntMap: it should be several times faster than Data.Map here.

Related

Randomized algorithm not behaving as expected

I am implementing an approximate counting algorithm where we:
Maintain a counter X using log (log n) bits
Initialize X to 0
When an item arrives, increase X by 1 with probability (½)X
When the stream is over, output 2X − 1 so that E[2X]= n + 1
My implementation is as follows:
import System.Random
type Prob = Double
type Tosses = Int
-- * for sake of simplicity we assume 0 <= p <= 1
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= 100*p, s')
where (q,s') = randomR (1,100) s
toses :: Prob -> Tosses -> StdGen -> [(Bool,StdGen)]
toses _ 0 _ = []
toses p n s = let t#(b,s') = tos p s in t : toses p (pred n) s'
toses' :: Prob -> Tosses -> StdGen -> [Bool]
toses' p n = fmap fst . toses p n
morris :: StdGen -> [a] -> Int
morris s xs = go s xs 0 where
go _ [] n = n
go s (_:xs) n = go s' xs n' where
(h,s') = tos (0.5^n) s
n' = if h then succ n else n
main :: IO Int
main = do
s <- newStdGen
return $ morris s [1..10000]
The problem is that my X is always incorrect for any |stream| > 2, and it seems like for all StdGen and |stream| > 1000, X = 7
I tested the same algorithm in Matlab and it works there, so I assume it's either
an issue with my random number generator, or
raising 1/2 to a large n in Double
Please suggest a path forward?

The problem is actually very simple: with randomR (1,100) you preclude values within the first percent, so you have a complete cutoff at high powers of 1/2 (which all lie in that small interval). Actually a general thing: ranges should start at zero, not at one†, unless there's a specific reason.
But why even use a range of 100 in the first place? I'd just make it
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= p, s')
where (q,s') = randomR (0,1) s
†I know, Matlab gets this wrong all over the place. Just one of the many horrible things about that language.
Unrelated to your problem: as chi remarked this kind of code looks a lot nicer if you use a suitable random monad, instead of manually passing around StdGens.
import Data.Random
import Data.Random.Source.Std
type Prob = Double
tos :: Prob -> RVar Bool
tos p = do
q <- uniform 0 1
return $ q <= p
morris :: [a] -> RVar Int
morris xs = go xs 0 where
go [] n = return n
go (_:xs) n = do
h <- tos (0.5^n)
go xs $ if h then succ n else n
morrisTest :: Int -> IO Int
morrisTest n = do
runRVar (morris [1..n]) StdRandom

Haskell: comparison of techniques for generating combinations

I was doing a few of the 99 Haskell Problems earlier and I thought that exercise 27 ("write a function to enumerate the possible combinations") was interesting as it's a simple concept and it lends itself to multiple implementations.
I was curious about relative efficiency so I decided to run a couple of different implementations - results are in the table below. (For reference: Emacs bash ansi-term in LXDE (Ubuntu 14.04) running on VirtualBox; Thinkpad X220; 8gb RAM, i5 64bit 2.4ghz.)
TL;DR:
(i) Why are combination-generating techniques #7 and #8 (from the table below; code included at bottom of post) so much faster than the rest?
(ii) Also, what do the figures in the Bytes column actually represent?
(i) It's odd because function #7 works by filtering the powerset (which is waaaay larger than the combinations list); I suspect this is laziness at work, i.e., that this is the function which is most effectively exploiting the fact that we've only asked for the length of the list and not the list itself. (Also, its 'memory usage' is lower than that of the other functions, but, then again, I'm not sure exactly what memory-related stat is being shown.)
Regarding function #8: kudos to Bergi for that ridiculously fast implementation and thanks to user5402 for suggesting the addition. Still trying to wrap my ahead around the speed difference of this one.
(ii) The figures in the Bytes column are reported by GHCi after running the :set +s command; they clearly don't represent max memory usage as I only have ~25gb of RAM + free HD space.)?
Code:
import Data.List
--algorithms to generate combinations
--time required to compute the following: length $ 13 "abcdefghijklmnopqrstuvwxyz"
--(90.14 secs, 33598933424 bytes)
combDC1 :: (Eq a) => Int -> [a] -> [[a]]
combDC1 n xs = filter (/= []) $ combHelper n n xs []
combHelper :: Int -> Int -> [a] -> [a] -> [[a]]
combHelper n _ [] chosen = if length chosen == n
then [chosen]
else [[]]
combHelper n i remaining chosen
| length chosen == n = [chosen]
| n - length chosen > length remaining = [[]]
| otherwise = combHelper n (i-1) (tail remaining) ((head remaining):chosen) ++
combHelper n i (tail remaining) chosen
--(167.63 secs, 62756587760 bytes)
combSoln1 :: Int -> [a] -> [([a],[a])]
combSoln1 0 xs = [([],xs)]
combSoln1 n [] = []
combSoln1 n (x:xs) = ts ++ ds
where
ts = [ (x:ys,zs) | (ys,zs) <- combSoln1 (n-1) xs ]
ds = [ (ys,x:zs) | (ys,zs) <- combSoln1 n xs ]
--(71.40 secs, 30480652480 bytes)
combSoln2 :: Int -> [a] -> [[a]]
combSoln2 0 _ = [ [] ]
combSoln2 n xs = [ y:ys | y:xs' <- tails xs
, ys <- combSoln2 (n-1) xs']
--(83.75 secs, 46168207528 bytes)
combSoln3 :: Int -> [a] -> [[a]]
combSoln3 0 _ = return []
combSoln3 n xs = do
y:xs' <- tails xs
ys <- combSoln3 (n-1) xs'
return (y:ys)
--(92.34 secs, 40541644232 bytes)
combSoln4 :: Int -> [a] -> [[a]]
combSoln4 0 _ = [[]]
combSoln4 n xs = [ xs !! i : x | i <- [0..(length xs)-1]
, x <- combSoln4 (n-1) (drop (i+1) xs) ]
--(90.63 secs, 33058536696 bytes)
combSoln5 :: Int -> [a] -> [[a]]
combSoln5 _ [] = [[]]
combSoln5 0 _ = [[]]
combSoln5 k (x:xs) = x_start ++ others
where x_start = [ x : rest | rest <- combSoln5 (k-1) xs ]
others = if k <= length xs then combSoln5 k xs else []
--(61.74 secs, 33053297832 bytes)
combSoln6 :: Int -> [a] -> [[a]]
combSoln6 0 _ = [[]]
combSoln6 _ [] = []
combSoln6 n (x:xs) = (map (x:) (combSoln6 (n-1) xs)) ++ (combSoln6 n xs)
--(8.41 secs, 10785499208 bytes)
combSoln7 k ns = filter ((k==).length) (subsequences ns)
--(3.15 secs, 2889815872 bytes)
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])

You should also test the algorithm found in this SO answer:
subsequences of length n from list performance
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
in if n>l then [] else subsequencesBySize xs !! (l-n)
where
subsequencesBySize [] = [[[]]]
subsequencesBySize (x:xs) = let next = subsequencesBySize xs
in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])
On my machine I get the following timing and memory usage from ghci:
ghci> length $ combSoln7 13 "abcdefghijklmnopqrstuvwxyz"
10400600
(13.42 secs, 10783921008 bytes)
ghci> length $ subsequencesOfSize 13 "abcdefghijklmnopqrstuvwxyz"
10400600
(6.52 secs, 2889807480 bytes)

fact :: (Integral a) => a -> a
fact n = product [1..n]
ncombs n k = -- to evaluate number of combinations
let n' = toInteger n
k' = toInteger k
in div (fact n') ((fact k') * (fact (n' - k')))
combinations :: Int -> [a] -> [[a]]
combinations 0 xs = [[]]
combinations 1 xs = [[x] | x <- xs]
combinations n xs =
let ps = reverse [0..n - 1]
inc (p:[])
| pn < length xs = pn:[]
| otherwise = p:[]
where pn = p + 1
inc (p:ps)
| pn < length xs = pn:ps
| (head psn) < length xs = inc ((head psn):psn)
| otherwise = (p:ps)
where pn = p + 1
psn = inc ps
amount = ncombs (length xs) n
pointers = take (fromInteger amount) (iterate inc ps)
c' xs ps = map (xs!!) (reverse ps)
in map (c' xs) pointers
I am learning Haskell and found a comparably fast implementation. I had a hard time with the type system with some functions requiring Ints and some fractional numbers and some Integers. On my computer the fastest solution presented here takes about 6,1 seconds to run and mine takes 3,5 to 2,9 seconds.

Construct infinite sorted list without adding duplicates

I am relatively new to Haskell, but I am trying to learn both by reading and trying to solve problems on Project Euler. I am currently trying to implement a function that takes an infinite list of integers and returns the ordered list of pairwise sums of elements in said list. I am really looking for solutions to the specific issue I am facing, rather than advice on different strategies or approaches, but those are welcome as well, as being a coder doesn't mean knowing how to implement a strategy, but also choosing the best strategy available.
My approach relies on traversing an infinite list of infinite generators and retrieving elements in order, with several mathematical properties that are useful in implementing my solution.
If I were trying to obtain the sequence of pairwise sums of the natural numbers, for example, this would be my code:
myList :: [Integer]
myList = [1..]
myGens :: [[Integer]]
myGens = gens myList
where
gens = \xs -> map (\x -> [x+y|y<-(dropWhile (<x) xs)]) xs
Regardless of the number set used, provided that it is sorted, the following conditions hold:
∀ i ≥ 0, head (gens xs !! i) == 2*(myList !! i)
∀ i,j,k ≥ 0, l > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j+l)
Special cases for the second condition are:
∀ i,j ≥ 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+1) !! j)
∀ i,j ≥ 0, k > 0, (((gens xs) !! i) !! j) < (((gens xs) !! i+k) !! j)
Here is the particular code I am trying to modify:
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,i) = step xs cs xss
counts = inc i cs
streams = chop i xss
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,Int)
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,Int)
pace hs xs#((x,i):xt) = minim (x,i) hs xt
where
minim :: (Integer,Int) -> [Integer] -> [(Integer,Int)] -> (Integer,Int)
minim m _ [] = m
minim m#(g,i) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = y
| g > h = minim y hs ynt
| 2*(hs !! n) > g = m
| otherwise = minim m hs ynt
defer :: [Int] -> [[a]] -> [(a,Int)]
defer cs xss = (infer (zip cs (zip (map head xss) [0..])))
infer :: [(Int,(a,Int))] -> [(a,Int)]
infer [] = []
infer ((c,xi):xis) | c == 0 = xi:[]
| otherwise = xi:(infer (dropWhile (\(p,(q,r)) -> p>=c) xis))
The set in question I am using has the property that multiple distinct pairs produce an identical sum. I want an efficient method of handling all duplicate elements at once, in order to avoid an increased cost of computing all the pairwise sums up to N, as it requires M more tests if M is the number of duplicates.
Does anyone have any suggestions?
EDIT:
I made some changes to the code, independently of what was suggested, and would appreciate feedback on the relative efficiencies of my original code, my revised code, and the proposals so far.
stride :: [Integer] -> [Int] -> [[Integer]] -> [Integer]
stride xs cs xss = x : stride xs counts streams
where
(x,is) = step xs cs xss
counts = foldr (\i -> inc i) cs is
streams = foldr (\i -> chop i) xss is
step :: [Integer] -> [Int] -> [[Integer]] -> (Integer,[Int])
step xs cs xss = pace xs (defer cs xss)
pace :: [Integer] -> [(Integer,Int)] -> (Integer,[Int])
pace hs xs#((x,i):xt) = minim (x,(i:[])) hs xt
where
minim :: (Integer,[Int]) -> [Integer] -> [(Integer,Int)] -> (Integer,[Int])
minim m _ [] = m
minim m#(g,is#(i:_)) hs (y#(h,n):ynt) | g > h && 2*(hs !! n) > h = (h,[n])
| g > h = minim (h,[n]) hs ynt
| g == h && 2*(hs !! n) > h = (g,n:is)
| g == h = minim (g,n:is) hs ynt
| g < h && 2*(hs !! n) > g = m
| g < h = minim m hs ynt
Also, I left out the code for inc and chop:
alter :: (a->a) -> Int -> [a] -> [a]
alter = \f -> \n -> \xs -> (take (n) xs) ++ [f (xs !! n)] ++ (drop (n+1) xs)
inc :: Int -> [Int] -> [Int]
inc = alter (1+)
chop :: Int -> [[a]] -> [[a]]
chop = alter (tail)

I'm going to present a solution that uses an infinite pairing heap. We'll have logarithmic overhead per element constructed, but no one knows how to do better (in a model with comparison-based methods and real numbers).
The first bit of code is just the standard pairing heap.
module Queue where
import Data.Maybe (fromMaybe)
data Queue k = E
| T k [Queue k]
deriving Show
fromOrderedList :: (Ord k) => [k] -> Queue k
fromOrderedList [] = E
fromOrderedList [k] = T k []
fromOrderedList (k1 : ks'#(k2 : _ks''))
| k1 <= k2 = T k1 [fromOrderedList ks']
mergePairs :: (Ord k) => [Queue k] -> Queue k
mergePairs [] = E
mergePairs [q] = q
mergePairs (q1 : q2 : qs'') = merge (merge q1 q2) (mergePairs qs'')
merge :: (Ord k) => Queue k -> Queue k -> Queue k
merge (E) q2 = q2
merge q1 (E) = q1
merge q1#(T k1 q1's) q2#(T k2 q2's)
= if k1 <= k2 then T k1 (q2 : q1's) else T k2 (q1 : q2's)
deleteMin :: (Ord k) => Queue k -> Maybe (k, Queue k)
deleteMin (E) = Nothing
deleteMin (T k q's) = Just (k, mergePairs q's)
toOrderedList :: (Ord k) => Queue k -> [k]
toOrderedList q
= fromMaybe [] $
do (k, q') <- deleteMin q
return (k : toOrderedList q')
Note that fromOrderedList accepts infinite lists. I think that this can be justified theoretically by pretending as though the infinite list of descendants effectively are merged "just in time". This feels like the kind of thing that should be in the literature on purely functional data structures already, but I'm going to be lazy and not look right now.
The function mergeOrderedByMin takes this one step further and merges a potentially infinite list of queues, where the min element in each queue is nondecreasing. I don't think that we can reuse merge, since merge appears to be insufficiently lazy.
mergeOrderedByMin :: (Ord k) => [Queue k] -> Queue k
mergeOrderedByMin [] = E
mergeOrderedByMin (E : qs') = mergeOrderedByMin qs'
mergeOrderedByMin (T k q's : qs')
= T k (mergeOrderedByMin qs' : q's)
The next function removes duplicates from a sorted list. It's in the library that m09 suggested, but for the sake of completeness, I'll define it here.
nubOrderedList :: (Ord k) => [k] -> [k]
nubOrderedList [] = []
nubOrderedList [k] = [k]
nubOrderedList (k1 : ks'#(k2 : _ks''))
| k1 < k2 = k1 : nubOrderedList ks'
| k1 == k2 = nubOrderedList ks'
Finally, we put it all together. I'll use the squares as an example.
squares :: [Integer]
squares = map (^ 2) [0 ..]
sumsOfTwoSquares :: [Integer]
sumsOfTwoSquares
= nubOrderedList $ toOrderedList $
mergeOrderedByMin
[fromOrderedList (map (s +) squares) | s <- squares]

If you don't want to modify your code that much, you can use the nub function of Data.List.Ordered (installable by cabal install data-ordlist) to filter duplicates out.
It runs in linear time, ie complexity wise your algorithm won't change.

for your example [1..] the result is just [2..]. A "very smart compiler" could deduce this from the general solution with implicit heap, that follows.
gens xs is better expressed as
gens xs = map (\t#(x:_) -> map (x+) t) $ tails xs -- or should it be
-- map (\(x:ys) -> map (x+) ys) $ tails xs -- ?
Its resulting list of lists is easily merged without duplicates by tree-like folding1 (pictured here), with
pairsums xs = foldi (\(x:l) r-> x : union l r) $ gens xs
This assumes the input list is ordered in increasing order. If it's merely in non-decreasing order (with only finite runs of equals in it, of course), you'll need to slap an orderedNub on top of that (as m09 mentions),
pairsums' = orderedNub . pairsums
Just by using foldi where foldr would work, we often get an algorithmic improvement in complexity from a factor of n to log n, a pretty significant speedup. I use it as a general tool all the time.
1The code, adjusted for infinite lists only:
foldi f (x:xs) = f x (foldi f (pairs f xs))
pairs f (x:y:t) = f x y : pairs f t
union (x:xs) (y:ys) = case compare x y of
LT -> x : union xs (y:ys)
EQ -> x : union xs ys
GT -> y : union (x:xs) ys
See also:
mergesort as foldtree (by Heinrich Apfelmus)
infinite tree folding (by Dave Bayer)
Implicit Heap (by apfelmus)

I propose to build the pairs above the diagonal, that way a lot of duplicates are not even generated:
sums xs = zipWith (map . (+)) hs ts where
(hs:ts) = tails xs
Now you have a list of lists, each containing sorted sums. Because they are sorted, it is possible to determine the next element of the sequence in a finite number of steps:
filtermerge :: (Ord a) => [[a]]->[a]
filtermerge ((h:t):ts) = h : filtermerge (insert t ts) where
insert [] ts = ts
insert xs [] = [xs]
insert h ([]:t) = insert h t
insert (h:t) ts#((h1:t1):t2)
| h < h1 = (h:t):ts
| h == h1 = insert (h:t) $ insert t1 t2
| otherwise = insert (h1:t1) $ insert (h:t) t2
filtermerge _ = []

How can I remove the first apperance of a number in a list? Haskell

I need to make a function that takes a list and an element and returns a list in which the first occurrence of the element is removed: something like
removeFst [1,5,2,3,5,3,4,5,6] 5
[1,2,3,5,3,4,5,6]
What I tried is:
main :: IO()
main = do
putStr ( show $ removeFst [1,5,2,3,5,3,4,5,6] 5)
removeFst :: [Int] -> Int -> [Int]
removeFst [] m = []
removeFst [x] m
| x == m = []
| otherwise = [x]
removeFst (x:xs) m
| x == m = xs
| otherwise = removeFst xs m
But this doesn't work... it returns the list without the first elements. I think I should make the recursive call to make the list something like:
removeFst (x:xs) m
| x == m = xs
| otherwise = removeFst (-- return the whole list till element x) m

You are very close, what you miss is prepending the elements before the first found m to the result list,
removeFst :: [Int] -> Int -> [Int]
removeFst [] m = []
removeFst (x:xs) m
| x == m = xs
| otherwise = x : removeFst xs m
-- ^^^ keep x /= m
Note that the special case for one-element lists is superfluous.
Also note that removeFst = flip delete with delete from Data.List.

It should be mentioned that your function is equivalent to Data.List.delete.
Here another version:
import Data.List
removeFst xs x = front ++ drop 1 back where
(front, back) = break (==x) xs

First non-repeating char in a string ? in haskell or F#

Given a sequence of char what is the most efficient way to find the first non repeating char
Interested purely functional implementation haskell or F# preffered.

A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)
The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.

A simple F# solution:
let f (s: string) =
let n = Map(Seq.countBy id s)
Seq.find (fun c -> n.[c] = 1) s

Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.
open System
open System.IO
open System.Collections.Generic
let Solve (str : string) =
let arrStr = str.ToCharArray()
let sorted = Array.sort arrStr
let len = str.Length - 1
let rec Inner i =
if i = len + 1 then
'-'
else
let index = Array.BinarySearch(sorted, arrStr.[i])
if index = 0 && sorted.[index+1] <> sorted.[index] then
arrStr.[i]
elif index = len && sorted.[index-1] <> sorted.[index] then
arrStr.[i]
elif index > 0 && index < len &&
sorted.[index+1] <> sorted.[index] &&
sorted.[index-1] <> sorted.[index] then
arrStr.[i]
else
Inner (i + 1)
Inner 0
let _ =
printfn "%c" (Solve "abcdefabcf")
A - means all characters are repeated.
Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.

Here's a bit longish solution, but guaranteed to be worst-case O(n log n):
import List
import Data.Ord.comparing
sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)
index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]
dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
dropRepeated $ dropWhile ((==(fst x)).fst) xs
| otherwise =
x:(dropRepeated xs)
nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index
firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs
The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.
The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).
Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...

An alternate Haskell O(n log n) solution using Data.Map and no sorting:
module NonRepeat (
firstNonRepeat
)
where
import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)
data Occurance = Occ { first :: Int, count :: Int }
deriving (Eq, Ord)
note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)
combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)
firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
where occurances = toList . fromListWith combine . zipWith note [0..]
findMinimum = safeMinimum . filter ((== 1).count.snd)
safeMinimum [] = Nothing
safeMinimum xs = Just $ minimumBy (comparing snd) xs

let firstNonRepeating (str:string) =
let rec inner i cMap =
if i = str.Length then
cMap
|> Map.filter (fun c (count, index) -> count = 1)
|> Map.toSeq
|> Seq.minBy (fun (c, (count, index)) -> index)
|> fst
else
let c = str.[i]
let value = if cMap.ContainsKey c then
let (count, index) = cMap.[c]
(count + 1, index)
else
(1, i)
let cMap = cMap.Add(c, value)
inner (i + 1) cMap
inner 0 (Map.empty)
Here is a simpler version that sacrifices speed.
let firstNonRepeating (str:string) =
let (c, count) = str
|> Seq.countBy (fun c -> c)
|> Seq.minBy (fun (c, count) -> count)
if count = 1 then Some c else None

How about something like this:
let firstNonRepeat s =
let repeats =
((Set.empty, Set.empty), s)
||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
|> snd
s
|> Seq.tryFind (fun c -> not (Set.contains c repeats))

This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):
static char FstNonRepeatedChar(string s)
{
return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Is it bad to use toList? - performance

Related

Randomized algorithm not behaving as expected

Haskell: comparison of techniques for generating combinations

Construct infinite sorted list without adding duplicates

How can I remove the first apperance of a number in a list? Haskell

First non-repeating char in a string ? in haskell or F#

Categories

Resources