Functional Relational Operators, join-like, outside of queries - linq

Let's say I have two ordered collections of key-value-pairs, and I need to perform an inner equi-join on them to produce pairs of values: seq<'a * 'b> -> seq<'a * 'c> -> seq<'b * 'c>.
This is easily enough wired to the extension methods of System.Linq.Enumerable.
let foo abs acs =
System.Linq.Enumerable.Join(abs, acs,
(fun ab -> fst ab), (fun ac -> fst ac),
fun ab ac -> snd ab, snd ac )
foo ["A",1;"B",2] ["B",1;"C",2;"B",3]
// val it : System.Collections.Generic.IEnumerable<int * int> =
// seq [(2, 1); (2, 3)]
Translation of that functionality to F# is not that straightforward. To avoid filtering on the Cartesian product, I'm probably loosing some laziness/execution on demand.
let bar abs acs =
let d = dict(Seq.groupBy fst acs) in seq{
for a, b in abs do
let ok, cs = d.TryGetValue a
if ok then for c in cs -> b, snd c }
bar ["A",1;"B",2] ["B",1;"C",2;"B",3]
// val it : seq<int * int> = seq [(2, 1); (2, 3)]
Yet it feels crufty, like a hand-forged, bespoke solution to a particular sub-problem. How can I approach relational operators, possibly also natural, semi-, anti- or outer joins, in a more general, abstract way?

You can use a query expression to achieve the same thing.
query {
for a in ["A",1;"B",2] do
join b in ["B",1;"C",2;"B",3] on (fst a = fst b)
select (snd a, snd b) };;
val it : seq<int * int> = seq [(2, 1); (2, 3)]
Alternatively, there is nothing wrong with just using System.Linq.Enumerable.Join.

Related

How can I follow F# Lint's suggestion to use `id`

I am comparing two lists of thangs. Since I'm more familiar with Linq than F#, I did this:
let r1 = (rows1.Zip (rows2, fun r1 r2 -> rowComparer r1 r2)) .All (fun f -> f)
This raises two complaints from the F# linter.
Lint: If `rowComparer` has no mutable arguments partially applied then the lambda can be removed.
Lint: `fun x -> x` might be able to be refactored into `id`.
Of these, I could understand the latter, and tried this:
let r1 = (rows1.Zip (rows2, fun r1 r2 -> rowComparer r1 r2)) .All id
But this made the F# compiler complain:
This expression was expected to have type
'System.Func<bool,bool>'
but here has type
''a -> 'a'
Can someone say how this code can be more righteous?
I would suggest using the F# List or Seq modules instead of LINQ methods. Then you'll be able to use F# types like 'a -> 'a instead of System.Func<'a, 'a>, and you can pass id to the forAll function. If you could post a complete example, it would be easier to give you a complete answer, but I think something like this would be roughly equivalent to what you're doing with LINQ:
let compare (rowComparer: ('a * 'a) -> bool) rows =
Seq.zip rows >> Seq.map rowComparer >> Seq.forall id
This creates a function that takes two sequences and compares each value in the first to the corresponding value in the second, generating a sequence of booleans. It then returns true if all of the values in the sequence are true, otherwise it returns false. This is achieved using function composition and partial application to build a new function with the required signature.
You can then partially apply a row comparer function to create a specialized compare function for each of your scenarios, as follows:
let compareEqual = compare (fun (a,b) -> a = b)
compareEqual [0; 1; 2] [0; 1; 2] // true
compareEqual [0; 1; 2] [2; 1; 2] // false
You can supply the standard function id as an argument if you create an instance of System.Func with the correct number of generic type parameters from it. When employing a lambda expression, the F# compiler does that for you.
open System.Linq
let foo rowComparer (rows1 : seq<_>) (rows2 : seq<_>) =
(rows1.Zip (rows2, fun r1 r2 -> rowComparer r1 r2)).All(System.Func<_,_>(id))
// val foo :
// rowComparer:('a -> 'b -> bool) -> rows1:seq<'a> -> rows2:seq<'b> -> bool

Haskell groupBy depending on accumulator value

I have a list of pairs of views which represents list of content labels and their widths which I want to group in lines (if the next content label doesn't fit in line then put it into another line). So we have: viewList = [(View1, 45), (View2, 223.5), (View3, 14) (View4, 42)].
I want to write a function groupViews :: [a] -> [[a]] to group this list into a list of sublists where each sublist will contain only views with sum of widths less than the maximum specified width (let's say 250).
So for a sorted viewList this function will return : [[(View3, 14), (View4, 42), (View1, 45)],[(View2, 223.5)]]
It looks similar to groupBy. However, groupBy doesn't maintain an accumulator. I tried to use scanl + takeWhile(<250) combination but in this case I was able to receive only first valid sublist. Maybe use iterate + scanl + takeWhile somehow? But this looks very cumbersome and not functional at all. Any help will be much appreciated.
I would start with a recursive definition like this:
groupViews :: Double -> (a -> Double) -> [a] -> [[a]]
groupViews maxWidth width = go (0, [[]])
where
go (current, acc : accs) (view : views)
| current + width view <= maxWidth
= go (current + width view, (view : acc) : accs) views
| otherwise = go (width view, [view] : acc : accs) views
go (_, accs) []
= reverse $ map reverse accs
Invoked like groupViews 250 snd (sortOn snd viewList). The first thing I notice is that it can be represented as a left fold:
groupViews' maxWidth width
= reverse . map reverse . snd . foldl' go (0, [[]])
where
go (current, acc : accs) view
| current + width view <= maxWidth
= (current + width view, (view : acc) : accs)
| otherwise
= (width view, [view] : acc : accs)
I think this is fine, though you could factor it further if you like, into one scan to accumulate the widths modulo the max width, and another pass to group the elements into ascending runs. For example, here’s a version that works on integer widths:
groupViews'' maxWidth width views
= map fst
$ groupBy ((<) `on` snd)
$ zip views
$ drop 1
$ scanl (\ current view -> (current + width view) `mod` maxWidth) 0 views
And of course you can include the sort in these definitions instead of passing the sorted list from outside.
I don't know a clever way to do this just by combining functions from the standard library, but I do think you can do better than just implementing it from scratch.
This problem fits into a class of problems that I've seen before: "batch up items from this list somehow, and combine its items into batches according to some combination rule and some rule for deciding when a batch is too big". Years ago, when I was writing Clojure, I built a function that abstracted out this idea of batched combinations, just asking you to specify the rules for batching, and was able to use it in a surprising number of places.
Here's how I think it might be reimagined in Haskell:
glue :: Monoid a => (a -> Bool) -> [a] -> [a]
glue tooBig = go mempty
where go current [] = [current]
go current (x:xs) | tooBig x' = current : go x xs
| otherwise = go x' xs
where x' = current `mappend` x
If you had such a glue function already, you could build a simple data type with the appropriate Monoid instance (a list of objects and their cumulative sum), and then let glue do the heavy lifting:
import Data.Monoid (Sum(..))
data ViewGroup contents size = ViewGroup {totalSize :: size,
elements :: [(contents, size)]}
instance Monoid b => Monoid (ViewGroup a b) where
mempty = ViewGroup mempty []
mappend (ViewGroup lSize lElts) (ViewGroup rSize rElts) =
ViewGroup (lSize `mappend` rSize)
(lElts ++ rElts)
viewGroups = let views = [("a", 14), ("b", 42), ("c", 45), ("d", 223.5)]
in glue ((> 250) . totalSize) [ViewGroup (Sum width) [(x, Sum width)]
| (x, width) <- views]
main = print (viewGroups :: [ViewGroup String (Sum Double)])
[ViewGroup {totalSize = Sum {getSum = 101.0},
elements = [("a",Sum {getSum = 14.0}),
("b",Sum {getSum = 42.0}),
("c",Sum {getSum = 45.0})]},
ViewGroup {totalSize = Sum {getSum = 223.5},
elements = [("d",Sum {getSum = 223.5})]}]
On the one hand this looks like quite a bit of work for a simple function, but on the other it's rather nice to have a type that describes the cumulative summing you're doing, and Monoid instances are nice to have anyway...and after defining the type and the Monoid instance there's almost no work left to do in the calling of glue itself.
Well, I don't know, maybe it's still too much work, especially if you don't believe you can reuse that type. But I do think it's useful to recognize that this is a specific case of a more general problem, and try to solve the more general problem as well.
Given that groupBy and span themselves are defined by manual recursive functions, our modified functions will use the same mechanism.
Let us first define a general function groupAcc which takes an initial value for the accumulator, and then a function which takes an element in the list, the current accumulator state and potentially produces a new accumulated value (Nothing means the element is not accepted):
{-# LANGUAGE LambdaCase #-}
import Data.List (sortOn)
import Control.Arrow (first, second)
spanAcc :: z -> (a -> z -> Maybe z) -> [a] -> ((z, [a]), [a])
spanAcc z0 p = \case
xs#[] -> ((z0, xs), xs)
xs#(x:xs') -> case p x z0 of
Nothing -> ((z0, []), xs)
Just z1 -> first (\(z2, xt) -> (if null xt then z1 else z2, x : xt)) $
spanAcc z1 p xs'
groupAcc :: z -> (a -> z -> Maybe z) -> [a] -> [(z, [a])]
groupAcc z p = \case
[] -> [] ;
xs -> uncurry (:) $ second (groupAcc z p) $ spanAcc z p xs
For our specific problem, we define:
threshold :: (Num a, Ord a) => a -> a -> a -> Maybe a
threshold max a z0 = let z1 = a + z0 in if z1 < max then Just z1 else Nothing
groupViews :: (Ord z, Num z) => [(lab, z)] -> [[(lab, z)]]
groupViews = fmap snd . groupAcc 0 (threshold 250 . snd)
Which finally gives us:
groupFinal :: (Num a, Ord a) => [(lab, a)] -> [[(lab, a)]]
groupFinal = groupViews . sortOn snd
And ghci gives us:
> groupFinal [("a", 45), ("b", 223.5), ("c", 14), ("d", 42)]
[[("c",14.0),("d",42.0),("a",45.0)],[("b",223.5)]]
If we want to, we can simplify groupAcc by assuming that z is a Monoid wherefore mempty may be used, such that:
groupAcc2 :: Monoid z => (a -> z -> Maybe z) -> [a] -> [(z, [a])]
groupAcc2 p = \case
[] -> [] ;
xs -> let z = mempty in
uncurry (:) $ second (groupAcc z p) $ spanAcc z p xs

Round-robin algorithm in OCaml

This is the followup question of What's the grouping plan so that every two people are grouped together just once?
Basically, I implemented the Round robin algorithm.
By the algorithm, it can generate pairs list where each possible pair of elements are grouped together exactly once.
For example, we have a, b, c, d, then
On first day, we do
a b
c d
Then we group like [(a,c);(b,d)].
Then we round it clockwise like
a c
d b
Then we group like [(a,d);(c,b)].
Then we round it clockwise like
a d
b c
Then we group like [(a,b);(d,c)].
(Note, a is fixed all the time.)
Finally I can get
[(a,c);(b,d)]
[(a,d);(c,b)]
[(a,b);(d,c)]
Here are the ocaml code:
let split = List.fold_left (fun (l1, l2) x -> (l2, x::l1)) ([], [])
let round l1 l2 =
match List.rev l1, l2 with
| _, [] | [], _ -> raise Cant_round
| hd1::tl1, hd2::tl2 ->
hd2::(List.rev tl1), List.rev (hd1::List.rev tl2)
let rec robin fixed stopper acc = function
| _, [] | [], _ -> raise Cant_robin
| l1, (hd2::tl2 as l2) ->
if hd2 = stopper then acc
else robin fixed stopper ((List.combine (fixed::l1) l2)::acc) (round l1 l2)
let round_robin = function
| [] | _::[] -> raise Cant_round_robin
| hd::tl ->
let l1, l2 = in
match split tl with
| _, [] -> raise Cant_round_robin
| l1, (hd2::_ as l2) ->
robin hd hd2 ((List.combine (hd::l1) l2)::[]) (round l1 l2)
The code is quite straight forward following the algorithm. Is there a better implmentation?
let round_robin ~nplayers ~round i =
(* only works for an even number of players *)
assert (nplayers mod 2 = 0);
assert (0 <= round && round < nplayers - 1);
(* i is the position of a match,
at each round there are nplayers/2 matches *)
assert (0 <= i && i < nplayers / 2);
let last = nplayers - 1 in
let player pos =
if pos = last then last
else (pos + round) mod last
in
(player i, player (last - i))
let all_matches nplayers =
Array.init (nplayers - 1) (fun round ->
Array.init (nplayers / 2) (fun i ->
round_robin ~nplayers ~round i))
let _ = all_matches 6;;
(**
[|[|(0, 5); (1, 4); (2, 3)|];
[|(1, 5); (2, 0); (3, 4)|];
[|(2, 5); (3, 1); (4, 0)|];
[|(3, 5); (4, 2); (0, 1)|];
[|(4, 5); (0, 3); (1, 2)|]|]
*)
You don't need to compute the clockwise rotation by operating over actual data. You can represent it as picking indices in a fixed array (of things you rotate): after rotating the array t r times, the element at index i in the rotated array will be at index i+r in the original array, in fact (i+r) mod (Array.length t) to have wrap-around.
With this idea you could compute pairing without moving data around, simply incrementing a counter representing the number of rotations performed so far. In fact, you could probably even come up with a purely numerical solution that does not create any data structure (the array of things-to-rotate), and reasons on the various indices to apply this reasoning.
Although this question has been answered, but the correct answer is in an imperative way.
I finally found the following way to deal with round-robin algorithm simpler in functional way.
let round l1 l2 = let move = List.hd l2 in move::l1, (List.tl l2)#[move]
let combine m l1 l2 =
let rec comb i acc = function
|[], _ | _, [] -> acc
|_ when i >= m -> acc
|hd1::tl1, hd2::tl2 -> comb (i+1) ((hd1,hd2)::acc) (tl1,tl2)
in
comb 0 [] (l1,l2)
let round_robin l =
let fix = List.hd l in
let half = (List.length l)/2 in
List.fold_left (
fun (acc, (l1, l2)) _ -> (combine half (fix::l1) l2)::acc, round l1 l2
) ([], (List.tl l, List.rev l)) l |> fst |> List.tl

How make this piece of Haskell code more concise?

As practice, I am trying to write a simulation for the casino game "war" in Haskell.
http://en.wikipedia.org/wiki/Casino_war
It is a very simple game with a few rules. It would be an otherwise very simple problem to write in any of the imperative language I know, however I am struggling to write it in Haskell.
The code I have so far:
-- Simulation for the Casino War
import System.Random
import Data.Map
-------------------------------------------------------------------------------
-- stolen from the internet
fisherYatesStep :: RandomGen g => (Map Int a, g) -> (Int, a) -> (Map Int a, g)
fisherYatesStep (m, gen) (i, x) = ((insert j x . insert i (m ! j)) m, gen')
where
(j, gen') = randomR (0, i) gen
fisherYates :: RandomGen g => g -> [a] -> ([a], g)
fisherYates gen [] = ([], gen)
fisherYates gen l = toElems $ Prelude.foldl
fisherYatesStep (initial (head l) gen) (numerate (tail l))
where
toElems (x, y) = (elems x, y)
numerate = zip [1..]
initial x gen = (singleton 0 x, gen)
-------------------------------------------------------------------------------
data State = Deal | Tie deriving Show
-- state: game state
-- # cards to deal
-- # cards to burn
-- cards on the table
-- indices for tied players
-- # players
-- players winning
-- dealer's winning
type GameState = (State, Int, Int, [Int], [Int], Int, [Int], Int)
gameRound :: GameState -> Int -> GameState
gameRound (Deal, toDeal, toBurn, inPlay, tied, numPlayers, pWins, dWins) card
| toDeal > 0 =
-- not enough card, deal a card
(Deal, toDeal - 1, 0, card:inPlay, tied, numPlayers, pWins, dWins)
| toDeal == 0 =
-- enough cards in play now
-- here should detemine whether or not there is any ties on the table,
-- and go to the tie state
let
dealerCard = head inPlay
p = zipWith (+) pWins $ (tail inPlay) >>=
(\x -> if x < dealerCard then return (-1) else return 1)
d = if dealerCard == (maximum inPlay) then dWins + 1 else dWins - 1
in
(Deal, numPlayers + 1, 0, [], tied, numPlayers, p, d)
gameRound (Tie, toDeal, toBurn, inPlay, tied, numPlayers, pWins, dWins) card
-- i have no idea how to write the logic for the tie state AKA the "war" state
| otherwise = (Tie, toDeal, toBurn, inPlay, tied, numPlayers, pWins, dWins)
-------------------------------------------------------------------------------
main = do
rand <- newStdGen
-- create the shuffled deck
(deck, _) <- return $ fisherYates rand $ [2 .. 14] >>= (replicate 6)
-- fold the state updating function over the deck
putStrLn $ show $ Prelude.foldl gameRound
(Deal, 7, 0, [], [], 6, [0 ..], 0) deck
-------------------------------------------------------------------------------
I understand why extra work has to go towards creating random numbers, but I am pretty sure I am missing some basic construct or concept. It shouldn't be this awkward to keep a collection of states, and run a branching logic over a list of input. I couldn't even figure out a good way to write the logic for the case where there are ties on the table.
I am not asking for complete solutions. It would be real nice if someone could point out what I am doing wrong, or some good reading materials that are relevant.
Thanks in advance.
A useful design pattern for maintaining application state is the so called state monad. You can find a description and some introductory examples here. Also, you might want to consider using a data type with named fields instead of a tuple for GameState, for example:
data GameState = GameState { state :: State,
toDeal :: Int
-- and so on
}
This will make it easier to access/update individual fields using record syntax.
To make the code more readable, you should break up the structure of the game into meaningful components, and reorganizing your code accordingly. What you've done is to put all the game's state into one data structure. The result is that you have to deal with all the game details all the time.
The game keeps track of scores for each player and the dealer. Sometimes it adds 1 or subtracts 1 from a score. Scores aren't used for anything else. Separate out the score management from the other code:
-- Scores for each player and the dealer
data Score = Score [Int] Int
-- Outcome for each player and the dealer. 'True' means a round was won.
data Outcome = Outcome [Bool] Bool
startingScore :: Int -> Score
startingScore n = Score (replicate n 0) 0
updateScore :: Outcome -> Score -> Score
updateScore (Outcome ps d) (Score pss ds) = Score (zipWith upd pss pos) (update ds d)
where upd s True = s+1
upd s False = s-1
The cards dealt are also associated with players and the dealer. Winning or losing a round is based only on the card values. Separate out the score computation from the other code:
type Card = Int
data Dealt = Dealt [Card] Card
scoreRound :: Dealt -> Outcome
scoreRound (Dealt ps dealerCard) = Outcome (map scorePlayer ps) (dealerCard == maximumCard)
where
maximumCard = maximum (dealerCard : ps)
scorePlayer p = p >= dealerCard
I would say a game round consists of all steps needed to produce a single Outcome. Reorganize the code accordingly:
type Deck = [Card]
deal :: Int -> Deck -> (Dealt, Deck)
deal n d = (Dealt (take n d) (head $ drop n d), drop (n+1) d) -- Should check whether deck has enough cards
-- The 'input-only' parts of GameState
type GameConfig =
GameConfig {nPlayers :: Int}
gameRound :: GameConfig -> Deck -> (Deck, Outcome)
gameRound config deck = let
(dealt, deck') = deal (nPlayers config) deck
outcome = scoreRound dealt
in (deck', outcome)
This covers most of what was in the original code. You can approach the rest in a similar way.
The main idea you should get is that Haskell makes it easy to decompose programs into small pieces that are meaningful on their own. That is what makes code easier to work with.
Instead of putting everything into GameState, I created Score, Outcome, Dealt, and Deck. Some of these data types came from the original GameState. Others were not in the original code at all; they were implicit in the way complicated loops were organized. Instead of putting the entire game into gameRound, I created updateScore, scoreRound, deal, and other functions. Each of these interacts with only a few pieces of data.
It occurred to me that the recommendation 'use StateT' might be a little opaque so I translated a bit into that jargon, hoping you could see how to go from there. It might be best to include the state of the deck in the game state. gameround below just restates your function in StateT lingo. The previous definition, game uses the deck field of the game state, continuously reduced, and contains the whole game. I introduce IO actions, just to show how it's done, and so you can see the succession of states if you call main in ghci. You 'lift' IO actions into the StateT machinery, to put them on a level with the gets and puts. Note that in mose subcases, we put the new state and then call for the action to be repeated, so that the do block contains the complete recursive operation. (Tie and an empty deck end the game immediately.) Then in the last line of main we runStateT on this self-updating game yielding a function GameState -> IO (GameState,()); then we feed this with a certain starting state including the randomly determined deck to get the IO action which is the main business. (I don't follow how the game is supposed to work, but was mechanically moving things around to get the idea across.)
import Control.Monad.Trans.State
import Control.Monad.Trans
import System.Random
import Data.Map
data Stage = Deal | Tie deriving Show
data GameState =
GameState { stage :: Stage
, toDeal :: Int
, toBurn :: Int
, inPlay :: [Int]
, tied :: [Int]
, numPlayers :: Int
, pWins :: [Int]
, dWins :: Int
, deck :: [Int]} deriving Show
-- deck field is added for the `game` example
type GameRound m a = StateT GameState m a
main = do
rand <- newStdGen
let deck = fst $ fisherYates rand $ concatMap (replicate 6) [2 .. 14]
let startState = GameState Deal 7 0 [] [] 6 [0 ..100] 0 deck
runStateT game startState
game :: GameRound IO ()
game = do
st <- get
lift $ putStrLn "Playing: " >> print st
case deck st of
[] -> lift $ print "no cards"
(card:cards) ->
case (toDeal st, stage st) of
(0, Deal) -> do put (first_case_update st card cards)
game -- <-- recursive call with smaller deck
(_, Deal) -> do put (second_case_update st card cards)
game
(_, Tie) -> do lift $ putStrLn "This is a tie"
lift $ print st
where -- state updates:
-- I separate these out hoping this will make the needed sort
-- of 'logic' above clearer.
first_case_update s card cards=
s { numPlayers = numPlayers s + 1
, pWins = [if x < dealerCard then -1 else 1 |
x <- zipWith (+) (pWins s) (tail (inPlay s)) ]
, dWins = if dealerCard == maximum (inPlay s)
then dWins s + 1
else dWins s - 1
, deck = cards }
where dealerCard = head (inPlay s)
second_case_update s card cards =
s { toDeal = toDeal s - 1
, toBurn = 0
, inPlay = card : inPlay s
, deck = cards}
-- a StateTified formulation of your gameRound
gameround :: Monad m => Int -> GameRound m ()
gameround card = do
s <- get
case (toDeal s, stage s) of
(0, Deal) ->
put $ s { toDeal = numPlayers s + 1
, pWins = [if x < dealerCard then -1 else 1 |
x <- zipWith (+) (pWins s) (tail (inPlay s)) ]
, dWins = if dealerCard == maximum (inPlay s)
then dWins s + 1
else dWins s - 1}
where dealerCard = head (inPlay s)
(_, Deal) ->
put $ s { toDeal = toDeal s - 1
, toBurn = 0
, inPlay = card : inPlay s}
(_, Tie) -> return ()
fisherYatesStep :: RandomGen g => (Map Int a, g) -> (Int, a) -> (Map Int a, g)
fisherYatesStep (m, gen) (i, x) = ((insert j x . insert i (m ! j)) m, gen')
where
(j, gen') = randomR (0, i) gen
fisherYates :: RandomGen g => g -> [a] -> ([a], g)
fisherYates gen [] = ([], gen)
fisherYates gen l = toElems $ Prelude.foldl
fisherYatesStep (initial (head l) gen) (numerate (tail l))
where
toElems (x, y) = (elems x, y)
numerate = zip [1..]
initial x gen = (singleton 0 x, gen)

First non-repeating char in a string ? in haskell or F#

Given a sequence of char what is the most efficient way to find the first non repeating char
Interested purely functional implementation haskell or F# preffered.
A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)
The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.
A simple F# solution:
let f (s: string) =
let n = Map(Seq.countBy id s)
Seq.find (fun c -> n.[c] = 1) s
Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.
open System
open System.IO
open System.Collections.Generic
let Solve (str : string) =
let arrStr = str.ToCharArray()
let sorted = Array.sort arrStr
let len = str.Length - 1
let rec Inner i =
if i = len + 1 then
'-'
else
let index = Array.BinarySearch(sorted, arrStr.[i])
if index = 0 && sorted.[index+1] <> sorted.[index] then
arrStr.[i]
elif index = len && sorted.[index-1] <> sorted.[index] then
arrStr.[i]
elif index > 0 && index < len &&
sorted.[index+1] <> sorted.[index] &&
sorted.[index-1] <> sorted.[index] then
arrStr.[i]
else
Inner (i + 1)
Inner 0
let _ =
printfn "%c" (Solve "abcdefabcf")
A - means all characters are repeated.
Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.
Here's a bit longish solution, but guaranteed to be worst-case O(n log n):
import List
import Data.Ord.comparing
sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)
index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]
dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
dropRepeated $ dropWhile ((==(fst x)).fst) xs
| otherwise =
x:(dropRepeated xs)
nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index
firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs
The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.
The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).
Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...
An alternate Haskell O(n log n) solution using Data.Map and no sorting:
module NonRepeat (
firstNonRepeat
)
where
import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)
data Occurance = Occ { first :: Int, count :: Int }
deriving (Eq, Ord)
note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)
combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)
firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
where occurances = toList . fromListWith combine . zipWith note [0..]
findMinimum = safeMinimum . filter ((== 1).count.snd)
safeMinimum [] = Nothing
safeMinimum xs = Just $ minimumBy (comparing snd) xs
let firstNonRepeating (str:string) =
let rec inner i cMap =
if i = str.Length then
cMap
|> Map.filter (fun c (count, index) -> count = 1)
|> Map.toSeq
|> Seq.minBy (fun (c, (count, index)) -> index)
|> fst
else
let c = str.[i]
let value = if cMap.ContainsKey c then
let (count, index) = cMap.[c]
(count + 1, index)
else
(1, i)
let cMap = cMap.Add(c, value)
inner (i + 1) cMap
inner 0 (Map.empty)
Here is a simpler version that sacrifices speed.
let firstNonRepeating (str:string) =
let (c, count) = str
|> Seq.countBy (fun c -> c)
|> Seq.minBy (fun (c, count) -> count)
if count = 1 then Some c else None
How about something like this:
let firstNonRepeat s =
let repeats =
((Set.empty, Set.empty), s)
||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
|> snd
s
|> Seq.tryFind (fun c -> not (Set.contains c repeats))
This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):
static char FstNonRepeatedChar(string s)
{
return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}

Resources