Find duplicates in an unsorted sequence efficiently - performance

I need a very efficient way to find duplicates in an unsorted sequence. This is what I came up with, but it has a few shortcomings, namely it
unnecessarily counts occurrences beyond 2
consumes the entire sequence before yielding duplicates
creates several intermediate sequences
module Seq =
let duplicates items =
items
|> Seq.countBy id
|> Seq.filter (snd >> ((<) 1))
|> Seq.map fst
Regardless of the shortcomings, I don't see a reason to replace this with twice the code. Is it possible to improve this with comparably concise code?

A more elegant functional solution:
let duplicates xs =
Seq.scan (fun xs x -> Set.add x xs) Set.empty xs
|> Seq.zip xs
|> Seq.choose (fun (x, xs) -> if Set.contains x xs then Some x else None)
Uses scan to accumulate sets of all elements seen so far. Then uses zip to combine each element with the set of elements before it. Finally, uses choose to filter out the elements that are in the set of previously-seen elements, i.e. the duplicates.
EDIT
Actually my original answer was completely wrong. Firstly, you don't want duplicates in your outputs. Secondly, you want performance.
Here is a purely functional solution that implements the algorithm you're after:
let duplicates xs =
(Map.empty, xs)
||> Seq.scan (fun xs x ->
match Map.tryFind x xs with
| None -> Map.add x false xs
| Some false -> Map.add x true xs
| Some true -> xs)
|> Seq.zip xs
|> Seq.choose (fun (x, xs) ->
match Map.tryFind x xs with
| Some false -> Some x
| None | Some true -> None)
This uses a map to track whether each element has been seen before once or many times and then emits the element if it is seen having only been seen once before, i.e. the first time it is duplicated.
Here is a faster imperative version:
let duplicates (xs: _ seq) =
seq { let d = System.Collections.Generic.Dictionary(HashIdentity.Structural)
let e = xs.GetEnumerator()
while e.MoveNext() do
let x = e.Current
let mutable seen = false
if d.TryGetValue(x, &seen) then
if not seen then
d.[x] <- true
yield x
else
d.[x] <- false }
This is around 2× faster than any of your other answers (at the time of writing).
Using a for x in xs do loop to enumerate the elements in a sequence is substantially slower than using GetEnumerator directly but generating your own Enumerator is not significantly faster than using a computation expression with yield.
Note that the TryGetValue member of Dictionary allows me to avoid allocation in the inner loop by mutating a stack allocated value whereas the TryGetValue extension member offered by F# (and used by kvb in his/her answer) allocates its return tuple.

Here's an imperative solution (which is admittedly slightly longer):
let duplicates items =
seq {
let d = System.Collections.Generic.Dictionary()
for i in items do
match d.TryGetValue(i) with
| false,_ -> d.[i] <- false // first observance
| true,false -> d.[i] <- true; yield i // second observance
| true,true -> () // already seen at least twice
}

This is the best "functional" solution I could come up with that doesn't consume the entire sequence up front.
let duplicates =
Seq.scan (fun (out, yielded:Set<_>, seen:Set<_>) item ->
if yielded.Contains item then
(None, yielded, seen)
else
if seen.Contains item then
(Some(item), yielded.Add item, seen.Remove item)
else
(None, yielded, seen.Add item)
) (None, Set.empty, Set.empty)
>> Seq.Choose (fun (x,_,_) -> x)

Assuming your sequence is finite, this solution requires one run on the sequence:
open System.Collections.Generic
let duplicates items =
let dict = Dictionary()
items |> Seq.fold (fun acc item ->
match dict.TryGetValue item with
| true, 2 -> acc
| true, 1 -> dict.[item] <- 2; item::acc
| _ -> dict.[item] <- 1; acc) []
|> List.rev
You can provide length of the sequence as the capacity of Dictionary, but it requires to enumerate the whole sequence once more.
EDIT:
To resolve 2nd problem, one could generate duplicates on demand:
open System.Collections.Generic
let duplicates items =
seq {
let dict = Dictionary()
for item in items do
match dict.TryGetValue item with
| true, 2 -> ()
| true, 1 -> dict.[item] <- 2; yield item
| _ -> dict.[item] <- 1
}

Functional solution:
let duplicates items =
let test (unique, result) v =
if not(unique |> Set.contains v) then (unique |> Set.add v ,result)
elif not(result |> Set.contains v) then (unique,result |> Set.add v)
else (unique, result)
items |> Seq.fold test (Set.empty, Set.empty) |> snd |> Set.toSeq

Related

Flip dimensions of a 2-dimensional list (or list of lists), where each sublist has equal length

I have a list of lists, where each inner list has the same length, and I need to project that as its inverse (that is, I need the dimensions flipped).
In other words, take the first item of each sublist and put that in a new list, take the second item of each sublist and put it in a new list etc. Then return a list of all these new lists.
Example: if input is:
let ls = [[1;2;3];[4;5;6];[7;8;9];[0;0;0]];;
Then output is supposed to be:
val it : int list list = [[1; 4; 7; 0]; [2; 5; 8; 0]; [3; 6; 9; 0]]
I have working code, but it doesn't feel right. It traverses multiple times over the lists, needs to do a List.rev multiple times and has to check for empty on the inner lists:
let rec getInnerHeads acc skipped lst =
match lst with
| [] -> List.rev acc, List.rev skipped
| item::rest ->
match item with
| [] -> [], skipped
| innerHead::skip1 ->
getInnerHeads (innerHead::acc) (skip1::skipped) rest
let rec flipDimensions acc lst =
match lst with
| [] -> acc |> List.rev
| z when (z |> List.forall List.isEmpty) -> acc |> List.rev
| rest ->
let (elem, skip1Elems) = getInnerHeads [] [] rest
flipDimensions (elem::acc) skip1Elems
The only upside of above code is that it is rail-recursive (at least I think it is).
Anybody has a more efficient, or succinct, or both algorithm? I checked F# Snippets and SO, figured this would've been asked before, it seems so common, but I didn't find any examples.
Maybe something with List.unfold?
let transpose matrix =
let rec loop acc = function
| (_::_)::_ as m -> loop (List.map List.head m :: acc) (List.map List.tail m)
| _ -> List.rev acc
loop [] matrix

Elegant Array.multipick(?) implementation

I'd like to implement something akin to imaginary Array.multipick:
Array.multipick : choosers:('a -> bool) [] -> array:'a [] -> 'a []
Internally, we test each array's element with all choosers, the first chooser to return true is removed from choosers array, and we add that chooser's argument to the result. After that, we continue interation while choosers array has elements left.
The last part is important, because without early exit requirement this could be solved with just Array.fold.
This could be easily implemented with something like:
let rec impl currentIndex currentChoosers results
But it's too procedural for my taste. Maybe there's more elegant solution?
It's quite difficult to write elegant code using arrays of changing size. Here is some code that works on lists instead and does not mutate any values.
let rec pick accum elem tried = function
| [] -> (accum, List.rev tried)
| chooser :: rest ->
if chooser elem then (elem :: accum, List.rev_append tried rest)
else pick accum elem (chooser :: tried) rest
let rec multipick_l accum choosers list =
match choosers, list with
| [], _
| _, [] -> List.rev accum
| _, elem :: elems ->
let (accum', choosers') = pick accum elem [] choosers in
multipick_l accum' choosers' elems
let multipick choosers array =
Array.of_list
(multipick_l [] (Array.to_list choosers) (Array.to_list array))
If you think that Array.fold_left is usable except for the early exit requirement, you can use an exception to exit early.
A fold with an early exit is a good idea, however a production-worthy one specifically targeting arrays would need to be written in a fairly imperative manner. For simplicity, I'll grab the more general sequence one from this answer.
let multipick (choosers: ('a -> bool) array) (arr: 'a array) : 'a array =
let indexed =
choosers
|> Seq.indexed
|> Map.ofSeq
((indexed, []), arr)
||> foldWhile (fun (cs, res) e ->
if Map.isEmpty cs then
None
else
match cs |> Seq.tryFind (fun kvp -> kvp.Value e) with
| Some kvp -> Some (Map.remove kvp.Key cs, e :: res)
| None -> Some (cs, res))
|> snd
|> List.rev
|> Array.ofList
I'm using a Map keyed by array index to keep track of remaining functions - this allows for easy removal of elements, but still retains their order (since map key-value pairs are ordered by keys when iterating).
F# Set wouldn't work with functions due to comparison constraint. System.Collections.Generic.HashSet would work, but it's mutable, and I'm not sure if it would retain ordering.

How to Create the Power Set (Combinations) of the Infinite Set in F# using Sequences?

Here is my failed attempt at the problem any help would be appreciated.
I tried to come up with the best algo for the power set that worked on eager lists. This part seems to be working fine. The part I'm having trouble with is translating it to work with Sequences so it can run it on streaming\infinite lists. I really don't like the yield syntax maybe because I don't understand it well but I would rather have it without using the yield syntax as well.
//All Combinations of items in a list
//i.e. the Powerset given each item is unique
//Note: lists are eager so can't be used for infinite
let listCombinations xs =
List.fold (fun acc x ->
List.collect (fun ys -> ys::[x::ys]) acc) [[]] xs
//This works fine (Still interested if it could be faster)
listCombinations [1;2;3;4;5] |> Seq.iter (fun x -> printfn "%A" x)
//All Combinations of items in a sequence
//i.e. the Powerset given each item is unique
//Note: Not working
let seqCombinations xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
seq { yield ys
yield seq { yield x
yield! ys} }) acc) Seq.empty xs
//All Combinations of items in a sequence
//i.e. the Powerset given each item is unique
//Note: Not working (even wrong type signature)
let seqCombinations2 xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
Seq.append ys (Seq.append x ys)) acc) Seq.empty xs
//Sequences to test on
let infiniteSequence = Seq.initInfinite (fun i -> i + 1)
let finiteSequence = Seq.take 5 infiniteSequence
//This should work easy since its in a finite sequence
//But it does not, so their must be a bug in 'seqCombinations' above
for xs in seqCombinations finiteSequence do
for y in xs do
printfn "%A" y
//This one is much more difficult to get to work
//since its the powerset on the infinate sequence
//None the less If someone could help me find a way to make this work
//This is my ultimate goal
let firstFew = Seq.take 20 (seqCombinations infiniteSequence)
for xs in firstFew do
for y in xs do
printfn "%A" y
Your seqCombinations is almost correct, but you didn't translate it from lists to sequences properly. The equivalent of [[]] is not Seq.empty, but Seq.singleton Seq.empty:
let seqCombinations xs =
Seq.fold (fun acc x ->
Seq.collect (fun ys ->
seq { yield ys
yield seq { yield x
yield! ys} }) acc) (Seq.singleton Seq.empty) xs
The code above works for finite sequences. But for infinite ones, it doesn't work, because it first tries to reach the end, which it obviously never does for infinite sequences.
If you want a function that will work with infinite sequences I managed to figure out two ways, but neither of them is particularly nice. One of them uses mutable state:
let seqCombinations xs =
let combs = ref [[]]
seq {
yield! !combs
for x in xs do
let added = List.map (fun ys -> x::ys) !combs
yield! added
combs := !combs # added
}
The other is too much about dealing with details of seq<T>:
open System.Collections.Generic
let seqCombinations (xs : seq<_>) =
let rec combs acc (e : IEnumerator<_>) =
seq {
if (e.MoveNext()) then
let added = List.map (fun ys -> (e.Current)::ys) acc
yield! added
yield! combs (acc # added) e }
use enumerator = xs.GetEnumerator()
seq {
yield []
yield! combs [[]] enumerator
}
I think this would be much easier if you could treat infinite sequences as head and tail, like finite lists in F# or any sequence in Haskell. But it's certainly possible there is a nice way to express this in F#, and I just didn't find it.
I've asked a similar question recently at Generate powerset lazily and got some nice answers.
For powerset of finite sets, the answer by #Daniel in the above link is an efficient solution and probably suits your purpose. You can come up with a test case to compare between his approach and yours.
Regarding powerset of infinite sets, here is a bit of maths. According to Cantor's theorem, the power set of a countably infinite set is uncountably infinite. It means there is no way to enumerate powerset of all integers (which is countably infinite) even in a lazy way. The intuition is the same for real numbers; since real number is uncountably infinite, we can't actually model them using infinite sequences.
Therefore, there is no algorithm to enumerate powerset of a countably infinite set. Or that kind of algorithm just doesn't make sense.
This is sort of a joke, but will actually generate the correct result for an infinite sequence (it's just that it can't be proven--empirically, not mathematically).
let powerset s =
seq {
yield Seq.empty
for x in s -> seq [x]
}

F# insert/remove item from list

How should I go about removing a given element from a list? As an example, say I have list ['A'; 'B'; 'C'; 'D'; 'E'] and want to remove the element at index 2 to produce the list ['A'; 'B'; 'D'; 'E']? I've already written the following code which accomplishes the task, but it seems rather inefficient to traverse the start of the list when I already know the index.
let remove lst i =
let rec remove lst lst' =
match lst with
| [] -> lst'
| h::t -> if List.length lst = i then
lst' # t
else
remove t (lst' # [h])
remove lst []
let myList = ['A'; 'B'; 'C'; 'D'; 'E']
let newList = remove myList 2
Alternatively, how should I insert an element at a given position? My code is similar to the above approach and most likely inefficient as well.
let insert lst i x =
let rec insert lst lst' =
match lst with
| [] -> lst'
| h::t -> if List.length lst = i then
lst' # [x] # lst
else
insert t (lst' # [h])
insert lst []
let myList = ['A'; 'B'; 'D'; 'E']
let newList = insert myList 2 'C'
Removing element at the specified index isn't a typical operation in functional programming - that's why it seems difficult to find the right implementation of these operations. In functional programming, you'll usually process the list element-by-element using recursion, or implement the processing in terms of higher-level declarative operations. Perhaps if you could clarfiy what is your motivation, we can give a better answer.
Anyway, to implement the two operations you wanted, you can use existing higher-order functions (that traverse the entire list a few times, because there is really no good way of doing this without traversing the list):
let removeAt index input =
input
// Associate each element with a boolean flag specifying whether
// we want to keep the element in the resulting list
|> List.mapi (fun i el -> (i <> index, el))
// Remove elements for which the flag is 'false' and drop the flags
|> List.filter fst |> List.map snd
To insert element to the specified index, you could write:
let insertAt index newEl input =
// For each element, we generate a list of elements that should
// replace the original one - either singleton list or two elements
// for the specified index
input |> List.mapi (fun i el -> if i = index then [newEl; el] else [el])
|> List.concat
However, as noted earlier - unless you have a very good reasons for using these functions, you should probably consider describing your goals more broadly and use an alternative (more functional) solution.
Seems the most idiomatic (not tail recursive):
let rec insert v i l =
match i, l with
| 0, xs -> v::xs
| i, x::xs -> x::insert v (i - 1) xs
| i, [] -> failwith "index out of range"
let rec remove i l =
match i, l with
| 0, x::xs -> xs
| i, x::xs -> x::remove (i - 1) xs
| i, [] -> failwith "index out of range"
it seems rather inefficient to
traverse the start of the list when I
already know the index.
F# lists are singly-linked lists, so you don't have indexed access to them. But most of the time, you don't need it. The majority of indexed operations on arrays are iteration from front to end, which is exactly the most common operation on immutable lists. Its also pretty common to add items to the end of an array, which isn't really the most efficient operation on singly linked lists, but most of the time you can use the "cons and reverse" idiom or use an immutable queue to get the same result.
Arrays and ResizeArrays are really the best choice if you need indexed access, but they aren't immutable. A handful of immutable data structures like VLists allow you to create list-like data structures supporting O(1) cons and O(log n) indexed random access if you really need it.
If you need random access in a list, consider using System.Collections.Generic.List<T> or System.Collections.Generic.LinkedList<T> instead of a F# list.
I know this has been here for a while now, but just had to do something like this recently and I came up with this solution, maybe it isn't the most efficient, but it surely is the shortest idiomatic code I found for it
let removeAt index list =
list |> List.indexed |> List.filter (fun (i, _) -> i <> index) |> List.map snd
The List.Indexed returns a list of tuples which are the index in the list and the actual item in that position after that all it takes is to filter the one tuple matching the inputted index and get the actual item afterwards.
I hope this helps someone who's not extremely concerned with efficiency and wants brief code
The following includes a bit of error checking as well
let removeAt index = function
| xs when index >= 0 && index < List.length xs ->
xs
|> List.splitAt index
|> fun (x,y) -> y |> List.skip 1 |> List.append x
| ys -> ys
Lets go thru it and explain the code
// use the function syntax
let removeAt index = function
// then check if index is within the size of the list
| xs when index >= 0 && index < List.length xs ->
xs
// split the list before the given index
// splitAt : int -> 'T list -> ('T list * 'T list)
// this gives us a tuple of the the list with 2 sublists
|> List.splitAt index
// define a function which
// first drops the element on the snd tuple element
// then appends the remainder of that sublist to the fst tuple element
// and return all of it
|> fun (x,y) -> y |> List.skip 1 |> List.append x
//index out of range so return the original list
| ys -> ys
And if you don't like the idea of simply returning the original list on indexOutOfRange - wrap the return into something
let removeAt index = function
| xs when index >= 0 && index < List.length xs ->
xs
|> List.splitAt index
|> fun (x,y) -> y |> List.skip 1 |> List.append x
|> Some
| ys -> None
I think this should be quite faster than Juliet's or Tomas' proposal but most certainly Mauricio's comment is hitting it home. If one needs to remove or delete items other data structures seem a better fit.

Calculating permutations in F#

Inspired by this question and answer, how do I create a generic permutations algorithm in F#? Google doesn't give any useful answers to this.
EDIT: I provide my best answer below, but I suspect that Tomas's is better (certainly shorter!)
you can also write something like this:
let rec permutations list taken =
seq { if Set.count taken = List.length list then yield [] else
for l in list do
if not (Set.contains l taken) then
for perm in permutations list (Set.add l taken) do
yield l::perm }
The 'list' argument contains all the numbers that you want to permute and 'taken' is a set that contains numbers already used. The function returns empty list when all numbers all taken.
Otherwise, it iterates over all numbers that are still available, gets all possible permutations of the remaining numbers (recursively using 'permutations') and appends the current number to each of them before returning (l::perm).
To run this, you'll give it an empty set, because no numbers are used at the beginning:
permutations [1;2;3] Set.empty;;
I like this implementation (but can't remember the source of it):
let rec insertions x = function
| [] -> [[x]]
| (y :: ys) as l -> (x::l)::(List.map (fun x -> y::x) (insertions x ys))
let rec permutations = function
| [] -> seq [ [] ]
| x :: xs -> Seq.concat (Seq.map (insertions x) (permutations xs))
Tomas' solution is quite elegant: it's short, purely functional, and lazy. I think it may even be tail-recursive. Also, it produces permutations lexicographically. However, we can improve performance two-fold using an imperative solution internally while still exposing a functional interface externally.
The function permutations takes a generic sequence e as well as a generic comparison function f : ('a -> 'a -> int) and lazily yields immutable permutations lexicographically. The comparison functional allows us to generate permutations of elements which are not necessarily comparable as well as easily specify reverse or custom orderings.
The inner function permute is the imperative implementation of the algorithm described here. The conversion function let comparer f = { new System.Collections.Generic.IComparer<'a> with member self.Compare(x,y) = f x y } allows us to use the System.Array.Sort overload which does in-place sub-range custom sorts using an IComparer.
let permutations f e =
///Advances (mutating) perm to the next lexical permutation.
let permute (perm:'a[]) (f: 'a->'a->int) (comparer:System.Collections.Generic.IComparer<'a>) : bool =
try
//Find the longest "tail" that is ordered in decreasing order ((s+1)..perm.Length-1).
//will throw an index out of bounds exception if perm is the last permuation,
//but will not corrupt perm.
let rec find i =
if (f perm.[i] perm.[i-1]) >= 0 then i-1
else find (i-1)
let s = find (perm.Length-1)
let s' = perm.[s]
//Change the number just before the tail (s') to the smallest number bigger than it in the tail (perm.[t]).
let rec find i imin =
if i = perm.Length then imin
elif (f perm.[i] s') > 0 && (f perm.[i] perm.[imin]) < 0 then find (i+1) i
else find (i+1) imin
let t = find (s+1) (s+1)
perm.[s] <- perm.[t]
perm.[t] <- s'
//Sort the tail in increasing order.
System.Array.Sort(perm, s+1, perm.Length - s - 1, comparer)
true
with
| _ -> false
//permuation sequence expression
let c = f |> comparer
let freeze arr = arr |> Array.copy |> Seq.readonly
seq { let e' = Seq.toArray e
yield freeze e'
while permute e' f c do
yield freeze e' }
Now for convenience we have the following where let flip f x y = f y x:
let permutationsAsc e = permutations compare e
let permutationsDesc e = permutations (flip compare) e
My latest best answer
//mini-extension to List for removing 1 element from a list
module List =
let remove n lst = List.filter (fun x -> x <> n) lst
//Node type declared outside permutations function allows us to define a pruning filter
type Node<'a> =
| Branch of ('a * Node<'a> seq)
| Leaf of 'a
let permutations treefilter lst =
//Builds a tree representing all possible permutations
let rec nodeBuilder lst x = //x is the next element to use
match lst with //lst is all the remaining elements to be permuted
| [x] -> seq { yield Leaf(x) } //only x left in list -> we are at a leaf
| h -> //anything else left -> we are at a branch, recurse
let ilst = List.remove x lst //get new list without i, use this to build subnodes of branch
seq { yield Branch(x, Seq.map_concat (nodeBuilder ilst) ilst) }
//converts a tree to a list for each leafpath
let rec pathBuilder pth n = // pth is the accumulated path, n is the current node
match n with
| Leaf(i) -> seq { yield List.rev (i :: pth) } //path list is constructed from root to leaf, so have to reverse it
| Branch(i, nodes) -> Seq.map_concat (pathBuilder (i :: pth)) nodes
let nodes =
lst //using input list
|> Seq.map_concat (nodeBuilder lst) //build permutations tree
|> Seq.choose treefilter //prune tree if necessary
|> Seq.map_concat (pathBuilder []) //convert to seq of path lists
nodes
The permutations function works by constructing an n-ary tree representing all possible permutations of the list of 'things' passed in, then traversing the tree to construct a list of lists. Using 'Seq' dramatically improves performance as it makes everything lazy.
The second parameter of the permutations function allows the caller to define a filter for 'pruning' the tree before generating the paths (see my example below, where I don't want any leading zeros).
Some example usage: Node<'a> is generic, so we can do permutations of 'anything':
let myfilter n = Some(n) //i.e., don't filter
permutations myfilter ['A';'B';'C';'D']
//in this case, I want to 'prune' leading zeros from my list before generating paths
let noLeadingZero n =
match n with
| Branch(0, _) -> None
| n -> Some(n)
//Curry myself an int-list permutations function with no leading zeros
let noLZperm = permutations noLeadingZero
noLZperm [0..9]
(Special thanks to Tomas Petricek, any comments welcome)
If you need distinct permuations (when the original set has duplicates), you can use this:
let rec insertions pre c post =
seq {
if List.length post = 0 then
yield pre # [c]
else
if List.forall (fun x->x<>c) post then
yield pre#[c]#post
yield! insertions (pre#[post.Head]) c post.Tail
}
let rec permutations l =
seq {
if List.length l = 1 then
yield l
else
let subperms = permutations l.Tail
for sub in subperms do
yield! insertions [] l.Head sub
}
This is a straight-forward translation from this C# code. I am open to suggestions for a more functional look-and-feel.
Take a look at this one:
http://fsharpcode.blogspot.com/2010/04/permutations.html
let length = Seq.length
let take = Seq.take
let skip = Seq.skip
let (++) = Seq.append
let concat = Seq.concat
let map = Seq.map
let (|Empty|Cons|) (xs:seq<'a>) : Choice<Unit, 'a * seq<'a>> =
if (Seq.isEmpty xs) then Empty else Cons(Seq.head xs, Seq.skip 1 xs)
let interleave x ys =
seq { for i in [0..length ys] ->
(take i ys) ++ seq [x] ++ (skip i ys) }
let rec permutations xs =
match xs with
| Empty -> seq [seq []]
| Cons(x,xs) -> concat(map (interleave x) (permutations xs))
If you need permutations with repetitions, this is the "by the book" approach using List.indexed instead of element comparison to filter out elements while constructing a permutation.
let permutations s =
let rec perm perms carry rem =
match rem with
| [] -> carry::perms
| l ->
let li = List.indexed l
let permutations =
seq { for ci in li ->
let (i, c) = ci
(perm
perms
(c::carry)
(li |> List.filter (fun (index, _) -> i <> index) |> List.map (fun (_, char) -> char))) }
permutations |> Seq.fold List.append []
perm [] [] s

Resources