Haskell: stock span algorithm - algorithm

I'm trying to implement the "stock span problem" in Haskell. This is the solution I've come up with. Would like to see if there is any other idiomatic way to do this. this is O(n^2) algorithm (?) and using a stack, we can make it a O(n). Any pointers to the other higher order functions that can be used is appreciated.
import Data.List (inits)
type Quote = Int
type Quotes = [Quote]
type Span = [Int]
stockSpanQuad :: Quotes -> Span
stockSpanQuad [] = []
stockSpanQuad xs = map spanQ (map splitfunc (tail $ inits xs))
where
spanQ (qs, q) = 1 + (length $ takeWhile (\a -> a <= q) (reverse qs))
splitfunc xs = (init xs, last xs)

The link that you've provided contains a solution which uses stack data structure. The examples in every language mutate the stack, use the arrays with indexes and access the elements of the array by index. All these operations are not very common in Haskell.
Let's consider the following solution:
type Quote = Int
type Quotes = [Quote]
type Span = [Int]
stockSpanWithStack :: Quotes -> Span
stockSpanWithStack quotes = calculateSpan quotesWithIndexes []
where
quotesWithIndexes = zip quotes [0..]
calculateSpan [] _ = []
calculateSpan ((x, index):xs) stack =
let
newStack = dropWhile (\(y, _) -> y <= x) stack
stockValue [] = index + 1
stockValue ((_, x):_) = index - x
in
(stockValue newStack) : (calculateSpan xs ((x, index):newStack))
And let's compare it with the Python solution:
# Calculate span values for rest of the elements
for i in range(1, n):
# Pop elements from stack whlie stack is not
# empty and top of stack is smaller than price[i]
while( len(st) > 0 and price[st[0]] <= price[i]):
st.pop()
# If stack becomes empty, then price[i] is greater
# than all elements on left of it, i.e. price[0],
# price[1], ..price[i-1]. Else the price[i] is
# greater than elements after top of stack
S[i] = i+1 if len(st) <= 0 else (i - st[0])
# Push this element to stack
st.append(i)
For the stack solution, we need elements with indexes. It can be imitated like this:
quotesWithIndexes = zip quotes [0..]
The list of quotes is iterated recursively and instead of modifying a stack in every loop iteration, we can call the function with a modified value:
calculateSpan ((x, index):xs) stack =
The following line in Python (popping of elements of the stack, which are smaller than the current value):
while( len(st) > 0 and price[st[0]] <= price[i]):
st.pop()
Can be rewritten in Haskell as:
newStack = dropWhile (\(y, _) -> y <= x) stack
And calculation of a stock value:
S[i] = i+1 if len(st) <= 0 else (i - st[0])
Can be interpreted as:
stockValue [] = index + 1
stockValue ((_, x):_) = index - x
Below it was said, that it's not a common thing to modify the state of the variables, like S[i] = ... or st.append(i). But we can recursively call the function with new stack value and prepend the current result to it:
(stockValue newStack) : (calculateSpan xs ((x, index):newStack))
Technically, we push at the beginning of the list and drop the first elements of the list, because it's idiomatic and more performant way of working with lists in Haskell.

I've come up with the following, however I'm not sure if its "idiomatic". I think this is similar to #Igor's answer. Please comment.
{-# LANGUAGE BangPatterns #-}
stockSpanLinear :: Quotes -> Span
stockSpanLinear = reverse.snd.(foldl func ([],[]))
type Stack = [(Quote, Int)]
func :: (Stack, Span)-> Quote -> (Stack, Span)
func ([], []) q = ([(q, 1)], [1])
func p#((_, !i):pis, span) q = go p q (i+1)
where
go :: (Stack, Span) -> Quote -> Int -> (Stack, Span)
go (stack,span) q index = let ys = dropWhile (\(p, _) -> p <= q) stack
in case ys of
[] -> ((q, index):ys, index+1:span)
(_,i):_ -> ((q, index):ys, index-i:span)

Related

Haskell Optimizations for List Processing stymied by Lazy Evaluation

I'm trying to improve the efficiency of the following code. I want to count all occurrences of a symbol before a given point (as part of pattern-matching using a Burrows-Wheeler transform). There's some overlap in how I'm counting symbols. However, when I have tried to implement what looks like it should be more efficient code, it turns out to be less efficient, and I'm assuming that lazy evaluation and my poor understanding of it is to blame.
My first attempt at a counting function went like this:
count :: Ord a => [a] -> a -> Int -> Int
count list sym pos = length . filter (== sym) . take pos $ list
Then in the body of the matching function itself:
matching str refCol pattern = match 0 (n - 1) (reverse pattern)
where n = length str
refFstOcc sym = length $ takeWhile (/= sym) refCol
match top bottom [] = bottom - top + 1
match top bottom (sym : syms) =
let topCt = count str sym top
bottomCt = count str sym (bottom + 1)
middleCt = bottomCt - topCt
refCt = refFstOcc sym
in if middleCt > 0
then match (refCt + topCt) (refCt + bottomCt - 1) syms
else 0
(Stripped down for brevity - I'm memoizing first occurrences of symbols in refCol through a Map, and a couple other details as well).
Edit: Sample use would be:
matching "AT$TCTAGT" "$AACGTTTT" "TCG"
which should be 1 (assuming I didn't mistype anything).
Now, I'm recounting everything in the middle between the top pointer and the bottom twice, which adds up when I count a million character DNA string with only 4 possible choices for characters (and profiling tells me that this is the big bottleneck, too, taking 48% of my time for bottomCt and around 38% of my time for topCt). For reference, when calculating this for a million character string and trying to match 50 patterns (each of which is between 1 and 1000 characters), the program takes about 8.5 to 9.5 seconds to run.
However, if I try to implement the following function:
countBetween :: Ord a => [a] -> a -> Int -> Int -> (Int, Int)
countBetween list sym top bottom =
let (topList, bottomList) = splitAt top list
midList = take (bottom - top) bottomList
getSyms = length . filter (== sym)
in (getSyms topList, getSyms midList)
(with changes made to the matching function to compensate), the program takes between 18 and 22 seconds to run.
I've also tried passing in a Map which can keep track of previous calls, but that also takes about 20 seconds to run and runs up the memory usage.
Similarly, I've shorted length . filter (== sym) to a fold, but again - 20 seconds for foldr, and 14-15 for foldl.
So what would be a proper Haskell way to optimize this code through rewriting it? (Specifically, I'm looking for something that doesn't involve precomputation - I may not be reusing strings very much - and which explains something of why this is happening).
Edit: More clearly, what I am looking for is the following:
a) Why does this behaviour happen in Haskell? How does lazy evaluation play a role, what optimizations is the compiler making to rewrite the count and countBetween functions, and what other factors may be involved?
b) What is a simple code rewrite which would address this issue so that I don't traverse the lists multiple times? I'm looking specifically for something which addresses that issue, rather than a solution which sidesteps it. If the final answer is, count is the most efficient possible way to write the code, why is that?
I'm not sure lazy evaluation has much to do with the performance of the code. I think the main problem is the use of String - which is a linked list - instead of more performant string type.
Note that this call in your countBetween function:
let (topList, bottomList) = splitAt top list
will re-create the linked link corresponding to topList meaning
a lot more allocations.
A Criterion benchmark to compare splitAt versus using take n/drop n
may be found here: http://lpaste.net/174526. The splitAt version is
about 3 times slower and, of course, has a lot more allocations.
Even if you don't want to "pre-compute" the counts you can improve
matters a great deal by simply switching to either ByteString or Text.
Define:
countSyms :: Char -> ByteString -> Int -> Int -> Int
countSyms sym str lo hi =
length [ i | i <- [lo..hi], BS.index str i == sym ]
and then:
countBetween :: ByteString -> Char -> Int -> Int -> (Int,Int)
countBetween str sym top bottom = (a,b)
where a = countSyms sym str 0 (top-1)
b = countSyms sym str top (bottom-1)
Also, don't use reverse on large lists - it will reallocate the
entire list. Just index into a ByteString / Text in reverse.
Memoizing counts may or may not help. It all depends on how it's done.
It seems that the main point of the match routine is
to transform a interval (bottom,top) to another interval
based on the current symbol sym. The formulas are
basically:
ref_fst = index of sym in ref_col
-- defined in an outer scope
match :: Char -> (Int,Int) -> (Int,Int)
match sym (bottom, top) | bottom > top = (bottom, top) -- if the empty interval
match sym (bottom, top) =
let
top_count = count of sym in str from index 0 to top
bot_count = count of sym in str from index 0 to bottom
mid_count = top_count - bot_count
in if mid_count > 0
then (ref_fst + bot_count, ref_fst + top_count)
else (1,0) -- the empty interval
And then matching is just a fold over pattern using match
with the initial interval (0, n-1).
Both top_count and bot_count can be computed efficiently
using a precomputed lookup table, and below is code which
does that.
If you run test1 you'll see a trace of how the interval
is transformed via each symbol in the pattern.
Note: There may be off-by-1 errors, and I've hard coded
ref_fst to be 0 - I'm not sure how this fits into the
larger algorithm, but the basic idea should be sound.
Note that once the counts vector has been created
there is no need to index into the original string anymore.
Therefore, even though I use a ByteString here for
the (larger) DNA sequence, it's not crucial, and the
mkCounts routine should work just as well if passed a String
instead.
Code also available at http://lpaste.net/174288
{-# LANGUAGE OverloadedStrings #-}
import Data.Vector.Unboxed ((!))
import qualified Data.Vector.Unboxed as UV
import qualified Data.Vector.Unboxed.Mutable as UVM
import qualified Data.ByteString.Char8 as BS
import Debug.Trace
import Text.Printf
import Data.List
mkCounts :: BS.ByteString -> UV.Vector (Int,Int,Int,Int)
mkCounts syms = UV.create $ do
let n = BS.length syms
v <- UVM.new (n+1)
let loop x i | i >= n = return x
loop x i = let s = BS.index syms i
(a,t,c,g) = x
x' = case s of
'A' -> (a+1,t,c,g)
'T' -> (a,t+1,c,g)
'C' -> (a,t,c+1,g)
'G' -> (a,t,c,g+1)
_ -> x
in do UVM.write v i x
loop x' (i+1)
x <- loop (0,0,0,0) 0
UVM.write v n x
return v
data DNA = A | C | T | G
deriving (Show)
getter :: DNA -> (Int,Int,Int,Int) -> Int
getter A (a,_,_,_) = a
getter T (_,t,_,_) = t
getter C (_,_,c,_) = c
getter G (_,_,_,g) = g
-- narrow a window
narrow :: Int -> UV.Vector (Int,Int,Int,Int) -> DNA -> (Int,Int) -> (Int,Int)
narrow refcol counts sym (lo,hi) | trace msg False = undefined
where msg = printf "-- lo: %d hi: %d refcol: %d sym: %s top_cnt: %d bot_count: %d" lo hi refcol (show sym) top_count bot_count
top_count = getter sym (counts ! (hi+1))
bot_count = getter sym (counts ! lo)
narrow refcol counts sym (lo,hi) =
let top_count = getter sym (counts ! (hi+1))
bot_count = getter sym (counts ! (lo+0))
mid_count = top_count - bot_count
in if mid_count > 0
then ( refcol + bot_count, refcol + top_count-1 )
else (lo+1,lo) -- signal an wmpty window
findFirst :: DNA -> UV.Vector (Int,Int,Int,Int) -> Int
findFirst sym v =
let n = UV.length v
loop i | i >= n = n
loop i = if getter sym (v ! i) > 0
then i
else loop (i+1)
in loop 0
toDNA :: String -> [DNA]
toDNA str = map charToDNA str
charToDNA :: Char -> DNA
charToDNA = go
where go 'A' = A
go 'C' = C
go 'T' = T
go 'G' = G
dnaToChar A = 'A'
dnaToChar C = 'C'
dnaToChar T = 'T'
dnaToChar G = 'G'
first :: DNA -> BS.ByteString -> Int
first sym str = maybe len id (BS.elemIndex (dnaToChar sym) str)
where len = BS.length str
test2 = do
-- matching "AT$TCTAGT" "$AACGTTTT" "TCG"
let str = "AT$TCTAGT"
refcol = "$AACGTTTT"
syms = toDNA "TCG"
-- hard coded for now
-- may be computeed an memoized
refcol_G = 4
refcol_C = 3
refcol_T = 5
counts = mkCounts str
w0 = (0, BS.length str -1)
w1 = narrow refcol_G counts G w0
w2 = narrow refcol_C counts C w1
w3 = narrow refcol_T counts T w2
firsts = (first A refcol, first T refcol, first C refcol, first G refcol)
putStrLn $ "firsts: " ++ show firsts
putStrLn $ "w0: " ++ show w0
putStrLn $ "w1: " ++ show w1
putStrLn $ "w2: " ++ show w2
putStrLn $ "w3: " ++ show w3
let (lo,hi) = w3
len = if lo <= hi then hi - lo + 1 else 0
putStrLn $ "length: " ++ show len
matching :: BS.ByteString -> BS.ByteString -> String -> Int
matching str refcol pattern =
let counts = mkCounts str
n = BS.length str
syms = toDNA (reverse pattern)
firsts = (first A refcol, first T refcol, first C refcol, first G refcol)
go (lo,hi) sym = narrow refcol counts sym (lo,hi)
where refcol = getter sym firsts
(lo, hi) = foldl' go (0,n-1) syms
len = if lo <= hi then hi - lo + 1 else 0
in len
test3 = matching "AT$TCTAGT" "$AACGTTTT" "TCG"

Writing infinite list to skip every factor of p?

How can I efficiently represent the list [0..] \\ [t+0*p, t+1*p ..]?
I have defined:
Prelude> let factors p t = [t+0*p, t+1*p ..]
I want to efficiently represent an infinite list that is the difference of [0..] and factors p t, but using \\ from Data.List requires too much memory for even medium-sized lists:
Prelude Data.List> [0..10000] \\ (factors 5 0)
<interactive>: out of memory
I know that I can represent the values between t+0*p and t+1*p with:
Prelude> let innerList p1 p2 t = [t+p1+1, t+p1+2 .. t+p2-1]
Prelude> innerList 0 5 0
[1,2,3,4]
However, repeatedly calculating and concatenating innerList for increasing intervals seems clumsy.
Can I efficiently represent [0..] \\ (factors p t) without calculating rem or mod for each element?
For the infinite list [0..] \\ [t,t+p..],
yourlist t p = [0..t-1] ++ [i | m <- [0,p..], i <- [t+m+1..t+m+p-1]]
Of course this approach doesn't scale, at all, if you'd want to remove some other factors, like
[0..] \\ [t,t+p..] \\ [s,s+q..] \\ ...
in which case you'll have to remove them in sequence with minus, mentioned in Daniel Fischer's answer. There is no magic bullet here.
But there's also a union, with which the above becomes
[0..] \\ ( [t,t+p..] `union` [s,s+q..] `union` ... )
the advantage is, we can arrange the unions in a tree, and get algorithmic improvement.
You can't use (\\) for that, because
(\\) :: (Eq a) => [a] -> [a] -> [a]
(\\) = foldl (flip delete)
the list of elements you want to remove is infinite, and a left fold never terminates when the list it folds over is infinite.
If you rather want to use something already written than write it yourself, you can use minus from the data-ordlist package.
The performance should be adequate.
Otherwise,
minus :: Ord a => [a] -> [a] -> [a]
minus xxs#(x:xs) yys#(y:ys)
| x < y = x : minus xs yys
| x == y = minus xs ys
| otherwise = minus xss ys
minus xs _ = xs
You can use a list comprehesion with a predicate, using rem:
>>> let t = 0
>>> let p = 5
>>> take 40 $ [ x | x <- [1..], x `rem` p /= t ]
[1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19,21,22,23,24,26,27,28,29,31,32,33,34,36,37,38,39,41,42,43,44,46,47,48,49]
If you want efficiency, why does your solution have to use list comprehension syntax?
Why not something like this?
gen' n i p | i == p = gen' (n + p) 1 p
gen' n i p = (n+i) : gen' n (i+1) p
gen = gen' 0 1
and then do
gen 5
Because you have ascending lists, you can simply lazily merge them:
nums = [1..]
nogos = factors p t
result = merge nums (dropWhile (<head nums) nogos) where
merge (a:as) (b:bs)
| a < b = a : merge as (b:bs)
| a == b = merge as bs
| otherwise = error "should not happen"
Writing this in a general way so that we have a function that builds the difference of two infinite lists, provided only that they are in ascending order, is left as exercise. In the end, the following should be possible
[1..] `infiniteDifference` primes `infiniteDifference` squares
For this, make it a left associative operator.

Minimizing chunks in a matrix

Suppose I have the following matrix:
The matrix can be broken down into chunks such that each chunk must, for all rows, have the same number of columns where the value is marked true for that row.
For example, the following chunk is valid:
This means that rows do not have to be contiguous.
Columns do not have to be contiguous either, as the following is a valid chunk:
However, the following is invalid:
That said, what is an algorithm that can be used to select chunks such that the minimal number of chunks will be used when finding all the chunks?
Given the example, above, the proper solution is (items with the same color represent a valid chunk):
In the above example, three is the minimal number of chunks that this can be broken down into.
Note that the following is also a valid solution:
There's not a preference to the solutions, really, just to get the least number of chunks.
I thought of counting using adjacent cells, but that doesn't account for the fact that the column values don't have to be contiguous.
I believe the key lies in finding the chunks with the largest area given the constraints, removing those items, and then repeating.
Taking that approach, the solution is:
But how to traverse the matrix and find the largest area is eluding me.
Also note, that if you want to reshuffle the rows and/or columns during the operations, that's a valid operation (in order to find the largest area), but I'd imagine you can only do it after you remove the largest areas from the matrix (after one area is found and moving onto the next).
You are doing circuit minimization on a truth table. For 4x4 truth tables, you can use a K map. The Quine-McCluskey algorithm is a generalization that can handle larger truth tables.
Keep in mind the problem is NP-Hard, so depending on the size of your truth tables, this problem can quickly grow to a size that is intractable.
This problem is strongly related to Biclustering, for which there are many efficient algorithms (and freely available implementations). Usually you will have to specify the number K of clusters you expect to find; if you don't have a good idea what K should be, you can proceed by binary search on K.
In case the biclusters don't overlap, you are done, otherwise you need to do some geometry to cut them into "blocks".
The solution I propose is fairly straightforward, but very time consuming.
It can be decomposed in 4 major steps:
find all the existing patterns in the matrix,
find all the possible combinations of these patterns,
remove all the incomplete pattern sets,
scan the remaining list to get the set with the minimum number of elements
First of, the algorithm below works on either column or row major matrices. I chose column for the explanations, but you may swap it for rows at your convenience, as long as it remains consistent accross the whole process.
The sample code accompanying the answer is in OCaml, but doesn't use any specific feature of the language, so it should be easy to port to other ML dialects.
Step 1:
Each column can be seen as a bit vector. Observe that a pattern (what you call chunk in your question) can be constructed by intersecting (ie. and ing) all the columns, or all the rows composing it, or even a combinations. So the first step is really about producing all the combinations of rows and columns (the powerset of the matrix' rows and columns if you will), intersecting them at the same time, and filter out the duplicates.
We consider the following interface for a matrix datatype:
module type MATRIX = sig
type t
val w : int (* the width of the matrix *)
val h : int (* the height ........ *)
val get : t -> int -> int -> bool (* cell value getter *)
end
Now let's have a look at this step's code:
let clength = M.h
let rlength = M.w
(* the vector datatype used throughought the algorithm
operator on this type are in the module V *)
type vector = V.t
(* a pattern description and comparison operators *)
module Pattern = struct
type t = {
w : int; (* width of thd pattern *)
h : int; (* height of the pattern *)
rows : vector; (* which rows of the matrix are used *)
cols : vector; (* which columns... *)
}
let compare a b = Pervasives.compare a b
let equal a b = compare a b = 0
end
(* pattern set : let us store patterns without duplicates *)
module PS = Set.Make(Pattern)
(* a simple recursive loop on #f #k times *)
let rec fold f acc k =
if k < 0
then acc
else fold f (f acc k) (pred k)
(* extract a column/row of the given matrix *)
let cr_extract mget len =
fold (fun v j -> if mget j then V.set v j else v) (V.null len) (pred len)
let col_extract m i = cr_extract (fun j -> M.get m i j) clength
let row_extract m i = cr_extract (fun j -> M.get m j i) rlength
(* encode a single column as a pattern *)
let col_encode c i =
{ w = 1; h = count c; rows = V.set (V.null clength) i; cols = c }
let row_encode r i =
{ h = 1; w = count r; cols = V.set (V.null rlength) i; rows = r }
(* try to add a column to a pattern *)
let col_intersect p c i =
let col = V.l_and p.cols c in
let h = V.count col in
if h > 0
then
let row = V.set (V.copy p.rows) i in
Some {w = V.count row; h = h; rows = row; clos = col}
else None
let row_intersect p r i =
let row = V.l_and p.rows r in
let w = V.count row in
if w > 0
then
let col = V.set (V.copy p.cols) i in
Some { w = w; h = V.count col; rows = row; cols = col }
else None
let build_patterns m =
let bp k ps extract encode intersect =
let build (l,k) =
let c = extract m k in
let u = encode c k in
let fld p ps =
match intersect p c k with
None -> l
| Some npc -> PS.add npc ps
in
PS.fold fld (PS.add u q) q, succ k
in
fst (fold (fun res _ -> build res) (ps, 0) k)
in
let ps = bp (pred rlength) PS.empty col_extract col_encode col_intersect in
let ps = bp (pred clength) ps row_extract row_encode row_intersect in
PS.elements ps
The V module must comply with the following signature for the whole algorithm:
module type V = sig
type t
val null : int -> t (* the null vector, ie. with all entries equal to false *)
val copy : t -> t (* copy operator *)
val get : t -> int -> bool (* get the nth element *)
val set : t -> int -> t (* set the nth element to true *)
val l_and : t -> t -> t (* intersection operator, ie. logical and *)
val l_or : t -> t -> t (* logical or *)
val count : t -> int (* number of elements set to true *)
val equal : t -> t -> bool (* equality predicate *)
end
Step 2:
Combining the patterns can also be seen as a powerset construction, with some restrictions: A valid pattern set may only contain patterns which don't overlap. The later can be defined as true for two patterns if both contain at least one common matrix cell.
With the pattern data structure used above, the overlap predicate is quite simple:
let overlap p1 p2 =
let nullc = V.null h
and nullr = V.null w in
let o v1 v2 n = not (V.equal (V.l_and v1 v2) n) in
o p1.rows p2.rows nullr && o p1.cols p2.cols nullc
The cols and rows of the pattern record indicate which coordinates in the matrix are included in the pattern. Thus a logical and on both fields will tell us if the patterns overlap.
For including a pattern in a pattern set, we must ensure that it does not overlap with any pattern of the set.
type pset = {
n : int; (* number of patterns in the set *)
pats : pattern list;
}
let overlap sp p =
List.exists (fun x -> overlap x p) sp.pats
let scombine sp p =
if overlap sp p
then None
else Some {
n = sp.n + 1;
pats = p::sp.pats;
}
let build_pattern_sets l =
let pset l p =
let sp = { n = 1; pats = [p] } in
List.fold_left (fun l spx ->
match scombine spx p with
None -> l
| Some nsp -> nsp::l
) (sp::l) l
in List.fold_left pset [] l
This step produces a lot of sets, and thus is very memory and computation intensive. It's certainly the weak point of this solution, but I don't see yet how to reduce the fold.
Step 3:
A pattern set is incomplete if when rebuilding the matrix with it, we do not obtain the original one. So the process is rather simple.
let build_matrix ps w =
let add m p =
let rec add_col p i = function
| [] -> []
| c::cs ->
let c =
if V.get p.rows i
then V.l_or c p.cols
else c
in c::(add_col p (succ i) cs)
in add_col p 0 m
in
(* null matrix as a list of null vectors *)
let m = fold (fun l _ -> V.null clength::l) [] (pred rlength) in
List.fold_left add m ps.pats
let drop_incomplete_sets m l =
(* convert the matrix to a list of columns *)
let m' = fold (fun l k -> col_extract m k ::l) [] (pred rlength) in
let complete m sp =
let m' = build_matrix sp in
m = m'
in List.filter (fun x -> complete m' x) l
Step 4:
The last step is just selecting the set with the smallest number of elements:
let smallest_set l =
let smallest ps1 ps2 = if ps1.n < ps2.n then ps1 else ps2 in
match l with
| [] -> assert false (* there should be at least 1 solution *)
| h::t -> List.fold_left smallest h t
The whole computation is then just the chaining of each steps:
let compute m =
let (|>) f g = g f in
build_patterns m |> build_pattern_sets |> drop_incomplete_sets m |> smallest_set
Notes
The algorithm above constructs a powerset of a powerset, with some limited filtering. There isn't as far as I know a way to reduce the search (as mentioned in a comment, if this is a NP hard problem, there isn't any).
This algorithm checks all the possible solutions, and correctly returns an optimal one (tested with many matrices, including the one given in the problem description.
One quick remark regarding the heuristic you propose in your question:
it could be easily implemented using the first step, removing the largest pattern found, and recursing. That would yeld a solution much more rapidly than my algorithm. However, the solution found may not be optimal.
For instance, consider the following matrix:
.x...
.xxx
xxx.
...x.
The central 4 cell chunck is the largest which may be found, but the set using it would comprise 5 patterns in total.
.1...
.223
422.
...5.
Yet this solution uses only 4:
.1...
.122
334.
...4.
Update:
Link to the full code I wrote for this answer.

First non-repeating char in a string ? in haskell or F#

Given a sequence of char what is the most efficient way to find the first non repeating char
Interested purely functional implementation haskell or F# preffered.
A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)
The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.
A simple F# solution:
let f (s: string) =
let n = Map(Seq.countBy id s)
Seq.find (fun c -> n.[c] = 1) s
Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.
open System
open System.IO
open System.Collections.Generic
let Solve (str : string) =
let arrStr = str.ToCharArray()
let sorted = Array.sort arrStr
let len = str.Length - 1
let rec Inner i =
if i = len + 1 then
'-'
else
let index = Array.BinarySearch(sorted, arrStr.[i])
if index = 0 && sorted.[index+1] <> sorted.[index] then
arrStr.[i]
elif index = len && sorted.[index-1] <> sorted.[index] then
arrStr.[i]
elif index > 0 && index < len &&
sorted.[index+1] <> sorted.[index] &&
sorted.[index-1] <> sorted.[index] then
arrStr.[i]
else
Inner (i + 1)
Inner 0
let _ =
printfn "%c" (Solve "abcdefabcf")
A - means all characters are repeated.
Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.
Here's a bit longish solution, but guaranteed to be worst-case O(n log n):
import List
import Data.Ord.comparing
sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)
index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]
dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
dropRepeated $ dropWhile ((==(fst x)).fst) xs
| otherwise =
x:(dropRepeated xs)
nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index
firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs
The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.
The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).
Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...
An alternate Haskell O(n log n) solution using Data.Map and no sorting:
module NonRepeat (
firstNonRepeat
)
where
import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)
data Occurance = Occ { first :: Int, count :: Int }
deriving (Eq, Ord)
note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)
combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)
firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
where occurances = toList . fromListWith combine . zipWith note [0..]
findMinimum = safeMinimum . filter ((== 1).count.snd)
safeMinimum [] = Nothing
safeMinimum xs = Just $ minimumBy (comparing snd) xs
let firstNonRepeating (str:string) =
let rec inner i cMap =
if i = str.Length then
cMap
|> Map.filter (fun c (count, index) -> count = 1)
|> Map.toSeq
|> Seq.minBy (fun (c, (count, index)) -> index)
|> fst
else
let c = str.[i]
let value = if cMap.ContainsKey c then
let (count, index) = cMap.[c]
(count + 1, index)
else
(1, i)
let cMap = cMap.Add(c, value)
inner (i + 1) cMap
inner 0 (Map.empty)
Here is a simpler version that sacrifices speed.
let firstNonRepeating (str:string) =
let (c, count) = str
|> Seq.countBy (fun c -> c)
|> Seq.minBy (fun (c, count) -> count)
if count = 1 then Some c else None
How about something like this:
let firstNonRepeat s =
let repeats =
((Set.empty, Set.empty), s)
||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
|> snd
s
|> Seq.tryFind (fun c -> not (Set.contains c repeats))
This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):
static char FstNonRepeatedChar(string s)
{
return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}

Calculating permutations in F#

Inspired by this question and answer, how do I create a generic permutations algorithm in F#? Google doesn't give any useful answers to this.
EDIT: I provide my best answer below, but I suspect that Tomas's is better (certainly shorter!)
you can also write something like this:
let rec permutations list taken =
seq { if Set.count taken = List.length list then yield [] else
for l in list do
if not (Set.contains l taken) then
for perm in permutations list (Set.add l taken) do
yield l::perm }
The 'list' argument contains all the numbers that you want to permute and 'taken' is a set that contains numbers already used. The function returns empty list when all numbers all taken.
Otherwise, it iterates over all numbers that are still available, gets all possible permutations of the remaining numbers (recursively using 'permutations') and appends the current number to each of them before returning (l::perm).
To run this, you'll give it an empty set, because no numbers are used at the beginning:
permutations [1;2;3] Set.empty;;
I like this implementation (but can't remember the source of it):
let rec insertions x = function
| [] -> [[x]]
| (y :: ys) as l -> (x::l)::(List.map (fun x -> y::x) (insertions x ys))
let rec permutations = function
| [] -> seq [ [] ]
| x :: xs -> Seq.concat (Seq.map (insertions x) (permutations xs))
Tomas' solution is quite elegant: it's short, purely functional, and lazy. I think it may even be tail-recursive. Also, it produces permutations lexicographically. However, we can improve performance two-fold using an imperative solution internally while still exposing a functional interface externally.
The function permutations takes a generic sequence e as well as a generic comparison function f : ('a -> 'a -> int) and lazily yields immutable permutations lexicographically. The comparison functional allows us to generate permutations of elements which are not necessarily comparable as well as easily specify reverse or custom orderings.
The inner function permute is the imperative implementation of the algorithm described here. The conversion function let comparer f = { new System.Collections.Generic.IComparer<'a> with member self.Compare(x,y) = f x y } allows us to use the System.Array.Sort overload which does in-place sub-range custom sorts using an IComparer.
let permutations f e =
///Advances (mutating) perm to the next lexical permutation.
let permute (perm:'a[]) (f: 'a->'a->int) (comparer:System.Collections.Generic.IComparer<'a>) : bool =
try
//Find the longest "tail" that is ordered in decreasing order ((s+1)..perm.Length-1).
//will throw an index out of bounds exception if perm is the last permuation,
//but will not corrupt perm.
let rec find i =
if (f perm.[i] perm.[i-1]) >= 0 then i-1
else find (i-1)
let s = find (perm.Length-1)
let s' = perm.[s]
//Change the number just before the tail (s') to the smallest number bigger than it in the tail (perm.[t]).
let rec find i imin =
if i = perm.Length then imin
elif (f perm.[i] s') > 0 && (f perm.[i] perm.[imin]) < 0 then find (i+1) i
else find (i+1) imin
let t = find (s+1) (s+1)
perm.[s] <- perm.[t]
perm.[t] <- s'
//Sort the tail in increasing order.
System.Array.Sort(perm, s+1, perm.Length - s - 1, comparer)
true
with
| _ -> false
//permuation sequence expression
let c = f |> comparer
let freeze arr = arr |> Array.copy |> Seq.readonly
seq { let e' = Seq.toArray e
yield freeze e'
while permute e' f c do
yield freeze e' }
Now for convenience we have the following where let flip f x y = f y x:
let permutationsAsc e = permutations compare e
let permutationsDesc e = permutations (flip compare) e
My latest best answer
//mini-extension to List for removing 1 element from a list
module List =
let remove n lst = List.filter (fun x -> x <> n) lst
//Node type declared outside permutations function allows us to define a pruning filter
type Node<'a> =
| Branch of ('a * Node<'a> seq)
| Leaf of 'a
let permutations treefilter lst =
//Builds a tree representing all possible permutations
let rec nodeBuilder lst x = //x is the next element to use
match lst with //lst is all the remaining elements to be permuted
| [x] -> seq { yield Leaf(x) } //only x left in list -> we are at a leaf
| h -> //anything else left -> we are at a branch, recurse
let ilst = List.remove x lst //get new list without i, use this to build subnodes of branch
seq { yield Branch(x, Seq.map_concat (nodeBuilder ilst) ilst) }
//converts a tree to a list for each leafpath
let rec pathBuilder pth n = // pth is the accumulated path, n is the current node
match n with
| Leaf(i) -> seq { yield List.rev (i :: pth) } //path list is constructed from root to leaf, so have to reverse it
| Branch(i, nodes) -> Seq.map_concat (pathBuilder (i :: pth)) nodes
let nodes =
lst //using input list
|> Seq.map_concat (nodeBuilder lst) //build permutations tree
|> Seq.choose treefilter //prune tree if necessary
|> Seq.map_concat (pathBuilder []) //convert to seq of path lists
nodes
The permutations function works by constructing an n-ary tree representing all possible permutations of the list of 'things' passed in, then traversing the tree to construct a list of lists. Using 'Seq' dramatically improves performance as it makes everything lazy.
The second parameter of the permutations function allows the caller to define a filter for 'pruning' the tree before generating the paths (see my example below, where I don't want any leading zeros).
Some example usage: Node<'a> is generic, so we can do permutations of 'anything':
let myfilter n = Some(n) //i.e., don't filter
permutations myfilter ['A';'B';'C';'D']
//in this case, I want to 'prune' leading zeros from my list before generating paths
let noLeadingZero n =
match n with
| Branch(0, _) -> None
| n -> Some(n)
//Curry myself an int-list permutations function with no leading zeros
let noLZperm = permutations noLeadingZero
noLZperm [0..9]
(Special thanks to Tomas Petricek, any comments welcome)
If you need distinct permuations (when the original set has duplicates), you can use this:
let rec insertions pre c post =
seq {
if List.length post = 0 then
yield pre # [c]
else
if List.forall (fun x->x<>c) post then
yield pre#[c]#post
yield! insertions (pre#[post.Head]) c post.Tail
}
let rec permutations l =
seq {
if List.length l = 1 then
yield l
else
let subperms = permutations l.Tail
for sub in subperms do
yield! insertions [] l.Head sub
}
This is a straight-forward translation from this C# code. I am open to suggestions for a more functional look-and-feel.
Take a look at this one:
http://fsharpcode.blogspot.com/2010/04/permutations.html
let length = Seq.length
let take = Seq.take
let skip = Seq.skip
let (++) = Seq.append
let concat = Seq.concat
let map = Seq.map
let (|Empty|Cons|) (xs:seq<'a>) : Choice<Unit, 'a * seq<'a>> =
if (Seq.isEmpty xs) then Empty else Cons(Seq.head xs, Seq.skip 1 xs)
let interleave x ys =
seq { for i in [0..length ys] ->
(take i ys) ++ seq [x] ++ (skip i ys) }
let rec permutations xs =
match xs with
| Empty -> seq [seq []]
| Cons(x,xs) -> concat(map (interleave x) (permutations xs))
If you need permutations with repetitions, this is the "by the book" approach using List.indexed instead of element comparison to filter out elements while constructing a permutation.
let permutations s =
let rec perm perms carry rem =
match rem with
| [] -> carry::perms
| l ->
let li = List.indexed l
let permutations =
seq { for ci in li ->
let (i, c) = ci
(perm
perms
(c::carry)
(li |> List.filter (fun (index, _) -> i <> index) |> List.map (fun (_, char) -> char))) }
permutations |> Seq.fold List.append []
perm [] [] s

Resources