Haskell performance using dynamic programming - performance

I am attempting to calculate the Levenshtein distance between two strings using dynamic programming. This is being done through Hackerrank, so I have timing constraints. I used a techenique I saw in: How are Dynamic Programming algorithms implemented in idiomatic Haskell? and it seems to be working. Unfortunaly, it is timing out in one test case. I do not have access to the specific test case, so I don't know the exact size of the input.
import Control.Monad
import Data.Array.IArray
import Data.Array.Unboxed
main = do
n <- readLn
replicateM_ n $ do
s1 <- getLine
s2 <- getLine
print $ editDistance s1 s2
editDistance :: String -> String -> Int
editDistance s1 s2 = dynamic editDistance' (length s1, length s2)
where
s1' :: UArray Int Char
s1' = listArray (1,length s1) s1
s2' :: UArray Int Char
s2' = listArray (1,length s2) s2
editDistance' table (i,j)
| min i j == 0 = max i j
| otherwise = min' (table!((i-1),j) + 1) (table!(i,(j-1)) + 1) (table!((i-1),(j-1)) + cost)
where
cost = if s1'!i == s2'!j then 0 else 1
min' a b = min (min a b)
dynamic :: (Array (Int,Int) Int -> (Int,Int) -> Int) -> (Int,Int) -> Int
dynamic compute (xBnd, yBnd) = table!(xBnd,yBnd)
where
table = newTable $ map (\coord -> (coord, compute table coord)) [(x,y) | x<-[0..xBnd], y<-[0..yBnd]]
newTable xs = array ((0,0),fst (last xs)) xs
I've switched to using arrays, but that speed up was insufficient. I cannot use Unboxed arrays, because this code relies on laziness. Are there any glaring performance mistakes I have made? Or how else can I speed it up?

The backward equations for edit distance calculations are:
f(i, j) = minimum [
1 + f(i + 1, j), -- delete from the 1st string
1 + f(i, j + 1), -- delete from the 2nd string
f(i + 1, j + 1) + if a(i) == b(j) then 0 else 1 -- substitute or match
]
So within each dimension, you need nothing more than the very next index: + 1. This is a sequential access pattern, not random access to require arrays; and can be implemented using lists and nested right folds:
editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = head . foldr loop [n, n - 1..0] $ zip a [m, m - 1..]
where
(m, n) = (length a, length b)
loop (s, l) lst = foldr go [l] $ zip3 b lst (tail lst)
where
go (t, i, j) acc#(k:_) = inc `seq` inc:acc
where inc = minimum [i + 1, k + 1, if s == t then j else j + 1]
You may test this code in Hackerrank Edit Distance Problem as in:
import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
import Text.Read (readMaybe)
editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = ... -- as implemented above
main :: IO ()
main = do
Just n <- readMaybe <$> getLine
replicateM_ n $ do
a <- getLine
b <- getLine
print $ editDistance a b
which passes all tests with a decent performance.

Related

How to optimize this Haskell code summing up the primes in sublinear time?

Problem 10 from Project Euler is to find the sum of all the primes below given n.
I solved it simply by summing up the primes generated by the sieve of Eratosthenes. Then I came across much more efficient solution by Lucy_Hedgehog (sub-linear!).
For n = 2⋅10^9:
Python code (from the quote above) runs in 1.2 seconds in Python 2.7.3.
C++ code (mine) runs in about 0.3 seconds (compiled with g++ 4.8.4).
I re-implemented the same algorithm in Haskell, since I'm learning it:
import Data.List
import Data.Map (Map, (!))
import qualified Data.Map as Map
problem10 :: Integer -> Integer
problem10 n = (sieve (Map.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n
where vs = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
r = floor (sqrt (fromIntegral n))
sieve :: Map Integer Integer -> Integer -> Integer -> [Integer] -> Map Integer Integer
sieve m p r vs | p > r = m
| otherwise = sieve (if m ! p > m ! (p - 1) then update m vs p else m) (p + 1) r vs
update :: Map Integer Integer -> [Integer] -> Integer -> Map Integer Integer
update m vs p = foldl' decrease m (map (\v -> (v, sumOfSieved m v p)) (takeWhile (>= p*p) vs))
decrease :: Map Integer Integer -> (Integer, Integer) -> Map Integer Integer
decrease m (k, v) = Map.insertWith (flip (-)) k v m
sumOfSieved :: Map Integer Integer -> Integer -> Integer -> Integer
sumOfSieved m v p = p * (m ! (v `div` p) - m ! (p - 1))
main = print $ problem10 $ 2*10^9
I compiled it with ghc -O2 10.hs and run with time ./10.
It gives the correct answer, but takes about 7 seconds.
I compiled it with ghc -prof -fprof-auto -rtsopts 10 and run with ./10 +RTS -p -h.
10.prof shows that decrease takes 52.2% time and 67.5% allocations.
After running hp2ps 10.hp I got such heap profile:
Again looks like decrease takes most of the heap. GHC version 7.6.3.
How would you optimize run time of this Haskell code?
Update 13.06.17:
I tried replacing immutable Data.Map with mutable Data.HashTable.IO.BasicHashTable from the hashtables package, but I'm probably doing something bad, since for tiny n = 30 it already takes too long, about 10 seconds. What's wrong?
Update 18.06.17:
Curious about the HashTable performance issues is a good read. I took Sherh's code using mutable Data.HashTable.ST.Linear, but dropped Data.Judy in instead. It runs in 1.1 seconds, still relatively slow.
I've done some small improvements so it runs in 3.4-3.5 seconds on my machine.
Using IntMap.Strict helped a lot. Other than that I just manually performed some ghc optimizations just to be sure. And make Haskell code more close to Python code from your link. As a next step you could try to use some mutable HashMap. But I'm not sure... IntMap can't be much faster than some mutable container because it's an immutable one. Though I'm still surprised about it's efficiency. I hope this can be implemented faster.
Here is the code:
import Data.List (foldl')
import Data.IntMap.Strict (IntMap, (!))
import qualified Data.IntMap.Strict as IntMap
p :: Int -> Int
p n = (sieve (IntMap.fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]) 2 r vs) ! n
where vs = [n `div` i | i <- [1..r]] ++ [n', n' - 1 .. 1]
r = floor (sqrt (fromIntegral n) :: Double)
n' = n `div` r - 1
sieve :: IntMap Int -> Int -> Int -> [Int] -> IntMap Int
sieve m' p' r vs = go m' p'
where
go m p | p > r = m
| m ! p > m ! (p - 1) = go (update m vs p) (p + 1)
| otherwise = go m (p + 1)
update :: IntMap Int -> [Int] -> Int -> IntMap Int
update s vs p = foldl' decrease s (takeWhile (>= p2) vs)
where
sp = s ! (p - 1)
p2 = p * p
sumOfSieved v = p * (s ! (v `div` p) - sp)
decrease m v = IntMap.adjust (subtract $ sumOfSieved v) v m
main :: IO ()
main = print $ p $ 2*10^(9 :: Int)
UPDATE:
Using mutable hashtables I've managed to make performance up to ~5.5sec on Haskell with this implementation.
Also, I used unboxed vectors instead of lists in several places. Linear hashing seems to be the fastest. I think this can be done even faster. I noticed sse42 option in hasthables package. Not sure I've managed to set it correctly but even without it runs that fast.
UPDATE 2 (19.06.2017)
I've managed to make it 3x faster then best solution from #Krom (using my code + his map) by dropping judy hashmap at all. Instead just plain arrays are used. You can come up with the same idea if you notice that keys for S hashmap are either sequence from 1 to n' or n div i for i from 1 to r. So we can represent such HashMap as two arrays making lookups in array depending on searching key.
My code + Judy HashMap
$ time ./judy
95673602693282040
real 0m0.590s
user 0m0.588s
sys 0m0.000s
My code + my sparse map
$ time ./sparse
95673602693282040
real 0m0.203s
user 0m0.196s
sys 0m0.004s
This can be done even faster if instead of IOUArray already generated vectors and Vector library is used and readArray is replaced by unsafeRead. But I don't think this should be done if only you're not really interested in optimizing this as much as possible.
Comparison with this solution is cheating and is not fair. I expect same ideas implemented in Python and C++ will be even faster. But #Krom solution with closed hashmap is already cheating because it uses custom data structure instead of standard one. At least you can see that standard and most popular hash maps in Haskell are not that fast. Using better algorithms and better ad-hoc data structures can be better for such problems.
Here's resulting code.
First as a baseline, the timings of the existing approaches
on my machine:
Original program posted in the question:
time stack exec primorig
95673602693282040
real 0m4.601s
user 0m4.387s
sys 0m0.251s
Second the version using Data.IntMap.Strict from
here
time stack exec primIntMapStrict
95673602693282040
real 0m2.775s
user 0m2.753s
sys 0m0.052s
Shershs code with Data.Judy dropped in here
time stack exec prim-hash2
95673602693282040
real 0m0.945s
user 0m0.955s
sys 0m0.028s
Your python solution.
I compiled it with
python -O -m py_compile problem10.py
and the timing:
time python __pycache__/problem10.cpython-36.opt-1.pyc
95673602693282040
real 0m1.163s
user 0m1.160s
sys 0m0.003s
Your C++ version:
$ g++ -O2 --std=c++11 p10.cpp -o p10
$ time ./p10
sum(2000000000) = 95673602693282040
real 0m0.314s
user 0m0.310s
sys 0m0.003s
I didn't bother to provide a baseline for slow.hs, as I didn't
want to wait for it to complete when run with an argument of
2*10^9.
Subsecond performance
The following program runs in under a second on my machine.
It uses a hand rolled hashmap, which uses closed hashing with
linear probing and uses some variant of knuths hashfunction,
see here.
Certainly it is somewhat tailored to the case, as the lookup
function for example expects the searched keys to be present.
Timings:
time stack exec prim
95673602693282040
real 0m0.725s
user 0m0.714s
sys 0m0.047s
First I implemented my hand rolled hashmap simply to hash
the keys with
key `mod` size
and selected a size multiple times higher than the expected
input, but the program took 22s or more to complete.
Finally it was a matter of choosing a hash function which was
good for the workload.
Here is the program:
import Data.Maybe
import Control.Monad
import Data.Array.IO
import Data.Array.Base (unsafeRead)
type Number = Int
data Map = Map { keys :: IOUArray Int Number
, values :: IOUArray Int Number
, size :: !Int
, factor :: !Int
}
newMap :: Int -> Int -> IO Map
newMap s f = do
k <- newArray (0, s-1) 0
v <- newArray (0, s-1) 0
return $ Map k v s f
storeKey :: IOUArray Int Number -> Int -> Int -> Number -> IO Int
storeKey arr s f key = go ((key * f) `mod` s)
where
go :: Int -> IO Int
go ind = do
v <- readArray arr ind
go2 v ind
go2 v ind
| v == 0 = do { writeArray arr ind key; return ind; }
| v == key = return ind
| otherwise = go ((ind + 1) `mod` s)
loadKey :: IOUArray Int Number -> Int -> Int -> Number -> IO Int
loadKey arr s f key = s `seq` key `seq` go ((key *f) `mod` s)
where
go :: Int -> IO Int
go ix = do
v <- unsafeRead arr ix
if v == key then return ix else go ((ix + 1) `mod` s)
insertIntoMap :: Map -> (Number, Number) -> IO Map
insertIntoMap m#(Map ks vs s f) (k, v) = do
ix <- storeKey ks s f k
writeArray vs ix v
return m
fromList :: Int -> Int -> [(Number, Number)] -> IO Map
fromList s f xs = do
m <- newMap s f
foldM insertIntoMap m xs
(!) :: Map -> Number -> IO Number
(!) (Map ks vs s f) k = do
ix <- loadKey ks s f k
readArray vs ix
mupdate :: Map -> Number -> (Number -> Number) -> IO ()
mupdate (Map ks vs s fac) i f = do
ix <- loadKey ks s fac i
old <- readArray vs ix
let x' = f old
x' `seq` writeArray vs ix x'
r' :: Number -> Number
r' = floor . sqrt . fromIntegral
vs' :: Integral a => a -> a -> [a]
vs' n r = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
vss' n r = r + n `div` r -1
list' :: Int -> Int -> [Number] -> IO Map
list' s f vs = fromList s f [(i, i * (i + 1) `div` 2 - 1) | i <- vs]
problem10 :: Number -> IO Number
problem10 n = do
m <- list' (19*vss) (19*vss+7) vs
nm <- sieve m 2 r vs
nm ! n
where vs = vs' n r
vss = vss' n r
r = r' n
sieve :: Map -> Number -> Number -> [Number] -> IO Map
sieve m p r vs | p > r = return m
| otherwise = do
v1 <- m ! p
v2 <- m ! (p - 1)
nm <- if v1 > v2 then update m vs p else return m
sieve nm (p + 1) r vs
update :: Map -> [Number] -> Number -> IO Map
update m vs p = foldM (decrease p) m $ takeWhile (>= p*p) vs
decrease :: Number -> Map -> Number -> IO Map
decrease p m k = do
v <- sumOfSieved m k p
mupdate m k (subtract v)
return m
sumOfSieved :: Map -> Number -> Number -> IO Number
sumOfSieved m v p = do
v1 <- m ! (v `div` p)
v2 <- m ! (p - 1)
return $ p * (v1 - v2)
main = do { n <- problem10 (2*10^9) ; print n; } -- 2*10^9
I am not a professional with hashing and that sort of stuff, so
this can certainly be improved a lot. Maybe we Haskellers should
improve the of the shelf hash maps or provide some simpler ones.
My hashmap, Shershs code
If I plug my hashmap in Shershs (see answer below) code, see here
we are even down to
time stack exec prim-hash2
95673602693282040
real 0m0.601s
user 0m0.604s
sys 0m0.034s
Why is slow.hs slow?
If you read through the source
for the function insert in Data.HashTable.ST.Basic, you
will see that it deletes the old key value pair and inserts
a new one. It doesn't look up the "place" for the value and
mutate it, as one might imagine, if one reads that it is
a "mutable" hashtable. Here the hashtable itself is mutable,
so you don't need to copy the whole hashtable for insertion
of a new key value pair, but the value places for the pairs
are not. I don't know if that is the whole story of slow.hs
being slow, but my guess is, it is a pretty big part of it.
A few minor improvements
So that's the idea I followed while trying to improve
your program the first time.
See, you don't need a mutable mapping from keys to values.
Your key set is fixed. You want a mapping from keys to mutable
places. (Which is, by the way, what you get from C++ by default.)
And so I tried to come up with that. I used IntMap IORef from
Data.IntMap.Strict and Data.IORef first and got a timing
of
tack exec prim
95673602693282040
real 0m2.134s
user 0m2.141s
sys 0m0.028s
I thought maybe it would help to work with unboxed values
and to get that, I used IOUArray Int Int with 1 element
each instead of IORef and got those timings:
time stack exec prim
95673602693282040
real 0m2.015s
user 0m2.018s
sys 0m0.038s
Not much of a difference and so I tried to get rid of bounds
checking in the 1 element arrays by using unsafeRead and
unsafeWrite and got a timing of
time stack exec prim
95673602693282040
real 0m1.845s
user 0m1.850s
sys 0m0.030s
which was the best I got using Data.IntMap.Strict.
Of course I ran each program multiple times to see if
the times are stable and the differences in run time aren't
just noise.
It looks like these are all just micro-optimizations.
And here is the program that ran fastest for me without using a hand rolled data structure:
import qualified Data.IntMap.Strict as M
import Control.Monad
import Data.Array.IO
import Data.Array.Base (unsafeRead, unsafeWrite)
type Number = Int
type Place = IOUArray Number Number
type Map = M.IntMap Place
tupleToRef :: (Number, Number) -> IO (Number, Place)
tupleToRef = traverse (newArray (0,0))
insertRefs :: [(Number, Number)] -> IO [(Number, Place)]
insertRefs = traverse tupleToRef
fromList :: [(Number, Number)] -> IO Map
fromList xs = M.fromList <$> insertRefs xs
(!) :: Map -> Number -> IO Number
(!) m i = unsafeRead (m M.! i) 0
mupdate :: Map -> Number -> (Number -> Number) -> IO ()
mupdate m i f = do
let place = m M.! i
old <- unsafeRead place 0
let x' = f old
-- make the application of f strict
x' `seq` unsafeWrite place 0 x'
r' :: Number -> Number
r' = floor . sqrt . fromIntegral
vs' :: Integral a => a -> a -> [a]
vs' n r = [n `div` i | i <- [1..r]] ++ reverse [1..n `div` r - 1]
list' :: [Number] -> IO Map
list' vs = fromList [(i, i * (i + 1) `div` 2 - 1) | i <- vs]
problem10 :: Number -> IO Number
problem10 n = do
m <- list' vs
nm <- sieve m 2 r vs
nm ! n
where vs = vs' n r
r = r' n
sieve :: Map -> Number -> Number -> [Number] -> IO Map
sieve m p r vs | p > r = return m
| otherwise = do
v1 <- m ! p
v2 <- m ! (p - 1)
nm <- if v1 > v2 then update m vs p else return m
sieve nm (p + 1) r vs
update :: Map -> [Number] -> Number -> IO Map
update m vs p = foldM (decrease p) m $ takeWhile (>= p*p) vs
decrease :: Number -> Map -> Number -> IO Map
decrease p m k = do
v <- sumOfSieved m k p
mupdate m k (subtract v)
return m
sumOfSieved :: Map -> Number -> Number -> IO Number
sumOfSieved m v p = do
v1 <- m ! (v `div` p)
v2 <- m ! (p - 1)
return $ p * (v1 - v2)
main = do { n <- problem10 (2*10^9) ; print n; } -- 2*10^9
If you profile that, you see that it spends most of the time in the custom lookup function (!),
don't know how to improve that further. Trying to inline (!) with {-# INLINE (!) #-}
didn't yield better results; maybe ghc already did this.
This code of mine evaluates the sum to 2⋅10^9 in 0.3 seconds and the sum to 10^12 (18435588552550705911377) in 19.6 seconds (if given sufficient RAM).
import Control.DeepSeq
import qualified Control.Monad as ControlMonad
import qualified Data.Array as Array
import qualified Data.Array.ST as ArrayST
import qualified Data.Array.Base as ArrayBase
primeLucy :: (Integer -> Integer) -> (Integer -> Integer) -> Integer -> (Integer->Integer)
primeLucy f sf n = g
where
r = fromIntegral $ integerSquareRoot n
ni = fromIntegral n
loop from to c = let go i = ControlMonad.when (to<=i) (c i >> go (i-1)) in go from
k = ArrayST.runSTArray $ do
k <- ArrayST.newListArray (-r,r) $ force $
[sf (div n (toInteger i)) - sf 1|i<-[r,r-1..1]] ++
[0] ++
[sf (toInteger i) - sf 1|i<-[1..r]]
ControlMonad.forM_ (takeWhile (<=r) primes) $ \p -> do
l <- ArrayST.readArray k (p-1)
let q = force $ f (toInteger p)
let adjust = \i j -> do { v <- ArrayBase.unsafeRead k (i+r); w <- ArrayBase.unsafeRead k (j+r); ArrayBase.unsafeWrite k (i+r) $!! v+q*(l-w) }
loop (-1) (-div r p) $ \i -> adjust i (i*p)
loop (-div r p-1) (-min r (div ni (p*p))) $ \i -> adjust i (div (-ni) (i*p))
loop r (p*p) $ \i -> adjust i (div i p)
return k
g :: Integer -> Integer
g m
| m >= 1 && m <= integerSquareRoot n = k Array.! (fromIntegral m)
| m >= integerSquareRoot n && m <= n && div n (div n m)==m = k Array.! (fromIntegral (negate (div n m)))
| otherwise = error $ "Function not precalculated for value " ++ show m
primeSum :: Integer -> Integer
primeSum n = (primeLucy id (\m -> div (m*m+m) 2) n) n
If your integerSquareRoot function is buggy (as reportedly some are), you can replace it here with floor . sqrt . fromIntegral.
Explanation:
As the name suggests it is based upon a generalization of the famous method by "Lucy Hedgehog" eventually discovered by the original poster.
It allows you to calculate many sums of the form (with p prime) without enumerating all the primes up to N and in time O(N^0.75).
Its inputs are the function f (i.e., id if you want the prime sum), its summatory function over all the integers (i.e., in that case the sum of the first m integers or div (m*m+m) 2), and N.
PrimeLucy returns a lookup function (with p prime) restricted to certain values of n: .
Try this and let me know how fast it is:
-- sum of primes
import Control.Monad (forM_, when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
sieve :: Int -> UArray Int Bool
sieve n = runSTUArray $ do
let m = (n-1) `div` 2
r = floor . sqrt $ fromIntegral n
bits <- newArray (0, m-1) True
forM_ [0 .. r `div` 2 - 1] $ \i -> do
isPrime <- readArray bits i
when isPrime $ do
let a = 2*i*i + 6*i + 3
b = 2*i*i + 8*i + 6
forM_ [a, b .. (m-1)] $ \j -> do
writeArray bits j False
return bits
primes :: Int -> [Int]
primes n = 2 : [2*i+3 | (i, True) <- assocs $ sieve n]
main = do
print $ sum $ primes 1000000
You can run it on ideone. My algorithm is the Sieve of Eratosthenes, and it should be quite fast for small n. For n = 2,000,000,000, the array size may be a problem, in which case you will need to use a segmented sieve. See my blog for more information about the Sieve of Eratosthenes. See this answer for information about a segmented sieve (but not in Haskell, unfortunately).

Solving CHRL4 on code chef (CHEF and Way) in Haskell

I am trying to solve this question in Haskell but the codechef compiler keeps on saying it is the wrong answer. The question is as follows:
After visiting a childhood friend, Chef wants to get back to his home. Friend lives at the first street, and Chef himself lives at the N-th (and the last) street. Their city is a bit special: you can move from the X-th street to the Y-th street if and only if 1 <= Y - X <= K, where K is the integer value that is given to you. Chef wants to get to home in such a way that the product of all the visited streets' special numbers is minimal (including the first and the N-th street). Please, help him to find such a product.
Input
The first line of input consists of two integer numbers - N and K - the number of streets and the value of K respectively. The second line consist of N numbers - A1, A2, ..., AN respectively, where Ai equals to the special number of the i-th street.
The output should be modulo 1000000007
Input
4 2
1 2 3 4
Output
8
The solution I used is as follows:
import qualified Data.ByteString.Char8 as B
import Data.Maybe (fromJust)
findMinIndex x index minIndex n
| index == n = minIndex
| (x!!index) < (x!!minIndex) = findMinIndex x (index+1) index n
| otherwise = findMinIndex x (index+1) minIndex n
minCost [] _ = 1
minCost (x:xs) k = let indexList = take k xs
minIndex = findMinIndex indexList 0 0 (length indexList)
in x * minCost(drop minIndex xs) k
main :: IO()
main = do
t <- B.getContents
let inputs = B.lines t
let firstLine = inputs !! 0
let secondLine = inputs !! 1
let [n,k] = map (fst . fromJust . B.readInt) $ B.split ' ' firstLine
let specialNums = reverse $ map (fst . fromJust . B.readInteger) $ B.split ' ' secondLine
putStrLn $ show ((minCost specialNums k) `mod` 1000000007)
It worked for the given test case and a few other test cases I tries out. But it is not being accepted by codechef. I followed the editorial for the problem and made it. Basically starting from the last number in the list of special numbers the program search it's immediate k predecessors and finds the minimum one in that range and multiplies it with the current value and so on till the beginning of the list
Your algorithm doesn't always give the smallest product for all the inputs, e.g. this one:
5 2
3 2 3 2 3
The editorial explained the problem throughout, you really should read it again.
This problem is basically a shortest path problem, streets are vertices, possible movements from street to street are edges of the graph, the weight of an edge is determined by the special value of the tail alone. While the total movement cost is defined as the product but not the sum of all the costs, the question can be normalized by taking logarithms of all the special values, since
a * b = exp(log(a) + log(b))
Given log is monotonically increasing function, the minimal product is just the minimal sum of logarithms.
In editorial the editor picked Dijkstra's algorithm, but after taking the log transformation, it will be a standard shortest path problem and can be solved with any shortest path algorithm you like.
There are many implementations of Dijkstra's algorithm in Haskell, I found two on Hackage and one here. The parsing and graph initializing code is straight forward.
import Control.Monad (foldM)
import Control.Monad.ST
import Data.Array
import Data.Array.MArray
import Data.Array.ST
import Data.Function (on)
import Data.IntMap.Strict as M
import Data.List (groupBy)
import Data.Set as S
-- Code from http://rosettacode.org/wiki/Dijkstra's_algorithm#Haskell
dijkstra :: (Ix v, Num w, Ord w, Bounded w) => v -> v -> Array v [(v,w)] -> (Array v w, Array v v)
dijkstra src invalid_index adj_list = runST $ do
min_distance <- newSTArray b maxBound
writeArray min_distance src 0
previous <- newSTArray b invalid_index
let aux vertex_queue =
case S.minView vertex_queue of
Nothing -> return ()
Just ((dist, u), vertex_queue') ->
let edges = adj_list Data.Array.! u
f vertex_queue (v, weight) = do
let dist_thru_u = dist + weight
old_dist <- readArray min_distance v
if dist_thru_u >= old_dist then
return vertex_queue
else do
let vertex_queue' = S.delete (old_dist, v) vertex_queue
writeArray min_distance v dist_thru_u
writeArray previous v u
return $ S.insert (dist_thru_u, v) vertex_queue'
in
foldM f vertex_queue' edges >>= aux
aux (S.singleton (0, src))
m <- freeze min_distance
p <- freeze previous
return (m, p)
where b = bounds adj_list
newSTArray :: Ix i => (i,i) -> e -> ST s (STArray s i e)
newSTArray = newArray
shortest_path_to :: (Ix v) => v -> v -> Array v v -> [v]
shortest_path_to target invalid_index previous =
aux target [] where
aux vertex acc | vertex == invalid_index = acc
| otherwise = aux (previous Data.Array.! vertex) (vertex : acc)
-- Code I wrote
instance Bounded Double where
minBound = -1e100
maxBound = 1e100
constructInput :: Int -> Int -> M.IntMap Integer -> Array Int [(Int, Double)]
constructInput n k specMap =
let
specMap' = fmap (log . fromIntegral) specMap
edges = [(src, [(dest, specMap' M.! dest) | dest <- [src+1..src+k], dest <= n]) | src <- [1..n]]
in
array (1, n) edges
main :: IO ()
main = do
rawInput <- getContents
let
[l, l'] = lines rawInput
[n,k] = fmap read . words $ l
specs = fmap read . words $ l'
specMap = M.fromList $ [1..n] `zip` specs
adj_list = constructInput n k specMap
(_, previous) = dijkstra 1 0 adj_list
path = shortest_path_to n 0 previous
weight = (product $ fmap (specMap M.!) path) `mod` 1000000007
print weight
PS: My program scores 30 with a lot of TLE (short for "Too Long Execution" I guess) on CodeChief, for the full mark you may have to try it yourself and get a better solution.

Randomized algorithm not behaving as expected

I am implementing an approximate counting algorithm where we:
Maintain a counter X using log (log n) bits
Initialize X to 0
When an item arrives, increase X by 1 with probability (½)X
When the stream is over, output 2X − 1 so that E[2X]= n + 1
My implementation is as follows:
import System.Random
type Prob = Double
type Tosses = Int
-- * for sake of simplicity we assume 0 <= p <= 1
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= 100*p, s')
where (q,s') = randomR (1,100) s
toses :: Prob -> Tosses -> StdGen -> [(Bool,StdGen)]
toses _ 0 _ = []
toses p n s = let t#(b,s') = tos p s in t : toses p (pred n) s'
toses' :: Prob -> Tosses -> StdGen -> [Bool]
toses' p n = fmap fst . toses p n
morris :: StdGen -> [a] -> Int
morris s xs = go s xs 0 where
go _ [] n = n
go s (_:xs) n = go s' xs n' where
(h,s') = tos (0.5^n) s
n' = if h then succ n else n
main :: IO Int
main = do
s <- newStdGen
return $ morris s [1..10000]
The problem is that my X is always incorrect for any |stream| > 2, and it seems like for all StdGen and |stream| > 1000, X = 7
I tested the same algorithm in Matlab and it works there, so I assume it's either
an issue with my random number generator, or
raising 1/2 to a large n in Double
Please suggest a path forward?
The problem is actually very simple: with randomR (1,100) you preclude values within the first percent, so you have a complete cutoff at high powers of 1/2 (which all lie in that small interval). Actually a general thing: ranges should start at zero, not at one†, unless there's a specific reason.
But why even use a range of 100 in the first place? I'd just make it
tos :: Prob -> StdGen -> (Bool,StdGen)
tos p s = (q <= p, s')
where (q,s') = randomR (0,1) s
†I know, Matlab gets this wrong all over the place. Just one of the many horrible things about that language.
Unrelated to your problem: as chi remarked this kind of code looks a lot nicer if you use a suitable random monad, instead of manually passing around StdGens.
import Data.Random
import Data.Random.Source.Std
type Prob = Double
tos :: Prob -> RVar Bool
tos p = do
q <- uniform 0 1
return $ q <= p
morris :: [a] -> RVar Int
morris xs = go xs 0 where
go [] n = return n
go (_:xs) n = do
h <- tos (0.5^n)
go xs $ if h then succ n else n
morrisTest :: Int -> IO Int
morrisTest n = do
runRVar (morris [1..n]) StdRandom

First non-repeating char in a string ? in haskell or F#

Given a sequence of char what is the most efficient way to find the first non repeating char
Interested purely functional implementation haskell or F# preffered.
A fairly straightforward use of Data.Set in combination with filter will do the job in an efficient one-liner. Since this seems homeworkish, I'm declining to provide the precise line in question :-)
The complexity should, I think, be O(n log m) where m is the number of distinct characters in the string and n is the total number of characters in the string.
A simple F# solution:
let f (s: string) =
let n = Map(Seq.countBy id s)
Seq.find (fun c -> n.[c] = 1) s
Here's an F# solution in O(n log n): sort the array, then for each character in the original array, binary search for it in the sorted array: if it's the only one of its kind, that's it.
open System
open System.IO
open System.Collections.Generic
let Solve (str : string) =
let arrStr = str.ToCharArray()
let sorted = Array.sort arrStr
let len = str.Length - 1
let rec Inner i =
if i = len + 1 then
'-'
else
let index = Array.BinarySearch(sorted, arrStr.[i])
if index = 0 && sorted.[index+1] <> sorted.[index] then
arrStr.[i]
elif index = len && sorted.[index-1] <> sorted.[index] then
arrStr.[i]
elif index > 0 && index < len &&
sorted.[index+1] <> sorted.[index] &&
sorted.[index-1] <> sorted.[index] then
arrStr.[i]
else
Inner (i + 1)
Inner 0
let _ =
printfn "%c" (Solve "abcdefabcf")
A - means all characters are repeated.
Edit: ugly hack with using the - for "no solution" as you can use Options, which I keep forgetting about! An exercise for the reader, as this does look like homework.
Here's a bit longish solution, but guaranteed to be worst-case O(n log n):
import List
import Data.Ord.comparing
sortPairs :: Ord a => [(a, b)]->[(a, b)]
sortPairs = sortBy (comparing fst)
index :: Integral b => [a] -> [(a, b)]
index = flip zip [1..]
dropRepeated :: Eq a => [(a, b)]->[(a, b)]
dropRepeated [] = []
dropRepeated [x] = [x]
dropRepeated (x:xs) | fst x == fst (head xs) =
dropRepeated $ dropWhile ((==(fst x)).fst) xs
| otherwise =
x:(dropRepeated xs)
nonRepeatedPairs :: Ord a => Integral b => [a]->[(a, b)]
nonRepeatedPairs = dropRepeated . sortPairs . index
firstNonRepeating :: Ord a => [a]->a
firstNonRepeating = fst . minimumBy (comparing snd) . nonRepeatedPairs
The idea is: sort the string lexicographically, so that it's easy to remove any repeated characters in linear time and find the first character which is not repeated. But in order to find it, we need to save information about characters' positions in text.
The speed on easy cases (like [1..10000]) is not perfect, but for something harder ([1..10000] ++ [1..10000] ++ [10001]) you can see the difference between this and a naive O(n^2).
Of course this can be done in linear time, if the size of alphabet is O(1), but who knows how large the alphabet is...
An alternate Haskell O(n log n) solution using Data.Map and no sorting:
module NonRepeat (
firstNonRepeat
)
where
import Data.List (minimumBy)
import Data.Map (fromListWith, toList)
import Data.Ord (comparing)
data Occurance = Occ { first :: Int, count :: Int }
deriving (Eq, Ord)
note :: Int -> a -> (a, Occurance)
note pos a = (a, Occ pos 1)
combine :: Occurance -> Occurance -> Occurance
combine (Occ p0 c0) (Occ p1 c1) = Occ (p0 `min` p1) (c0 + c1)
firstNonRepeat :: (Ord a) => [a] -> Maybe a
firstNonRepeat = fmap fst . findMinimum . occurances
where occurances = toList . fromListWith combine . zipWith note [0..]
findMinimum = safeMinimum . filter ((== 1).count.snd)
safeMinimum [] = Nothing
safeMinimum xs = Just $ minimumBy (comparing snd) xs
let firstNonRepeating (str:string) =
let rec inner i cMap =
if i = str.Length then
cMap
|> Map.filter (fun c (count, index) -> count = 1)
|> Map.toSeq
|> Seq.minBy (fun (c, (count, index)) -> index)
|> fst
else
let c = str.[i]
let value = if cMap.ContainsKey c then
let (count, index) = cMap.[c]
(count + 1, index)
else
(1, i)
let cMap = cMap.Add(c, value)
inner (i + 1) cMap
inner 0 (Map.empty)
Here is a simpler version that sacrifices speed.
let firstNonRepeating (str:string) =
let (c, count) = str
|> Seq.countBy (fun c -> c)
|> Seq.minBy (fun (c, count) -> count)
if count = 1 then Some c else None
How about something like this:
let firstNonRepeat s =
let repeats =
((Set.empty, Set.empty), s)
||> Seq.fold (fun (one,many) c -> Set.add c one, if Set.contains c one then Set.add c many else many)
|> snd
s
|> Seq.tryFind (fun c -> not (Set.contains c repeats))
This is pure C# (so I assume there's a similar F# version), which will be efficient if GroupBy is efficient (which it ought to be):
static char FstNonRepeatedChar(string s)
{
return s.GroupBy(x => x).Where(xs => xs.Count() == 1).First().First();
}

Performance of looping over an Unboxed array in Haskell

First of all, it's great. However, I came across a situation where my benchmarks turned up weird results. I am new to Haskell, and this is first time I've gotten my hands dirty with mutable arrays and Monads. The code below is based on this example.
I wrote a generic monadic for function that takes numbers and a step function rather than a range (like forM_ does). I compared using my generic for function (Loop A) against embedding an equivalent recursive function (Loop B). Having Loop A is noticeably faster than having Loop B. Weirder, having both Loop A and B together is faster than having Loop B by itself (but slightly slower than Loop A by itself).
Some possible explanations I can think of for the discrepancies. Note that these are just guesses:
Something I haven't learned yet about how Haskell extracts results from monadic functions.
Loop B faults the array in a less cache efficient manner than Loop A. Why?
I made a dumb mistake; Loop A and Loop B are actually different.
Note that in all 3 cases of having either or both Loop A and Loop B, the program produces the same output.
Here is the code. I tested it with ghc -O2 for.hs using GHC version 6.10.4 .
import Control.Monad
import Control.Monad.ST
import Data.Array.IArray
import Data.Array.MArray
import Data.Array.ST
import Data.Array.Unboxed
for :: (Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for start end step f = loop start where
loop i
| i <= end = do
f i
loop (step i)
| otherwise = return ()
primesToNA :: Int -> UArray Int Bool
primesToNA n = runSTUArray $ do
a <- newArray (2,n) True :: ST s (STUArray s Int Bool)
let sr = floor . (sqrt::Double->Double) . fromIntegral $ n+1
-- Loop A
for 4 n (+ 2) $ \j -> writeArray a j False
-- Loop B
let f i
| i <= n = do
writeArray a i False
f (i+2)
| otherwise = return ()
in f 4
forM_ [3,5..sr] $ \i -> do
si <- readArray a i
when si $
forM_ [i*i,i*i+i+i..n] $ \j -> writeArray a j False
return a
primesTo :: Int -> [Int]
primesTo n = [i | (i,p) <- assocs . primesToNA $ n, p]
main = print $ primesTo 30000000
I just tried benchmarking this with Criterion and GHC 6.12.1, and Loop A looks only slightly faster for me. I definitely don't get the weird "both together are faster than B alone" effect.
Also, if your step function really is just a step and doesn't do anything wacky with its argument, the following version of for seems a bit faster, especially for smaller arrays:
for' :: (Enum a, Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for' start end step = forM_ $ enumFromThenTo start (step start) end
Here are the results from Criterion, where loopA' is your loop A using my for', and where loopC is both A and B together:
benchmarking loopA...
mean: 2.372893 s, lb 2.370982 s, ub 2.374914 s, ci 0.950
std dev: 10.06753 ms, lb 8.820194 ms, ub 11.66965 ms, ci 0.950
benchmarking loopA'...
mean: 2.368167 s, lb 2.354312 s, ub 2.381413 s, ci 0.950
std dev: 69.50334 ms, lb 65.94236 ms, ub 73.17173 ms, ci 0.950
benchmarking loopB...
mean: 2.423160 s, lb 2.419131 s, ub 2.427260 s, ci 0.950
std dev: 20.78412 ms, lb 18.06613 ms, ub 24.99021 ms, ci 0.950
benchmarking loopC...
mean: 4.308503 s, lb 4.304875 s, ub 4.312110 s, ci 0.950
std dev: 18.48732 ms, lb 16.19325 ms, ub 21.32299 ms, ci 0.950<
And here's the code:
module Main where
import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed
import Criterion.Main
for :: (Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for start end step f = loop start where
loop i
| i <= end = do
f i
loop (step i)
| otherwise = return ()
for' :: (Enum a, Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for' start end step = forM_ $ enumFromThenTo start (step start) end
loopA arr n = for 4 n (+ 2) $ flip (writeArray arr) False
loopA' arr n = for' 4 n (+ 2) $ flip (writeArray arr) False
loopB arr n =
let f i | i <= n = do writeArray arr i False
f (i+2)
| otherwise = return ()
in f 4
loopC arr n = do
loopA arr n
loopB arr n
runPrimes loop n = do
let sr = floor . (sqrt::Double->Double) . fromIntegral $ n+1
a <- newArray (2,n) True :: (ST s (STUArray s Int Bool))
loop a n
forM_ [3,5..sr] $ \i -> do
si <- readArray a i
when si $
forM_ [i*i,i*i+i+i..n] $ \j -> writeArray a j False
return a
primesA n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopA n, p]
primesA' n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopA' n, p]
primesB n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopB n, p]
primesC n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopC n, p]
main = let n = 10000000 in
defaultMain [ bench "loopA" $ nf primesA n
, bench "loopA'" $ nf primesA' n
, bench "loopB" $ nf primesB n
, bench "loopC" $ nf primesC n ]
Perhaps compare and contrast with the Shootout nsieve program? in any case, the only way to know what really is happening is to look at the core (e.g. with the ghc-core tool).
{-# OPTIONS -O2 -optc-O -fbang-patterns -fglasgow-exts -optc-march=pentium4 #-}
--
-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart 2005
-- nsieve over an ST monad Bool array
--
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits
import Text.Printf
main = do
n <- getArgs >>= readIO . head :: IO Int
mapM_ (\i -> sieve (10000 `shiftL` (n-i))) [0, 1, 2]
sieve n = do
let r = runST (do a <- newArray (2,n) True :: ST s (STUArray s Int Bool)
go a n 2 0)
printf "Primes up to %8d %8d\n" (n::Int) (r::Int) :: IO ()
go !a !m !n !c
| n == m = return c
| otherwise = do
e <- unsafeRead a n
if e then let loop j
| j < m = do
x <- unsafeRead a j
when x $ unsafeWrite a j False
loop (j+n)
| otherwise = go a m (n+1) (c+1)
in loop (n `shiftL` 1)
else go a m (n+1) c

Resources