Haskell - converting from List to Data.Vector - performance

After profiling my haskell program, I've found that 66% of the time in the program is spent indexing into lists. The solution seems to be using Data.Vector, but I'm having trouble converting: when I change the code to use a Vector it uses tons and tons of memory, and hangs so badly I can't even profile it. What could cause this?
Here is the file I would like to convert: https://github.com/drew-gross/Blokus-AI/blob/master/Grid.hs
and my attempt at converting it: https://github.com/drew-gross/Blokus-AI/blob/convert-to-vector/Grid.hs
Any ideas what I am doing wrong? Or at least, where to look?

makeEmptyGrid width height defaultCell = Grid (Data.Vector.take arraySize $ fromList $ repeat defaultCell) width height
That's a killer right there. fromList converts an entire list to a Vector, but repeat defaultCell is an infinite list.
makeEmptyGrid width height defaultCell = Grid (fromListN arraySize $ repeat defaultCell) width height
or
makeEmptyGrid width height defaultCell = Grid (fromList $ replicate arraySize defaultCell) width height
would fix that.
A cursory look over the rest didn't result in further obvious traps, but I may easily have overlooked some.

This is just an additional thought following upon Daniel. It looked like your Grids were only of Colors It probably won't do much for a small 'Grid' but it is comparatively easy to make an Unbox instance for Color. Then a grid will contain an unboxed array. In Grid.hs you would import Data.Vector.Unboxed rather than Data.Vector. This is in general much better for many reasons, but will require you to put an Unbox a => constraint on many of the definitions. This might have consequences if you want to make or 'map' into Grids full of things of another type than Color, unless it has an Unbox instance.
Below I just add the TH incantation from vector-th-unbox (I just learned about that package recently, and am taking the occasion to test it again) and the two requisite definitions. It wouldn't be much harder to write it by hand following the Bool instance in Data.Vector.Unboxed.Base.
{-#LANGUAGE TemplateHaskell, TypeFamilies, MultiParamTypeClasses#-}
module Color where
import Display
import Data.Vector.Unboxed.Deriving
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Generic as G
import qualified Data.Vector.Generic.Mutable as M
import Data.Word (Word8)
data Color = Yellow | Red | Green | Blue | Empty
deriving (Show, Eq, Ord, Enum, Bounded)
fromColor :: Color -> Word8
{-# INLINE fromColor #-}
fromColor = fromIntegral . fromEnum
toColor :: Word8 -> Color
{-# INLINE toColor #-}
toColor x | x < 5 = toEnum (fromIntegral x)
toColor _ = Empty
derivingUnbox "Color"
[t| Color -> Word8 |]
[| fromColor |]
[| toColor |]
-- test
colorCycle :: Int -> V.Vector Color
colorCycle n = V.unfoldr colorop 0 where
colorop m | m < n = Just (toColor (fromIntegral (m `mod` 5)),m+1)
colorop _ = Nothing
-- *Colour> colorCycle 12
-- fromList [Yellow,Red,Green,Blue,Empty,Yellow,
-- Red,Green,Blue,Empty,Yellow,Red]
colorBlack = "\ESC[0;30m"
colorRed = "\ESC[0;31m"
colorGreen = "\ESC[0;32m"
colorYellow = "\ESC[0;33m"
colorBlue = "\ESC[0;34m"
instance Display Color where
display Red = colorRed ++ "R" ++ colorBlack
display Green = colorGreen ++ "G" ++ colorBlack
display Yellow = colorYellow ++ "Y" ++ colorBlack
display Blue = colorBlue ++ "B" ++ colorBlack
display Empty = "."
Edit: In versions of vector-th-unbox preceding 0.1 the following template was used:
derivingUnbox "Color"
[d| instance Unbox' (Color) Word8 |]
[| fromColor |]
[| toColor |]

Related

Haskell performance—topological sort is not fast enough

I'm a Haskell beginner and have chosen it to solve a programming task for my class, however my solution is too slow and doesn't get accepted. I'm trying to profile it and was hoping that I could get some pointers from more advanced Haskellers here.
The only other solution in my class that got accepted so far was written in Rust. I'm sure that I should be able to achieve similar performance in Haskell and I wrote horrible imperative code in the hope of improving performance, alas to no avail.
My first suspicion relates to work, where I am using forever to go over the in-degree array until I get an out-of-bounds exception. I was hoping for this to be tail-recursive and to compile to a while (true) style loop.
My second suspicion is that I/O is perhaps slowing things down.
EDIT: The problem has likely to do with my algorithm because I am not keeping a queue of nodes with indegree 0. Thank you #luqui.
EDIT2: It seems that the real bottleneck was I/O, I fixed that thanks to #Davislor.
The task is based on this: http://www.spoj.com/UKCPLAD/problems/TOPOSORT/ and I am constrained to use only the libraries in the Haskell Platform.
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE LambdaCase #-}
{-# OPTIONS_GHC -O3 #-}
import Control.Monad
import Data.Array.IO
import Data.IORef
import Data.Int
import Control.Exception
type List = []
type Node = Int32
type Edge = (Node, Node)
type Indegree = Int32
main = do
(numNodes, _) <- readPair <$> getLine
edges <- map readPair . lines <$> getContents
topo numNodes edges
-- lower bound
{-# INLINE lb #-}
lb = 1
topo :: Node -> List Edge -> IO ()
topo numNodes edges = do
result <- newIORef []
count <- newIORef 0
indegrees <- newArray (lb,numNodes) 0 :: IO (IOUArray Node Indegree)
neighbours <- newArray (lb,numNodes) [] :: IO (IOArray Node (List Node))
forM_ edges $ \(from,to) -> do
update indegrees to (+1)
update neighbours from (to:)
let work = forever $ do
z <- getNext indegrees
modifyIORef' result (z:)
modifyIORef' count (+1)
ns <- readArray neighbours z
forM_ ns $ \n -> update indegrees n pred
work `catch`
\(_ :: SomeException) -> do
count <- readIORef count
if numNodes == count
then (mapM_ (\n -> putStr (show n ++ " ")) . reverse) =<< readIORef result
else putStrLn "Sandro fails."
{-# INLINE update #-}
update a i f = do
x <- readArray a i
writeArray a i (f x)
{-# INLINE getNext #-}
getNext indegrees = getNext' indegrees =<< getBounds indegrees
{-# INLINE getNext' #-}
getNext' indegrees (lb,ub) = readArray indegrees lb >>= \case
0 -> writeArray indegrees lb (-1) >> return lb
_ -> getNext' indegrees (lb+1,ub)
readPair :: String -> (Node,Node)
{-# INLINE readPair #-}
readPair = toPair . map read . words
where toPair [x,y] = (x,y)
toPair _ = error "Only two entries per line allowed"
Example output
$ ./topo
8 9
1 4
1 2
4 2
4 3
3 2
5 2
3 5
8 2
8 6
^D
1 4 3 5 7 8 2 6
If you haven’t already, profile your program by compiling with -prof -fprof-auto and then executing with the command-line options +RTS -p. This will generate a profile *.prof that will tell you which functions the program is spending all its time in. However, I can see immediately where the biggest time-waster is. Your instincts were right: it’s the I/O.
Having done that a lot, I can guarantee you that you’ll find that it’s spending the vast majority of its time doing I/O. The first thing you should always do to speed up your program is rewrite it to use fast I/O. Haskell is a fast language, when you use the right data structures. The default I/O library in the Prelude uses singly-linked lists with lazily-evaluated thunks where each node holds a single Unicode character. That would be slow in C, too!
I’ve gotten the best results with Data.ByteString.Lazy.Char8 when the input is ASCII, and Data.ByteString.Builder to generate the output. (An alternative is Data.Text.) That gets you a lazily-evaluated list of strict character buffers on input (so interactive input and output still works), and fills a single buffer on output.
After you’ve written the skeleton of the program with fast I/O, the next step is to look at your algorithm, and especially your data structures. Use profiling to see where all the time goes. But I’d recommend you use a functional algorithm rather than trying to write imperative programs in Haskell with do.
I almost always approach problems like this in Haskell with a more functional style: in particular, my main function is almost always something similar to:
import qualified Data.ByteString.Lazy.Char8 as B8
main :: IO()
main = B8.interact ( output . compute . input )
This makes everything except the call to interact a pure function, and isolates the parsing code and the formatting code so the compute part in the middle can be independent of that.
Since this is an assignment and you want to solve the problem yourself, I’ll refrain from refactoring the program for you, but here’s an example I wrote in response to a question on another forum to perform a counting sort. It should be suitable as a skeleton for other kinds of problems.
import Data.Array.IArray (accumArray, assocs)
import Data.Array.Unboxed (UArray)
import Data.ByteString.Builder (Builder, char7, intDec, toLazyByteString)
import qualified Data.ByteString.Lazy.Char8 as B8
import Data.Monoid ((<>))
main :: IO()
main = B8.interact ( output . compute . input ) where
input :: B8.ByteString -> [Int]
input = map perLine . tail . B8.lines where
perLine = decode . B8.readInt
decode (Just (x, _)) = x
decode Nothing = error "Invalid input: expected integer."
compute :: [Int] -> [Int]
compute = concatMap expand . assocs . countingSort . map encode where
encode i = (i, 1)
countingSort :: [(Int, Int)] -> UArray Int Int
countingSort = accumArray (+) 0 (lower, upper)
lower = 0
upper = 1000000
expand (i,c) = replicate c i
output :: [Int] -> B8.ByteString
output = toLazyByteString . foldMap perCase where
perCase :: Int -> Builder
perCase x = intDec x <> char7 '\n'
At present, this version ran in less than half the time of anyone else’s Haskell solution for the same problem, the same holds true for the actual contest problems I’ve used it for, and the approach generalizes.
So I suggest changing the I/O to be similar to that, first, then profiling, and coming back with the profiling output if that doesn’t make enough of a difference. This might also be a good Code Review question.
Thanks to #Davislor's suggestions I managed to get it to be much faster and I also refactored the code for the better and now I actually have an m log(n) algorithm. Surprisingly this doesn't make that much of a difference—the I/O far outweighed the suboptimal complexity of the algorithm.
EDIT: got rid of unsafePerformIO and it actually runs a teeny-weeny bit faster. Plus adding -XStrict shaves off even more time.
{-# LANGUAGE Strict #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -O2 #-}
import Control.Monad
import Data.Array.IO
import Data.Int
import Data.Set (Set)
import qualified Data.Set as Set
import Data.ByteString.Builder (Builder, char7, intDec, toLazyByteString)
import qualified Data.ByteString.Lazy.Char8 as B8
import Data.Monoid ((<>))
type List = []
type Node = Int
type Edge = (Node, Node)
type Indegree = Int
main = B8.putStrLn =<< topo . map readPair . B8.lines =<< B8.getContents
readPair :: B8.ByteString -> (Node,Node)
readPair str = (x,y)
where
(Just (x, str')) = B8.readInt str
(Just (y, _ )) = B8.readInt (B8.tail str')
topo :: List Edge -> IO B8.ByteString
topo inp = do
let (numNodes, _) = head inp
edges = tail inp
indegrees <- newArray (1,numNodes) 0 :: IO (IOUArray Node Indegree)
neighbours <- newArray (1,numNodes) [] :: IO (IOArray Node (List Node))
-- setup
forM_ edges $ \(from,to) -> do
update indegrees to (+1)
update neighbours from (to:)
zeroes <- collectIndegreeZero [] indegrees =<< getBounds indegrees
processQueue (Set.fromList zeroes) [] numNodes indegrees neighbours
where
collectIndegreeZero acc indegrees (lb,ub)
| lb > ub = return acc
| otherwise = do
indegr <- readArray indegrees lb
let acc' = if indegr == 0 then (lb:acc) else acc
collectIndegreeZero acc' indegrees (lb+1,ub)
processQueue queue result numNodes indegrees neighbours = do
if null queue
then if numNodes == 0
then return . toLazyByteString . foldMap whitespace . reverse $ result
else return "Sandro fails."
else do
(node,queue) <- return $ Set.deleteFindMin queue
ns <- readArray neighbours node
queue <- foldM decrIndegrees queue ns
processQueue queue (node:result) (numNodes-1) indegrees neighbours
where
decrIndegrees :: Set Node -> Node -> IO (Set Node)
decrIndegrees q n = do
i <- readArray indegrees n
writeArray indegrees n (i-1)
return $ if i == 1 then Set.insert n q else q
whitespace x = intDec x <> char7 ' '
{-# INLINE update #-}
update a i f = do
x <- readArray a i
writeArray a i (f x)

Haskell : matrix sorting much slower than vector sorting

I have to sort the lines of large integer matrices in Haskell and I started benchmarking with random data. I found that Haskell is 3 times slower than C++.
Because of the randomness, I expect line comparison to always terminate at the first column (which should have no duplicates). So I narrowed the matrix to a single column implemented as a Vector (Unboxed.Vector Int) and compared its sorting to a usual Vector Int.
Vector Int sorts as fast as C++ (good news !), but again, the column matrix is 3 times slower. Do you have an idea why ? Please find the code below.
import qualified Data.Vector.Unboxed as UV(Vector, fromList)
import qualified Data.Vector as V(Vector, fromList, modify)
import Criterion.Main(env, bench, nf, defaultMain)
import System.Random(randomIO)
import qualified Data.Vector.Algorithms.Intro as Alg(sort)
randomVector :: Int -> IO (V.Vector Int)
randomVector count = V.fromList <$> mapM (\_ -> randomIO) [1..count]
randomVVector :: Int -> IO (V.Vector (UV.Vector Int))
randomVVector count = V.fromList <$> mapM (\_ -> do
x <- randomIO
return $ UV.fromList [x]) [1..count]
benchSort :: IO ()
benchSort = do
let bVVect = env (randomVVector 300000) $ bench "sortVVector" . nf (V.modify Alg.sort)
bVect = env (randomVector 300000) $ bench "sortVector" . nf (V.modify Alg.sort)
defaultMain [bVect, bVVect]
main = benchSort
As Edward Kmett as explained to me, the Haskell version has one extra layer of indirection. A UV.Vector looks something like
data Vector a = Vector !Int !Int ByteArray#
So each entry in your vector of vectors is actually a pointer to a record holding slice indices and a pointer to an array of bytes. This is an extra indirection that the C++ code doesn't have. The solution is to use an ArrayArray#, which is an array of direct pointers to byte arrays or to further ArrayArray#s. If you need vector, you'll have to figure out what to do about the slicing machinery. Another option is to switch to primitive, which offers simpler arrays.
Following dfeuer's advice, implementing a vector of vectors as an ArrayArray# is 4 times faster than Vector (Unboxed.Vector Int) and only 40% slower than sorting a c++ std::vector<std::vector<int> > :
import Control.Monad.Primitive
import Data.Primitive.ByteArray
import qualified Data.Vector.Generic.Mutable.Base as GM(MVector(..))
import GHC.Prim
data MutableArrayArray s a = MutableArrayArray (MutableArrayArray# s)
instance GM.MVector MutableArrayArray ByteArray where
{-# INLINE basicLength #-}
basicLength (MutableArrayArray marr) = I# (sizeofMutableArrayArray# marr)
{-# INLINE basicUnsafeRead #-}
basicUnsafeRead (MutableArrayArray marr) (I# i) = primitive $ \s -> case readByteArrayArray# marr i s of
(# s1, bar #) -> (# s1, ByteArray bar #)
{-# INLINE basicUnsafeWrite #-}
basicUnsafeWrite (MutableArrayArray marr) (I# i) (ByteArray bar) = primitive $ \s ->
(# writeByteArrayArray# marr i bar s, () #)
For example, sorting a matrix of integers will then use
sortIntArrays :: ByteArray -> ByteArray -> Ordering
sortIntArrays x y = let h1 = indexByteArray x 0 :: Int
h2 = indexByteArray y 0 :: Int in
compare h1 h2

Why is the F# version of this program 6x faster than the Haskell one?

Haskell version(1.03s):
module Main where
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import Control.Monad
import Control.Applicative ((<$>))
import Data.Vector.Unboxed (Vector,(!))
import qualified Data.Vector.Unboxed as V
solve :: Vector Int -> Int
solve ar =
V.foldl' go 0 ar' where
ar' = V.zip ar (V.postscanr' max 0 ar)
go sr (p,m) = sr + m - p
main = do
t <- fmap (read . T.unpack) TIO.getLine -- With Data.Text, the example finishes 15% faster.
T.unlines . map (T.pack . show . solve . V.fromList . map (read . T.unpack) . T.words)
<$> replicateM t (TIO.getLine >> TIO.getLine) >>= TIO.putStr
F# version(0.17s):
open System
let solve (ar : uint64[]) =
let ar' =
let t = Array.scanBack max ar 0UL |> fun x -> Array.take (x.Length-1) x
Array.zip ar t
let go sr (p,m) = sr + m - p
Array.fold go 0UL ar'
let getIntLine() =
Console.In.ReadLine().Split [|' '|]
|> Array.choose (fun x -> if x <> "" then uint64 x |> Some else None)
let getInt() = getIntLine().[0]
let t = getInt()
for i=1 to int t do
getInt() |> ignore
let ar = getIntLine()
printfn "%i" (solve ar)
The above two programs are the solutions for the Stock Maximize problem and times are for the first test case of the Run Code button.
For some reason the F# version is roughly 6x faster, but I am pretty sure that if I replaced the slow library functions with imperative loops that I could speed it up by at least 3 times and more likely 10x.
Could the Haskell version be similarly improved?
I am doing the above for learning purposes and in general I am finding it difficult to figure out how to write efficient Haskell code.
If you switch to ByteString and stick with plain Haskell lists (instead of vectors) you will get a more efficient solution. You may also rewrite the solve function with a single left fold and bypass zip and right scan (1).
Overall, on my machine, I get 20 times performance improvement compared to your Haskell solution (2).
Below Haskell code performs faster than the F# code:
import Data.List (unfoldr)
import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
parse :: ByteString -> [Int]
parse = unfoldr $ C.readInt . C.dropWhile (== ' ')
solve :: [Int] -> Int
solve xs = foldl go (const 0) xs minBound
where go f x s = if s < x then f x else s - x + f s
main = do
[n] <- parse <$> B.getLine
replicateM_ n $ B.getLine >> B.getLine >>= print . solve . parse
1. See edits for an earlier version of this answer which implements solve using zip and scanr.
2. HackerRank website shows even a larger performance improvement.
If I wanted to do that quickly in F# I would avoid all of the higher-order functions inside solve and just write a C-style imperative loop:
let solve (ar : uint64[]) =
let mutable sr, m = 0UL, 0UL
for i in ar.Length-1 .. -1 .. 0 do
let p = ar.[i]
m <- max p m
sr <- sr + m - p
sr
According to my measurements, this is 11x faster than your F#.
Then the performance is limited by the IO layer (unicode parsing) and string splitting. This can be optimised by reading into a byte buffer and writing the lexer by hand:
let buf = Array.create 65536 0uy
let mutable idx = 0
let mutable length = 0
do
use stream = System.Console.OpenStandardInput()
let rec read m =
let c =
if idx < length then
idx <- idx + 1
else
length <- stream.Read(buf, 0, buf.Length)
idx <- 1
buf.[idx-1]
if length > 0 && '0'B <= c && c <= '9'B then
read (10UL * m + uint64(c - '0'B))
else
m
let read() = read 0UL
for _ in 1UL .. read() do
Array.init (read() |> int) (fun _ -> read())
|> solve
|> System.Console.WriteLine
Just for the record, the F# version is also not optimal. I don't think it really matters at this point, but if people wanted to compare the performance, then it is worth noting that it can be made faster.
I have not tried very hard (you can certainly make it even faster by using restricted mutation, which would not be against the nature of F#), but simple change to use Seq instead of Array in the right places (to avoid allocating temporary arrays) makes the code about 2x to 3x faster:
let solve (ar : uint64[]) =
let ar' = Seq.zip ar (Array.scanBack max ar 0UL)
let go sr (p,m) = sr + m - p
Seq.fold go 0UL ar'
If you use Seq.zip, you can also drop the take call (because Seq.zip truncates the sequence automatically). Measured using #time using the following snippet:
let rnd = Random()
let inp = Array.init 100000 (fun _ -> uint64 (rnd.Next()))
for a in 0 .. 10 do ignore (solve inp) // Measure this line
I get around 150ms for the original code and something between 50-75ms using the new version.

Haskell : Two different functions using the same where clauses

Is there a way I can make a structure for 2 different functions using the same where clauses?
My code:
bonusColBullet :: Bonus -> Bullet -> World -> World
bonusColBullet bn#(Bonus{bnpos=pos}) b w#(World{bullets=bs, bonuses=bns, score=s})
| doBoxesCollide bnlp bnrp blp brp = w{bullets=delete b bs, bonuses=delete bn bns, score=incVal s}
| otherwise = w
where
blp = bpos' - bSizeH --bullet corners
brp = bpos' + bSizeH
bnlp = pos - bnSizeH --bonus obj corners
bnrp = pos + bnSizeH
bpos' = bpos b
incVal s#(Score{sval=sv, multiplier}) = s{sval=sv+multiplier}
enemyColBullet :: Enemy -> Bullet -> World -> World
enemyColBullet e#(Enemy{epos=pos}) b w#(World{bullets=bs, enemies=es, score=s})
| doBoxesCollide elp erp blp brp = w{bullets=delete b bs, enemies=delete e es, score=incVal s}
| otherwise = w
where
blp = bpos' - bSizeH -- bullet corners
brp = bpos' + bSizeH
elp = pos - eSizeH -- enemy corners
erp = pos + eSizeH
bpos' = bpos b
incVal s#(Score{sval=sv, multiplier}) = s{sval=sv+multiplier}
Because like this it looks very inefficient to me, so I figured there should be a way to only have to write the where clause ones and to somehow make it includeable for both functions?
If anyone could help me out on this it'd be much appreciated!
Best regards,
Skyfe.
Yep! You're right that this can be factored out. One cool feature in Haskell which may help you is ambiguity. Haskell has two sorts of ambiguity, and you can use either one in this case. The first sort of ambiguity is called parametric types, for example the list type [x] exists for all x no matter what x is, so there are actually a lot of different functions reverse :: [x] -> [x], one for every type which you can put inside those lists. The only problem is that sometimes you want a constraint on these parameters, so that you can do stuff with them (reverse is constrained by the total ambiguity; it can't do anything to the elements but can only reorder them and possibly drop or repeat them). The constrained ambiguity is a feature called type classes. Here's how I'd use them to fit your case:
type Size = ____ -- fill this in with whatever that type actually is.
class Sprite s where
kill :: s -> World -> World
position :: s -> Size
size :: s -> Size
instance Sprite Bullet where
kill b w = w{bullets = delete b (bullets w)}
position = bpos
size = const bSizeH -- constant for all bullets, right?
instance Sprite Bonus where
kill bn w = w{bonuses = delete bn (bonuses w)}
position = bnpos
size = const bnSizeH
instance Sprite Enemy where
kill e w = w{enemies = delete e (enemies w)}
position = epos
size = const eSizeH
Now you can write something more generic:
collides :: (Sprite x, Sprite y) => x -> y -> Bool
collides x y = doBoxesCollide (px - sx) (px + sx) (py - sy) (py + sy)
where px = position x
py = position y
sx = size x
sy = size y
addScore :: World -> World
addScore w = w{score = s{sval = sval s + multiplier s}} where s = score w
killCollision :: (Sprite x, Sprite y) => x -> y -> World -> World
killCollision x y = if collides x y then addScore . kill x . kill y else id
We've gone up from your 22 lines up to 27 lines, but we now have a lot of smaller parts interacting in clearer ways, and no repeating ourselves. It's an artistic choice whether this is worth it or not -- it often helps longer-term maintenance of a program, but if you're just trying to get a program out there often copy-paste is faster.
So now where you would have written bonusColBullet bn b w you can just write killCollision bn b w and it does the same thing, and so does killCollision e b w for enemies e. But you've got a little more power. Suppose you want to have enemies eat bonuses that they collide with: but you do not get a score for that. Then that would be if collides e bn then kill bn else id. Or you might decide that different sprites have different points which they are worth; then you add to the Sprite class points :: x -> Points (where Points is whatever Num type your points are -- presumably Integer but I didn't want to assume). You modify addScore to become:
addScore :: (Sprite x, Sprite y) => x -> y -> World -> World
addScore x y w = w{score = s{sval = newscore}
where s = score w
newscore = sval s + multiplier s * (points x + points y)
and you modify killCollision by replacing addScore with addScore x y. That's it. Now different enemies or bonuses can be worth different amounts. With a little more work, you can have an enemy who eats a bonus get the points from that bonus (so that if you kill them you still get the bonus points). Stuff like that.
Is there a way I can make a structure for 2 different functions using the same where clauses
where gives you a binding with local scope. So to share that between two functions, they have to be "inside" the where scope.
Easier is to float the where clause out. E.g. calculate it once and pass x and y to your functions.
main = do
let x = func a - c
y = func' b - c
someFunc x y
someOtherFunc x y

Optimizing Haskell code

I'm trying to learn Haskell and after an article in reddit about Markov text chains, I decided to implement Markov text generation first in Python and now in Haskell. However I noticed that my python implementation is way faster than the Haskell version, even Haskell is compiled to native code. I am wondering what I should do to make the Haskell code run faster and for now I believe it's so much slower because of using Data.Map instead of hashmaps, but I'm not sure
I'll post the Python code and Haskell as well. With the same data, Python takes around 3 seconds and Haskell is closer to 16 seconds.
It comes without saying that I'll take any constructive criticism :).
import random
import re
import cPickle
class Markov:
def __init__(self, filenames):
self.filenames = filenames
self.cache = self.train(self.readfiles())
picklefd = open("dump", "w")
cPickle.dump(self.cache, picklefd)
picklefd.close()
def train(self, text):
splitted = re.findall(r"(\w+|[.!?',])", text)
print "Total of %d splitted words" % (len(splitted))
cache = {}
for i in xrange(len(splitted)-2):
pair = (splitted[i], splitted[i+1])
followup = splitted[i+2]
if pair in cache:
if followup not in cache[pair]:
cache[pair][followup] = 1
else:
cache[pair][followup] += 1
else:
cache[pair] = {followup: 1}
return cache
def readfiles(self):
data = ""
for filename in self.filenames:
fd = open(filename)
data += fd.read()
fd.close()
return data
def concat(self, words):
sentence = ""
for word in words:
if word in "'\",?!:;.":
sentence = sentence[0:-1] + word + " "
else:
sentence += word + " "
return sentence
def pickword(self, words):
temp = [(k, words[k]) for k in words]
results = []
for (word, n) in temp:
results.append(word)
if n > 1:
for i in xrange(n-1):
results.append(word)
return random.choice(results)
def gentext(self, words):
allwords = [k for k in self.cache]
(first, second) = random.choice(filter(lambda (a,b): a.istitle(), [k for k in self.cache]))
sentence = [first, second]
while len(sentence) < words or sentence[-1] is not ".":
current = (sentence[-2], sentence[-1])
if current in self.cache:
followup = self.pickword(self.cache[current])
sentence.append(followup)
else:
print "Wasn't able to. Breaking"
break
print self.concat(sentence)
Markov(["76.txt"])
--
module Markov
( train
, fox
) where
import Debug.Trace
import qualified Data.Map as M
import qualified System.Random as R
import qualified Data.ByteString.Char8 as B
type Database = M.Map (B.ByteString, B.ByteString) (M.Map B.ByteString Int)
train :: [B.ByteString] -> Database
train (x:y:[]) = M.empty
train (x:y:z:xs) =
let l = train (y:z:xs)
in M.insertWith' (\new old -> M.insertWith' (+) z 1 old) (x, y) (M.singleton z 1) `seq` l
main = do
contents <- B.readFile "76.txt"
print $ train $ B.words contents
fox="The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead."
a) How are you compiling it? (ghc -O2 ?)
b) Which version of GHC?
c) Data.Map is pretty efficient, but you can be tricked into lazy updates -- use insertWith' , not insertWithKey.
d) Don't convert bytestrings to String. Keep them as bytestrings, and store those in the Map
Data.Map is designed under the assumption that the class Ord comparisons take constant time. For string keys this may not be the case—and when the strings are equal it is never the case. You may or may not be hitting this problem depending on how large your corpus is and how many words have common prefixes.
I'd be tempted to try a data structure that is designed to operate with sequence keys, such as for example a the bytestring-trie package kindly suggested by Don Stewart.
I tried to avoid doing anything fancy or subtle. These are just two approaches to doing the grouping; the first emphasizes pattern matching, the second doesn't.
import Data.List (foldl')
import qualified Data.Map as M
import qualified Data.ByteString.Char8 as B
type Database2 = M.Map (B.ByteString, B.ByteString) (M.Map B.ByteString Int)
train2 :: [B.ByteString] -> Database2
train2 words = go words M.empty
where go (x:y:[]) m = m
go (x:y:z:xs) m = let addWord Nothing = Just $ M.singleton z 1
addWord (Just m') = Just $ M.alter inc z m'
inc Nothing = Just 1
inc (Just cnt) = Just $ cnt + 1
in go (y:z:xs) $ M.alter addWord (x,y) m
train3 :: [B.ByteString] -> Database2
train3 words = foldl' update M.empty (zip3 words (drop 1 words) (drop 2 words))
where update m (x,y,z) = M.alter (addWord z) (x,y) m
addWord word = Just . maybe (M.singleton word 1) (M.alter inc word)
inc = Just . maybe 1 (+1)
main = do contents <- B.readFile "76.txt"
let db = train3 $ B.words contents
print $ "Built a DB of " ++ show (M.size db) ++ " words"
I think they are both faster than the original version, but admittedly I only tried them against the first reasonable corpus I found.
EDIT
As per Travis Brown's very valid point,
train4 :: [B.ByteString] -> Database2
train4 words = foldl' update M.empty (zip3 words (drop 1 words) (drop 2 words))
where update m (x,y,z) = M.insertWith (inc z) (x,y) (M.singleton z 1) m
inc k _ = M.insertWith (+) k 1
Here's a foldl'-based version that seems to be about twice as fast as your train:
train' :: [B.ByteString] -> Database
train' xs = foldl' (flip f) M.empty $ zip3 xs (tail xs) (tail $ tail xs)
where
f (a, b, c) = M.insertWith (M.unionWith (+)) (a, b) (M.singleton c 1)
I tried it on the Project Gutenberg Huckleberry Finn (which I assume is your 76.txt), and it produces the same output as your function. My timing comparison was very unscientific, but this approach is probably worth a look.
1) I'm not clear on your code.
a) You define "fox" but don't use it. Were you meaning for us to try to help you using "fox" instead of reading the file?
b) You declare this as "module Markov" then have a 'main' in the module.
c) System.Random isn't needed. It does help us help you if you clean code a bit before posting.
2) Use ByteStrings and some strict operations as Don said.
3) Compile with -O2 and use -fforce-recomp to be sure you actually recompiled the code.
4) Try this slight transformation, it works very fast (0.005 seconds). Obviously the input is absurdly small, so you'd need to provide your file or just test it yourself.
{-# LANGUAGE OverloadedStrings, BangPatterns #-}
module Main where
import qualified Data.Map as M
import qualified Data.ByteString.Lazy.Char8 as B
type Database = M.Map (B.ByteString, B.ByteString) (M.Map B.ByteString Int)
train :: [B.ByteString] -> Database
train xs = go xs M.empty
where
go :: [B.ByteString] -> Database -> Database
go (x:y:[]) !m = m
go (x:y:z:xs) !m =
let m' = M.insertWithKey' (\key new old -> M.insertWithKey' (\_ n o -> n + 1) z 1 old) (x, y) (M.singleton z 1) m
in go (y:z:xs) m'
main = print $ train $ B.words fox
fox="The quick brown fox jumps over the brown fox who is slow jumps over the brown fox who is dead."
As Don suggested, look into using the stricer versions o your functions: insertWithKey' (and M.insertWith' since you ignore the key param the second time anyway).
It looks like your code probably builds up a lot of thunks until it gets to the end of your [String].
Check out: http://book.realworldhaskell.org/read/profiling-and-optimization.html
...especially try graphing the heap (about halfway through the chapter). Interested to see what you figure out.

Resources