Haskell computation performance - performance

I have several different implementations of the same function. The difference lies in the use of bang patterns. The question is, why does perimeterNaiveFast work the same as perimeterStrictFast?
Benchmark results:
Function realisations:
> data Point = Point { x :: Int, y :: Int }
>
> distance :: Point -> Point -> Double
> distance (Point x1 y1) (Point x2 y2) =
> sqrt $ fromIntegral $ (x1 - x2) ^ (2 :: Int) + (y1 - y2) ^ (2 :: Int)
>
> perimeterNaive :: [Point] -> Double
> perimeterNaive [] = 0.0
> perimeterNaive points = foldPoints points 0.0
> where
> firstPoint = head points
> foldPoints [] _ = 0.0
> foldPoints [lst] acc = acc + distance firstPoint lst
> foldPoints (prev:next:rst) acc = foldPoints (next:rst) (acc + distance prev next)
>
> perimeterStrict :: [Point] -> Double perimeterStrict [] = 0.0
> perimeterStrict points = foldPoints points 0.0
> where
> firstPoint = head points
> foldPoints [] _ = 0.0
> foldPoints [lst] acc = acc + distance firstPoint lst
> foldPoints (prev:next:rst) !acc = foldPoints (next:rst) (acc + distance prev next)
>
> perimeterNaiveFast :: [Point] -> Double perimeterNaiveFast [] = 0.0
> perimeterNaiveFast (first:rest) = foldPoints rest first 0.0
> where
> foldPoints [] lst acc = acc + distance first lst
> foldPoints (next:rst) prev acc = foldPoints rst next (acc + distance next prev)
>
> perimeterStrictFast :: [Point] -> Double perimeterStrictFast [] = 0.0
> perimeterStrictFast (first:rest) = foldPoints rest first 0.0
> where
> foldPoints [] lst acc = acc + distance first lst
> foldPoints (next:rst) prev !acc = foldPoints rst next (acc + distance next prev)
>
> main :: IO ()
> main = defaultMain [ perimeterBench $ 10 ^ (6 :: Int) ]
>
> perimeterBench :: Int -> Benchmark perimeterBench n = bgroup "Perimiter"
> [ bench "Naive" $ nf perimeterNaive points
> , bench "Strict" $ nf perimeterStrict points
> , bench "Naive fast" $ nf perimeterNaiveFast points
> , bench "Strict fast" $ nf perimeterStrictFast points
> ]
> where
> points = map (\i -> Point i i) [1..n]

GHC's strictness analysis pass has noticed that the Float accumulator is not (and cannot be) lazily consumed, and converted your naive version into your strict version on your behalf.
The reason it has not also done this for your other naive/fast pair is this clause:
foldPoints [] _ = 0.0
Here you ignore the accumulator, and so the compiler believes that maybe in some cases not forcing the computation as you go along may be better. Changing this to
foldPoints [] acc = acc
is enough for GHC to make your other naive/strict pair have the same performance on my machine.

Related

Haskell: Knight tour never finish if I try more than 55 steps?

This is my code:
maxX=8; maxY=8;
maxSteps=60 -- If I change maxSteps=55 I get an answer
move :: [(Int, Int)] -> [( Int, Int)]
move list
| lastX>maxX || lastY>maxY || lastX<=0 || lastY<=0 = []
| lastMove `elem` (init list) = []
| length list == maxSteps = list
| length m1 == maxSteps = m1
| length m2 == maxSteps = m2
| length m3 == maxSteps = m3
| length m4 == maxSteps = m4
| length m5 == maxSteps = m5
| length m6 == maxSteps = m6
| length m7 == maxSteps = m7
| length m8 == maxSteps = m8
| otherwise = []
where lastMove = last list
lastX = fst lastMove
lastY = snd lastMove
m1 = move (list ++ [(lastX+1,lastY+2)])
m2 = move (list ++ [(lastX+2,lastY+1)])
m3 = move (list ++ [(lastX-1,lastY+2)])
m4 = move (list ++ [(lastX-2,lastY+1)])
m5 = move (list ++ [(lastX+1,lastY-2)])
m6 = move (list ++ [(lastX+2,lastY-1)])
m7 = move (list ++ [(lastX-1,lastY+2)])
m8 = move (list ++ [(lastX-2,lastY-1)])
y = move [(1,1)]
main = print $ y
Do you know why it never finish (Maybe I can wait more...)? Do you have other solution to implement same brute-force algorithm but will work faster?
It does terminate (it runs for about 1 minute on my computer) and produces a correct answer.
One simple way to speed it up is to add a new move to the front of the list (and reverse the result before printing it). Adding the first element takes constant time, while append an element to the back of the list is linear in its size.
There is also a bug in your code: m3 and m7 are the same. After fixing this bug and adding the new move to the front of the list, the code runs in under once second:
maxX = 8
maxY = 8
maxSteps = 60
move :: [(Int, Int)] -> [( Int, Int)]
move list
| lastX > maxX || lastY > maxY || lastX <= 0 || lastY <= 0 = []
| lastMove `elem` (tail list) = []
| length list == maxSteps = list
| length m1 == maxSteps = m1
| length m2 == maxSteps = m2
| length m3 == maxSteps = m3
| length m4 == maxSteps = m4
| length m5 == maxSteps = m5
| length m6 == maxSteps = m6
| length m7 == maxSteps = m7
| length m8 == maxSteps = m8
| otherwise = []
where lastMove = head list
lastX = fst lastMove
lastY = snd lastMove
m1 = move ((lastX + 1, lastY + 2) : list)
m2 = move ((lastX + 2, lastY + 1) : list)
m3 = move ((lastX - 1, lastY + 2) : list)
m4 = move ((lastX - 2, lastY + 1) : list)
m5 = move ((lastX + 1, lastY - 2) : list)
m6 = move ((lastX + 2, lastY - 1) : list)
m7 = move ((lastX - 1, lastY - 2) : list)
m8 = move ((lastX - 2, lastY - 1) : list)
y = move [(1, 1)]
main = print $ reverse y
I have a made a few more changes. First of all, I got rid of "manually" adding 8 possible moves at each step. We can use a list to do that. This approach helps to avoid bugs like this. It also turns out that the execution time depends on the order in which new moves are examined. This version finds an open tour in about a minute (and, in my opinion, it's more readable than the original code):
maxX = 8
maxY = 8
maxSteps = 64
shifts = [-1, 1, -2, 2]
move :: [(Int, Int)] -> [(Int, Int)]
move path
| lastX > maxX || lastY > maxY || lastX <= 0 || lastY <= 0 = []
| lastMove `elem` tail path = []
| length path == maxSteps = path
| not (null validNewPaths) = head validNewPaths
| otherwise = []
where lastMove#(lastX, lastY) = head path
newPaths = [(lastX + x, lastY + y) : path | x <- shifts, y <- shifts, abs x /= abs y]
validNewPaths = filter (\xs -> length xs == maxSteps) (map move newPaths)
main = print $ reverse (move [(1, 1)])

Why is the "better" digit-listing function slower?

I was playing around with Project Euler #34, and I wrote these functions:
import Data.Time.Clock.POSIX
import Data.Char
digits :: (Integral a) => a -> [Int]
digits x
| x < 10 = [fromIntegral x]
| otherwise = let (q, r) = x `quotRem` 10 in (fromIntegral r) : (digits q)
digitsByShow :: (Integral a, Show a) => a -> [Int]
digitsByShow = map (\x -> ord x - ord '0') . show
I thought that for sure digits has to be the faster one, as we don't convert to a String. I could not have been more wrong. I ran the two versions via pe034:
pe034 digitFunc = sum $ filter sumFactDigit [3..2540160]
where
sumFactDigit :: Int -> Bool
sumFactDigit n = n == (sum $ map sFact $ digitFunc n)
sFact :: Int -> Int
sFact n
| n == 0 = 1
| n == 1 = 1
| n == 2 = 2
| n == 3 = 6
| n == 4 = 24
| n == 5 = 120
| n == 6 = 720
| n == 7 = 5040
| n == 8 = 40320
| n == 9 = 362880
main = do
begin <- getPOSIXTime
print $ pe034 digitsByShow -- or digits
end <- getPOSIXTime
print $ end - begin
After compiling with ghc -O, digits consistently takes .5 seconds, while digitsByShow consistently takes .3 seconds. Why is this so? Why is the function which stays within Integer arithmetic slower, whereas the function which goes into string comparison is faster?
I ask this because I come from programming in Java and similar languages, where the % 10 trick of generating digits is way faster than the "convert to String" method. I haven't been able to wrap my head around the fact that converting to a string could be faster.
This is the best I can come up with.
digitsV2 :: (Integral a) => a -> [Int]
digitsV2 n = go n []
where
go x xs
| x < 10 = fromIntegral x : xs
| otherwise = case quotRem x 10 of
(q,r) -> go q (fromIntegral r : xs)
when compiled with -O2 and tested with Criterion
digits runs in 470.4 ms
digitsByShow runs in 421.8 ms
digitsV2 runs in 258.0 ms
results may vary
edit:
I am not sure why building the list like this helps so much.
But you can improve your codes speed by strictly evaluating quotRem x 10
You can do this with BangPatterns
| otherwise = let !(q, r) = x `quotRem` 10 in (fromIntegral r) : (digits q)
or with case
| otherwise = case quotRem x 10 of
(q,r) -> fromIntegral r : digits q
Doing this drops digits down to 323.5 ms
edit: time without using Criterion
digits = 464.3 ms
digitsStrict = 328.2 ms
digitsByShow = 259.2 ms
digitV2 = 252.5 ms
note: The criterion package measures software performance.
Let's investigate why #No_signal's solution is faster.
I made three runs of ghc:
ghc -O2 -ddump-simpl digits.hs >digits.txt
ghc -O2 -ddump-simpl digitsV2.hs >digitsV2.txt
ghc -O2 -ddump-simpl show.hs >show.txt
digits.hs
digits :: (Integral a) => a -> [Int]
digits x
| x < 10 = [fromIntegral x]
| otherwise = let (q, r) = x `quotRem` 10 in (fromIntegral r) : (digits q)
main = return $ digits 1
digitsV2.hs
digitsV2 :: (Integral a) => a -> [Int]
digitsV2 n = go n []
where
go x xs
| x < 10 = fromIntegral x : xs
| otherwise = let (q, r) = x `quotRem` 10 in go q (fromIntegral r : xs)
main = return $ digits 1
show.hs
import Data.Char
digitsByShow :: (Integral a, Show a) => a -> [Int]
digitsByShow = map (\x -> ord x - ord '0') . show
main = return $ digitsByShow 1
If you'd like to view the complete txt files, I placed them on ideone (rather than paste a 10000 char dump here):
digits.txt
digitsV2.txt
show.txt
If we carefully look through digits.txt, it appears that this is the relevant section:
lvl_r1qU = __integer 10
Rec {
Main.$w$sdigits [InlPrag=[0], Occ=LoopBreaker]
:: Integer -> (# Int, [Int] #)
[GblId, Arity=1, Str=DmdType <S,U>]
Main.$w$sdigits =
\ (w_s1pI :: Integer) ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.ltInteger#
w_s1pI lvl_r1qU
of wild_a17q { __DEFAULT ->
case GHC.Prim.tagToEnum# # Bool wild_a17q of _ [Occ=Dead] {
False ->
let {
ds_s16Q [Dmd=<L,U(U,U)>] :: (Integer, Integer)
[LclId, Str=DmdType]
ds_s16Q =
case integer-gmp-1.0.0.0:GHC.Integer.Type.quotRemInteger
w_s1pI lvl_r1qU
of _ [Occ=Dead] { (# ipv_a17D, ipv1_a17E #) ->
(ipv_a17D, ipv1_a17E)
} } in
(# case ds_s16Q of _ [Occ=Dead] { (q_a11V, r_X12h) ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.integerToInt r_X12h
of wild3_a17c { __DEFAULT ->
GHC.Types.I# wild3_a17c
}
},
case ds_s16Q of _ [Occ=Dead] { (q_X12h, r_X129) ->
case Main.$w$sdigits q_X12h
of _ [Occ=Dead] { (# ww1_s1pO, ww2_s1pP #) ->
GHC.Types.: # Int ww1_s1pO ww2_s1pP
}
} #);
True ->
(# GHC.Num.$fNumInt_$cfromInteger w_s1pI, GHC.Types.[] # Int #)
}
}
end Rec }
digitsV2.txt:
lvl_r1xl = __integer 10
Rec {
Main.$wgo [InlPrag=[0], Occ=LoopBreaker]
:: Integer -> [Int] -> (# Int, [Int] #)
[GblId, Arity=2, Str=DmdType <S,U><L,U>]
Main.$wgo =
\ (w_s1wh :: Integer) (w1_s1wi :: [Int]) ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.ltInteger#
w_s1wh lvl_r1xl
of wild_a1dp { __DEFAULT ->
case GHC.Prim.tagToEnum# # Bool wild_a1dp of _ [Occ=Dead] {
False ->
case integer-gmp-1.0.0.0:GHC.Integer.Type.quotRemInteger
w_s1wh lvl_r1xl
of _ [Occ=Dead] { (# ipv_a1dB, ipv1_a1dC #) ->
Main.$wgo
ipv_a1dB
(GHC.Types.:
# Int
(case integer-gmp-1.0.0.0:GHC.Integer.Type.integerToInt ipv1_a1dC
of wild2_a1ea { __DEFAULT ->
GHC.Types.I# wild2_a1ea
})
w1_s1wi)
};
True -> (# GHC.Num.$fNumInt_$cfromInteger w_s1wh, w1_s1wi #)
}
}
end Rec }
I actually couldn't find the relevant section for show.txt. I'll work on that later.
Right off the bat, digitsV2.hs produces shorter code. That's probably a good sign for it.
digits.hs seems to be following this psuedocode:
def digits(w_s1pI):
if w_s1pI < 10: return [fromInteger(w_s1pI)]
else:
ds_s16Q = quotRem(w_s1pI, 10)
q_X12h = ds_s16Q[0]
r_X12h = ds_s16Q[1]
wild3_a17c = integerToInt(r_X12h)
ww1_s1pO = r_X12h
ww2_s1pP = digits(q_X12h)
ww2_s1pP.pushFront(ww1_s1pO)
return ww2_s1pP
digitsV2.hs seems to be following this psuedocode:
def digitsV2(w_s1wh, w1_s1wi=[]): # actually disguised as go(), as #No_signal wrote
if w_s1wh < 10:
w1_s1wi.pushFront(fromInteger(w_s1wh))
return w1_s1wi
else:
ipv_a1dB, ipv1_a1dC = quotRem(w_s1wh, 10)
w1_s1wi.pushFront(integerToIn(ipv1a1dC))
return digitsV2(ipv1_a1dC, w1_s1wi)
It might not be that these functions mutate lists like my psuedocode suggests, but this immediately suggests something: it looks as if digitsV2 is fully tail-recursive, whereas digits is actually not (may have to use some Haskell trampoline or something). It appears as if Haskell needs to store all the remainders in digits before pushing them all to the front of the list, whereas it can just push them and forget about them in digitsV2. This is purely speculation, but it is well-founded speculation.

Haskell performance of memoization implement of dynamic programming is poor [duplicate]

I'm trying to memoize the following function:
gridwalk x y
| x == 0 = 1
| y == 0 = 1
| otherwise = (gridwalk (x - 1) y) + (gridwalk x (y - 1))
Looking at this I came up with the following solution:
gw :: (Int -> Int -> Int) -> Int -> Int -> Int
gw f x y
| x == 0 = 1
| y == 0 = 1
| otherwise = (f (x - 1) y) + (f x (y - 1))
gwlist :: [Int]
gwlist = map (\i -> gw fastgw (i `mod` 20) (i `div` 20)) [0..]
fastgw :: Int -> Int -> Int
fastgw x y = gwlist !! (x + y * 20)
Which I then can call like this:
gw fastgw 20 20
Is there an easier, more concise and general way (notice how I had to hardcode the max grid dimensions in the gwlist function in order to convert from 2D to 1D space so I can access the memoizing list) to memoize functions with multiple parameters in Haskell?
You can use a list of lists to memoize the function result for both parameters:
memo :: (Int -> Int -> a) -> [[a]]
memo f = map (\x -> map (f x) [0..]) [0..]
gw :: Int -> Int -> Int
gw 0 _ = 1
gw _ 0 = 1
gw x y = (fastgw (x - 1) y) + (fastgw x (y - 1))
gwstore :: [[Int]]
gwstore = memo gw
fastgw :: Int -> Int -> Int
fastgw x y = gwstore !! x !! y
Use the data-memocombinators package from hackage. It provides easy to use memorization techniques and provides an easy and breve way to use them:
import Data.MemoCombinators (memo2,integral)
gridwalk = memo2 integral integral gridwalk' where
gridwalk' x y
| x == 0 = 1
| y == 0 = 1
| otherwise = (gridwalk (x - 1) y) + (gridwalk x (y - 1))
Here is a version using Data.MemoTrie from the MemoTrie package to memoize the function:
import Data.MemoTrie(memo2)
gridwalk :: Int -> Int -> Int
gridwalk = memo2 gw
where
gw 0 _ = 1
gw _ 0 = 1
gw x y = gridwalk (x - 1) y + gridwalk x (y - 1)
If you want maximum generality, you can memoize a memoizing function.
memo :: (Num a, Enum a) => (a -> b) -> [b]
memo f = map f (enumFrom 0)
gwvals = fmap memo (memo gw)
fastgw :: Int -> Int -> Int
fastgw x y = gwvals !! x !! y
This technique will work with functions that have any number of arguments.
Edit: thanks to Philip K. for pointing out a bug in the original code. Originally memo had a "Bounded" constraint instead of "Num" and began the enumeration at minBound, which would only be valid for natural numbers.
Lists aren't a good data structure for memoizing, though, because they have linear lookup complexity. You might be better off with a Map or IntMap. Or look on Hackage.
Note that this particular code does rely on laziness, so if you wanted to switch to using a Map you would need to take a bounded amount of elements from the list, as in:
gwByMap :: Int -> Int -> Int -> Int -> Int
gwByMap maxX maxY x y = fromMaybe (gw x y) $ M.lookup (x,y) memomap
where
memomap = M.fromList $ concat [[((x',y'),z) | (y',z) <- zip [0..maxY] ys]
| (x',ys) <- zip [0..maxX] gwvals]
fastgw2 :: Int -> Int -> Int
fastgw2 = gwByMap 20 20
I think ghc may be stupid about sharing in this case, you may need to lift out the x and y parameters, like this:
gwByMap maxX maxY = \x y -> fromMaybe (gw x y) $ M.lookup (x,y) memomap

Creating image from function in parallel

I'm writting a parogram in Haskell that creates a fractal and writes to a PNG file. I have a function
f:: Int->Int->PixelRGB8
which calcualtes color of the pixel with given image coordinates. (The output color format, PixelRGB8, is not important, I can easilly change it to, say, RGB tuple or anything).
Using Codec.Picture, I can write
writePng "test.png" $ generateImage f width height
which indeed writes the desired image file. However, it works very slowly and I can see that my CPU load is low. I want to use parallel computations, since the computation of each pixel value does not depend on its neighbors. As far as I can see, Codec.Picture does not provide any means to do it. I understand how parMap works, but I can't see a way to apply it here. I think one possible solution is to use repa.DevIL, but I'm kinda lost in its multidimusional arrays notation which looks like an overkill in my case. So, the question is: how to construct an image file from given function using parallel?
UPDATE. Here's a complete code (function 'extract' is ommited because it's long and called only one time):
import Data.Complex
import System.IO
import Data.List.Split
import Codec.Picture
eval:: (Floating a) => [a] -> a -> a
eval [p] _ = p
eval (p:ps) z = p * z ** (fromIntegral (length ps) ) + (eval ps z)
type Comp = Complex Double
-- func, der, z, iter
convergesOrNot:: (Comp -> Comp) -> (Comp -> Comp) -> Comp->Int -> Int
convergesOrNot _ _ _ 0 = 0
convergesOrNot f d z iter | realPart (abs (f z) ) < 1e-6 = 1
| otherwise = convergesOrNot f d (z - (f z)/(d z)) (iter-1)
-- x, y, f,d, xMin, xMin, stepX, stepY
getPixel:: Int->Int->(Comp->Comp)->(Comp->Comp)->Double->Double->Double->Double->PixelRGB8
getPixel x y f d xMin yMin stepX stepY | convergesOrNot f d z 16 == 1 = PixelRGB8 255 255 255
| otherwise = PixelRGB8 0 0 0
where
real = xMin + (fromIntegral x)*stepX
imag = yMin + (fromIntegral y)*stepY
z = real :+ imag;
data Params = Params{f :: [Comp],
d :: [Comp],
xMin::Double,
yMin::Double,
stepX::Double,
stepY::Double,
width::Int,
height::Int
} deriving (Show)
getPixelParams:: Int->Int->Params->PixelRGB8
getPixelParams x y params = getPixel x y func derv (xMin params) (yMin params) (stepX params) (stepY params)
where
func = \z -> eval (f params) z
derv = \z -> eval (d params) z
main = do
handle <- openFile "config.txt" ReadMode
config <- hGetContents handle
let params = extract config
writePng "test.png" $ generateImage (\x y -> getPixelParams x y params) (width params) (height params)
hClose handle
The profiling shows that most of the time is spent in eval function. Result (the .prof file ) is as follows (it's only the top part of file, the rest is bunch of zeroes):
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 91 0 0.0 0.0 100.0 100.0
main Main 183 0 0.0 0.0 99.9 100.0
main.\ Main 244 0 0.0 0.0 0.0 0.0
getPixelParams Main 245 0 0.0 0.0 0.0 0.0
getPixelParams.derv Main 269 1 0.0 0.0 0.0 0.0
getPixelParams.func Main 246 1 0.0 0.0 0.0 0.0
generateImage Codec.Picture.Types 199 1 0.0 0.0 99.8 99.9
generateImage.generated Codec.Picture.Types 234 1 0.0 0.0 99.8 99.9
generateImage.generated.lineGenerator Codec.Picture.Types 238 257 0.0 0.0 99.8 99.9
generateImage.generated.lineGenerator.column Codec.Picture.Types 239 65792 0.5 0.8 99.8 99.9
unsafeWritePixel Codec.Picture.Types 275 65536 0.0 0.0 0.0 0.0
main.\ Main 240 65536 0.1 0.0 99.2 99.1
getPixelParams Main 241 65536 0.7 0.0 99.1 99.1
getPixelParams.derv Main 270 0 0.2 0.0 19.3 18.5
getPixelParams.derv.\ Main 271 463922 0.2 0.0 19.2 18.5
eval Main 272 1391766 18.9 18.5 18.9 18.5
getPixelParams.func Main 247 0 0.5 0.0 62.3 59.0
getPixelParams.func.\ Main 248 993380 0.4 0.0 61.8 59.0
eval Main 249 3973520 61.4 59.0 61.4 59.0
getPixel Main 242 65536 0.2 0.0 16.7 21.5
getPixel.imag Main 262 256 0.0 0.0 0.0 0.0
getPixel.z Main 261 65536 0.1 0.1 0.1 0.1
getPixel.real Main 251 65536 0.2 0.1 0.2 0.1
convergesOrNot Main 243 531889 16.3 21.3 16.3 21.3
UPDATE 2 After a number of changes from #Cirdec and #Jedai, the code looks like this:
import Data.Complex
import System.IO
import Data.List.Split
import qualified Data.List as DL
import Codec.Picture
import Codec.Picture.Types
import Control.Parallel
import Data.Array
import Control.Parallel.Strategies
import GHC.Conc (numCapabilities)
class Ix a => Partitionable a where
partition :: Int -> (a, a) -> [(a, a)]
default partition :: (Num a) => Int -> (a, a) -> [(a, a)]
partition n r#(l,_) = zipWith (\x y -> (x, x+y-1)) starts steps
where
(span, longerSpans) = rangeSize r `quotRem` n
steps = zipWith (+) (replicate (min (rangeSize r) n) (fromIntegral span)) (replicate longerSpans 1 ++ repeat 0)
starts = scanl (+) l steps
instance Partitionable Int
instance (Partitionable a, Partitionable b) => Partitionable (a, b) where
partition n ((x0,y0), (x1, y1)) = do
xr'#(x0', x1') <- partition n (x0, x1)
let n' = n * rangeSize xr' `div` rangeSize (x0, x1)
(y0', y1') <- partition n' (y0, y1)
return ((x0', y0'), (x1', y1'))
mkArrayPar :: (Partitionable i) => Int -> Strategy e -> (i, i) -> (i -> e) -> Array i e
mkArrayPar n s bounds f = listArray bounds (concat workUnits)
where
partitions = partition n bounds
workUnits = parMap (evalList s) (map f . range) partitions
generateImagePar :: forall a . Pixel a => (Int -> Int -> a) -> Int -> Int -> Image a
generateImagePar f w h = generateImage f' w h
where
bounds = ((0, 0), (w-1,h-1))
pixels = mkArrayPar numCapabilities rseq bounds (uncurry f)
f' = curry (pixels !)
--
-- Newton
--
eval:: (Floating a) => [a] -> a -> a
eval cs z = DL.foldl1' (\acc c -> acc * z + c) cs
diff:: (Floating a) => [a] -> [a]
diff [p] = []
diff (p:ps) = [(fromIntegral (length ps) )*p] ++ diff ps
type Comp = Complex Double
convergesOrNot:: (Comp -> Comp) -> (Comp -> Comp) -> Comp->Int -> Int
convergesOrNot _ _ _ 0 = 0
convergesOrNot f d z iter | realPart (abs (f z) ) < 1e-6 = 1
| otherwise = convergesOrNot f d (z - (f z)/(d z)) (iter-1)
-- x, y, f,d, xMin, xMin, stepX, stepY
getPixel:: Int->Int->(Comp->Comp)->(Comp->Comp)->Double->Double->Double->Double->PixelRGB8
getPixel x y f d xMin yMin stepX stepY | convergesOrNot f d z 16 == 1 = PixelRGB8 255 255 255
| otherwise = PixelRGB8 0 0 0
where
real = xMin + (fromIntegral x)*stepX
imag = yMin + (fromIntegral y)*stepY
z = real :+ imag;
data Params = Params{f :: [Comp],
d :: [Comp],
xMin::Double,
yMin::Double,
stepX::Double,
stepY::Double,
width::Int,
height::Int
} deriving (Show)
extract:: String -> Params
extract config = Params poly deriv xMin yMin stepX stepY width height
where
lines = splitOn "\n" config
wh = splitOn " " (lines !! 0)
width = read (wh !! 0) :: Int
height = read (wh !! 1) :: Int
bottomLeft = splitOn " " (lines !! 1)
upperRight = splitOn " " (lines !! 2)
xMin = read $ (bottomLeft !! 0) :: Double
yMin = read $ (bottomLeft !! 1) :: Double
xMax = read $ (upperRight !! 0) :: Double
yMax = read $ (upperRight !! 1) :: Double
stepX = (xMax - xMin)/(fromIntegral width)
stepY = (yMax - yMin)/(fromIntegral height)
poly = map (\x -> (read x :: Double) :+ 0) (splitOn " " (lines !! 3))
deriv = diff poly
getPixelParams:: Int->Int->Params->PixelRGB8
getPixelParams x y params = getPixel x y func derv (xMin params) (yMin params) (stepX params) (stepY params)
where
func = \z -> eval (f params) z
derv = \z -> eval (d params) z
main = do
handle <- openFile "config.txt" ReadMode
config <- hGetContents handle
let params = extract config
writePng "test.png" $ generateImagePar (\x y -> getPixelParams x y params) (width params) (height params)
hClose handle
I compile it with
ghc O2 -threaded -rtsopts -XDefaultSignatures -XExistentialQuantification partNewton.hs -o newton
and I run it with ./newton +RTS -N. But when I run it on config
2048 2048
-1 -1
1 1
1 0 0 1
it results in error
Stack space overflow: current size 8388608 bytes.
You can calculate the pixels in parallel before generating the image. To make the pixel lookup for generateImage simple, we'll stuff all of the pixels into an Array.
{-# LANGUAGE RankNTypes #-}
import Data.Array
import Control.Parallel.Strategies
To generate the image in parallel, we'll calculate the pixels in parallel for each boint within the range of the bounds of the image. We'll build a temporary Array to hold all the pixels. The array's lookup function, ! will provide an efficient lookup function to pass to generateImage.
generateImagePar :: forall a . Pixel a => (Int -> Int -> a) -> Int -> Int -> Image a
generateImagePar f w h = generateImage f' w h
where
bounds = ((0, 0), (w-1,h-1))
pixels = parMap rseq (uncurry f) (range bounds)
pixelArray = listArray bounds pixels
f' = curry (pixelArray !)
We can then write your example in terms of generateImagePar.
writePng "test.png" $ generateImagePar f width height
This may be no faster and may in fact be slower than using generateImage. It's important to profile your code to understand why it is slow before attempting to improve its performance. For example, if your program is memory starved or is thrashing resources, using generateImagePar will certainly be slower than using generateImage.
Partitioning the work
We can partition the work into chunks to reduce the number of sparks without resorting to any sort of mutable data structure. First we'll define the class of indexes whose ranges can be divided into partitions. We'll define a default for dividing up numeric ranges.
class Ix a => Partitionable a where
partition :: Int -> (a, a) -> [(a, a)]
default partition :: (Num a) => Int -> (a, a) -> [(a, a)]
partition n r#(l,_) = zipWith (\x y -> (x, x+y-1)) starts steps
where
(span, longerSpans) = rangeSize r `quotRem` n
steps = zipWith (+) (replicate (min (rangeSize r) n) (fromIntegral span)) (replicate longerSpans 1 ++ repeat 0)
starts = scanl (+) l steps
Ints (and any other Num) can be made Partitionable using the default implementation.
instance Partitionable Int
Index products can be partitioned by first partitioning the first dimension, and then partitioning the second dimension if there aren't enough possible divisions in the first dimension.
instance (Partitionable a, Partitionable b) => Partitionable (a, b) where
partition n ((x0,y0), (x1, y1)) = do
xr'#(x0', x1') <- partition n (x0, x1)
let n' = n * rangeSize xr' `div` rangeSize (x0, x1)
(y0', y1') <- partition n' (y0, y1)
return ((x0', y0'), (x1', y1'))
We can build an array in parallel by partitioning the work into units and sparking each work unit.
mkArrayPar :: (Partitionable i) => Int -> Strategy e -> (i, i) -> (i -> e) -> Array i e
mkArrayPar n s bounds f = listArray bounds (concat workUnits)
where
partitions = partition n bounds
workUnits = parMap (evalList s) (map f . range) partitions
Now we can define generateImagePar in terms of making an array in parallel. A good number of partitions is a small multiple of the number of actual processors, numCapabilities; we'll start up to 1 partition per processor.
import GHC.Conc (numCapabilities)
generateImagePar :: forall a . Pixel a => (Int -> Int -> a) -> Int -> Int -> Image a
generateImagePar f w h = generateImage f' w h
where
bounds = ((0, 0), (w-1,h-1))
pixels = mkArrayPar numCapabilities rseq bounds (uncurry f)
f' = curry (pixels !)

Optimizing the damerau version of the levenshtein algorithm to better than O(n*m)

Here is the algorithm (in ruby)
#http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
def self.dameraulevenshtein(seq1, seq2)
oneago = nil
thisrow = (1..seq2.size).to_a + [0]
seq1.size.times do |x|
twoago, oneago, thisrow = oneago, thisrow, [0] * seq2.size + [x + 1]
seq2.size.times do |y|
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + ((seq1[x] != seq2[y]) ? 1 : 0)
thisrow[y] = [delcost, addcost, subcost].min
if (x > 0 and y > 0 and seq1[x] == seq2[y-1] and seq1[x-1] == seq2[y] and seq1[x] != seq2[y])
thisrow[y] = [thisrow[y], twoago[y-2] + 1].min
end
end
end
return thisrow[seq2.size - 1]
end
My problem is that with a seq1 of length 780, and seq2 of length 7238, this takes about 25 seconds to run on an i7 laptop. Ideally, I'd like to get this reduced to about a second, since it's running as part of a webapp.
I found that there is a way to optimize the vanilla levenshtein distance such that the runtime drops from O(n*m) to O(n + d^2) where n is the length of the longer string, and d is the edit distance. So, my question becomes, can the same optimization be applied to the damerau version I have (above)?
Yes the optimization can be applied to the damereau version. Here is a haskell code to do this (I don't know Ruby):
distd :: Eq a => [a] -> [a] -> Int
distd a b
= last (if lab == 0 then mainDiag
else if lab > 0 then lowers !! (lab - 1)
else{- < 0 -} uppers !! (-1 - lab))
where mainDiag = oneDiag a b (head uppers) (-1 : head lowers)
uppers = eachDiag a b (mainDiag : uppers) -- upper diagonals
lowers = eachDiag b a (mainDiag : lowers) -- lower diagonals
eachDiag a [] diags = []
eachDiag a (bch:bs) (lastDiag:diags) = oneDiag a bs nextDiag lastDiag : eachDiag a bs diags
where nextDiag = head (tail diags)
oneDiag a b diagAbove diagBelow = thisdiag
where doDiag [_] b nw n w = []
doDiag a [_] nw n w = []
doDiag (apr:ach:as) (bpr:bch:bs) nw n w = me : (doDiag (ach:as) (bch:bs) me (tail n) (tail w))
where me = if ach == bch then nw else if ach == bpr && bch == apr then nw else 1 + min3 (head w) nw (head n)
firstelt = 1 + head diagBelow
thisdiag = firstelt : doDiag a b firstelt diagAbove (tail diagBelow)
lab = length a - length b
min3 x y z = if x < y then x else min y z
distance :: [Char] -> [Char] -> Int
distance a b = distd ('0':a) ('0':b)
The code above is an adaptation of this code.

Resources