Related
I implemented the following function:
iterateState :: Int -> (a -> State s a) -> (a -> State s [a])
iterateState 0 f a = return []
iterateState n f a = do
b <- f a
xs <- iterateState (n - 1) f b
return $ b : xs
My primary use case is for a = Double. It works, but it is very slow. It allocates 528MB of heap space to produce a list of 1M Double values and spends most of its time doing garbage collection.
I have experimented with implementations that work on the type s -> (a, s) directly as well as with various strictness annotations. I was able to reduce the heap allocation somewhat, but not even close to what one would expect from a reasonable implementation. I suspect that the resulting ([a], s) being a combination of something to be consumed lazily ([a]) and something whose WHNF forces the entire computation (s) makes optimization difficult for GHC.
Assuming that the iterative nature of lists would be unsuitable for this situation, I turned to the vector package. To my delight, it already contains
iterateNM :: (Monad m, Unbox a) => Int -> (a -> m a) -> a -> m (Vector a)
Unfortunately, this is only slightly faster than my list implementation, still allocating 328MB of heap space. I assumed that this is because it uses unstreamM, whose description reads
Load monadic stream bundle into a newly allocated vector. This function goes through a list, so prefer using unstream, unless you need to be in a monad.
Looking at its behavior for the list monad, it is understandable that there is no efficient implementation for general monads. Luckily, I only need the state monad, and I found another function that almost fits the signature of the state monad.
unfoldrExactN :: Unbox a => Int -> (b -> (a, b)) -> b -> Vector a
This function is blazingly fast and performs no excess heap allocation beyond the 8MB needed to hold the resulting unboxed vector of 1M Double values. Unfortunately, it does not return the final state at the end of the computation, so it cannot be wrapped in the State type.
I looked at the implementation of unfoldrExactN to see if I could adjust it to expose the final state at the end of the computation. Unfortunately, this seems to be difficult, as the stream constructed by
unfoldrExactN :: Monad m => Int -> (s -> (a, s)) -> s -> Stream m a
which is eventually expanded into a vector by unstream has already forgotten the state type s.
I imagine I could circumvent the entire Stream infrastructure and implement iterateState directly on mutable vectors in the ST monad (similarly to how unstream expands a stream into a vector). However, I would lose all the benefits of stream fusion, as well as turning a computation that is easily expressed as a pure function into imperative low-level mush just for performance reasons. This is particularly frustrating while knowing that the existing unfoldrExactN already calculates all the values I want, but I have no access to them.
Is there a better way?
Can this function be implemented in a purely functional way with reasonable performance and no excess heap allocations? Preferably in a way that ties into the vector package and its stream fusion infrastructure.
The following program has 12MB max residency on my computer when compiled with optimizations:
import Data.Vector.Unboxed
import Data.Vector.Unboxed.Mutable
iterateNState :: Unbox a => Int -> (a -> s -> (s, a)) -> (a -> s -> (s, Vector a))
iterateNState n f a0 s0 = createT (unsafeNew n >>= go 0 a0 s0) where
go i a s arr
| i >= n = pure (s, arr)
| otherwise = do
unsafeWrite arr i a
case f a s of
(s', a') -> go (i+1) a' s' arr
main = id
. print
. Data.Vector.Unboxed.sum
. snd
$ iterateNState 1000000 (\a s -> (s+1, a+s :: Int)) 0 0
(It continues to have a nice low residency even when the final two 0s are read from input dynamically.)
I'm new to Haskell, and trying to learn it by thinking in terms of image processing.
So far, I have been stuck thinking about how you would implement a neighbourhood-filtering algorithm in Haskell (or any functional programming language, really).
How would a spatial averaging filter (say 3x3 kernel, 5x5 image) be written functionally? Coming from an entirely imperative background, I can't seem to come up with a way to either structure the data so the solution is elegant, or not do it by iterating through the image matrix, which doesn't seem very declarative.
Working with neighborhoods is easy to do elegantly in a functional language. Operations like convolution with a kernel are higher order functions that can be written in terms of one of the usual tools of functional programming languages - lists.
To write some real, useful code, we'll first play pretend to explain a library.
Pretend
You can think of each image as a function from a coordinate in the image to the value of the data held at that coordinate. This would be defined over all possible coordinates, so it would be useful to pair it with some bounds which tell us where the function is defined. This would suggest a data type like
data Image coordinate value = Image {
lowerBound :: coordinate,
upperBound :: coordinate,
value :: coordinate -> value
}
Haskell has a very similar data type called Array in Data.Array. This data type comes with an additional feature that the value function in Image wouldn't have - it remembers the value for each coordinate so that it never needs to be recomputed. We'll work with Arrays using three functions, which I'll describe in terms of how they'd be defined for Image above. This will help us see that even though we are using the very useful Array type, everything could be written in terms of functions and algebraic data types.
type Array i e = Image i e
bounds gets the bounds of the Array
bounds :: Array i e -> (i, i)
bounds img = (lowerBound img, upperBound img)
The ! looks up a value in the Array
(!) :: Array i e -> i -> e
img ! coordinate = value img coordinate
Finally, makeArray builds an Array
makeArray :: Ix i => (i, i) -> (i -> e) -> Array i e
makeArray (lower, upper) f = Image lower upper f
Ix is a typeclass for things that behave like image coordinates, they have a range. There are instances for most of the base types like Int, Integer, Bool, Char, etc. For example the range of (1, 5) is [1, 2, 3, 4, 5]. There's also an instances for products or tuples of things that themselves have Ix instances; the instance for tuples ranges over all combinations of the ranges of each component. For example, range (('a',1),('c',2)) is
[('a',1),('a',2),
('b',1),('b',2),
('c',1),('c',2)]`
We are only interested in two functions from the Ix typeclass, range :: Ix a => (a, a) -> [a] and inRange :: Ix a => a -> (a, a) -> Bool. inRange quickly checks if a value would be in the result of range.
Reality
In reality, makeArray isn't provided by Data.Array, but we can define it in terms of listArray which constructs an Array from a list of items in the same order as the range of its bounds
import Data.Array
makeArray :: (Ix i) => (i, i) -> (i -> e) -> Array i e
makeArray bounds f = listArray bounds . map f . range $ bounds
When we convolve an array with a kernel, we will compute the neighborhood by adding the coordinates from the kernel to the coordinate we are calculating. The Ix typeclass doesn't require that we can combine two indexes together. There's one candidate typeclass for "things that combine" in base, Monoid, but there aren't instances for Int or Integer or other numbers because there's more than one sensible way to combine them: + and *. To address this, we'll make our own typeclass Offset for things that combine with a new operator called .+.. Usually we don't make typeclasses except for things that have laws. We'll just say that Offset should "work sensibly" with Ix.
class Offset a where
(.+.) :: a -> a -> a
Integers, the default type Haskell uses when you write an integer literal like 9, can be used as offsets.
instance Offset Integer where
(.+.) = (+)
Additionally, pairs or tuples of things that Offset can be combined pairwise.
instance (Offset a, Offset b) => Offset (a, b) where
(x1, y1) .+. (x2, y2) = (x1 .+. x2, y1 .+. y2)
We have one more wrinkle before we write convolve - how will we deal with the edges of the image? I intend to pad them with 0 for simplicity. pad background makes a version of ! that's defined everywhere, outside the bounds of an Array it returns the background.
pad :: Ix i => e -> Array i e -> i -> e
pad background array i =
if inRange (bounds array) i
then array ! i
else background
We're now prepared to write a higher order function for convolve. convolve a b convolves the image b with the kernel a. convolve is higher order because each of its arguments and its result is an Array, which is really a combination of a function ! and its bounds.
convolve :: (Num n, Ix i, Offset i) => Array i n -> Array i n -> Array i n
convolve a b = makeArray (bounds b) f
where
f i = sum . map (g i) . range . bounds $ a
g i o = a ! o * pad 0 b (i .+. o)
To convolve an image b with a kernel a, we make a new image defined over the same bounds as b. Each point in the image can be computed by the function f, which sums the product (*) of the value in the kernel a and the value in the padded image b for each offset o in the range of the bounds of the kernel a.
Example
With the six declarations from the previous section, we can write the example you requested, a spatial averaging filter with a 3x3 kernel applied to a 5x5 image. The kernel a defined below is a 3x3 image that uses one ninth of the value from each of the 9 sampled neighbors. The 5x5 image b is a gradient increasing from 2 in the top left corner to 10 in the bottom right corner.
main = do
let
a = makeArray ((-1, -1), (1, 1)) (const (1.0/9))
b = makeArray ((1,1),(5,5)) (\(x,y) -> fromInteger (x + y))
c = convolve a b
print b
print c
The printed input b is
array ((1,1),(5,5))
[((1,1),2.0),((1,2),3.0),((1,3),4.0),((1,4),5.0),((1,5),6.0)
,((2,1),3.0),((2,2),4.0),((2,3),5.0),((2,4),6.0),((2,5),7.0)
,((3,1),4.0),((3,2),5.0),((3,3),6.0),((3,4),7.0),((3,5),8.0)
,((4,1),5.0),((4,2),6.0),((4,3),7.0),((4,4),8.0),((4,5),9.0)
,((5,1),6.0),((5,2),7.0),((5,3),8.0),((5,4),9.0),((5,5),10.0)]
The convolved output c is
array ((1,1),(5,5))
[((1,1),1.3333333333333333),((1,2),2.333333333333333),((1,3),2.9999999999999996),((1,4),3.6666666666666665),((1,5),2.6666666666666665)
,((2,1),2.333333333333333),((2,2),3.9999999999999996),((2,3),5.0),((2,4),6.0),((2,5),4.333333333333333)
,((3,1),2.9999999999999996),((3,2),5.0),((3,3),6.0),((3,4),7.0),((3,5),5.0)
,((4,1),3.6666666666666665),((4,2),6.0),((4,3),7.0),((4,4),8.0),((4,5),5.666666666666666)
,((5,1),2.6666666666666665),((5,2),4.333333333333333),((5,3),5.0),((5,4),5.666666666666666),((5,5),4.0)]
Depending on the complexity of what you want to do, you might consider using more established libraries, like the oft recommended repa, rather than implementing an image processing kit for yourself.
I wanted to write an efficient implementation of the Floyd-Warshall all pairs shortest path algorithm in Haskell using Vectors to hopefully get good performance.
The implementation is quite straight-forward, but instead of using a 3-dimensional |V|×|V|×|V| matrix, a 2-dimensional vector is used, since we only ever read the previous k value.
Thus, the algorithm is really just a series of steps where a 2D vector is passed in, and a new 2D vector is generated. The final 2D vector contains the shortest paths between all nodes (i,j).
My intuition told me that it would be important to make sure that the previous 2D vector was evaluated before each step, so I used BangPatterns on the prev argument to the fw function and the strict foldl':
{-# Language BangPatterns #-}
import Control.DeepSeq
import Control.Monad (forM_)
import Data.List (foldl')
import qualified Data.Map.Strict as M
import Data.Vector (Vector, (!), (//))
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as V hiding (length, replicate, take)
type Graph = Vector (M.Map Int Double)
type TwoDVector = Vector (Vector Double)
infinity :: Double
infinity = 1/0
-- calculate shortest path between all pairs in the given graph, if there are
-- negative cycles, return Nothing
allPairsShortestPaths :: Graph -> Int -> Maybe TwoDVector
allPairsShortestPaths g v =
let initial = fw g v V.empty 0
results = foldl' (fw g v) initial [1..v]
in if negCycle results
then Nothing
else Just results
where -- check for negative elements along the diagonal
negCycle a = any not $ map (\i -> a ! i ! i >= 0) [0..(V.length a-1)]
-- one step of the Floyd-Warshall algorithm
fw :: Graph -> Int -> TwoDVector -> Int -> TwoDVector
fw g v !prev k = V.create $ do -- ← bang
curr <- V.new v
forM_ [0..(v-1)] $ \i ->
V.write curr i $ V.create $ do
ivec <- V.new v
forM_ [0..(v-1)] $ \j -> do
let d = distance g prev i j k
V.write ivec j d
return ivec
return curr
distance :: Graph -> TwoDVector -> Int -> Int -> Int -> Double
distance g _ i j 0 -- base case; 0 if same vertex, edge weight if neighbours
| i == j = 0.0
| otherwise = M.findWithDefault infinity j (g ! i)
distance _ a i j k = let c1 = a ! i ! j
c2 = (a ! i ! (k-1))+(a ! (k-1) ! j)
in min c1 c2
However, when running this program with a 1000-node graph with 47978 edges, things does not look good at all. The memory usage is very high and the program takes way too long to run. The program was compiled with ghc -O2.
I rebuilt the program for profiling, and limited the number of iterations to 50:
results = foldl' (fw g v) initial [1..50]
I then ran the program with +RTS -p -hc and +RTS -p -hd:
This is... interesting, but I guess it's showing that it's accumulating tonnes of thunks. Not good.
Ok, so after a few shots in the dark, I added a deepseq in fw to make sure prev really is evaluted:
let d = prev `deepseq` distance g prev i j k
Now things look better, and I can actually run the program to completion with constant memory usage. It's obvious that the bang on the prev argument was not enough.
For comparison with the previous graphs, here is the memory usage for 50 iterations after adding the deepseq:
Ok, so things are better, but I still have some questions:
Is this the correct solution for this space leak? I am wrong in feeling that inserting a deepseq is a bit ugly?
Is my usage of Vectors here idiomatic/correct? I'm building a completely new vector for every iteration and hoping that the garbage collector will delete the old Vectors.
Is there any other things I could do to make this run faster with this approach?
For references, here is graph.txt: http://sebsauvage.net/paste/?45147f7caf8c5f29#7tiCiPovPHWRm1XNvrSb/zNl3ujF3xB3yehrxhEdVWw=
Here is main:
main = do
ls <- fmap lines $ readFile "graph.txt"
let numVerts = head . map read . words . head $ ls
let edges = map (map read . words) (tail ls)
let g = V.create $ do
g' <- V.new numVerts
forM_ [0..(numVerts-1)] (\idx -> V.write g' idx M.empty)
forM_ edges $ \[f,t,w] -> do
-- subtract one from vertex IDs so we can index directly
curr <- V.read g' (f-1)
V.write g' (f-1) $ M.insert (t-1) (fromIntegral w) curr
return g'
let a = allPairsShortestPaths g numVerts
case a of
Nothing -> putStrLn "Negative cycle detected."
Just a' -> do
putStrLn $ "The shortest, shortest path has length "
++ show ((V.minimum . V.map V.minimum) a')
First, some general code cleanup:
In your fw function, you explicitly allocate and fill mutable vectors. However, there is a premade function for this exact purpose, namely generate. fw can therefore be rewritten as
V.generate v (\i -> V.generate v (\j -> distance g prev i j k))
Similarly, the graph generation code can be replaced with replicate and accum:
let parsedEdges = map (\[f,t,w] -> (f - 1, (t - 1, fromIntegral w))) edges
let g = V.accum (flip (uncurry M.insert)) (V.replicate numVerts M.empty) parsedEdges
Note that this totally removes all need for mutation, without losing any performance.
Now, to the actual questions:
In my experience, deepseq is very useful, but only as quick fix to space leaks like this one. The fundamental problem is not that you need to force the results after you've produced them. Instead, the use of deepseq implies that you should have been building the structure more strictly in the first place. In fact, if you add a bang pattern in your vector creation code like so:
let !d = distance g prev i j k
Then the problem is fixed without deepseq. Note that this doesn't work with the generate code, because, for some reason (I might create a feature request for this), vector does not provide strict functions for boxed vectors. However, when I get to unboxed vectors in answer to question 3, which are strict, both approaches work without strictness annotations.
As far as I know, the pattern of repeatedly generating new vectors is idiomatic. The only thing not idiomatic is the use of mutability - except when they are strictly necessary, mutable vectors are generally discouraged.
There are a couple of things to do:
Most simply, you can replace Map Int with IntMap. As that isn't really the slow point of the function, this doesn't matter too much, but IntMap can be much faster for heavy workloads.
You can switch to using unboxed vectors. Although the outer vector has to remain boxed, as vectors of vectors can't be unboxed, the inner vector can be. This also solves your strictness problem - because unboxed vectors are strict in their elements, you don't get a space leak. Note that on my machine, this improves the performance from 4.1 seconds to 1.3 seconds, so the unboxing is very helpful.
You can flatten the vector into a single one and use multiplication and division to switch between two dimensional indicies and one dimentional indicies. I don't recommend this, as it is a bit involved, quite ugly, and, due to the division, actually slows down the code on my machine.
You can use repa. This has the huge advantage of automatically parallelizing your code. Note that, since repa flattens its arrays and apparently doesn't properly get rid of the divisions needed to fill nicely (it's possible to do with nested loops, but I think it uses a single loop and a division), it has the same performance penalty as I mentioned above, bringing the runtime from 1.3 seconds to 1.8. However, if you enable parallelism and use a multicore machine, you start seeing some benifits. Unfortunately, you current test case is too tiny to see much benifit, so, on my 6 core machine, I see it drop back down to 1.2 seconds. If I up the size back to [1..v] instead of [1..50], the parallelism brings it from 32 seconds to 13. Presumably, if you give this program a larger input, you might see more benifit.
If you're interested, I've posted my repa-ified version here.
EDIT: Use -fllvm. Testing on my computer, using repa, I get 14.7 seconds without parallelism, which is almost as good as without -fllvm and with parallelism. In general, LLVM can just handle array based code like this very well.
I need a pseudo random number generator that gives me a number from the range [-1, 1] (range is optional) from two inputs of the type float.
I'll also try to explain why I need it:
I'm using the Diamond-Square algorithm to create a height map for my terrain engine. The terrain is split into patches (Chunked LOD).
The problem with Diamond-Square is that it uses the random function, so let's say two neighbor patches are sharing same point (x, z) then I want the height to be the same for them all so that I won't get some crack effect.
Some may say I could fetch the height information from the neighbor patch, but then the result could be different after which patch was created first.
So that's why I need a pseudo number generator that returns an unique number given two inputs which are the (x, z).
(I'm not asking someone to write such function, I just need a general feedback and or known algorithms that do something similar).
You need something similar to a hash function on the pair (x, z).
I would suggest something like
(a * x + b * z + c) ^ d
where all numbers are integers, a and b are big primes so that the integer multiplications overflow, and c and d are some random integers. ^ is bitwise exclusive or. The result is a random integer which you can scale to the desired range.
This assumes that the map is not used in a game where knowing the terrain is of substantial value, as such a function is not secure for keeping it a secret. In that case you'd better use some cryptographic function.
If you're looking for a bijection from IRxIR -> [-1;1], I can suggest this:
bijection from IR to ]-a:a[
First let's find a bijection from IR-> ]-1;1[ so we just need to find a bijection from IRxIR->IR
tan(x): ]-Pi/2;Pi/2[ -> IR
arctan(x) : IR -> ]-Pi/2;Pi/2[
1/Pi*arctan(x) + 1/2: IR -> ]0;1[
2*arctan(x) : IR->]-Pi:Pi[
and
ln(x) : IR + -> IR
exp(x): IR -> R+
Bijection from ]0,1[ x ]0,1[ -> ]0,1[
let's write:
(x,y) in ]0,1[ x ]0,1[
x= 0,x1x2x3x4...xn...etc where x1x2x3x4...xn represent the decimals of x in base 10
y=0,y1y2y3y4...ym...etc idem
Let's define z=0,x1y1x2y2xx3y3....xnyn...Oym in ]0,1[
Then by construction we can provethere that it is exact bijection from ]0,1[ x ]0,1[ to ]0,1[.
(i'm not sure it's is true for number zith infinite decimals..but it's at least a "very good" injection, tell me if i'm wrong)
let's name this function : CANTOR(x,y)
then 2*CANTOR-1 is a bijection from ]0,1[ x ]0,1[ -> ]-1,1[
Then combining all the above assertions:
here you go, you get the bijection from IRxIR -> ]-1;1[...
You can combine with a bijection from IR-> ]0,1[
IRxIR -> ]-1;1[
(x,y) -> 2*CANTOR(1/Pi*arctan(x) + 1/2,1/Pi*arctan(y) + 1/2)-1
let's define the reciproque, we process the same way:
RCANTOR: z -> (x,y) (reciproque of CANTOR(x,y)
RCANTOR((z+1)/2): ]-1:1[ -> ]01[x ]0,1[
then 1/Pi*tan(RCANTOR((z+1)/2)) + 1/2 : z ->(x,y)
]-1;1[ -> IRxIR
Just pick any old hash function, stick in the binary description of the coordinates and use the output.
How would you make the folowing code functional with the same speed? In general, as an input I have a list of objects containing position coordinates and other stuff and I need to create a 2D array consisting those objects.
let m = Matrix.Generic.create 6 6 []
let pos = [(1.3,4.3); (5.6,5.4); (1.5,4.8)]
pos |> List.iter (fun (pz,py) ->
let z, y = int pz, int py
m.[z,y] <- (pz,py) :: m.[z,y]
)
It could be probably done in this way:
let pos = [(1.3,4.3); (5.6,5.4); (1.5,4.8)]
Matrix.generic.init 6 6 (fun z y ->
pos |> List.fold (fun state (pz,py) ->
let iz, iy = int pz, int py
if iz = z && iy = y then (pz,py) :: state else state
) []
)
But I guess it would be much slower because it loops through the whole matrix times the list versus the former list iteration...
PS: the code might be wrong as I do not have F# on this computer to check it.
It depends on the definition of "functional". I would say that a "functional" function means that it always returns the same result for the same parameters and that it doesn't modify any global state (or the value of parameters if they are mutable). I think this is a sensible definition for F#, but it also means that there is nothing "dis-functional" with using mutation locally.
In my point of view, the following function is "functional", because it creates and returns a new matrix instead of modifying an existing one, but of course, the implementation of the function uses mutation.
let performStep m =
let res = Matrix.Generic.create 6 6 []
let pos = [(1.3,4.3); (5.6,5.4); (1.5,4.8)]
for pz, py in pos do
let z, y = int pz, int py
res.[z,y] <- (pz,py) :: m.[z,y]
res
Mutation-free version:
Now, if you wanted to make the implementation fully functional, then I would start by creating a matrix that contains Some(pz, py) in the places where you want to add the new list element to the element of the matrix and None in all other places. I guess this could be done by initializing a sparse matrix. Something like this:
let sp = pos |> List.map (fun (pz, py) -> int pz, int py, (pz, py))
let elementsToAdd = Matrix.Generic.initSparse 6 6 sp
Then you should be able to combine the original matrix m with the newly created elementsToAdd. This can be certainly done using init (however, having something like map2 would be maybe nicer):
let res = Matrix.init 6 6 (fun i j ->
match elementsToAdd.[i, j], m.[i, j] with
| Some(n), res -> n::res
| _, res -> res )
There is still quite likely some mutation hidden in the F# library functions (such as init and initSparse), but at least it shows one way to implement the operation using more primitive operations.
EDIT: This will work only if you need to add at most single element to each matrix cell. If you wanted to add multiple elements, you'd have to group them first (e.g. using Seq.groupBy)
You can do something like this:
[1.3, 4.3; 5.6, 5.4; 1.5, 4.8]
|> Seq.groupBy (fun (pz, py) -> int pz, int py)
|> Seq.map (fun ((pz, py), ps) -> pz, py, ps)
|> Matrix.Generic.initSparse 6 6
But in your question you said:
How would you make the folowing code functional with the same speed?
And in a later comment you said:
Well, I try to avoid mutability so that the code would be simple to paralelize in the future
I am afraid this is a triumph of hope over reality. Functional code generally has poor absolute performance and scales badly when parallelized. Given the huge amount of allocation this code is doing, you're not likely to see any performance gain from parallelism at all.
Why do you want to do it functionally? The Matrix type is designed to be mutated, so the way you're doing it now looks good to me.
If you really want to do it functionally, though, here's what I'd do:
let pos = [(1.3,4.3); (5.6,5.4); (1.5,4.8)]
let addValue m k v =
if Map.containsKey k m then
Map.add k (v::m.[k]) m
else
Map.add k [v] m
let map =
pos
|> List.map (fun (x,y) -> (int x, int y),(x,y))
|> List.fold (fun m (p,q) -> addValue m p q) Map.empty
let m = Matrix.Generic.init 6 6 (fun x y -> if (Map.containsKey (x,y) map) then map.[x,y] else [])
This runs through the list once, creating an immutable map from indices to lists of points. Then, we initialize each entry in the matrix, doing a single map lookup for each entry. This should take total time O(M + N log N) where M and N are the number of entries in your matrix and list respectively. I believe that your original solution using mutation takes O(M+N) time and your revised solution takes O(M*N) time.