I need to traverse N-ary tree and to the each node add number when I visited in in preorder. I have n-ary tree defined like this:
data NT a = N a [NT a] deriving Show
Example:
If I have following tree:
let ntree = N "eric" [N "lea" [N "kristy" [],N "pedro" [] ,N "rafael" []],N "anna" [],N "bety" []]
I want to transform it to
let ntree = N (1,"eric") [N (2,"lea") [N (3,"kristy") [],N (4,"pedro") [] ,N (5,"rafael") []],N (6,"anna") [],N (7,"bety") []]
"Preordedness" isnt that important.
I want to see how to write a function that passes values between levels, like how to pass number down to successor list and how to pass updated number to parent and go with that number to other branches.
So far I has been able to write functions like this:
traverse :: NT String -> String
traverse (N val []) =" "++val++" "
traverse (N val list) =val++" " ++ (concat $ map traverse list)
which outputs
"eric lea kristy pedro rafael anna bety "
EDIT: Question is:
How can I write a function
numberNodes :: NT a -> NT (a,Int)
that numbers nodes according to the preorder traversal of the tree?
Hard part for me to understand is passing auxilliary data around, could you please elaborate on that ?
In this concrete case it is one Int that means "time" or order in which I traverse this tree.
First Attempt: Hard Work
For the case of n-ary trees, there are three things going on: numbering elements, numbering trees, and numbering lists of trees. It would help to treat them separately. Types first:
aNumber :: a -- thing to number
-> Int -- number to start from
-> ( (a, Int) -- numbered thing
, Int -- next available number afterwards
)
ntNumber :: NT a -- thing to number
-> Int -- number to start from
-> ( NT (a, Int) -- numbered thing
, Int -- next available number afterwards
)
ntsNumber :: [NT a] -- thing to number
-> Int -- number to start from
-> ( [NT (a, Int)] -- numbered thing
, Int -- next available number afterwards
)
Notice that all three types share the same pattern. When you see that there is a pattern that you are following, apparently by coincidence, you know you have an opportunity to learn something. But let's press on for now and learn later.
Numbering an element is easy: copy the starting number into the output and return its successor as the next available.
aNumber a i = ((a, i), i + 1)
For the other two, the pattern (there's that word again) is
split the input into its top-level components
number each component in turn, threading the numbers through
It's easy to do the first with pattern matching (inspecting the data visually) and the second with where clauses (grabbing the two parts of the output).
For trees, a top level split gives us two components: an element and a list. In the where clause, we call the appropriate numbering functions as directed by those types. In each case, the "thing" output tells us what to put in place of the "thing" input. Meanwhile, we thread the numbers through, so the starting number for the whole is the starting number for the first component, the "next" number for the first component starts the second, and the "next" number from the second is the "next" number for the whole.
ntNumber (N a ants) i0 = (N ai aints, i2) where
(ai, i1) = aNumber a i0
(aints, i2) = ntsNumber ants i1
For lists, we have two possibilities. An empty list has no components, so we return it directly without using any more numbers. A "cons" has two components, we do exactly as we did before, using the appropriate numbering functions as directed by the type.
ntsNumber [] i = ([], i)
ntsNumber (ant : ants) i0 = (aint : aints, i2) where
(aint, i1) = ntNumber ant i0
(aints, i2) = ntsNumber ants i1
Let's give it a go.
> let ntree = N "eric" [N "lea" [N "kristy" [],N "pedro" [] ,N "rafael" []],N "anna" [],N "bety" []]
> ntNumber ntree 0
(N ("eric",0) [N ("lea",1) [N ("kristy",2) [],N ("pedro",3) [],N ("rafael",4) []],N ("anna",5) [],N ("bety",6) []],7)
So we're there. But are we happy? Well, I'm not. I have the annoying sensation that I wrote pretty much the same type three times and pretty much the same program twice. And if I wanted to do more element-numbering for differently organised data (e.g., your binary trees), I'd have to write the same thing again again. Repetitive patterns in Haskell code are always missed opportunities: it's important to develop a sense of self-criticism and ask whether there's a neater way.
Second Attempt: Numbering and Threading
Two of the repetitive patterns we saw, above, are
1. the similarity of the types,
2. the similarity of the way the numbers get threaded.
If you match up the types to see what's in common, you'll notice they're all
input -> Int -> (output, Int)
for different inputs and outputs. Let's give the largest common component a name.
type Numbering output = Int -> (output, Int)
Now our three types are
aNumber :: a -> Numbering (a, Int)
ntNumber :: NT a -> Numbering (NT (a, Int))
ntsNumber :: [NT a] -> Numbering [NT (a, Int)]
You often see such types in Haskell:
input -> DoingStuffToGet output
Now, to deal with the threading, we can build some helpful tools to work with and combine Numbering operations. To see which tools we need, look at how we combine the outputs after we've numbered the components. The "thing" parts of the outputs are always built by applying some functions which don't get numbered (data constructors, usually) to some "thing" outputs from numberings.
To deal with the functions, we can build a gadget that looks a lot like our [] case, where no actual numbering was needed.
steady :: thing -> Numbering thing
steady x i = (x, i)
Don't be put off by the way the type makes it look as if steady has only one argument: remember that Numbering thing abbreviates a function type, so there really is another -> in there. We get
steady [] :: Numbering [a]
steady [] i = ([], i)
just like in the first line of ntsNumber.
But what about the other constructors, N and (:)? Ask ghci.
> :t steady N
steady N :: Numbering (a -> [NT a] -> NT a)
> :t steady (:)
steady (:) :: Numbering (a -> [a] -> [a])
We get numbering operations with functions as outputs, and we want to generate the arguments to those function by more numbering operations, producing one big overall numbering operation with the numbers threaded through. One step of that process is to feed a numbering-generated function one numbering-generated input. I'll define that as an infix operator.
($$) :: Numbering (a -> b) -> Numbering a -> Numbering b
infixl 2 $$
Compare with the type of the explicit application operator, $
> :t ($)
($) :: (a -> b) -> a -> b
This $$ operator is "application for numberings". If we can get it right, our code becomes
ntNumber :: NT a -> Numbering (NT (a, Int))
ntNumber (N a ants) i = (steady N $$ aNumber a $$ ntsNumber ants) i
ntsNumber :: [NT a] -> Numbering [NT (a, Int)]
ntsNumber [] i = steady [] i
ntsNumber (ant : ants) i = (steady (:) $$ ntNumber ant $$ ntsNumber ants) i
with aNumber as it was (for the moment). This code just does the data reconstruction, plugging together the constructors and the numbering processes for the components. We had better give the definition of $$ and make sure it gets the threading right.
($$) :: Numbering (a -> b) -> Numbering a -> Numbering b
(fn $$ an) i0 = (f a, i2) where
(f, i1) = fn i0
(a, i2) = an i1
Here, our old threading pattern gets done once. Each of fn and an is a function, expecting a starting number, and the whole of fn $$ sn is a function, which gets the starting number i0. We thread the numbers through, collecting first the function, then the argument. We then do the actual application and hand back the final "next" number.
Now, notice that in every line of code, the i input is fed in as the argument to a numbering process. We can simplify this code by just talking about the processes, not the numbers.
ntNumber :: NT a -> Numbering (NT (a, Int))
ntNumber (N a ants) = steady N $$ aNumber a $$ ntsNumber ants
ntsNumber :: [NT a] -> Numbering [NT (a, Int)]
ntsNumber [] = steady []
ntsNumber (ant : ants) = steady (:) $$ ntNumber ant $$ ntsNumber ants
One way to read this code is to filter out all the Numbering, steady and $$ uses.
ntNumber :: NT a -> ......... (NT (a, Int))
ntNumber (N a ants) = ...... N .. (aNumber a) .. (ntsNumber ants)
ntsNumber :: [NT a] -> ......... [NT (a, Int)]
ntsNumber [] = ...... []
ntsNumber (ant : ants) = ...... (:) .. (ntNumber ant) .. (ntsNumber ants)
and you see it just looks like a preorder traversal, reconstructing the original data structure after processing the elements. We're doing the right thing with the values, provided steady and $$ are correctly combining the processes.
We could try to do the same for aNumber
aNumber :: a -> Numbering a
aNumber a = steady (,) $$ steady a $$ ????
but the ???? is where we actually need the number. We could build a numbering process that fits in that hole: a numbering process that issues the next number.
next :: Numbering Int
next i = (i, i + 1)
That's the essence of numbering, the "thing" output is the number to be used now (which is the starting number), and the "next" number output is the one after. We may write
aNumber a = steady (,) $$ steady a $$ next
which simplifies to
aNumber a = steady ((,) a) $$ next
In our filtered view, that's
aNumber a = ...... ((,) a) .. next
What we've done is to bottle the idea of a "numbering process" and we've built the right tools to do ordinary functional programming with those processes. The threading pattern turns into the definitions of steady and $$.
Numbering is not the only thing that works this way. Try this...
> :info Applicative
class Functor f => Applicative (f :: * -> *) where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
...and you also get some more stuff. I just want to draw attention to the types of pure and <*>. They're a lot like steady and $$, but they are not just for Numbering. Applicative is the type class for every kind of process which works that way. I'm not saying "learn Applicative now!", just suggesting a direction of travel.
Third Attempt: Type-Directed Numbering
So far, our solution is directed towards one particular data structure, NT a, with [NT a] showing up as an auxiliary notion because it's used in NT a. We can make the whole thing a bit more plug-and-play if we focus on one layer of the type at a time. We defined numbering a list of trees in terms of numbering trees. In general, we know how to number a list of stuff if we know how to number each item of stuff.
If we know how to number an a to get b, we should be able to number a list of a to get a list of b. We can abstract over "how to process each item".
listNumber :: (a -> Numbering b) -> [a] -> Numbering [b]
listNumber na [] = steady []
listNumber na (a : as) = steady (:) $$ na a $$ listNumber na as
and now our old list-of-trees-numbering function becomes
ntsNumber :: [NT a] -> Numbering [NT (a, Int)]
ntsNumber = listNumber ntNumber
which is hardly worth naming. We can just write
ntNumber :: NT a -> Numbering (NT (a, Int))
ntNumber (N a ants) = steady N $$ aNumber a $$ listNumber ntNumber ants
We can play the same game for the trees themselves. If you know how to number stuff, you know how to number a tree of stuff.
ntNumber' :: (a -> Numbering b) -> NT a -> Numbering (NT b)
ntNumber' na (N a ants) = steady N $$ na a $$ listNumber (ntNumber' na) ants
Now we can do things like this
myTree :: NT [String]
myTree = N ["a", "b", "c"] [N ["d", "e"] [], N ["f"] []]
> ntNumber' (listNumber aNumber) myTree 0
(N [("a",0),("b",1),("c",2)] [N [("d",3),("e",4)] [],N [("f",5)] []],6)
Here, the node data is now itself a list of things, but we've been able to number those things individually. Our equipment is more adaptable because each component aligns with one layer of the type.
Now, try this:
> :t traverse
traverse :: (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)
It's an awful lot like the thing we just did, where f is Numbering and t is sometimes lists and sometimes trees.
The Traversable class captures what it means to be a type-former that lets you thread some sort of process through the stored elements. Again, the pattern you're using is very common and has been anticipated. Learning to use traverse saves a lot of work.
Eventually...
...you'll learn that a thing to do the job of Numbering already exists in the library: it's called State Int and it belongs to the Monad class, which means it must also be in the Applicative class. To get hold of it,
import Control.Monad.State
and the operation which kicks off a stateful process with its initial state, like our feeding-in of 0, is this thing:
> :t evalState
evalState :: State s a -> s -> a
Our next operation becomes
next' :: State Int Int
next' = get <* modify (1+)
where get is the process that accesses the state, modify makes a process that changes the state, and <* means "but also do".
If you start you file with the language extension pragma
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
you can declare your datatype like this
data NT a = N a [NT a] deriving (Show, Functor, Foldable, Traversable)
and Haskell will write traverse for you.
Your program then becomes one line...
evalState (traverse (\ a -> pure ((,) a) <*> get <* modify (1+)) ntree) 0
-- ^ how to process one element ^^^^^^^^^^^^^^^
-- ^ how to process an entire tree of elements ^^^^^^^^^
-- ^ processing your particular tree ^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- ^ kicking off the process with a starting number of 0 ^^^^^^^^^^^^^^^^
...but the journey to that one line involves a lot of "bottling the pattern" steps, which takes some (hopefully rewarding) learning.
I'll update this answer as soon as I get some progress.
Right now I reduced problem from n-ary tree to binary tree.
data T a = Leaf a | N (T a) a (T a) deriving Show
numberNodes:: T a -> T (a,Int)
numberNodes tree = snd $ numberNodes2 tree 0
numberNodes2:: T a -> Int -> (Int, T (a,Int))
numberNodes2 (Leaf a) time = (time,Leaf (a,time))
numberNodes2 (N left nodeVal right) time = (rightTime, N leftTree (nodeVal,time) rightTree )
where (leftTime,leftTree) = numberNodes2 left (time+1)
(rightTime,rightTree) = numberNodes2 right (leftTime+1)
Function numberNodes creates from this tree:
let bt = N (N (Leaf "anna" ) "leo" (Leaf "laura")) "eric" (N (Leaf "john") "joe" (Leaf "eddie"))
following tree:
N (N (Leaf ("anna",2)) ("leo",1) (Leaf ("laura",3))) ("eric",0) (N (Leaf ("john",5)) ("joe",4) (Leaf ("eddie",6)))
And now just rewrite it for n-ary tree...( which I don't know how to do, any hints? )
This answer by #pigworker is excellent, and I learned lots from it.
However, I believe we can use mapAccumL from Data.Traversable to achieve a very similar behaviour:
{-# LANGUAGE DeriveTraversable #-}
import Data.Traversable
import Data.Tuple
-- original data type from the question
data NT a = N a [NT a]
deriving (Show, Functor, Foldable, Traversable)
-- additional type from #pigworker's answer
type Numbering output = Int -> (output, Int)
-- compare this to signature of ntNumber
-- swap added to match the signature
ntNumberSimple :: (NT a) -> Numbering (NT (a, Int))
ntNumberSimple t n = swap $ mapAccumL func n t
where
func i x = (i+1, (x, i))
I believe that mapAccumL is using the very same State monad under the hood, but at the very least it's completely hidden from the caller.
I have a very large decision tree. It is used as follows:
-- once per application start
t :: Tree
t = buildDecisionTree
-- done several times
makeDecision :: Something -> Decision
makeDecision something = search t something
This decision tree is way too large to fit in memory. But, thanks to lazy evaluation, it is only partially evaluated.
The problem is, that there are scenarios where all possible decisions are tried causing the whole tree to be evaluated. This is not going to terminate, but should not cause a memory overflow either. Further, if this process is aborted, the memory usage does not decrease, as a huge subtree is still evaluated already.
A solution would be to reevaluate the tree every time makeDecision is called, but this would loose the benefits of caching decisions and significantly slow down makeDecision.
I would like to go a middle course. In particular it is very common in my application to do successive decisions with common path prefix in the tree. So I would like to cache the last used path but drop the others, causing them to reevaluate the next time they are used. How can I do this in Haskell?
It is not possible in pure haskell, see question Can a thunk be duplicated to improve memory performance? (as pointed out by #shang). You can, however, do this with IO.
We start with the module heade and list only the type and the functions that should make this module (which will use unsafePerformIO) safe. It is also possible to do this without unsafePerformIO, but that would mean that the user has to keep more of his code in IO.
{-# LANGUAGE ExistentialQuantification #-}
module ReEval (ReEval, newReEval, readReEval, resetReEval) where
import Data.IORef
import System.IO.Unsafe
We start by defining a data type that stores a value in a way that prevents all sharing, by keeping the function and the argument away from each other, and only apply the function when we want the value. Note that the value returned by unsharedValue can be shared, but not with the return value of other invocations (assuming the function is doing something non-trivial):
data Unshared a = forall b. Unshared (b -> a) b
unsharedValue :: Unshared a -> a
unsharedValue (Unshared f x) = f x
Now we define our data type of resettable computations. We need to store the computation and the current value. The latter is stored in an IORef, as we want to be able to reset it.
data ReEval a = ReEval {
calculation :: Unshared a,
currentValue :: IORef a
}
To wrap a value in a ReEval box, we need to have a function and an argument. Why not just a -> ReEval a? Because then there would be no way to prevent the parameter to be shared.
newReEval :: (b -> a) -> b -> ReEval a
newReEval f x = unsafePerformIO $ do
let c = Unshared f x
ref <- newIORef (unsharedValue c)
return $ ReEval c ref
Reading is simple: Just get the value from the IORef. This use of unsafePerformIO is safe becuase we will always get the value of unsharedValue c, although a different “copy” of it.
readReEval :: ReEval a -> a
readReEval r = unsafePerformIO $ readIORef (currentValue r)
And finally the resetting. I left it in the IO monad, not because it would be any less safe than the other function to be wrapped in unsafePerformIO, but because this is the easiest way to give the user control over when the resetting actually happens. You don’t want to risk that all your calls to resetReEval are lazily delayed until your memory has run out or even optimized away because there is no return value to use.
resetReEval :: ReEval a -> IO ()
resetReEval r = writeIORef (currentValue r) (unsharedValue (calculation r))
This is the end of the module. Here is example code:
import Debug.Trace
import ReEval
main = do
let func a = trace ("func " ++ show a) negate a
let l = [ newReEval func n | n <- [1..5] ]
print (map readReEval l)
print (map readReEval l)
mapM_ resetReEval l
print (map readReEval l)
And here you can see that it does what expected:
$ runhaskell test.hs
func 1
func 2
func 3
func 4
func 5
[-1,-2,-3,-4,-5]
[-1,-2,-3,-4,-5]
func 1
func 2
func 3
func 4
func 5
[-1,-2,-3,-4,-5]
I want to write a parallel map function in Haskell that's as efficient as possible. My initial attempt, which seems to be currently best, is to simply write,
pmap :: (a -> b) -> [a] -> [b]
pmap f = runEval . parList rseq . map f
I'm not seeing perfect CPU division, however. If this is possibly related to the number of sparks, could I write a pmap that divides the list into # of cpus segments, so there are minimal sparks created? I tried the following, but the peformance (and number of sparks) is much worse,
pmap :: (a -> b) -> [a] -> [b]
pmap f xs = concat $ runEval $ parList rseq $ map (map f) (chunk xs) where
-- the (len / 4) argument represents the size of the sublists
chunk xs = chunk' ((length xs) `div` 4) xs
chunk' n xs | length xs <= n = [xs]
| otherwise = take n xs : chunk (drop n xs)
The worse performance may be correlated with the higher memory use. The original pmap does scale somewhat on 24-core systems, so it's not that I don't have enough data.
(The number of CPU's on my desktop is 4, so I just hardcoded that).
Edit 1
Some performance data using +RTS -H512m -N -sstderr -RTS is here:
my 4-core desktop
24-core server
The parallel package defines a number of parallel map strategies for you:
parMap :: Strategy b -> (a -> b) -> [a] -> [b]
A combination of parList and map, and specific support for chunking the list:
parListChunk :: Int -> Strategy a -> Strategy [a]
Divides a list into chunks, and applies the strategy evalList strat to each chunk in parallel.
You should be able to use a combination of these to get any sparking behavior you desire. Or, for even more control, the Par monad package, for controlling the amount of threads created (purely).
References: The haddock docs for the parallel package
I am looking for a mutable (balanced) tree/map/hash table in Haskell or a way how to simulate it inside a function. I.e. when I call the same function several times, the structure is preserved. So far I have tried Data.HashTable (which is OK, but somewhat slow) and tried Data.Array.Judy but I was unable to make it work with GHC 6.10.4. Are there any other options?
If you want mutable state, you can have it. Just keep passing the updated map around, or keep it in a state monad (which turns out to be the same thing).
import qualified Data.Map as Map
import Control.Monad.ST
import Data.STRef
memoize :: Ord k => (k -> ST s a) -> ST s (k -> ST s a)
memoize f = do
mc <- newSTRef Map.empty
return $ \k -> do
c <- readSTRef mc
case Map.lookup k c of
Just a -> return a
Nothing -> do a <- f k
writeSTRef mc (Map.insert k a c) >> return a
You can use this like so. (In practice, you might want to add a way to clear items from the cache, too.)
import Control.Monad
main :: IO ()
main = do
fib <- stToIO $ fixST $ \fib -> memoize $ \n ->
if n < 2 then return n else liftM2 (+) (fib (n-1)) (fib (n-2))
mapM_ (print <=< stToIO . fib) [1..10000]
At your own risk, you can unsafely escape from the requirement of threading state through everything that needs it.
import System.IO.Unsafe
unsafeMemoize :: Ord k => (k -> a) -> k -> a
unsafeMemoize f = unsafePerformIO $ do
f' <- stToIO $ memoize $ return . f
return $ unsafePerformIO . stToIO . f'
fib :: Integer -> Integer
fib = unsafeMemoize $ \n -> if n < 2 then n else fib (n-1) + fib (n-2)
main :: IO ()
main = mapM_ (print . fib) [1..1000]
Building on #Ramsey's answer, I also suggest you reconceive your function to take a map and return a modified one. Then code using good ol' Data.Map, which is pretty efficient at modifications. Here is a pattern:
import qualified Data.Map as Map
-- | takes input and a map, and returns a result and a modified map
myFunc :: a -> Map.Map k v -> (r, Map.Map k v)
myFunc a m = … -- put your function here
-- | run myFunc over a list of inputs, gathering the outputs
mapFuncWithMap :: [a] -> Map.Map k v -> ([r], Map.Map k v)
mapFuncWithMap as m0 = foldr step ([], m0) as
where step a (rs, m) = let (r, m') = myFunc a m in (r:rs, m')
-- this starts with an initial map, uses successive versions of the map
-- on each iteration, and returns a tuple of the results, and the final map
-- | run myFunc over a list of inputs, gathering the outputs
mapFunc :: [a] -> [r]
mapFunc as = fst $ mapFuncWithMap as Map.empty
-- same as above, but starts with an empty map, and ignores the final map
It is easy to abstract this pattern and make mapFuncWithMap generic over functions that use maps in this way.
Although you ask for a mutable type, let me suggest that you use an immutable data structure and that you pass successive versions to your functions as an argument.
Regarding which data structure to use,
There is an implementation of red-black trees at Kent
If you have integer keys, Data.IntMap is extremely efficient.
If you have string keys, the bytestring-trie package from Hackage looks very good.
The problem is that I cannot use (or I don't know how to) use a non-mutable type.
If you're lucky, you can pass your table data structure as an extra parameter to every function that needs it. If, however, your table needs to be widely distributed, you may wish to use a state monad where the state is the contents of your table.
If you are trying to memoize, you can try some of the lazy memoization tricks from Conal Elliott's blog, but as soon as you go beyond integer arguments, lazy memoization becomes very murky—not something I would recommend you try as a beginner. Maybe you can post a question about the broader problem you are trying to solve? Often with Haskell and mutability the issue is how to contain the mutation or updates within some kind of scope.
It's not so easy learning to program without any global mutable variables.
If I read your comments right, then you have a structure with possibly ~500k total values to compute. The computations are expensive, so you want them done only once, and on subsequent accesses, you just want the value without recomputation.
In this case, use Haskell's laziness to your advantage! ~500k is not so big: Just build a map of all the answers, and then fetch as needed. The first fetch will force computation, subsequent fetches of the same answer will reuse the same result, and if you never fetch a particular computation - it never happens!
You can find a small implementation of this idea using 3D point distances as the computation in the file PointCloud.hs. That file uses Debug.Trace to log when the computation actually gets done:
> ghc --make PointCloud.hs
[1 of 1] Compiling Main ( PointCloud.hs, PointCloud.o )
Linking PointCloud ...
> ./PointCloud
(1,2)
(<calc (1,2)>)
Just 1.0
(1,2)
Just 1.0
(1,5)
(<calc (1,5)>)
Just 1.0
(1,2)
Just 1.0
Are there any other options?
A mutable reference to a purely functional dictionary like Data.Map.