Why are (constant) expressions not evaluated at compile time in Haskell?

Why are (constant) expressions not evaluated at compile time in Haskell? - performance

I am currently learning Haskell, and there is one thing that baffles me:
When I build a complex expression (whose computation will take some time) and this expression is constant (meaning it is build only of known, hard coded values), the expression is not evaluated at compile time.
Comming from a C/C++ background I am used to such kind of optimization.
What is the reason to NOT perform such optimization (by default) in Haskell / GHC ? What are the advantages, if any?
data Tree a =
EmptyTree
| Node a (Tree a) (Tree a)
deriving (Show, Read, Eq)
elementToTree :: a -> Tree a
elementToTree x = Node x EmptyTree EmptyTree
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = elementToTree x
treeInsert x (Node a left right)
| x == a = Node x left right
| x < a = Node a (treeInsert x left) right
| x > a = Node a left (treeInsert x right)
treeFromList :: (Ord a) => [a] -> Tree a
treeFromList [] = EmptyTree
treeFromList (x:xs) = treeInsert x (treeFromList xs)
treeElem :: (Ord a) => a -> Tree a -> Bool
treeElem x EmptyTree = False
treeElem x (Node a left right)
| x == a = True
| x < a = treeElem x left
| x > a = treeElem x right
main = do
let tree = treeFromList [0..90000]
putStrLn $ show (treeElem 3 tree)
As this will always print True I would expect the compiled programm to print and exit
almost immediately.

You may like this reddit thread. The compiler could try to do this, but it could be dangerous, as constants of any type can do funny things like loop. There are at least two solutions: one is supercompilation, not available as part of any compiler yet but you can try prototypes from various researchers; the more practical one is to use Template Haskell, which is GHC's mechanism for letting the programmer ask for some code to be run at compile time.

The process you are talking about is called supercompilation and it's more difficult than you make it out to be. It is actually one of the active research topics in computing science! There are some people that are trying to create such a supercompiler for Haskell (probably based on GHC, my memory is vague) but the feature is not included in GHC (yet) because the maintainers want to keep compilation times down. You mention C++ as a language that does this – C++ also happens to have notoriously bad compilation times!
Your alternative for Haskell is to do this optimisation manually with Template Haskell, which is Haskells compile-time evaluated macro system.

In this case, GHC can not be sure that the computation would finish. It's not a question of lazy versus strict, but rather the halting problem. To you, it looks quite simple to say that treeFromlist [0..90000] is a constant that can be evaluated at compile time, but how does the compiler know this? The compiler can easily optimize [0..90000] to a constant, but you wouldn't even notice this change.

Related

Should I avoid constructing in Haskell?

While reading a snipped from Haskell for Great Good I found the following situation:
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x (Node a left right)
| x == a = Node x left right
| x < a = Node a (treeInsert x left) right
| x > a = Node a left (treeInsert x right)
Wouldn't it be better for performance if we just reused the given Tree when x == a?
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x all#(Node a left right)
| x == a = all
| x < a = Node a (treeInsert x left) right
| otherwise = Node a left (treeInsert x right)
In real life coding, what should I do? Are there any drawbacks when returning the same thing?

Let's look at the core! (Without optimisations here)
$ ghc-7.8.2 -ddump-simpl wtmpf-file13495.hs
The relevant difference is that the first version (without all#(...)) has
case GHC.Classes.> # a_aUH $dOrd_aUV eta_B2 a1_aBQ
of _ [Occ=Dead] {
GHC.Types.False ->
Control.Exception.Base.patError
# (TreeInsert.Tree a_aUH)
"wtmpf-file13495.hs:(9,1)-(13,45)|function treeInsert"#;
GHC.Types.True ->
TreeInsert.Node
# a_aUH
a1_aBQ
left_aBR
(TreeInsert.treeInsert # a_aUH $dOrd_aUV eta_B2 right_aBS)
where reusing the node with that as-pattern does just
TreeInsert.Node
# a_aUI
a1_aBR
left_aBS
(TreeInsert.treeInsert # a_aUI $dOrd_aUW eta_B2 right_aBT);
This is an avoided check that may well make a significant performance difference.
However, this difference has actually nothing to do with the as-pattern. It's just because your first snippet uses a x > a guard, which is not trivial. The second uses otherwise, which is optimised away.
If you change the first snippet to
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x (Node a left right)
| x == a = Node x left right
| x < a = Node a (treeInsert x left) right
| otherwise = Node a left (treeInsert x right)
then the difference boils down to
GHC.Types.True -> TreeInsert.Node # a_aUH a1_aBQ left_aBR right_aBS
vs
GHC.Types.True -> wild_Xa
Which is indeed just the difference of Node x left right vs all.
...without optimisations, that is. The versions diverge further when I turn on -O2. But I can't really make out how the performance would differ, there.

In real life coding, what should I do? Are there any drawbacks when returning the same thing?
a == b does not guarantee that f a == f b for all functions f. So, you may have to return new object even if they compare equal.
In other words, it may not be feasible to change Node x left right to Node a left right or all when a == x regardless of performance gains.
For example you may have types which carry meta data. When you compare them for equality, you may only care about the values and ignore the meta data. But if you replace them just because they compare equal then you will loose the meta data.
newtype ValMeta a b = ValMeta (a, b) -- value, along with meta data
deriving (Show)
instance Eq a => Eq (ValMeta a b) where
-- equality only compares values, ignores meta data
ValMeta (a, b) == ValMeta (a', b') = a == a'
The point is Eq type-class only says that you may compare values for equality. It does not guarantee anything beyond that.
A real-world example where a == b does not guarantee that f a == f b is when you maintain a Set of unique values within a self-balancing tree. A self-balancing tree (such as Red-Black tree) has some guarantees about the structure of tree but the actual depth and structure depends on the order that the data were added to or removed from the set.
Now when you compare 2 sets for equality, you want to compare that values within the set are equal, not that the underlying trees have the same exact structure. But if you have a function such as depth which exposes the depth of underlying tree maintaining the set then you cannot guarantee that the depths are equal even if the sets compare equal.
Here is a video of great Philip Wadler realizing live and on-stage that many useful relations do not preserve equality (starting at 42min).
Edit: Example from ghc where a == b does not imply f a == f b:
\> import Data.Set
\> let a = fromList [1, 2, 3, 4, 5, 10, 9, 8, 7, 6]
\> let b = fromList [1..10]
\> let f = showTree
\> a == b
True
\> f a == f b
False
Another real-world example is hash-table. Two hash-tables are equal if and only if their key-value pairs tie out. However, the capacity of a hash-table, i.e. the number of keys you may add before having to re-allocate and rehash, depends on the order of inserts/deletes.
So if you have a function which returns the capacity of hash table, it may return different values for hash-tables a and b even though a == b.

My two cents... perhaps not even about the original question:
Instead of writing guards with x < a and x == a, I would match compare a b against LT, EQ and GT, e.g.:
treeInsert x all#(Node a left right) =
case compare x a of
EQ -> ...
LT -> ...
GT -> ...
I would do this especially if x and a can be complex data structures, since a test like x < a could be expensive.

answer seems to be wrong. I just leave it here, for reference...
With your second function you avoid creating a new node, because the compiler cannot really understand equality (== is just some function.) If you change the first version to
-- version C
treeInsert :: (Ord a) => a -> Tree a -> Tree a
treeInsert x EmptyTree = singleton x
treeInsert x (Node a left right)
| x == a = Node a left right -- Difference here! Changed x to a.
| x < a = Node a (treeInsert x left) right
| x > a = Node a left (treeInsert x right)
the compiler will probably be able to do common subexpression elimination, because the optimizer will be able to see that Node a left right is the same as Node a left right.
On the other hand, I doubt that the compiler can deduce from a == x that Node a left right is the same as Node x left right.
So, I'm pretty sure that under -O2, version B and version C are the same, but version A is probably slower because it does an extra instantiation in the a == x case.

Well, if the first case had used a instead of x as follows, then there's at least the chance that GHC would eliminate the allocation of a new node through common subexpression elimination.
treeInsert x (Node a left right)
| x == a = Node a left right
However, this is all but irrelevant in any non-trivial use case, because the path down the tree to the node is going to be duplicated even when the element already exists. And this path is going to be significantly longer than a single node unless your use case is trivial.
In the world of ML, the fairly idiomatic way to avoid this is to throw a KeyAlreadyExists exception, and then catch that exception at the top-level insertion function and return the original tree. This would cause the stack to be unwound instead of allocating any of the Nodes on the heap.
A direct implementation of the ML idiom is basically a no-no in Haskell, for good reasons. If avoiding this duplication matters, the simplest and possibly best thing to do is to check if the tree contains the key before you insert it.
The downside of this approach, compared to a direct Haskell insert or the ML idiom, is that it involves two traversals of the path instead of one. Now, here is a non-duplicating, single-pass insert you can implement in Haskell:
treeInsert :: Ord a => a -> Tree a -> Tree a
treeInsert x original_tree = result_tree
where
(result_tree, new_tree) = loop x original_tree
loop x EmptyTree = (new_tree, singleton x)
loop x (Node a left right) =
case compare x a of
LT -> let (res, new_left) = loop x left
in (res, Node a new_left right)
EQ -> (original_tree, error "unreachable")
GT -> let (res, new_right) = loop x right
in (res, Node a left new_right)
However, older versions of GHC (roughly 7-10 years ago) don't handle this sort of recursion through lazy pairs of results very efficiently, and in my experience check-before-insert is likely to perform better. I'd be slightly surprised if this observation has really changed in the context of more recent GHC versions.
One can certainly imagine a function that directly constructs (but does not return) a new path for the tree, and decides to return the new path or the original path once it's known whether the element exists already. (The new path would immediately become garbage if it is not returned.) This conforms to the basic principles of the GHC runtime, but isn't really expressible in the source language.
Of course, any completely non-duplicating insertion function on a lazy data structure is going to have different strictness properties than a simple, duplicating insert. So no matter the implementation technique, they are different functions if laziness matters.
But of course, whether or not the path is duplicated may not matter that much. The cases where it would matter the most would be when you are using the tree persistently, because in linear use cases the old path would become garbage immediately after each insertion. And of course, this only matters when you are inserting a significant number of duplicates.

Efficiency of unfoldr versus zipWith

Over on Code Review, I answered a question about a naive Haskell fizzbuzz solution by suggesting an implementation that iterates forward, avoiding the quadratic cost of the increasing number of primes and discarding modulo division (almost) entirely. Here's the code:
fizz :: Int -> String
fizz = const "fizz"
buzz :: Int -> String
buzz = const "buzz"
fizzbuzz :: Int -> String
fizzbuzz = const "fizzbuzz"
fizzbuzzFuncs = cycle [show, show, fizz, show, buzz, fizz, show, show, fizz, buzz, show, fizz, show, show, fizzbuzz]
toFizzBuzz :: Int -> Int -> [String]
toFizzBuzz start count =
let offsetFuncs = drop (mod (start - 1) 15) fizzbuzzFuncs
in take count $ zipWith ($) offsetFuncs [start..]
As a further prompt, I suggested rewriting it using Data.List.unfoldr. The unfoldr version is an obvious, simple modification to this code so I'm not going to type it here unless people seeking to answer my question insist that is important (no spoilers for the OP over on Code Review). But I do have a question about the relative efficiency of the unfoldr solution compared to the zipWith one. While I am no longer a Haskell neophyte, I am no expert on Haskell internals.
An unfoldr solution does not require the [start..] infinite list, since it can simply unfold from start. My thoughts are
The zipWith solution does not memoize each successive element of [start..] as it is asked for. Each element is used and discarded because no reference to the head of [start..] is kept. So there is no more memory consumed there than with unfoldr.
Concerns about the performance of unfoldr and recent patches to make it always inlined are conducted at a level which I have not yet reached.
So I think the two are equivalent in memory consumption but have no idea about the relative performance. Hoping more informed Haskellers can direct me towards an understanding of this.
unfoldr seems a natural thing to use to generate sequences, even if other solutions are more expressive. I just know I need to understand more about it's actual performance. (For some reason I find foldr much easier to comprehend on that level)
Note: unfoldr's use of Maybe was the first potential performance issue that occurred to me, before I even started investigating the issue (and the only bit of the optimisation/inlining discussions that I fully understood). So I was able to stop worrying about Maybe right away (given a recent version of Haskell).

As the one responsible for the recent changes in the implementations of zipWith and unfoldr, I figured I should probably take a stab at this. I can't really compare them so easily, because they're very different functions, but I can try to explain some of their properties and the significance of the changes.
unfoldr
Inlining
The old version of unfoldr (before base-4.8/GHC 7.10) was recursive at the top level (it called itself directly). GHC never inlines recursive functions, so unfoldr was never inlined. As a result, GHC could not see how it interacted with the function it was passed. The most troubling effect of this was that the function passed in, of type (b -> Maybe (a, b)), would actually produce Maybe (a, b) values, allocating memory to hold the Just and (,) constructors. By restructuring unfoldr as a "worker" and a "wrapper", the new code allows GHC to inline it and (in many cases) fuse it with the function passed in, so the extra constructors are stripped away by compiler optimizations.
For example, under GHC 7.10, the code
module Blob where
import Data.List
bloob :: Int -> [Int]
bloob k = unfoldr go 0 where
go n | n == k = Nothing
| otherwise = Just (n * 2, n+1)
compiled with ghc -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures leads to the core
$wbloob :: Int# -> [Int]
$wbloob =
\ (ww_sYv :: Int#) ->
letrec {
$wgo_sYr :: Int# -> [Int]
$wgo_sYr =
\ (ww1_sYp :: Int#) ->
case tagToEnum# (==# ww1_sYp ww_sYv) of _ {
False -> : (I# (*# ww1_sYp 2)) ($wgo_sYr (+# ww1_sYp 1));
True -> []
}; } in
$wgo_sYr 0
bloob :: Int -> [Int]
bloob =
\ (w_sYs :: Int) ->
case w_sYs of _ { I# ww1_sYv -> $wbloob ww1_sYv }
Fusion
The other change to unfoldr was rewriting it to participate in "fold/build" fusion, an optimization framework used in GHC's list libraries. The idea of both "fold/build" fusion and the newer, differently balanced, "stream fusion" (used in the vector library) is that if a list is produced by a "good producer", transformed by "good transformers", and consumed by a "good consumer", then the list conses never actually need to be allocated at all. The old unfoldr was not a good producer, so if you produced a list with unfoldr and consumed it with, say, foldr, the pieces of the list would be allocated (and immediately become garbage) as computation proceeded. Now, unfoldr is a good producer, so you can write a loop using, say, unfoldr, filter, and foldr, and not (necessarily) allocate any memory at all.
For example, given the above definition of bloob, and a stern {-# INLINE bloob #-} (this stuff is a bit fragile; good producers sometimes need to be inlined explicitly to be good), the code
hooby :: Int -> Int
hooby = sum . bloob
compiles to the GHC core
$whooby :: Int# -> Int#
$whooby =
\ (ww_s1oP :: Int#) ->
letrec {
$wgo_s1oL :: Int# -> Int# -> Int#
$wgo_s1oL =
\ (ww1_s1oC :: Int#) (ww2_s1oG :: Int#) ->
case tagToEnum# (==# ww1_s1oC ww_s1oP) of _ {
False -> $wgo_s1oL (+# ww1_s1oC 1) (+# ww2_s1oG (*# ww1_s1oC 2));
True -> ww2_s1oG
}; } in
$wgo_s1oL 0 0
hooby :: Int -> Int
hooby =
\ (w_s1oM :: Int) ->
case w_s1oM of _ { I# ww1_s1oP ->
case $whooby ww1_s1oP of ww2_s1oT { __DEFAULT -> I# ww2_s1oT }
}
which has no lists, no Maybes, and no pairs; the only allocation it performs is the Int used to store the final result (the application of I# to ww2_s1oT). The entire computation can reasonably be expected to be performed in machine registers.
zipWith
zipWith has a bit of a weird story. It fits into the fold/build framework a bit awkwardly (I believe it works quite a bit better with stream fusion). It is possible to make zipWith fuse with either its first or its second list argument, and for many years, the list library tried to make it fuse with either, if either was a good producer. Unfortunately, making it fuse with its second list argument can make a program less defined under certain circumstances. That is, a program using zipWith could work just fine when compiled without optimization, but produce an error when compiled with optimization. This is not a great situation. Therefore, as of base-4.8, zipWith no longer attempts to fuse with its second list argument. If you want it to fuse with a good producer, that good producer had better be in the first list argument.
Specifically, the reference implementation of zipWith leads to the expectation that, say, zipWith (+) [1,2,3] (1 : 2 : 3 : undefined) will give [2,4,6], because it stops as soon as it hits the end of the first list. With the previous zipWith implementation, if the second list looked like that but was produced by a good producer, and if zipWith happened to fuse with it rather than the first list, then it would go boom.

Breaking lists at index

I have a performance question today.
I am making a (Haskell) program and, when profiling, I saw that most of the time is spent in the function you can find below. Its purpose is to take the nth element of a list and return the list without it besides the element itself. My current (slow) definition is as follows:
breakOn :: Int -> [a] -> (a,[a])
breakOn 1 (x:xs) = (x,xs)
breakOn n (x:xs) = (y,x:ys)
where
(y,ys) = breakOn (n-1) xs
The Int argument is known to be in the range 1..n where n is the length of the (never null) list (x:xs), so the function never arises an error.
However, I got a poor performance here. My first guess is that I should change lists for another structure. But, before start picking different structures and testing code (which will take me lot of time) I wanted to ask here for a third person opinion. Also, I'm pretty sure that I'm not doing it in the best way. Any pointers are welcome!
Please, note that the type a may not be an instance of Eq.
Solution
I adapted my code tu use Sequences from the Data.Sequence module. The result is here:
import qualified Data.Sequence as S
breakOn :: Int -> Seq a -> (a,Seq a)
breakOn n xs = (S.index zs 0, ys <> (S.drop 1 zs))
where
(ys,zs) = S.splitAt (n-1) xs
However, I still accept further suggestions of improvement!

Yes, this is inefficient. You can do a bit better by using splitAt (which unboxes the number during the recursive bit), a lot better by using a data structure with efficient splitting, e.g. a fingertree, and best by massaging the context to avoid needing this operation. If you post a bit more context, it may be possible to give more targeted advice.

Prelude functions are generally pretty efficient. You could rewrite your function using splitAt, as so:
breakOn :: Int -> [a] -> (a,[a])
breakOn n xs = (z,ys++zs)
where
(ys,z:zs) = splitAt (n-1) xs

Can I always convert mutable-only algorithms to single-assignment and still be efficient?

The Context
The context of this question is that I want to play around with Gene Expression Programming (GEP), a form of evolutionary algorithm, using Erlang. GEP makes use of a string based DSL called 'Karva notation'. Karva notation is easily translated into expression parse trees, but the translation algorithm assumes an implementation having mutable objects: incomplete sub-expressions are created early-on the translation process and their own sub-expressions are filled-in later-on with values that were not known at the time they were created.
The purpose of Karva notation is that it guarantees syntactically correct expressions are created without any expensive encoding techniques or corrections of genetic code. The problem is that with a single-assignment programming language like Erlang, I have to recreate the expression tree continually as each sub expression gets filled in. This takes an inexpensive - O(n)? - update operation and converts it into one that would complete in exponential time (unless I'm mistaken). If I can't find an efficient functional algorithm to convert K-expressions into expression trees, then one of the compelling features of GEP is lost.
The Question
I appreciate that the K-expression translation problem is pretty obscure, so what I want is advice on how to convert an inherently-non-functional algorithm (alg that exploits mutable data structures) into one that does not. How do pure functional programming languages adapt many of the algorithms and data structures that were produced in the early days of computer science that depend on mutability to get the performance characteristics they need?

Carefully designed immutability avoids unecessary updating
Immutable data structures are only an efficiency problem if they're constantly changing, or you build them up the wrong way. For example, continually appending more to the end of a growing list is quadratic, whereas concatenating a list of lists is linear. If you think carefully, you can usually build up your structure in a sensible way, and lazy evaluation is your friend - hand out a promise to work it out and stop worrying.
Blindly trying to replicate an imperative algorithm can be ineffecient, but you're mistaken in your assertion that functional programming has to be asymptotically bad here.
Case study: pure functional GEP: Karva notation in linear time
I'll stick with your case study of parsing Karva notation for GEP. (
I've played with this solution more fully in this answer.)
Here's a fairly clean pure functional solution to the problem. I'll take the opportunity to name drop some good general recursion schemes along the way.
Code
(Importing Data.Tree supplies data Tree a = Node {rootLabel :: a, subForest :: Forest a} where type Forest a = [Tree a].)
import Data.Tree
import Data.Tree.Pretty -- from the pretty-tree package for visualising trees
arity :: Char -> Int
arity c
| c `elem` "+*-/" = 2
| c `elem` "Q" = 1
| otherwise = 0
A hylomorphism is the composition of an anamorphism (build up, unfoldr) and a catamorphism (combine, foldr).
These terms are introduced to the FP community in the seminal paper Functional Programming with Bananas, Lenses and Barbed wire.
We're going to pull the levels out (ana/unfold) and combine them back together (cata/fold).
hylomorphism :: b -> (a -> b -> b) -> (c -> (a, c)) -> (c -> Bool) -> c -> b
hylomorphism base combine pullout stop seed = hylo seed where
hylo s | stop s = base
| otherwise = combine new (hylo s')
where (new,s') = pullout s
To pull out a level, we use the total arity from the previous level to find where to split off this new level, and pass on the total arity for this one ready for next time:
pullLevel :: (Int,String) -> (String,(Int,String))
pullLevel (n,cs) = (level,(total, cs')) where
(level, cs') = splitAt n cs
total = sum $ map arity level
To combine a level (as a String) with the level below (that's already a Forest), we just pull off the number of trees that each character needs.
combineLevel :: String -> Forest Char -> Forest Char
combineLevel "" [] = []
combineLevel (c:cs) levelBelow = Node c subforest : combineLevel cs theRest
where (subforest,theRest) = splitAt (arity c) levelBelow
Now we can parse the Karva using a hylomorphism. Note that we seed it with a total arity from outside the string of 1, since there's only one node at the root level. Correspondingly we apply head to the result to get this singleton back out after the hylomorphism.
karvaToTree :: String -> Tree Char
karvaToTree cs = let
zero (n,_) = n == 0
in head $ hylomorphism [] combineLevel pullLevel zero (1,cs)
Linear Time
There's no exponential blowup, nor repeated O(log(n)) lookups or expensive modifications, so we shouldn't be in too much trouble.
arity is O(1)
splitAt part is O(part)
pullLevel (part,cs) is O(part) for grab using splitAt to get level, plus O(part) for the map arity level, so O(part)
combineLevel (c:cs) is O(arity c) for the splitAt, and O(sum $ map arity cs) for the recursive call
hylomorphism [] combineLevel pullLevel zero (1,cs)
makes a pullLevel call for each level, so the total pullLevel cost is O(sum parts) = O(n)
makes a combineLevel call for each level, so the total combineLevel cost is O(sum $ map arity levels) = O(n), since the total arity of the entire input is bound by n for valid strings.
makes O(#levels) calls to zero (which is O(1)), and #levels is bound by n, so that's below O(n) too
Hence karvaToTree is linear in the length of the input.
I think that puts to rest the assertion that you needed to use mutability to get a linear algorithm here.
Demo
Let's have a draw of the results (because Tree is so full of syntax it's hard to read the output!). You have to cabal install pretty-tree to get Data.Tree.Pretty.
see :: Tree Char -> IO ()
see = putStrLn.drawVerticalTree.fmap (:"")
ghci> karvaToTree "Q/a*+b-cbabaccbac"
Node {rootLabel = 'Q', subForest = [Node {rootLabel = '/', subForest = [Node {rootLabel = 'a', subForest = []},Node {rootLabel = '*', subForest = [Node {rootLabel = '+', subForest = [Node {rootLabel = '-', subForest = [Node {rootLabel = 'b', subForest = []},Node {rootLabel = 'a', subForest = []}]},Node {rootLabel = 'c', subForest = []}]},Node {rootLabel = 'b', subForest = []}]}]}]}
ghci> see $ karvaToTree "Q/a*+b-cbabaccbac"
Q
|
/
|
------
/ \
a *
|
-----
/ \
+ b
|
----
/ \
- c
|
--
/ \
b a
which matches the output expected from this tutorial where I found the example:

There isn't a single way to do this, it really has to be attempted case-by-case. I typically try to break them down into simpler operations using fold and unfold and then optimize from there. Karva decoding case is a breadth-first tree unfold as others have noted, so I started with treeUnfoldM_BF. Perhaps there are similar functions in Erlang.
If the decoding operation is unreasonably expensive, you could memoize the decoding and share/reuse subtrees... though it probably wouldn't fit into a generic tree unfolder and you'd need to write specialized function to do so. If the fitness function is slow enough, it may be fine to use a naive decoder like the one I have listed below. It will fully rebuild the tree each invocation.
import Control.Monad.State.Lazy
import Data.Tree
type MaxArity = Int
type NodeType = Char
treeify :: MaxArity -> [Char] -> Tree NodeType
treeify maxArity (x:xs) = evalState (unfoldTreeM_BF (step maxArity) x) xs
treeify _ [] = fail "empty list"
step :: MaxArity -> NodeType -> State [Char] (NodeType, [NodeType])
step maxArity node = do
xs <- get
-- figure out the actual child node count and use it instead of maxArity
let (children, ys) = splitAt maxArity xs
put ys
return (node, children)
main :: IO ()
main = do
let x = treeify 3 "0138513580135135135"
putStr $ drawTree . fmap (:[]) $ x
return ()

There are a couple of solutions when mutable state in functional programming is required.
Use a different algorithm that solves the same problem. E.g. quicksort is generally regarded as mutable and may therefore be less useful in a functional setting, but mergesort is generally better suited for a functional setting. I can't tell if this option is possible or makes sense in your case.
Even functional programming languages usually provide some way to mutate state. (This blog post seems to show how to do it in Erlang.) For some algorithms and data structures this is indeed the only available option (there's active research on the topic, I think); for example hash tables in functional programming languages are generally implemented with mutable state.
In your case, I'm not so sure immutability really leads to a performance bottleneck. You are right, the (sub)tree will be recreated on update, but the Erlang implementation will probably reuse all the subtrees that haven't changed, leading to O(log n) complexity per update instead of O(1) with mutable state. Also, the nodes of the trees won't be copied but instead the references to the nodes, which should be relatively efficient. You can read about tree updates in a functional setting in e.g. the thesis from Okasaki or in his book "Purely Functional Data Structures" based on the thesis. I'd try implementing the algorithm with an immutable data structure and switch to a mutable one if you have a performance problem.
Also see some relevant SO questions here and here.

I think I figured out how to solve your particular problem with the K trees, (the general problem is too hard :P). My solution is presented in some horrible sort of hybrid Python-like psudocode (I am very slow on my FP today) but it doesn't change a node after you create one (the trick is building the tree bottom-up)
First, we need to find which nodes belong to which level:
levels currsize nodes =
this_level , rest = take currsize from nodes, whats left
next_size = sum of the arities of the nodes
return [this_level | levels next_size rest]
(initial currsize is 1)
So in the +/*abcd, example, this should give you [+, /*, abcd]. Now you can convert this into a tree bottom up:
curr_trees = last level
for level in reverse(levels except the last)
next_trees = []
for root in level:
n = arity of root
trees, curr_trees = take n from curr_trees, whats left
next_trees.append( Node(root, trees) )
curr_trees = next_trees
curr_trees should be a list with the single root node now.
I am pretty sure we can convert this into single assignment Erlang/Haskell very easily now.

Space leak in list program

I am solving some problems of Project Euler in Haskell. I wrote a program for a riddle in it and it did not work as I expected.
When I looked in the task manager when running the program I saw that it was using > 1 gigabyte of RAM on ghc. A friend of me wrote a program with the same meaning in Java and succeeded in 7 seconds.
import Data.List
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) )
$ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]
vw x = hh^2 == x
where hh = (round.sqrt.fromIntegral) x
re = [0..9]
fromDigits x = foldl1 (\n m->10*n+m) x
I know this program would output the number I want given enough RAM and time, but there has to be a better-performing way.

The main problem here is that sequence has a space leak. It is defined like this:
sequence [] = [[]]
sequence (xs:xss) = [ y:ys | y <- xs, ys <- sequence xss ]
so the problem is that the list produced by the recursive call sequence xss is re-used for each of the elements of xs, so it can't be discarded until the end. A version without the space leak is
myseq :: [[a]] -> [[a]]
myseq xs = go (reverse xs) []
where
go [] acc = [acc]
go (xs:xss) acc = concat [ go xss (x:acc) | x <- xs ]
PS. the answer seems to be Just 1229314359627783009
Edit version avoiding the concat:
seqlists :: [[a]] -> [[a]]
seqlists xss = go (reverse xss) [] []
where
go [] acc rest = acc : rest
go (xs:xss) acc rest = foldr (\y r -> go xss (y:acc) r) rest xs
note that both of these versions generate the results in a different order from the standard sequence, so while they work for this problem we can't use one as a specialised version of sequence.

Following on from the answer given by Simon Marlow, here's a version of sequence that avoids the space leak while otherwise working just like the original, including preserving the order.
It still uses the nice, simple list comprehension of the original sequence - the only difference is that a fake data dependency is introduced that prevents the recursive call from being shared.
sequenceDummy d [] = d `seq` [[]]
sequenceDummy _ (xs:xss) = [ y:ys | y <- xs, ys <- sequenceDummy (Just y) xss ]
sequenceUnshared = sequenceDummy Nothing
I think this is a better way of avoiding the sharing that leads to the space leak.
I'd blame the excessive sharing on the "full laziness" transformation. Normally this does a great job of creating sharing that avoids recomputions, but sometimes recompution is very much more efficient than storing shared results.
It'd be nice if there was a more direct way to tell the compiler not to share a specific expression - the above dummy Maybe argument works and is efficient, but it's basically a hack that's just complicated enough that ghc can't tell that there's no real dependency. (In a strict language you don't have these issues because you only have sharing where you explicitly bind a variable to a value.)

EDIT: I think I'm wrong here - changing the type signature to :: Maybe Word64 (which would be enough bits for this problem I think) also takes forever / has a space leak, so it couldn't be the old Integer bug.
Your problem seems to be an old GHC bug (that I thought was fixed) with Integer causing a space leak. The below code finishes in about 150 ms when compiled with -O2.
import Data.List
import Data.Word
main = print opl
opl :: Maybe Word32
opl = find vw $ map (\x-> fromDigits (x++[0,0,9]) ) $ sequence [[1],re,[2],re,[3],re,[4],re,[5],re,[6],re,[7],re,[8],re]
vw x = hh^2 == x
where hh = (round.sqrt.fromIntegral) x
re = [0..9]
fromDigits x = foldl1 (\n m->10*n+m) x

Since you're looking for a nineteen-digit number with those characteristics found in vw, I'd try to simplify the construction in the mapped function just say fromDigits x*1000+9 for starters. Appending to a list is O(length-of-the-left-list), so throwing those last three digits on the end hurts the computation time a bunch.
As an aside (to you both), using the strict version of the fold (foldl1') will also help.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio