Haskell - foldl' in terms of foldr and performance issues - performance

While studying fold in depth with A tutorial on the universality and expressiveness of fold
I found an amazing definition of foldl using foldr:
-- I used one lambda function inside another only to improve reading
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z xs = foldr (\x g -> (\a -> g (f a x))) id xs z
After understanding what is going on, I thought I could even use foldr to define foldl', which would be like this:
foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f z xs = foldr (\x g -> (\a -> let z' = a `f` x in z' `seq` g z')) id xs z
Which is parallel to this:
foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f z (x:xs) = let z' = z `f` x
in seq z' $ foldl' f z' xs
foldl' _ z _ = z
It seems that both of them are running in constant space (not creating thunks) in simple cases like this:
*Main> foldl' (+) 0 [1..1000000]
500000500000
May I consider both definitions of foldl' equivalent in terms of performance?

In GHC 7.10+, foldl and foldl' are both defined in terms of foldr. The reason that they weren't before is that GHC didn't optimize the foldr definition well enough to participate in foldr/build fusion. But GHC 7.10 introduced a new optimization specifically to allow foldr/build fusion to succeed while using foldl' or foldl' defined that way.
The big win here is that an expression like foldl' (+) 0 [1..10] can be optimized down to never allocating a (:) constructor at all. And we all know that the absolute fastest garbage collection is when there's no garbage to collect.
See http://www.joachim-breitner.de/publications/CallArity-TFP.pdf for information on the new optimization in GHC 7.10, and why it was necessary.

Related

Performance gain implementing concatMap with foldl' for finite list?

I read from Foldr Foldl Foldl' that foldl' is more efficient for long finite lists because of the strictness property. I am aware that it is not suitable for infinite list.
Thus, I am limiting the comparison only for long finite lists.
concatMap
concatMap is implemented using foldr, which gives it laziness. However, using it with long finite lists will build up a long unreduced chain according to the article.
concatMap :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap f xs = build (\c n -> foldr (\x b -> foldr c b (f x)) n xs)
Thus I come up with the following implementation with use of foldl'.
concatMap' :: Foldable t => (a -> [b]) -> t a -> [b]
concatMap' f = reverse . foldl' (\acc x -> f x ++ acc) []
Test it out
I have build the following two functions to test out the performance.
lastA = last . concatMap (: []) $ [1..10000]
lastB = last . concatMap' (: []) $ [1..10000]
However, I was shocked by the results.
lastA:
(0.23 secs, 184,071,944 bytes)
(0.24 secs, 184,074,376 bytes)
(0.24 secs, 184,071,048 bytes)
(0.24 secs, 184,074,376 bytes)
(0.25 secs, 184,075,216 bytes)
lastB:
(0.81 secs, 224,075,080 bytes)
(0.76 secs, 224,074,504 bytes)
(0.78 secs, 224,072,888 bytes)
(0.84 secs, 224,073,736 bytes)
(0.79 secs, 224,074,064 bytes)
Follow-up Questions
concatMap outcompetes my concatMap' in both time and memory. I wonder there are mistakes I made in my concatMap' implementation.
Thus, I doubt the articles for stating the goodness of foldl'.
Are there any black magic in concatMap to make it so efficient?
Is it true that foldl' is more efficient for long finite list?
Is it true that using foldr with long finite lists will build up a long unreduced chain and impact the performance?
Are there any black magic in concatMap to make it so efficient?
No, not really.
Is it true that foldl' is more efficient for long finite list?
Not always. It depends on the folding function.
The point is, foldl and foldl' always have to scan the whole input list before producing the output. Instead, foldr does not always have to.
As an extreme case, consider
foldr (\x xs -> x) 0 [10..10000000]
which evaluates to 10 instantly -- only the first element of the list is evaluated. The reduction goes something like
foldr (\x xs -> x) 0 [10..10000000]
= foldr (\x xs -> x) 0 (10 : [11..10000000])
= (\x xs -> x) 10 (foldr (\x xs -> x) 0 [11..10000000])
= (\xs -> 10) (foldr (\x xs -> x) 0 [11..10000000])
= 10
and the recursive call is not evaluated thanks to laziness.
In general, when computing foldr f a xs, it is important to check whether f y ys is able to construct a part of the output before evaluating ys. For instance
foldr f [] xs
where f y ys = (2*y) : ys
produces a list cell _ : _ before evaluating 2*y and ys. This makes it an excellent candidate for foldr.
Again, we can define
map f xs = foldr (\y ys -> f y : ys) [] xs
which runs just fine. It consumes one element from xs and outputs the first output cell. Then it consumes the next element, outputs the next element, and so on. Using foldl' would not output anything until the whole list is processed, making the code quite inefficient.
Instead, if we wrote
sum xs = foldr (\y ys -> y+ys) 0 xs
then we do not output anything after the first element of xs is consumed.
We build a long chain of thunks, wasting a lot of memory.
Here, foldl' would instead work in constant space.
Is it true that using foldr with long finite lists will build up a long unreduced chain and impact the performance?
Not always. It strongly depends on how the output is consumed by the caller.
As a thumb rule, if the output is "atomic", meaning that the output consumer can not observe only a part of it (e.g. Bool, Int, ...) then it's better to use foldl'. If the output is "composed" of many independent values (list, trees, ...) probably foldr is a better choice, if f can produce its output step-by-step, in a "streaming" fashion.

Sorting a list using a "a -> a -> Maybe Ordering" function

Is there a variant of
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
(in Data.List) that allows me to use a a -> a -> Maybe Ordering sorting function instead of a -> a -> Ordering?
What this variant would do is this:
sortBy' :: (a -> a -> Maybe Ordering) -> [a] -> Maybe [a]
If a -> a -> Maybe Ordering ever returns Nothing when it's called during the sort, sortBy' would return Nothing. Otherwise it would return the sorted list wrapped in Just.
If such a variant is not already available, can you please help me construct one? (Preferably one that is at least as efficient as sortBy.)
You can adapt quickSort :
quickSortBy :: (a -> a -> Maybe Ordering) -> [a] -> Maybe [a]
quickSortBy f [] = Just []
quickSortBy f (x:xs) = do
comparisons <- fmap (zip xs) $ mapM (f x) xs
sortLesser <- quickSortBy f . map fst $ filter ((`elem` [GT, EQ]) . snd) comparisons
sortUpper <- quickSortBy f . map fst $ filter ((== LT) . snd) comparisons
return $ sortLesser ++ [x] ++ sortUpper
At least assume that your sorting predicate f :: a -> a -> Maybe Ordering is anti-symmetric : f x y == Just LT if and only if f y x == Just GT. Then when quickSortBy f returns Just [x1,...,xn], I think you have this guarantee : for all i in [1..n-1], f xi x(i+1) is Just LT or Just EQ.
When in particular f is a partial order (transitive), then [x1,...,xn] is totally ordered.

Performance of Foldable's default methods

I've been exploring the Foldable class and also the the Monoid class.
Firstly, lets say I want to fold over a list of the Monoid First. Like so:
x :: [First a]
fold? mappend mempty x
Then I assume in this case the most appropriate fold would be foldr, as mappend for First is lazy in it's second argument.
Conversely, for Last we'd want to foldl' (or just foldl I'm not sure).
Now moving away from lists, I've defined a simple binary tree like so:
{-# LANGUAGE GADTs #-}
data BinaryTree a where
BinaryTree :: BinaryTree a -> BinaryTree a -> BinaryTree a
Leaf :: a -> BinaryTree a
And I've made it Foldable with the most straightforward definition:
instance Foldable BinaryTree where
foldMap f (BinaryTree left right) =
(foldMap f left) `mappend` (foldMap f right)
foldMap f (Leaf x) = f x
As Foldable defines fold as simply foldMap id we can now do:
x1 :: BinaryTree (First a)
fold x1
x2 :: BinaryTree (Last a)
fold x2
Assuming our BinaryTree is balanced, and there's not many Nothing values, these operations should take O(log(n)) time I believe.
But Foldable also defines a whole lot of default methods like foldl, foldl', foldr and foldr' based on foldMap.
These default definitions seem to be implemented by composing a bunch of functions, wrapped in a Monoid called Endo, one for each element in the collection, and then composing them all.
For the purpose of this discussion I am not modifying these default definitions.
So lets now consider:
x1 :: BinaryTree (First a)
foldr mappend mempty x1
x2 :: BinaryTree (Last a)
foldl mappend mempty x2
Does running these retain O(log(n)) performance of the ordinary fold? (I'm not worried about constant factors for the moment). Does laziness result in the tree not needing to be fully traversed? Or will the default definitions of foldl and foldr require an entire traversal of the tree?
I tried to go though the algorithm step by step (much like they did on the Foldr Foldl Foldl' article) but I ended up completely confusing myself as this is a bit more complex as it involves an interaction between Foldable, Monoid and Endo.
So what I'm looking for is an explanation of why (or why not) the default definition of say foldr, would only take O(log(n)) time on a balanced binary tree like above. A step by step example like what's from the Foldr Foldl Foldl' article would be really helpful, but I understand if that's too difficult, as I totally confused myself attempting it.
Yes, it has O(log(n)) best case performance.
Endo is a wrapper around (a -> a) kind of functions that:
instance Monoid (Endo a) where
mempty = Endo id
Endo f `mappend` Endo g = Endo (f . g)
And the default implementation of foldr in Data.Foldable:
foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = appEndo (foldMap (Endo #. f) t) z
The definition of . (function composition) in case:
(.) f g = \x -> f (g x)
Endo is defined by newtype constructor, so it only exists at compile stage, not run-time.
#. operator changes the type of it's second operand and discard the first.
The newtype constructor and #. operator guarantee that you can ignore the wrapper when considering performance issues.
So the default implementation of foldr can be reduced to:
-- mappend = (.), mempty = id from instance Monoid (Endo a)
foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = foldMap f t z
For your Foldable BinaryTree:
foldr f z t
= foldMap f t z
= case t of
Leaf a -> f a z
-- what we care
BinaryTree l r -> ((foldMap f l) . (foldMap f r)) z
The default lazy evaluation in Haskell is ultimately simple, there are just two rules:
function application first
evaluate the arguments from left to right if the values matter
That makes it easy to trace the evaluation of the last line of the code above:
((foldMap f l) . (foldMap f r)) z
= (\z -> foldMap f l (foldMap f r z)) z
= foldMap f l (foldMap f r z)
-- let z' = foldMap f r z
= foldMap f l z' -- height 1
-- if the branch l is still not a Leaf node
= ((foldMap f ll) . (foldMap f lr)) z'
= (\z -> foldMap f ll (foldMap f lr)) z'
= foldMap f ll (foldMap f lr z')
-- let z'' = foldMap f lr z'
= foldMap f ll z'' -- height 2
The right branch of the tree is never expanded before the left has been fully expanded, and it goes one level higher after an O(1) operation of function expansion and application, therefore when it reached the left-most Leaf node:
= foldMap f leaf#(Leaf a) z'heightOfLeftMostLeaf
= f a z'heightOfLeftMostLeaf
Then f looks at the value a and decides to ignore its second argument (like what mappend will do to First values), the evaluation short-circuits, results O(height of the left-most leaf), or O(log(n)) performance when the tree is balanced.
foldl is all the same, it's just foldr with mappend flipped i.e. O(log(n)) best case performance with Last.
foldl' and foldr' are different.
foldl' :: (b -> a -> b) -> b -> t a -> b
foldl' f z0 xs = foldr f' id xs z0
where f' x k z = k $! f z x
At every step of reduction, the argument is evaluated first and then the function application, the tree will be traversed i.e. O(n) best case performance.

What's the meaning of strict version in haskell?

Follow <Real World Haskell> , it is said foldl' are strict version of foldl.
But it's hard for me to understand , what does strict mean??
foldl f z0 xs0 = lgo z0 xs0
where
lgo z [] = z
lgo z (x:xs) = lgo (f z x) xs
foldl' f z0 xs0 = lgo z0 xs0
where lgo z [] = z
lgo z (x:xs) = let z' = f z x in z' `seq` lgo z' xs
It is not widely known, but foldl' is actually non-strict in its accumulator argument! Recall the type:
foldl' :: (a -> b -> a) -> a -> [b] -> a
Its strictness in argument 2 depends on the strictness of the function given for argument 1, as you see if you pass const:
Prelude Data.List> foldl' (const (+1)) undefined [1]
2
Prelude Data.List> foldl' (const (+1)) undefined [1..4]
5
You would have thought, naively, that "foldl' is strict" means "strict in the accumulator argument". The above contradicts that.
However, it is even more insidious, as the strictness is only on the result of the function application in the cons case of the loop. So you still get bottoms if you enter the base case, but not the inductive case:
Prelude Data.List> foldl' (const (+1)) undefined []
*** Exception: Prelude.undefined
So the strictness in argument 2 also depends on the value of argument 3!
This is how I'd write it: "fully" strict in its 2nd argument.
foldl' f z0 xs0 = go z0 xs0
where
go !z [] = z
go !z (x:xs) = go (f z x) xs
Which is truly strict in its second argument, as you can see :
Prelude Data.List.Stream> foldl' (\a b -> 1) undefined [undefined]
*** Exception: Prelude.undefined
Compared with the Haskell2010 version:
Prelude Data.List.Stream> Data.List.foldl' (\a b -> 1 ) undefined [undefined]
1
This actuall has a practical impact -- the current definition will not unbox its accumulator argument consistently.
Historical note: this was discovered when we were specifying the list library's strictness semantics for the stream fusion paper in 2007, and the approach to specifying strictness is given in Duncan Coutt's PhD thesis.
foldl and (the strict) foldl' are close to semantically equivalent. The difference is in performance, especially when you are transversing a large list. The laziness has an overhead of building a thunk and foldl' is the more efficient way to arrive at that result because it doesn't build a huge thunk.
There is a really good article explaining this in detail on Haskell Wiki
Strict functions works like functions in C or other languages in that their arguments are generally eagerly evaluated.
A strict function is a function whose arguments are evaluated before the body is.

Optimisations with folds

I am just curious if there are any (first order polymorphic only) optimisations with folds.
For maps, there's deforestation: map g (map f ls) => map (g . f) ls, and rev (map f ls) => rev_map f ls (faster in Ocaml).
But fold is so powerful, it seems to defy any optimisation.
The obvious ones:
fold_left f acc (List.map g li) => fold_left (fun acc x -> f acc (g x)) acc li
fold_right f li acc => fold_left f acc li (* if (f,acc) is a monoid *)
You may be interested in the classical paper on the topic, "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire". Beware however that it is technical and has impenetrable notation.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.125
Edit: my first version of the first rule was wrong, edited thanks to vincent-hugot.
You can use deforestation on folds. In fact, map/map fusion is a special case of that.
The trick is to replace list construction by a special build function:
build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
Now, using the standard definition of foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr c n [] = n
foldr c n (x:xs) = c x (foldr c n xs)
We have the following equivalence:
foldr c n (build g) == g c n
(Actually this is only true under certain, but common, circumstances. For details see "Correctness of short-cut fusion").
If you write your list producing functions (including map) using build and your consumers using foldr, then the above equality can remove most intermediate lists. Haskell's list comprehensions are translated into combinations of build and foldr.
The downside of this approach is that it cannot handle left folds. Stream Fusion handles this just fine, though. It expresses list producers and transformers as streams (coinductive datatypes, kind of like iterators). The above paper is very readable, so I recommend taking a look.
The "bananas" paper mentioned by gasche goes into more details about kinds of folds and their equivalences.
Finally, there is Bird and Moor's "Algebra of Programming", which mentions transformations such as combining two folds into one.
If you're interested going a bit deeper into theory, I suggest you to read something about catamorphisms, anamorphisms and hylomorphisms. While the category theory surrounding it may seem to be a bit scary, the concept isn't that difficult.
Catamorphisms are functions that consume recursive data structures and produce some kind of a value. Anamorphisms are functions that given some value (a kind of a seed) produce recursive data structures. In particular, foldr and build mentioned in the other anwers are functions to build catamorphisms and anamorphisms on lists. But this concept can be applied to basically any recursive data structure, such as different kinds of trees etc.
Now if you build a recursive data structure with an anamorphism and then consume it with a catamorphism, you get what is called a hylomorphism. In such a case, you actually don't need the intermediate structure. You can skip creating it and destroying it. This is often called deforestation.
Concerning map: This function is interesting that it's both a catamorphism and an anamorphism:
map consumes a list and produces something; but also
map produces a list, consuming something.
So you can view the composition of two maps map f . map g as a composition of an anamorphism (map g) with a catamorphism (map f), forming a hylomorphism. So you know can optimize (deforest) by not creating the intermediate list.
To be specific: You could write map in two ways, one using foldr and the other using build:
mapAna :: (a -> b) -> [a] -> [b]
mapAna f xs = build (mapAna' f xs)
mapAna' :: (a -> b) -> [a] -> (b -> c -> c) -> c -> c
mapAna' f [] cons nil = nil
mapAna' f (x : xs) cons nil = (f x) `cons` (mapAna' f xs cons nil)
mapCata :: (a -> b) -> [a] -> [b]
mapCata f xs = foldr (\x ys -> f x : ys) [] xs
and the composition map f (map g zs) as mapCata f (mapAna g zs), which after some simplifications and applying foldr c n (build g) == g c n results in map (f . g).

Resources