Why is it impossible to Applicative-traverse arrays? (Or is it?) - performance

While pondering how to best map, i.e. traverse, an a -> Maybe a-Kleisli over an unboxed vector, I looked for an existing implementation. Obviously U.Vector is not Traversable, but it does supply a mapM, which for Maybe of course works just fine.
But the question is: is the Monad constraint really needed? Well, it turns out that even boxed vectors cheat for the Traversable instance: they really just traverse a list, which they convert from/to:
instance Traversable.Traversable Vector where
{-# INLINE traverse #-}
traverse f xs = Data.Vector.fromList Applicative.<$> Traversable.traverse f (toList xs)
mono-traversable does the same thing also for unboxed vectors; here this seems even more gruesome performance-wise.
Now, I wouldn't be surprised if vector was actually able to fuse many of these hacked traversals into a far more efficient form, but still – there seems to be a fundamental problem, preventing us from implementing a traversal on an array right away. Is there any “deep reason” for this inability?

After reading through the relevant source of vector and trying to make mapM work with Applicative I think the reason why Data.Vector.Unboxed.Vector doesn't have a traverse :: (Applicative f, Unbox a, Unbox b) -> (a -> f b) -> Vector a -> f (Vector b) function and Data.Vector.Vector doesn't have a native traverse is the fusion code. The offender is the following Stream type:
-- Data/Vector/Fusion/Stream/Monadic.hs Line: 137
-- | Result of taking a single step in a stream
data Step s a where
Yield :: a -> s -> Step s a
Skip :: s -> Step s a
Done :: Step s a
-- | Monadic streams
data Stream m a = forall s. Stream (s -> m (Step s a)) s
This is used internally to implement mapM. The m will be the same as from your initial call to Data.Vector.Unboxed.mapM. But because the spine of this stream is inside the m functor, it is not possible to work with it if you only have an applicative for m.
See also this issue on the vector GitHub repo: Weaken constraint on mapM.
Disclaimer: I don't really know how fusion works. I don't know how vector works.

Related

Is there a way to receive information on execution order(Specifically for sieve of erastothenes)?

I am attempting to understand one of the prime number algorithms enumerated here: https://wiki.haskell.org/index.php?title=Prime_numbers&oldid=36858#Postponed_Filters_Sieve, specifically:
primes :: [Integer]
primes = 2: 3: sieve (tail primes) [5,7..]
where
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
-- or: filter ((/=0).(`rem`p)) t
where (h,~(_:t)) = span (< p*p) xs
So conceptually I understand how this algorithm works (sieve of Erastothenes), start with 2,3, and a list of numbers, then eliminate any that are greater than the previous square and divisible by any below it.
But I'm having a hard time following along with the nested recursive step (prime calles sieve on primes, which calls sieve on primes which...)
I understand that this works due to lazy evaluation, and it demonstrably produces the right result, but I am incapable of following it.
So for example if I were to run take 5 primes what would actually happen:
e.g (I will refer to the result of the take operation as t for ease of reading/reasoning):
Step 1)
primes returns a list [2,3, xs]
so t is [2,3, take 3 xs]
where xs is sieve (tail primes) [5,7..]
Step 2)
tail primes is 3:xs
where xs is sieve (tail primes) [5,7..]
etc
so t should now be [2,3,3,3,3,3...]
I have little trouble following sieve itself...
So I guess I have two questions.
1) How exactly does this algorithm actually work, and where/why is my trace wrong
2) Is there a way, generally, in Haskell to figure out what order things are running in? Maybe print a recursion tree? Or at the very least drop in a debugger halt?
I took the liberty of de-optimizing and clarifying the algorithm a little bit:
primes :: [Integer]
primes = 2 : sieve primes [3 ..]
sieve :: [Integer] -> [Integer] -> [Integer]
sieve [] xs = xs -- degenerate case for testing
sieve (p:ps) xs = h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h, t) = span (< p*p) xs
This is the same base logic, but it does a lot more redundant work (a constant factor per output value, though) than the version you provided. I think that's a simpler starting point, and once you understand how this version works, it's easy to see what the optimizations do. I also pulled sieve into its own definition. It didn't use anything from its enclosing scope, and the ability to test it standalone might help with understanding what's going on.
If you'd like to peek into how evaluation proceeds, you can use the Debug.Trace module. The two functions I use most from it are trace and traceShow, depending on the value I want to see.
So, let's get a bit of tracing info from sieve:
import Debug.Trace
primes :: [Integer]
primes = 2 : sieve primes [3 ..]
sieve :: [Integer] -> [Integer] -> [Integer]
sieve [] xs = trace "degenerate case for testing" xs
sieve (p:ps) xs = traceShow (p, h) $ h ++ sieve ps [x | x <- t, x `rem` p /= 0]
where (h, t) = span (< p*p) xs
And to test it out:
ghci> take 10 primes
[2(2,[3])
,3(3,[5,7])
,5,7(5,[11,13,17,19,23])
,11,13,17,19,23(7,[29,31,37,41,43,47])
,29]
Well, that's a lot less clear than hoped. When ghci prints out a result, it uses the Show instance for the result's type. And the Show instance for [Integer] is lazy itself, so the printing of the list is getting interleaved with the tracing. To do better, let's have ghci produce a value that won't be output until after the tracing is complete. The sum should do:
ghci> sum $ take 10 primes
129
That was.. less than useful. Where'd the tracing go? Well, remember that the tracing functions are very impure. Their explicit goal is to produce side effects. But GHC doesn't respect side effects. It assumes that all functions are pure. One result of that assumption is that it can store the result of evaluating expressions. (Whether it does so or not depends on whether there is a shared reference or CSE optimizations kick in. In this case, primes itself is a shared reference.)
Maybe if we ask it to evaluate further than it has so far?
ghci> sum $ take 20 primes
(11,[53,59,61,67,71,73,79,83,89,97,101,103,107,109,113])
639
Ok, the tracing is separate from ghci's output as desired. But it's not really very informative at that point. To get a better picture, it needs to start back at the beginning. To do that, we need to get ghci to unload the definition of primes so that it will re-evaluate it from scratch. There are a bunch of ways to do this, but I'll demonstrate a method that has some additional ways to be useful.
ghci> :load *sieve.hs
[1 of 1] Compiling Main ( sieve.hs, interpreted )
Ok, modules loaded: Main.
By putting the * in front of the file name in the :load command, I instructed ghci to interpret the source from scratch, regardless of its current state. This works in this case because it forces a re-interpretation even though the source hasn't changed. It also is useful when you want to use :load on a source that has compiled output in the current directory, and have it interpret the whole module, not just load the compiled code.
ghci> sum $ take 10 primes
(2,[3])
(3,[5,7])
(5,[11,13,17,19,23])
(7,[29,31,37,41,43,47])
129
Now, let's get into how the algorithm actually works. The first thing to look into what the components of the tracing output are. The first element is the prime whose multiples are being sieved out of the potential outputs. The second element is the list of values being accepted as primes because they're less than p*p, and all non-primes less than that have already been removed from the candidate list. The mechanics of that should be familiar from any study of the sieve of Eratosthenes.
The calls to sieve start with sieve primes [3..]. The first place laziness critically comes into play is the pattern match on the first argument. The (:) constructor is already known, so the pattern matches p to the literal 2, and ps to an unevaluated expression. It's very important that it's unevaluated, because this call to sieve is what calculates the value. If it forced it to be evaluated to proceed, it would introduce a circular data dependency, which results in an infinite loop.
As the tracing indicates, the prime being used to remove elements from the candidates is 2. The call to span splits the input [3..] into ([3], [4..]). h is [3], as demonstrated by the tracing output. So the result of the call to sieve is [3] ++ <recursive call to sieve>. This is the second place laziness critically comes into play in the algorithm. The implementation of (++) doesn't do anything at all with its second argument until it has already produced the prefix of the list. This means that before the recursive call to sieve is evaluated, it's known that ps refers to a thunk that evaluates to [3] ++ <recursive call>.
That's enough information to handle the recursive call to sieve. Now, p is matched to 3, ps is matched to a thunk, and the logic continues. The tracing should illustrate what's going on at this point.
Now, the version you started with does a few things to optimize. First, it observes that the first element of t is always going to equal p*p, and it uses pattern matching to eliminate that element without doing any remainder calculation on it. This is a small saving per prime examined, but it is a clear saving.
Second, it skips filtering out the multiples of two, and just doesn't generate them in the first place. This reduces the amount of elements generated to be filtered later by a factor of two, and it reduces the number of filters being applied to each odd element by one.
As an aside, note that the stacking filter behavior is actually algorithmically significant, and not faithful to the sieve of Eratosthenes as described in literature. For further discussion of this, see The Genuine Sieve of Eratosthenes by Melissa O'Neill.

Example of a matrix as Applicative functor

I already asked a similar question but it was not clear enough, so I decided to rephrase it.
I know that a matrix is an applicative functor but not a monad. I am wondering if there is a simple and practical example of <*> for matrices.
A possible Applicative instance for matrices would be to make it analogous to ZipList. With F a matrix of functions, and X a matrix of values, F <*> X applies each function in F pointwise to each value in X. The result is truncated in each dimension to fit the shortest matrix. pure f gives an infinite matrix with f at each point. As an example example, the pointwise matrix multiplication is then (*) <$> A <*> B.
In stead of truncation and working with infinity, you could fix the shape of the matrix by using a phantom type parameter as used in accelerate. Of course, then you could also declare a Monad instance just like with fixed size ziplists.

Why doesn't sortBy take (a -> a -> Bool)?

The Haskell sortBy function takes (a -> a -> Ordering) as its first argument. Can anyone educate me as to what the reasoning is there? My background is entirely in languages that have a similar function take (a -> a -> Bool) instead, so having to write one that returns LT/GT was a bit confusing.
Is this the standard way of doing it in statically typed/pure functional languages? Is this peculiar to ML-descended languages? Is there some fundamental advantage to it that I'm not seeing, or some hidden disadvantage to using booleans instead?
Summarizing:
An Ordering is not GT | LT, it's actually GT | EQ | LT (apparently GHC doesn't make use of this under the hood for the purposes of sorting, but still)
Returning a trichotomic value more closely models the possible outcomes of a comparison of two elements
In certain cases, using Ordering rather than a Bool will save a comparison
Using an Ordering makes it easier to implement stable sorts
Using an Ordering makes it clear to readers that a comparison between two elements is being done (a boolean doesn't inherently carry this meaning, though I get the feeling many readers will assume it)
I'm tentatively accepting Carl's answer, and posting the above summary since no single answer has hit all the points as of this edit.
I think Boolean Blindness is the main reason. Bool is a type with no domain semantics. Its semantics in the case of a function like sortBy come entirely from convention, not from the domain the function is operating on.
This adds one level of indirection to the mental process involved in writing a comparison function. Instead of just saying "the three values I can return are less than, equal, or greater", the semantic building blocks of ordering, you say "I want to return less than, so I must convert it to a boolean." There's an extra mental conversion step that's always present. Even if you are well-versed in the convention, it still slows you down a bit. And if you're not well-versed in the convention, you are slowed down quite a bit by having to check to see what it is.
The fact that it's 3-valued instead of 2-valued means you don't need to be quite as careful in your sort implementation to get stability, either - but that's a minor implementation detail. It's not nearly as important as actually having your values have meanings. (Also, Bool is no more efficient than Ordering. It's not a primitive in Haskell. They're both algebraic data types defined in libraries.)
When you sort things, you put them in order; there's not a "truth" value to determine.
More to the point, what would "true" mean? That the first argument is less than the second? Greater than? Now you're overriding "true" to really mean "less than" (or "greater than", depending on how you choose to implement the function). And what if they're equal?
So why not cut out the middle man, so to speak, and return what you really mean?
There's no reason it couldn't. If you look at the ghc implementation, it only checks whether the result is GT or not. The Haskell Report version of the code uses insertBy, which likewise only checks for GT or not. You could write the following and use it without any problem:
sortByBool :: (a -> a -> Bool) -> [a] -> [a]
sortByBool lte = sortBy (\x y -> if lte x y then LT else GT)
sort' :: Ord a => [a] -> [a]
sort' = sortByBool (<=)
Some sorts could conceivably perform optimizations by knowing when elements are EQ, but the implementations currently used do not need this information.
I think there were two separate design decisions:
1) Creating the Ordering type
2) Choosing for sortBy to return an Orderingvalue
The Ordering type is useful for more than just sortBy - for example, compare is the "centerpiece" of the Ord typeclass. Its type is :: Ord a => a -> a -> Ordering. Given two values, then, you can find out whether they're less than, greater than, or equal -- with any other comparison function ((<), (<=), (>), (>=)), you can only rule out one of those three possibilities.
Here's a simple example where Ordering (at least in my opinion) makes a function's intent a little clearer:
f a b =
case compare a b of
GT -> {- something -}
LT -> {- something -}
EQ -> {- something -}
Once you've decided to create the Ordering type, then I think it's natural to use it in places where that's the information you're truly looking for (like sortBy), instead of using Bool as a sort of workaround.
Three valued Ordering is needed to save comparisons in cases where we do need to distinguish the EQ case. In duplicates-preserving sort or merge, we ignore the EQ case, so a predicate with less-then-or-equal semantics is perfectly acceptable. But not in case of union or nubSort where we do want to distinguish the three outcomes of comparison.
mergeBy lte (x:xs) (y:ys)
| lte y x = y : mergeBy lte (x:xs) ys
| otherwise = x : mergeBy lte xs (y:ys)
union (x:xs) (y:ys) = case compare x y of
LT -> x : union xs (y:ys)
EQ -> x : union xs ys
GT -> y : union (x:xs) ys
Writing the latter one with lte predicate is unnatural.

Purely functional set

Is there an algorithm that implements a purely functional set?
Expected operations would be union, intersection, difference, element?, empty? and adjoin.
Those are not hard requirements though and I would be happy to learn an algorithm that only implements a subset of them.
You can use a purely functional map implementation, where you just ignore the values.
See http://hackage.haskell.org/packages/archive/containers/0.1.0.1/doc/html/Data-IntMap.html (linked to from https://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki ).
(sidenote: For more information on functional datastructures, see http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504 )
A purely functional implementation exists for almost any data structure. In the case of sets or maps, you typically use some form of search tree, e.g. red/black trees or AVL trees. The standard reference for functional data structures is the book by Okasaki:
http://www.cambridge.org/gb/knowledge/isbn/item1161740/
Significant parts of it are available for free via his thesis:
http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf
The links from the answer by #ninjagecko are good. What I've been following recently are the Persistent Data Structures used in Clojure, which are functional, immutable and persistent.
A description of the implementation of the persistent hash map can be found in this two-part blog post:
http://blog.higher-order.net/2009/09/08/understanding-clojures-persistenthashmap-deftwice/
http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii/
These are implementations of some of the ideas (see the first answer, first entry) found in this reference request question.
The sets that come out of these structures support the functions you need:
http://clojure.org/data_structures#Data Structures-Sets
All that's left is to browse the source code and try to wrap your head around it.
Here is an implementation of a purely functional set in OCaml (it is the standard library of OCaml).
Is there an algorithm that implements a purely functional set?
You can implement set operations using many different purely functional data structures. Some have better complexity than others.
Examples include:
Lists
Where we have:
List Difference:
(\\) :: Eq a => [a] -> [a] -> [a]
The \\ function is list difference ((non-associative). In the result of xs \\ ys, the first occurrence of each element of ys in turn (if any) has been removed from xs. Thus
union :: Eq a => [a] -> [a] -> [a]
The union function returns the list union of the two lists. For example,
"dog" `union` "cow" == "dogcw"
Duplicates, and elements of the first list, are removed from the the second list, but if the first list contains duplicates, so will the result. It is a special case of unionBy, which allows the programmer to supply their own equality test.
intersect :: Eq a => [a] -> [a] -> [a]
The intersect function takes the list intersection of two lists. For example,
[1,2,3,4] `intersect` [2,4,6,8] == [2,4]
If the first list contains duplicates, so will the result.
Immutable Sets
More efficient data structures can be designed to improve the complexity of set operations. For example, the standard Data.Set library in Haskell implements sets as size-balanced binary trees:
Stephen Adams, "Efficient sets: a balancing act", Journal of Functional Programming 3(4):553-562, October 1993, http://www.swiss.ai.mit.edu/~adams/BB/.
Which is this data structure:
data Set a = Bin !Size !a !(Set a) !(Set a)
| Tip
type Size = Int
Yielding complexity of:
union, intersection, difference: O(n+m)

Is this implementation tail-recursive

I read in an algorithmic book that the Ackermann function cannot be made tail-recursive (what they say is "it can't be transformed into an iteration"). I'm pretty perplex about this, so I tried and come up with this:
let Ackb m n =
let rec rAck cont m n =
match (m, n) with
| 0, n -> cont (n+1)
| m, 0 -> rAck cont (m-1) 1
| m, n -> rAck (fun x -> rAck cont (m-1) x) m (n-1)
in rAck (fun x -> x) m n
;;
(it's OCaml / F# code).
My problem is, I'm not sure that this is actually tail recursive. Could you confirm that it is? If not, why? And eventually, what does it mean when people say that the Ackermann function is not primitive recursive?
Thanks!
Yes, it is tail-recursive. Every function can be made tail-rec by an explicit transformation to Continuation Passing Style.
This does not mean that the function will execute in constant memory : you build stacks of continuations that must be allocated. It may be more efficient to defunctionalize the continuations to represent that data as a simple algebraic datatype.
Being primitive recursive is a very different notion, related to expressiveness of a certain form of recursive definition that is used in mathematical theory, but probably not very much relevant to computer science as you know it: they are of very reduced expressiveness, and systems with function composition (starting with Gödel's System T), such as all current programming languages, are much more powerful.
In term of computer languages, primtive recursive functions roughly correspond to programs without general recursion where all loop/iterations are statically bounded (the number of possible repetitions is known).
Yes.
By definition, any recursive function can be transformed into an iteration as long as it has access to an unbounded stack-like construct. The interesting question is whether it can be done without a stack or any other unbounded data storage.
A tail-recursive function can be turned into such an iteration only if the size of its arguments is bounded. In your example (and almost any recursive function that uses continuations), the cont parameter is for all means and purposes a stack that can grow to any size. Indeed, the entire point of continuation-passing style is to store data usually present on the call stack ("what to do after I return?") in a continuation parameter instead.

Resources