How to optimize the speed of a numerical library in Haskell

How to optimize the speed of a numerical library in Haskell - performance

I have released a small numerical library for solving delay differential equations: http://github.com/masterdezign/dde
The main technical constraints:
Make the library flexible with respect to the number of dynamical variables (x(t), y(t),...), i.e. a State of the dynamical system for a given moment of time.
Easily integrate with libraries that exploit Data.Vector.Storable (e.g. hmatrix). Therefore, as input/output I extensively employ Data.Vector.Storable.
As a result, unlike in this solution:
How do I optimize numerical integration performance in Haskell (with example),
newtype State = State { _state :: V.Vector Double } is used
instead of data State = State {-# UNPACK #-} !Double {-# UNPACK #-} !Double. However, the library runs twice slower now.
The question: is there any way to bring both the speed of data State = State {-# UNPACK #-}... and the flexibility of newtype State = State { _state :: V.Vector Double } for unspecified number of variables? Shall I consider Template Haskell to create data UNPACK-like structures during the compilation time?

I wouldn't use any particular vector implementation. Variable-length types like Data.Vector are a bad choice not only because the extra length information is a considerable overhead when the dimension of the space is low, also because you lose any type-system guarantee that the dimensions match.
Instead, you should make everything parametric on the choice of vector space. I.e, you effectively make the dimension a compile-time variable, plus you allow vector types with some meaningfully-named subvariables.
import Data.VectorSpace
import Data.AdditiveGroup
newtype Stepper1 state = Stepper1 {
_stepper
:: Double
-> RHS state -- parameterised in a similar way
-> state
-> (Double, Double)
-> (Double, Double)
-> state
}
rk₄ :: VectorSpace v => Stepper1 v
rk₄ = Stepper1 _rk4
where _rk4 hStep rhs' y₀ ... = y₀ ^+^ (h/6)*^(k₁ ^+^ 2*^k₂ ^+^ 2*^k₃ ^+^ k₄)
where k₁ = rhs' (y₀, ...)
k₂ = rhs' (y₀ ^+^ (h/2)*^k₁, ...)
k₃ = rhs' (y₀ ^+^ (h/2)*^k₂, ...)
k₄ = rhs' (y₀ ^+^ h*^k₃, ...)
Then, what particular implementation it will be can be chosen by the user. For a 2-dimensional vector, a standard choice is V2 from the linear package; it is re-exported with VectorSpace instances from free-vector-spaces. But for testing, you could also just use plain old tuples, which have a VectorSpace instance right in vector-space. Of course, wrapping around the types from hmatrix would also be possible, but this would really not be good for performance – better just convert the final results, if necessary.
To get the best performance, you may need to use some {-# INLINE #-} pragmas. Bang patterns OTOH don't usually bring much performance advantage – most important is that the types are strict, and unboxed. It's certainly no use to pre-emptively put a bang before every variable definition – those won't have any effect anyway, because a CAF is only evaluated when it's used.
I would be excited to hear about the performance you get in the end! If it's notably worse than your original State {-# UNPACK #-} !Double {-# UNPACK #-} !Double, that's something we should investigate.

Related

Iterate State Monad and Collect Results in Sequence with Good Performance

I implemented the following function:
iterateState :: Int -> (a -> State s a) -> (a -> State s [a])
iterateState 0 f a = return []
iterateState n f a = do
b <- f a
xs <- iterateState (n - 1) f b
return $ b : xs
My primary use case is for a = Double. It works, but it is very slow. It allocates 528MB of heap space to produce a list of 1M Double values and spends most of its time doing garbage collection.
I have experimented with implementations that work on the type s -> (a, s) directly as well as with various strictness annotations. I was able to reduce the heap allocation somewhat, but not even close to what one would expect from a reasonable implementation. I suspect that the resulting ([a], s) being a combination of something to be consumed lazily ([a]) and something whose WHNF forces the entire computation (s) makes optimization difficult for GHC.
Assuming that the iterative nature of lists would be unsuitable for this situation, I turned to the vector package. To my delight, it already contains
iterateNM :: (Monad m, Unbox a) => Int -> (a -> m a) -> a -> m (Vector a)
Unfortunately, this is only slightly faster than my list implementation, still allocating 328MB of heap space. I assumed that this is because it uses unstreamM, whose description reads
Load monadic stream bundle into a newly allocated vector. This function goes through a list, so prefer using unstream, unless you need to be in a monad.
Looking at its behavior for the list monad, it is understandable that there is no efficient implementation for general monads. Luckily, I only need the state monad, and I found another function that almost fits the signature of the state monad.
unfoldrExactN :: Unbox a => Int -> (b -> (a, b)) -> b -> Vector a
This function is blazingly fast and performs no excess heap allocation beyond the 8MB needed to hold the resulting unboxed vector of 1M Double values. Unfortunately, it does not return the final state at the end of the computation, so it cannot be wrapped in the State type.
I looked at the implementation of unfoldrExactN to see if I could adjust it to expose the final state at the end of the computation. Unfortunately, this seems to be difficult, as the stream constructed by
unfoldrExactN :: Monad m => Int -> (s -> (a, s)) -> s -> Stream m a
which is eventually expanded into a vector by unstream has already forgotten the state type s.
I imagine I could circumvent the entire Stream infrastructure and implement iterateState directly on mutable vectors in the ST monad (similarly to how unstream expands a stream into a vector). However, I would lose all the benefits of stream fusion, as well as turning a computation that is easily expressed as a pure function into imperative low-level mush just for performance reasons. This is particularly frustrating while knowing that the existing unfoldrExactN already calculates all the values I want, but I have no access to them.
Is there a better way?
Can this function be implemented in a purely functional way with reasonable performance and no excess heap allocations? Preferably in a way that ties into the vector package and its stream fusion infrastructure.

The following program has 12MB max residency on my computer when compiled with optimizations:
import Data.Vector.Unboxed
import Data.Vector.Unboxed.Mutable
iterateNState :: Unbox a => Int -> (a -> s -> (s, a)) -> (a -> s -> (s, Vector a))
iterateNState n f a0 s0 = createT (unsafeNew n >>= go 0 a0 s0) where
go i a s arr
| i >= n = pure (s, arr)
| otherwise = do
unsafeWrite arr i a
case f a s of
(s', a') -> go (i+1) a' s' arr
main = id
. print
. Data.Vector.Unboxed.sum
. snd
$ iterateNState 1000000 (\a s -> (s+1, a+s :: Int)) 0 0
(It continues to have a nice low residency even when the final two 0s are read from input dynamically.)

Efficiency of unfoldr versus zipWith

Over on Code Review, I answered a question about a naive Haskell fizzbuzz solution by suggesting an implementation that iterates forward, avoiding the quadratic cost of the increasing number of primes and discarding modulo division (almost) entirely. Here's the code:
fizz :: Int -> String
fizz = const "fizz"
buzz :: Int -> String
buzz = const "buzz"
fizzbuzz :: Int -> String
fizzbuzz = const "fizzbuzz"
fizzbuzzFuncs = cycle [show, show, fizz, show, buzz, fizz, show, show, fizz, buzz, show, fizz, show, show, fizzbuzz]
toFizzBuzz :: Int -> Int -> [String]
toFizzBuzz start count =
let offsetFuncs = drop (mod (start - 1) 15) fizzbuzzFuncs
in take count $ zipWith ($) offsetFuncs [start..]
As a further prompt, I suggested rewriting it using Data.List.unfoldr. The unfoldr version is an obvious, simple modification to this code so I'm not going to type it here unless people seeking to answer my question insist that is important (no spoilers for the OP over on Code Review). But I do have a question about the relative efficiency of the unfoldr solution compared to the zipWith one. While I am no longer a Haskell neophyte, I am no expert on Haskell internals.
An unfoldr solution does not require the [start..] infinite list, since it can simply unfold from start. My thoughts are
The zipWith solution does not memoize each successive element of [start..] as it is asked for. Each element is used and discarded because no reference to the head of [start..] is kept. So there is no more memory consumed there than with unfoldr.
Concerns about the performance of unfoldr and recent patches to make it always inlined are conducted at a level which I have not yet reached.
So I think the two are equivalent in memory consumption but have no idea about the relative performance. Hoping more informed Haskellers can direct me towards an understanding of this.
unfoldr seems a natural thing to use to generate sequences, even if other solutions are more expressive. I just know I need to understand more about it's actual performance. (For some reason I find foldr much easier to comprehend on that level)
Note: unfoldr's use of Maybe was the first potential performance issue that occurred to me, before I even started investigating the issue (and the only bit of the optimisation/inlining discussions that I fully understood). So I was able to stop worrying about Maybe right away (given a recent version of Haskell).

As the one responsible for the recent changes in the implementations of zipWith and unfoldr, I figured I should probably take a stab at this. I can't really compare them so easily, because they're very different functions, but I can try to explain some of their properties and the significance of the changes.
unfoldr
Inlining
The old version of unfoldr (before base-4.8/GHC 7.10) was recursive at the top level (it called itself directly). GHC never inlines recursive functions, so unfoldr was never inlined. As a result, GHC could not see how it interacted with the function it was passed. The most troubling effect of this was that the function passed in, of type (b -> Maybe (a, b)), would actually produce Maybe (a, b) values, allocating memory to hold the Just and (,) constructors. By restructuring unfoldr as a "worker" and a "wrapper", the new code allows GHC to inline it and (in many cases) fuse it with the function passed in, so the extra constructors are stripped away by compiler optimizations.
For example, under GHC 7.10, the code
module Blob where
import Data.List
bloob :: Int -> [Int]
bloob k = unfoldr go 0 where
go n | n == k = Nothing
| otherwise = Just (n * 2, n+1)
compiled with ghc -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures leads to the core
$wbloob :: Int# -> [Int]
$wbloob =
\ (ww_sYv :: Int#) ->
letrec {
$wgo_sYr :: Int# -> [Int]
$wgo_sYr =
\ (ww1_sYp :: Int#) ->
case tagToEnum# (==# ww1_sYp ww_sYv) of _ {
False -> : (I# (*# ww1_sYp 2)) ($wgo_sYr (+# ww1_sYp 1));
True -> []
}; } in
$wgo_sYr 0
bloob :: Int -> [Int]
bloob =
\ (w_sYs :: Int) ->
case w_sYs of _ { I# ww1_sYv -> $wbloob ww1_sYv }
Fusion
The other change to unfoldr was rewriting it to participate in "fold/build" fusion, an optimization framework used in GHC's list libraries. The idea of both "fold/build" fusion and the newer, differently balanced, "stream fusion" (used in the vector library) is that if a list is produced by a "good producer", transformed by "good transformers", and consumed by a "good consumer", then the list conses never actually need to be allocated at all. The old unfoldr was not a good producer, so if you produced a list with unfoldr and consumed it with, say, foldr, the pieces of the list would be allocated (and immediately become garbage) as computation proceeded. Now, unfoldr is a good producer, so you can write a loop using, say, unfoldr, filter, and foldr, and not (necessarily) allocate any memory at all.
For example, given the above definition of bloob, and a stern {-# INLINE bloob #-} (this stuff is a bit fragile; good producers sometimes need to be inlined explicitly to be good), the code
hooby :: Int -> Int
hooby = sum . bloob
compiles to the GHC core
$whooby :: Int# -> Int#
$whooby =
\ (ww_s1oP :: Int#) ->
letrec {
$wgo_s1oL :: Int# -> Int# -> Int#
$wgo_s1oL =
\ (ww1_s1oC :: Int#) (ww2_s1oG :: Int#) ->
case tagToEnum# (==# ww1_s1oC ww_s1oP) of _ {
False -> $wgo_s1oL (+# ww1_s1oC 1) (+# ww2_s1oG (*# ww1_s1oC 2));
True -> ww2_s1oG
}; } in
$wgo_s1oL 0 0
hooby :: Int -> Int
hooby =
\ (w_s1oM :: Int) ->
case w_s1oM of _ { I# ww1_s1oP ->
case $whooby ww1_s1oP of ww2_s1oT { __DEFAULT -> I# ww2_s1oT }
}
which has no lists, no Maybes, and no pairs; the only allocation it performs is the Int used to store the final result (the application of I# to ww2_s1oT). The entire computation can reasonably be expected to be performed in machine registers.
zipWith
zipWith has a bit of a weird story. It fits into the fold/build framework a bit awkwardly (I believe it works quite a bit better with stream fusion). It is possible to make zipWith fuse with either its first or its second list argument, and for many years, the list library tried to make it fuse with either, if either was a good producer. Unfortunately, making it fuse with its second list argument can make a program less defined under certain circumstances. That is, a program using zipWith could work just fine when compiled without optimization, but produce an error when compiled with optimization. This is not a great situation. Therefore, as of base-4.8, zipWith no longer attempts to fuse with its second list argument. If you want it to fuse with a good producer, that good producer had better be in the first list argument.
Specifically, the reference implementation of zipWith leads to the expectation that, say, zipWith (+) [1,2,3] (1 : 2 : 3 : undefined) will give [2,4,6], because it stops as soon as it hits the end of the first list. With the previous zipWith implementation, if the second list looked like that but was produced by a good producer, and if zipWith happened to fuse with it rather than the first list, then it would go boom.

Space leak with wrapper around Data.ByteString.Builder

I have a wrapper type around Data.ByteString.Builder that allows me to track the length of the ByteString being built (cf. my previous question):
import Data.Monoid
import qualified Data.ByteString.Builder as B
import System.IO (stdout)
data LBuilder = LBuilder { toBuilder :: !B.Builder
, lbLength :: !Int }
instance Monoid LBuilder where
mempty = LBuilder mempty 0
(LBuilder x1 l1) `mappend` (LBuilder x2 l2) =
LBuilder (x1 <> x2) (l1 + l2)
char c = LBuilder (B.char7 c) 1
hPutLBuilder h = B.hPutBuilder h . toBuilder
As far as I understand it, this should be roughly as efficient as using Builder directly. But trying the following test case seems to reveal a space leak:
parts = replicate 10000000 $ char 'x'
main = hPutLBuilder stdout $ mconcat parts
Running this code takes a few seconds and consumes around 250MB of memory. Doing the same task with Builder is far faster and needs only 40KB. The memory profile shows that all the extra space is taken up by instances of BuildStep and Builder, which does not happen when using Builder directly.
What makes this code so inefficient? Why does it not happen when using Builder?
Edit:
Michael's answer below, led me to look at how parts is actually evaluated.
After playing around some more, I rewrote the test code in the following way:
makeStuff !acc 0 = acc
makeStuff !acc i = makeStuff (acc <> char 'x') (i - 1)
stuff = makeStuff mempty 10000000
-- stuffOld = mconcat $ replicate 10000000 $ char 'x'
main = hPutLBuilder stdout stuff
Using this definition the performance and memory usage for Builder and LBuilder is exactly the same (i.e. horrible :-). So it looks like the original version is so fast when using Builder because the compiler can somehow rewrite mconcat $ replicate n $ char c into something like B.lazyByteString $ L.replicate n (toAscii c) at compile time, instead of composing 10000000 functions at runtime on the heap. I tried to confirm this by looking at the generated core. I could tell that:
The definition for stuffOld is a call to a relatively short function that does something with types in Data.ByteString.Builder.Internal.
The definition for stuff is a call to makeStuff.
Said core was not meant to be understood by mere mortals.
So I guess this is just the benchmark being a pathological case, and the actual performance problem in my application lies somewhere else.

One issue is that forcing evaluation of the mconcat parts expression forces evaluation of its lbLength, which in turn will force evaluation of all the individual char 'x' values, which is where the space leak seems to come from. However, the only way I've found to get performance of your code to be the same as the original Builder is to use a newtype around B.Builder. Even just data LBuilder = LBuilder !B.Builder introduces significant overhead.

Can I always convert mutable-only algorithms to single-assignment and still be efficient?

The Context
The context of this question is that I want to play around with Gene Expression Programming (GEP), a form of evolutionary algorithm, using Erlang. GEP makes use of a string based DSL called 'Karva notation'. Karva notation is easily translated into expression parse trees, but the translation algorithm assumes an implementation having mutable objects: incomplete sub-expressions are created early-on the translation process and their own sub-expressions are filled-in later-on with values that were not known at the time they were created.
The purpose of Karva notation is that it guarantees syntactically correct expressions are created without any expensive encoding techniques or corrections of genetic code. The problem is that with a single-assignment programming language like Erlang, I have to recreate the expression tree continually as each sub expression gets filled in. This takes an inexpensive - O(n)? - update operation and converts it into one that would complete in exponential time (unless I'm mistaken). If I can't find an efficient functional algorithm to convert K-expressions into expression trees, then one of the compelling features of GEP is lost.
The Question
I appreciate that the K-expression translation problem is pretty obscure, so what I want is advice on how to convert an inherently-non-functional algorithm (alg that exploits mutable data structures) into one that does not. How do pure functional programming languages adapt many of the algorithms and data structures that were produced in the early days of computer science that depend on mutability to get the performance characteristics they need?

Carefully designed immutability avoids unecessary updating
Immutable data structures are only an efficiency problem if they're constantly changing, or you build them up the wrong way. For example, continually appending more to the end of a growing list is quadratic, whereas concatenating a list of lists is linear. If you think carefully, you can usually build up your structure in a sensible way, and lazy evaluation is your friend - hand out a promise to work it out and stop worrying.
Blindly trying to replicate an imperative algorithm can be ineffecient, but you're mistaken in your assertion that functional programming has to be asymptotically bad here.
Case study: pure functional GEP: Karva notation in linear time
I'll stick with your case study of parsing Karva notation for GEP. (
I've played with this solution more fully in this answer.)
Here's a fairly clean pure functional solution to the problem. I'll take the opportunity to name drop some good general recursion schemes along the way.
Code
(Importing Data.Tree supplies data Tree a = Node {rootLabel :: a, subForest :: Forest a} where type Forest a = [Tree a].)
import Data.Tree
import Data.Tree.Pretty -- from the pretty-tree package for visualising trees
arity :: Char -> Int
arity c
| c `elem` "+*-/" = 2
| c `elem` "Q" = 1
| otherwise = 0
A hylomorphism is the composition of an anamorphism (build up, unfoldr) and a catamorphism (combine, foldr).
These terms are introduced to the FP community in the seminal paper Functional Programming with Bananas, Lenses and Barbed wire.
We're going to pull the levels out (ana/unfold) and combine them back together (cata/fold).
hylomorphism :: b -> (a -> b -> b) -> (c -> (a, c)) -> (c -> Bool) -> c -> b
hylomorphism base combine pullout stop seed = hylo seed where
hylo s | stop s = base
| otherwise = combine new (hylo s')
where (new,s') = pullout s
To pull out a level, we use the total arity from the previous level to find where to split off this new level, and pass on the total arity for this one ready for next time:
pullLevel :: (Int,String) -> (String,(Int,String))
pullLevel (n,cs) = (level,(total, cs')) where
(level, cs') = splitAt n cs
total = sum $ map arity level
To combine a level (as a String) with the level below (that's already a Forest), we just pull off the number of trees that each character needs.
combineLevel :: String -> Forest Char -> Forest Char
combineLevel "" [] = []
combineLevel (c:cs) levelBelow = Node c subforest : combineLevel cs theRest
where (subforest,theRest) = splitAt (arity c) levelBelow
Now we can parse the Karva using a hylomorphism. Note that we seed it with a total arity from outside the string of 1, since there's only one node at the root level. Correspondingly we apply head to the result to get this singleton back out after the hylomorphism.
karvaToTree :: String -> Tree Char
karvaToTree cs = let
zero (n,_) = n == 0
in head $ hylomorphism [] combineLevel pullLevel zero (1,cs)
Linear Time
There's no exponential blowup, nor repeated O(log(n)) lookups or expensive modifications, so we shouldn't be in too much trouble.
arity is O(1)
splitAt part is O(part)
pullLevel (part,cs) is O(part) for grab using splitAt to get level, plus O(part) for the map arity level, so O(part)
combineLevel (c:cs) is O(arity c) for the splitAt, and O(sum $ map arity cs) for the recursive call
hylomorphism [] combineLevel pullLevel zero (1,cs)
makes a pullLevel call for each level, so the total pullLevel cost is O(sum parts) = O(n)
makes a combineLevel call for each level, so the total combineLevel cost is O(sum $ map arity levels) = O(n), since the total arity of the entire input is bound by n for valid strings.
makes O(#levels) calls to zero (which is O(1)), and #levels is bound by n, so that's below O(n) too
Hence karvaToTree is linear in the length of the input.
I think that puts to rest the assertion that you needed to use mutability to get a linear algorithm here.
Demo
Let's have a draw of the results (because Tree is so full of syntax it's hard to read the output!). You have to cabal install pretty-tree to get Data.Tree.Pretty.
see :: Tree Char -> IO ()
see = putStrLn.drawVerticalTree.fmap (:"")
ghci> karvaToTree "Q/a*+b-cbabaccbac"
Node {rootLabel = 'Q', subForest = [Node {rootLabel = '/', subForest = [Node {rootLabel = 'a', subForest = []},Node {rootLabel = '*', subForest = [Node {rootLabel = '+', subForest = [Node {rootLabel = '-', subForest = [Node {rootLabel = 'b', subForest = []},Node {rootLabel = 'a', subForest = []}]},Node {rootLabel = 'c', subForest = []}]},Node {rootLabel = 'b', subForest = []}]}]}]}
ghci> see $ karvaToTree "Q/a*+b-cbabaccbac"
Q
|
/
|
------
/ \
a *
|
-----
/ \
+ b
|
----
/ \
- c
|
--
/ \
b a
which matches the output expected from this tutorial where I found the example:

There isn't a single way to do this, it really has to be attempted case-by-case. I typically try to break them down into simpler operations using fold and unfold and then optimize from there. Karva decoding case is a breadth-first tree unfold as others have noted, so I started with treeUnfoldM_BF. Perhaps there are similar functions in Erlang.
If the decoding operation is unreasonably expensive, you could memoize the decoding and share/reuse subtrees... though it probably wouldn't fit into a generic tree unfolder and you'd need to write specialized function to do so. If the fitness function is slow enough, it may be fine to use a naive decoder like the one I have listed below. It will fully rebuild the tree each invocation.
import Control.Monad.State.Lazy
import Data.Tree
type MaxArity = Int
type NodeType = Char
treeify :: MaxArity -> [Char] -> Tree NodeType
treeify maxArity (x:xs) = evalState (unfoldTreeM_BF (step maxArity) x) xs
treeify _ [] = fail "empty list"
step :: MaxArity -> NodeType -> State [Char] (NodeType, [NodeType])
step maxArity node = do
xs <- get
-- figure out the actual child node count and use it instead of maxArity
let (children, ys) = splitAt maxArity xs
put ys
return (node, children)
main :: IO ()
main = do
let x = treeify 3 "0138513580135135135"
putStr $ drawTree . fmap (:[]) $ x
return ()

There are a couple of solutions when mutable state in functional programming is required.
Use a different algorithm that solves the same problem. E.g. quicksort is generally regarded as mutable and may therefore be less useful in a functional setting, but mergesort is generally better suited for a functional setting. I can't tell if this option is possible or makes sense in your case.
Even functional programming languages usually provide some way to mutate state. (This blog post seems to show how to do it in Erlang.) For some algorithms and data structures this is indeed the only available option (there's active research on the topic, I think); for example hash tables in functional programming languages are generally implemented with mutable state.
In your case, I'm not so sure immutability really leads to a performance bottleneck. You are right, the (sub)tree will be recreated on update, but the Erlang implementation will probably reuse all the subtrees that haven't changed, leading to O(log n) complexity per update instead of O(1) with mutable state. Also, the nodes of the trees won't be copied but instead the references to the nodes, which should be relatively efficient. You can read about tree updates in a functional setting in e.g. the thesis from Okasaki or in his book "Purely Functional Data Structures" based on the thesis. I'd try implementing the algorithm with an immutable data structure and switch to a mutable one if you have a performance problem.
Also see some relevant SO questions here and here.

I think I figured out how to solve your particular problem with the K trees, (the general problem is too hard :P). My solution is presented in some horrible sort of hybrid Python-like psudocode (I am very slow on my FP today) but it doesn't change a node after you create one (the trick is building the tree bottom-up)
First, we need to find which nodes belong to which level:
levels currsize nodes =
this_level , rest = take currsize from nodes, whats left
next_size = sum of the arities of the nodes
return [this_level | levels next_size rest]
(initial currsize is 1)
So in the +/*abcd, example, this should give you [+, /*, abcd]. Now you can convert this into a tree bottom up:
curr_trees = last level
for level in reverse(levels except the last)
next_trees = []
for root in level:
n = arity of root
trees, curr_trees = take n from curr_trees, whats left
next_trees.append( Node(root, trees) )
curr_trees = next_trees
curr_trees should be a list with the single root node now.
I am pretty sure we can convert this into single assignment Erlang/Haskell very easily now.

Haskell's algebraic data types

I'm trying to fully understand all of Haskell's concepts.
In what ways are algebraic data types similar to generic types, e.g., in C# and Java? And how are they different? What's so algebraic about them anyway?
I'm familiar with universal algebra and its rings and fields, but I only have a vague idea of how Haskell's types work.

Haskell's algebraic data types are named such since they correspond to an initial algebra in category theory, giving us some laws, some operations and some symbols to manipulate. We may even use algebraic notation for describing regular data structures, where:
+ represents sum types (disjoint unions, e.g. Either).
• represents product types (e.g. structs or tuples)
X for the singleton type (e.g. data X a = X a)
1 for the unit type ()
and μ for the least fixed point (e.g. recursive types), usually implicit.
with some additional notation:
X² for X•X
In fact, you might say (following Brent Yorgey) that a Haskell data type is regular if it can be expressed in terms of 1, X, +, •, and a least ﬁxed point.
With this notation, we can concisely describe many regular data structures:
Units: data () = ()
1
Options: data Maybe a = Nothing | Just a
1 + X
Lists: data [a] = [] | a : [a]
L = 1+X•L
Binary trees: data BTree a = Empty | Node a (BTree a) (BTree a)
B = 1 + X•B²
Other operations hold (taken from Brent Yorgey's paper, listed in the references):
Expansion: unfolding the fix point can be helpful for thinking about lists. L = 1 + X + X² + X³ + ... (that is, lists are either empty, or they have one element, or two elements, or three, or ...)
Composition, ◦, given types F and G, the composition F ◦ G is a type which builds “F-structures made out of G-structures” (e.g. R = X • (L ◦ R) ,where L is lists, is a rose tree.
Differentiation, the derivative of a data type D (given as D') is the type of D-structures with a single “hole”, that is, a distinguished location not containing any data. That amazingly satisfy the same rules as for differentiation in calculus:
1′ = 0
X′ = 1
(F + G)′ = F' + G′
(F • G)′ = F • G′ + F′ • G
(F ◦ G)′ = (F′ ◦ G) • G′
References:
Species and Functors and Types, Oh My!, Brent A. Yorgey, Haskell’10, September 30, 2010, Baltimore, Maryland, USA
Clowns to the left of me, jokers to the right (Dissecting Data Structures), Conor McBride POPL 2008

"Algebraic Data Types" in Haskell support full parametric polymorphism, which is the more technically correct name for generics, as a simple example the list data type:
data List a = Cons a (List a) | Nil
Is equivalent (as much as is possible, and ignoring non-strict evaluation, etc) to
class List<a> {
class Cons : List<a> {
a head;
List<a> tail;
}
class Nil : List<a> {}
}
Of course Haskell's type system allows more ... interesting use of type parameters but this is just a simple example. With regards to the "Algebraic Type" name, i've honestly never been entirely sure of the exact reason for them being named that, but have assumed that it's due the mathematical underpinnings of the type system. I believe that the reason boils down to the theoretical definition of an ADT being the "product of a set of constructors", however it's been a couple of years since i escaped university so i can no longer remember the specifics.
[Edit: Thanks to Chris Conway for pointing out my foolish error, ADT are of course sum types, the constructors providing the product/tuple of fields]

In universal algebra
an algebra consists of some sets of elements
(think of each set as the set of values of a type)
and some operations, which map elements to elements.
For example, suppose you have a type of "list elements" and a
type of "lists". As operations you have the "empty list", which is a 0-argument
function returning a "list", and a "cons" function which takes two arguments,
a "list element" and a "list", and produce a "list".
At this point there are many algebras that fit the description,
as two undesirable things may happen:
There could be elements in the "list" set which cannot be built
from the "empty list" and the "cons operation", so-called "junk".
This could be lists starting from some element that fell from the sky,
or loops without a beginning, or infinite lists.
The results of "cons" applied to different arguments could be equal,
e.g. consing an element to a non-empty list
could be equal to the empty list. This is sometimes called "confusion".
An algebra which has neither of these undesirable properties is called
initial, and this is the intended meaning of the abstract data type.
The name initial derives from the property that there is exactly
one homomorphism from the initial algebra to any given algebra.
Essentially you can evaluate the value of a list by applying the operations
in the other algebra, and the result is well-defined.
It gets more complicated for polymorphic types ...

A simple reason why they are called algebraic; there are both sum (logical disjunction) and product (logical conjunction) types. A sum type is a discriminated union, e.g:
data Bool = False | True
A product type is a type with multiple parameters:
data Pair a b = Pair a b
In O'Caml "product" is made more explicit:
type 'a 'b pair = Pair of 'a * 'b

Haskell's datatypes are called "algebraic" because of their connection to categorical initial algebras. But that way lies madness.
#olliej: ADTs are actually "sum" types. Tuples are products.

#Timbo:
You are basically right about it being sort of like an abstract Tree class with three derived classes (Empty, Leaf, and Node), but you would also need to enforce the guarantee that some one using your Tree class can never add any new derived classes, since the strategy for using the Tree datat type is to write code that switches at runtime based on the type of each element in the tree (and adding new derived types would break existing code). You can sort of imagine this getting nasty in C# or C++, but in Haskell, ML, and OCaml, this is central to the language design and syntax so coding style supports it in a much more convenient manner, via pattern matching.
ADT (sum types) are also sort of like tagged unions or variant types in C or C++.

old question, but no one's mentioned nullability, which is an important aspect of Algebraic Data Types, perhaps the most important aspect. Since each value most be one of alternatives, exhaustive case-based pattern matching is possible.

For me, the concept of Haskell's algebraic data types always looked like polymorphism in OO-languages like C#.
Look at the example from http://en.wikipedia.org/wiki/Algebraic_data_types:
data Tree = Empty
| Leaf Int
| Node Tree Tree
This could be implemented in C# as a TreeNode base class, with a derived Leaf class and a derived TreeNodeWithChildren class, and if you want even a derived EmptyNode class.
(OK I know, nobody would ever do that, but at least you could do it.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio