Implementing longest path algorithm in Haskell - algorithm

I would like some help implementing a longest path algorithm for Haskell. I've only used Haskell for about two weeks and haven't done anything in a functional language before. I am really lost when trying to implement algorithms in a functional language when you are limited to immutable data and recursion.
I've been trying to implement this algorithm: http://www.geeksforgeeks.org/find-longest-path-directed-acyclic-graph/
My graph is constructed like this:
data = Graph w = Graph {vertices :: [(Char, w)],
edges :: [(Char, Char, w)]} deriving Show
So I have weights on both vertices and edges, and the weights can be any datatype. Therefore I also need to take two functions, f and g, when computing the longest path. The longest path from vertex a to b will then be the sum of f(w) and g(w) for all weights in the path.
I have tried implementing this but I always find myself trying to code the "imperative" way, which gets really ugly really fast...
Please point me in the right direction.
weight_of_longest_path :: (Ord w) => Graph w -> Char -> Char
-> (w -> w) -> (w -> w) -> w
weight_of_longest_path (Graph v w) startVert endVert f g =
let
topSort = dropWhile (/= startVert) $ topological_ordering (Graph v w)
distList = zip topSort $
(snd $ head $ filter (\(a,b) -> a == startVert) v)
: (repeat (-999999999))
finalList = getFinalList (Graph v w) topSort distList f g
in
snd $ head $ filter (\(a,b) -> b == endVert) finalList
getFinalList :: (Ord w) => Graph w -> [Char] -> [(Char, w)]
-> (w -> w) -> (w -> w) -> [(Char, w)]
getFinalList _ [] finalList _ _ = finalList
getFinalList (Graph v w) (firstVert:rest) distList f g =
let
neighbours = secondNodes $ filter (\(a,b,w) -> a == firstVert) w
finalList = updateList firstVert neighbours distList (Graph v w) f g
in
getFinalList (Graph v w) rest finalList f g
updateList :: (Ord w) => Char -> [Char] -> [(Char, w)] -> Graph w
-> (w -> w) -> (w -> w) -> [(Char, w)]
updateList _ [] updatedList _ _ _ = updatedList
updateList firstVert (neighbour:rest) distList (Graph vertices weights) f g =
let
edgeWeight = selectThird $ head
$ filter (\(a,b,w) -> a == firstVert && b == neighbour) weights
verticeWeight = snd $ head
$ filter (\(a,b) -> a == neighbour) vertices
newDist = calcDist firstVert neighbour verticeWeight edgeWeight
distList f g
updatedList = replace distList neighbour newDist
in
updateList firstVert rest updatedList (Graph vertices weights) f g
calcDist :: (Ord w) => Char -> Char -> w -> w -> [(Char, w)]
-> (w -> w) -> (w -> w) -> w
calcDist firstVert neighbour verticeWeight edgeWeight distList f g =
if (compareTo f g
(snd $ head $ filter (\(a,b) -> a == neighbour) distList)
(snd $ head $ filter (\(a,b) -> a == firstVert) distList)
edgeWeight verticeWeight) == True
then
(f (snd $ head $ filter (\(a,b) -> a == firstVert) distList))
+ (g edgeWeight) + (f verticeWeight)
else
(f (snd $ head $ filter (\(a,b) -> a == neighbour) distList))
replace :: [(Char, w)] -> Char -> w -> [(Char, w)]
replace distList vertice value =
map (\p#(f, _) -> if f == vertice then (vertice, value) else p)
distList
As you can see it's a lot of messy code for such a simple algorithm and I'm sure its doable in a much cleaner way.

Here is an approach that employs a more "functional" way of thinking. It revolves around two functions:
longestPath :: Graph -> Node -> Node -> [Edge]
pathCost :: Graph -> [Edges] -> Int
longestPath returns the path as a list of edges of the longest path. pathCost returns the cost of a path.
The definition of longestPath goes something like this:
longestPath g start end
| start == end = []
| otherwise =
maximumBy (comparing (pathCost g))
[ e : path | e <- edges of the node start
let start' = the other vertex of e,
let g' = graph g with node start deleted,
let path = longestPath g' start' end ]
(maximumBy comes from Data.List and comparing from Data.Ord)
N.B. The edges will be generated in reverse order.
There are a host of implementation details to figure out, and, in particular, you'll have to slightly modify this to handle the case when there is not a path from start to node (which can happen once you start deleting nodes), but this is the approach I would start with.

Related

Haskell NB: ‘Edge’ is a non-injective type family

I am trying to write my own graph library in Haskell for use in the advent of code. I am trying to use a class for graphs and one concrete implementation using Data.Map. I am trying to write Dijkstra's algorithm, but I am running into some trouble with type families. I have the following typeclass and concrete implementation:
{-# LANGUAGE TypeFamilies, AllowAmbiguousTypes, ScopedTypeVariables, TypeFamilyDependencies #-}
class Graph g where
type Node g
type Edge g
nodeSet :: g -> S.Set (Node g)
neighbours :: g -> (Node g) -> Maybe [(Edge g, Node g)]
data MapGraph e n = MapGraph {mGraph :: M.Map n [(e,n)]} deriving Show
instance (Num e,Ord e,Ord n) => Graph (MapGraph e n) where
type Node (MapGraph e n) = n
type Edge (MapGraph e n) = e
nodeSet mapGraph = S.fromList $ M.keys $ mGraph mapGraph
neighbours mapGraph node = M.lookup node (mGraph mapGraph)
To represent the Infinity value of unvisited nodes in Dijkstra's algorithm I have created a sum data type:
data MaxBoundedNum a = Inf | Num a deriving Show
I am trying to work on the recursive function for the algorithm which will take in the graph, the current node, the destination node, the unvisited set, and a map of nodes and their length from the source node. The following skeleton function seems to be what I want:
go :: (Graph g) =>
g -> (Node g) -> (Node g) ->
S.Set (Node g) ->
M.Map (Node g) (MaxBoundedNum (Edge g)) ->
Maybe (M.Map (Node g) (MaxBoundedNum (Edge g)))
go graph curr dest uset vals = do
currNeighbours <- neighbours graph curr
undefined
This appears to work correctly for a graph g where graph :: MapGraph Int String
go graph
:: [Char]
-> [Char]
-> S.Set [Char]
-> M.Map [Char] (MaxBoundedNum Int)
-> Maybe (M.Map [Char] (MaxBoundedNum Int))
The next part of my go function needs to lookup the current distance from the vals map.
currDist <- M.lookup curr vals
This works outside the go function if I do the following:
currDist = M.lookup current vals
*Main> :t currDist
currDist :: Maybe (MaxBoundedNum Integer)
However, inside the do block I get this error:
Could not deduce (Ord (Node g)) arising from a use of ‘M.lookup’
from the context: Graph g
bound by the type signature for:
go :: forall g.
Graph g =>
g
-> Node g
-> Node g
-> S.Set (Node g)
-> M.Map (Node g) (MaxBoundedNum (Edge g))
-> Maybe (M.Map (Node g) (MaxBoundedNum (Edge g)))
at WithClass.hs:(96,1)-(100,49)
• In a stmt of a 'do' block: currDist <- M.lookup curr vals
The part Could not deduce made me think I need to give it a type annotation, so I did that:
currDist <- M.lookup curr vals :: Maybe (MaxBoundedNum (Edge g))
But that gives me this error:
WithClass.hs:102:15: error:
• Couldn't match type ‘Edge g’ with ‘Edge g1’
Expected type: Maybe (MaxBoundedNum (Edge g1))
Actual type: Maybe (MaxBoundedNum (Edge g))
NB: ‘Edge’ is a non-injective type family
• In a stmt of a 'do' block:
currDist <- M.lookup curr vals :: Maybe (MaxBoundedNum (Edge g))
In the expression:
do currDist <- M.lookup curr vals :: Maybe (MaxBoundedNum (Edge g))
currNeighbours <- neighbours graph curr
undefined
In an equation for ‘go’:
go graph curr dest uset vals
= do currDist <- M.lookup curr vals ::
Maybe (MaxBoundedNum (Edge g))
currNeighbours <- neighbours graph curr
undefined
• Relevant bindings include
vals :: M.Map (Node g) (MaxBoundedNum (Edge g))
(bound at WithClass.hs:101:25)
uset :: S.Set (Node g) (bound at WithClass.hs:101:20)
dest :: Node g (bound at WithClass.hs:101:15)
curr :: Node g (bound at WithClass.hs:101:10)
graph :: g (bound at WithClass.hs:101:4)
go :: g
-> Node g
-> Node g
-> S.Set (Node g)
-> M.Map (Node g) (MaxBoundedNum (Edge g))
-> Maybe (M.Map (Node g) (MaxBoundedNum (Edge g)))
(bound at WithClass.hs:101:1)
I had a look at this question but the accepted answer just said to add the TypeFamilyDependencies language extension which appears to not do anything for me. What am I doing wrong and how can I fix my code? Thank you in advance.
Operations on Set (Node g) and Map (Node g) require you to be able to compare nodes. This is what the Ord (Node g) constraint represents. The type you gave for go says it works for any definition of Node g, even those that can't be compared. The error tells you that when you use M.lookup, that needs an Ord (Node g) constraint, but there is no way to satisfy it.
You can satisfy that constraint by adding it to the signature of go:
{-# LANGUAGE FlexibleConstraints #-} -- Also enable this extension
go :: (Graph g, Ord (Node g)) => ...

Breadth-First Search using State monad in Haskell

Recently, I've asked a question for building DFS tree from Graph in Stackoverflow and had learned that it can be simply implemented by using State Monad.
DFS in haskell
While DFS requires to track only visited nodes, so that we can use 'Set' or 'List' or some sort of linear data structure to track visited nodes, BFS requires 'visited node' and 'queue' data structure to be accomplished.
My pseudocode for BFS is
Q = empty queue
T = empty Tree
mark all nodes except u as unvisited
while Q is nonempty do
u = deq(Q)
for each vertex v ∈ Adj(u)
if v is not visited
then add edge (u,v) to T
Mark v as visited and enq(v)
As can be inferred from pseudocode, we only have to do 3 processes per iteration.
dequeue point from queue
add all unvisited neighbors of the point to current tree's child, queue and 'visited' list
repeat this for next in queue
Since we are not using recursive traversal for BFS search, we need some other traversal method such as while loop. I've looked up loop-while package in hackage, but it seems somewhat deprecated.
What I assume is that I require some sort of code like this :
{-...-}
... = evalState (bfs) ((Set.singleton start),[start])
where
neighbors x = Map.findWithDefault [] x adj
bfs =do (vis,x:queue)<-get
map (\neighbor ->
if (Set.member neighbor vis)
then put(vis,queue)
else put ((Set.insert neighbor vis), queue++[neighbor]) >> (addToTree neighbor)
) neighbors x
(vis,queue)<-get
while (length queue > 0)
I understand that this implementation is very erroneous but this should give minimalistic view for how I think BFS should be implemented. Also, I really don't know how to circumvent using while loop for do blocks.(i.e should I use recursive algorithm to overcome it or should I think of completely different strategy)
Considering one of the answer I've found in previous question linked above, it seems like the answer should look like this :
newtype Graph a = Graph (Map.Map a [a]) deriving (Ord, Eq, Show)
data Tree a = Tree a [Tree a] deriving (Ord, Eq, Show)
bfs :: (Ord a) => Graph a -> a -> Tree a
bfs (Graph adj) start = evalState (bfs') ((Set.singleton start),[start])
where
bfs' = {-part where I don't know-}
Finally, if such implementation for BFS using state monad is impossible due to some reason, (which I believe not to be) please correct my false assumption.
I've seen some of the examples for BFS in Haskell without using state monad but I want to learn more about how state monad can be processed and couldn't have found any of examples of BFS implemented using state monad.
Thanks in advance.
EDIT:
I came up with some sort of algorithm using state monad but I fall in infinite loop.
bfs :: (Ord a) => Graph a -> a -> Tree a
bfs (Graph adj) start = evalState (bfs' (Graph adj) start) (Set.singleton start)
bfs' :: (Ord a) => Graph a -> a -> State (Set.Set a) (Tree a)
bfs' (Graph adj) point= do
vis <- get
let neighbors x = Map.findWithDefault [] x adj
let addableNeighbors (x:xs) = if Set.member x vis
then addableNeighbors(xs)
else x:addableNeighbors(xs)
let addVisited (vis) (ns) = Set.union (vis) $ Set.fromList ns
let newVisited = addVisited vis $ addableNeighbors $ neighbors point
put newVisited
return (Tree point $ map (flip evalState newVisited) (map (bfs' (Graph adj)) $ addableNeighbors $ neighbors point))
EDIT2: With some expense of space complexity, I've came out with a solution to get BFS graph using graph to return and queue to process. Despite it is not the optimal solution for generating BFS tree/graph, it will work.
bfs :: (Ord a) => Graph a -> a -> Graph a
bfs (Graph adj) start = evalState (bfs' (Graph adj) (Graph(Map.empty)) [start]) (Set.singleton start)
bfs':: (Ord a) => Graph a -> Graph a -> [a] -> State (Set.Set a) (Graph a)
bfs' _ (Graph ret) [] = return (Graph ret)
bfs' (Graph adj) (Graph ret) (p:points)= do
vis <- get
let neighbors x = Map.findWithDefault [] x adj
let addableNeighbors ns
| null ns = []
| otherwise = if Set.member (head ns) vis
then addableNeighbors(tail ns)
else (head ns):addableNeighbors(tail ns)
let addVisited (v) (ns) = Set.union (v) $ Set.fromList ns
let unVisited = addableNeighbors $ neighbors p
let newVisited = addVisited vis unVisited
let unionGraph (Graph g1) (Graph g2) = Graph (Map.union g1 g2)
put newVisited
bfs' (Graph adj) (unionGraph (Graph ret) (Graph (Map.singleton p unVisited))) (points ++ unVisited)
EDIT3: I've added convert function for graph to tree. Running function in EDIT2, and EDIT3 will yield BFS Tree. It is not the best algorithm for computation time wise, but I believe it is intuitive and easy to understand for newbies like me :)
graphToTree :: (Ord a) => Graph a -> a -> Tree a
graphToTree (Graph adj) point = Tree point $ map (graphToTree (Graph adj)) $ neighbors point
where neighbors x = Map.findWithDefault [] x adj
Converting a graph into a Tree breadth-first is a bit more difficult than simply searching the graph breadth-first. If you are searching the graph, you only ever need to return from a single branch. When converting the graph into a tree, the result needs to include results from multiple branches.
We can use a more general type than Graph a for what we can search or convert to trees. We can search or convert to trees anything with a function a -> [a]. For a Graph we'd use the function (Map.!) m, where m is the Map. Searching with a transposition table has a signature like
breadthFirstSearchUnseen:: Ord r => (a -> r) -> -- how to compare `a`s
(a -> Bool) -> -- where to stop
(a -> [a]) -> -- where you can go from an `a`
[a] -> -- where to start
Maybe [a]
Converting the function to a tree that contains each reachable node at the earliest depth has a signature like
shortestPathTree :: Ord r => (a -> r) -> -- how to compare `a`s
(a -> l) -- what label to put in the tree
(a -> [a]) -> -- where you can go from an `a`
a -> -- where to start
Tree l
We can slightly more generally start at any number of nodes and build a Forest that contains each reachable node at the earliest depth.
shortestPathTrees :: Ord r => (a -> r) -> -- how to compare `a`s
(a -> l) -- what label to put in the tree
(a -> [a]) -> -- where you can go from an `a`
[a] -> -- where to start
[Tree l]
Searching
Performing the conversion to a tree doesn't really help us search, we can perform breadth first searches on the original graph.
import Data.Sequence (viewl, ViewL (..), (><))
import qualified Data.Sequence as Seq
import qualified Data.Set as Set
breadthFirstSearchUnseen:: Ord r => (a -> r) -> (a -> Bool) -> (a -> [a]) -> [a] -> Maybe [a]
breadthFirstSearchUnseen repr p expand = combine Set.empty Seq.empty []
where
combine seen queued ancestors unseen =
go
(seen `Set.union` (Set.fromList . map repr $ unseen))
(queued >< (Seq.fromList . map ((,) ancestors) $ unseen))
go seen queue =
case viewl queue of
EmptyL -> Nothing
(ancestors, a) :< queued ->
if p a
then Just . reverse $ ancestors'
else combine seen queued ancestors' unseen
where
ancestors' = a:ancestors
unseen = filter (flip Set.notMember seen . repr) . expand $ a
The state maintained in the above search algorithm is a Seq queue of what nodes to visit next and a Set of nodes that have already been seen. If we instead kept track of nodes that have already been visited, then we could visit the same node multiple times if we find multiple paths to the node at the same depth. There's a more complete explanation in the answer I wrote this breadth first search for.
We can easily write searching Graphs in terms of our general search.
import qualified Data.Map as Map
newtype Graph a = Graph (Map.Map a [a]) deriving (Ord, Eq, Show)
bfsGraph :: (Ord a) => Graph a -> (a -> Bool) -> [a] -> Maybe [a]
bfsGraph (Graph adj) test = breadthFirstSearchUnseen id test ((Map.!) adj)
We can also write how to search Trees themselves.
import Data.Tree
bfsTrees :: (Ord a) => (a -> Bool) -> [Tree a] -> Maybe [a]
bfsTrees test = fmap (map rootLabel) . breadthFirstSearchUnseen rootLabel (test . rootLabel) subForest
Building trees
Building trees breadth-first is a lot more difficult. Fortunately Data.Tree already provides ways to build Trees in breadth first order from a monadic unfold. The breadth first order will take care of the queuing, we will only need to keep track of the state for the nodes we've already seen.
unfoldTreeM_BF has the type Monad m => (b -> m (a, [b])) -> b -> m (Tree a). m is the Monad our computations will be in, b is the type of data we are going to build the tree based on, and a is the type for the labels of the tree. In order to use it to build a tree we need to make a function b -> m (a, [b]). We're going to rename a to l for label, and b to a, which is what we've been using for our nodes. We need to make an a -> m (l, [a]). For m, we'll use the State monad from transformers to keep track of some state; the state will be the Set of nodes whose representation r we've already seen; we'll be using the State (Set.Set r) monad. Overall, we need to provide a function a -> State (Set.Set r) (l, [a]).
expandUnseen :: Ord r => (a -> r) -> (a -> l) -> (a -> [a]) -> a -> State (Set.Set r) (l, [a])
expandUnseen repr label expand a = do
seen <- get
let unseen = filter (flip Set.notMember seen . repr) . uniqueBy repr . expand $ a
put . Set.union seen . Set.fromList . map repr $ unseen
return (label a, unseen)
To build the trees, we run the state computation built by unfoldForestM_BF
shortestPathTrees :: Ord r => (a -> r) -> (a -> l) -> (a -> [a]) -> [a] -> [Tree l]
shortestPathTrees repr label expand = run . unfoldForestM_BF k . uniqueBy repr
where
run = flip evalState Set.empty
k = expandUnseen repr label expand
uniqueBy is a nubBy that takes advantage of an Ord instance instead of Eq.
uniqueBy :: Ord r => (a -> r) -> [a] -> [a]
uniqueBy repr = go Set.empty
where
go seen [] = []
go seen (x:xs) =
if Set.member (repr x) seen
then go seen xs
else x:go (Set.insert (repr x) seen) xs
We can write building shortest path trees from Graphs in terms of our general shortest path tree building
shortestPathsGraph :: Ord a => Graph a -> [a] -> [Tree a]
shortestPathsGraph (Graph adj) = shortestPathTrees id id ((Map.!) adj)
We can do the same for filtering a Forest to only the shortest paths through the Forest.
shortestPathsTree :: Ord a => [Tree a] -> [Tree a]
shortestPathsTree = shortestPathTrees rootLabel rootLabel subForest
My solution is based on working level-by-level (wrt. to BFS), see also this question and answer.
The general idea is: Assume we already know the sets of visited elements prior each level of our BFS as a list of sets. Then we can traverse the graph, level by level, updating our list of sets, constructing the output Tree on the way.
The trick is that after such a level-by-level traversal, we'll have the sets of visited elements after each level. And this is the same as the list before each level, just shifted by one. So by tying the knot, we can use the shifted output as the input for the procedure.
import Control.Monad.State
import qualified Data.Map as M
import Data.Maybe (fromMaybe, catMaybes)
import qualified Data.Set as S
import Data.Tree
newtype Graph a = Graph (M.Map a [a])
deriving (Ord, Eq, Show)
tagBfs :: (Ord a) => Graph a -> a -> Maybe (Tree a)
tagBfs (Graph g) s = let (t, sets) = runState (thread s) (S.empty : sets)
in t
where
thread x = do
sets#(s : subsets) <- get
case M.lookup x g of
Just vs | not (S.member x s) -> do
-- recursively create sub-nodes and update the subsets list
let (nodes, subsets') = runState
(catMaybes `liftM` mapM thread vs) subsets
-- put the new combined list of sets
put (S.insert x s : subsets')
-- .. and return the node
return . Just $ Node x nodes
_ -> return Nothing -- node not in the graph, or already visited
Running tagBfs example2 'b' it on the following example
example2 :: Graph Char
example2 = Graph $ M.fromList
[ ('a', ['b', 'c', 'd'])
, ('b', ['a'])
, ('c', [])
, ('d', [])
]
yields
Just (Node {rootLabel = 'b',
subForest = [Node {rootLabel = 'a',
subForest = [Node {rootLabel = 'c',
subForest = []},
Node {rootLabel = 'd',
subForest = []}
]}
]}
)

Haskell: Optimising Graph processing algorithm

This is a follow up to this post, with code now based on Structuring Depth-First Search Algorithms in Haskell to do depth first search, by King and Launchbury in the 1990s. That paper suggests a generate and prune strategy, but uses a mutable array with a State Monad (with some grammar that I suspect has since been deprecated). The authors hint that a set could be used for remembering nodes visited, as the cost of an additional O(log n). I tried to implement with a set (we have better machines now than they did in the 1990s!), to use modern State Monad syntax, and to use Vectors rather than arrays (as I read that that is normally better).
As before, my code runs on small data sets, but fails to return on the 5m edge graph I need to analyse, and I'm looking for hints only as to the weakness operating at scale. What I do know is that the code operates comfortably within memory, so that is not the problem, but have I inadvertently slipped to O(n2)? (By contrast, the official implementation of this paper in the Data.Graph library (which I have lately also borrowed some code from) uses a mutable Array but fails on the big data set with a ... Stack Overflow!!!)
So now I have a Vector data store with IntSet State that does not complete and an Array with ST Monad Array 'official' one that crashes! Haskell should be able to do better than this?
import Data.Vector (Vector)
import qualified Data.IntSet as IS
import qualified Data.Vector as V
import qualified Data.ByteString.Char8 as BS
import Control.Monad.State
type Vertex = Int
type Table a = Vector a
type Graph = Table [Vertex]
type Edge = (Vertex, Vertex)
data Tree a = Node a (Forest a) deriving (Show,Eq)
type Forest a = [Tree a]
-- ghc -O2 -threaded --make
-- +RTS -Nx
generate :: Graph -> Vertex -> Tree Vertex
generate g v = Node v $ map (generate g) (g V.! v)
chop :: Forest Vertex -> State IS.IntSet (Forest Vertex)
chop [] = return []
chop (Node x ts:us) = do
visited <- contains x
if visited then
chop us
else do
include x
x1 <- chop ts
x2 <- chop us
return (Node x x1:x2)
prune :: Forest Vertex -> State IS.IntSet (Forest Vertex)
prune vs = chop vs
main = do
--edges <- V.fromList `fmap` getEdges "testdata.txt"
edges <- V.fromList `fmap` getEdges "SCC.txt"
let
-- calculate size of five largest SCC
maxIndex = fst $ V.last edges
gr = buildG maxIndex edges
sccRes = scc gr
big5 = take 5 sccRes
big5' = map (\l -> length $ postorder l) big5
putStrLn $ show $ big5'
contains :: Vertex -> State IS.IntSet Bool
contains v = state $ \visited -> (v `IS.member` visited, visited)
include :: Vertex -> State IS.IntSet ()
include v = state $ \visited -> ((), IS.insert v visited)
getEdges :: String -> IO [Edge]
getEdges path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]
vertices :: Graph -> [Vertex]
vertices gr = [1.. (V.length gr - 1)]
edges :: Graph -> [Edge]
edges g = [(u,v) | u <- vertices g, v <- g V.! u]
-- accumulate :: (a -> b -> a) -> Vector a-> Vector (Int, b)--> Vector a
-- accumulating function f
-- initial vector (of length m)
-- vector of index/value pairs (of length n)
buildG :: Int -> Table Edge -> Graph
buildG maxIndex edges = graph' where
graph = V.replicate (maxIndex + 1) []
--graph' = V.accumulate (\existing new -> new:existing) graph edges
-- flip f takes its (first) two arguments in the reverse order of f
graph' = V.accumulate (flip (:)) graph edges
mapT :: Ord a => (Vertex -> a -> b) -> Table a -> Table b
mapT = V.imap
outDegree :: Graph -> Table Int
outDegree g = mapT numEdges g
where numEdges v es = length es
indegree :: Graph -> Table Int
indegree g = outDegree $ transposeG g
transposeG :: Graph -> Graph
transposeG g = buildG (V.length g - 1) (reverseE g)
reverseE :: Graph -> Table Edge
reverseE g = V.fromList [(w, v) | (v,w) <- edges g]
-- --------------------------------------------------------------
postorder :: Tree a -> [a]
postorder (Node a ts) = postorderF ts ++ [a]
postorderF :: Forest a -> [a]
postorderF ts = concat (map postorder ts)
postOrd :: Graph -> [Vertex]
postOrd g = postorderF (dff g)
dfs :: Graph -> [Vertex] -> Forest Vertex
dfs g vs = map (generate g) vs
dfs' :: Graph -> [Vertex] -> Forest Vertex
dfs' g vs = fst $ runState (prune d) $ IS.fromList []
where d = dfs g vs
dff :: Graph -> Forest Vertex
dff g = dfs' g $ reverse (vertices g)
scc :: Graph -> Forest Vertex
scc g = dfs' g $ reverse $ postOrd (transposeG g)
Some small possible improvements:
Change
type Edge = (Vertex, Vertex)
to
data Edge = Edge {-# UNPACK #-} !Vertex {-# UNPACK #-} !Vertex
to reuse the memory usage for each edge from 7 words to 3 words and to improve cache locality. Reducing memory pressure almost always also improves runtime. As #jberryman mentioned could use an unboxed vector for Table Edge (then you don't need the above custom data type).
generate :: Graph -> Vertex -> Tree Vertex
generate g v = Node v $ map (generate g) (g V.! v)
If you're sure that the index is in bounds, you could use the unsafe indexing function from vector instead of .!.
contains :: Vertex -> State IS.IntSet Bool
contains v = state $ \visited -> (v `IS.member` visited, visited)
Use a combination of get and put $! instead.
include :: Vertex -> State IS.IntSet ()
include v = state $ \visited -> ((), IS.insert v visited)
Use modify' instead.
You're using quite a lot of lists in your program. Linked lists aren't the most memory/cache efficient data structures. See if you can convert your code to use more vectors.

Data Structure Differentiation, Intuition Building

According to this paper differentiation works on data structures.
According to this answer:
Differentiation, the derivative of a data type D (given as D') is the type of D-structures with a single “hole”, that is, a distinguished location not containing any data. That amazingly satisfy the same rules as for differentiation in calculus.
The rules are:
1 = 0
X′ = 1
(F + G)′ = F' + G′
(F • G)′ = F • G′ + F′ • G
(F ◦ G)′ = (F′ ◦ G) • G′
The referenced paper is a bit too complex for me to get an intuition.
What does this this mean in practice? A concrete example would be fantastic.
What's a one hole context for an X in an X? There's no choice: it's (-), representable by the unit type.
What's a one hole context for an X in an X*X? It's something like (-,x2) or (x1,-), so it's representable by X+X (or 2*X, if you like).
What's a one hole context for an X in an X*X*X? It's something like (-,x2,x3) or (x1,-,x3) or (x1,x2,-), representable by X*X + X*X + X*X, or (3*X^2, if you like).
More generally, an F*G with a hole is either an F with a hole and a G intact, or an F intact and a G with a hole.
Recursive datatypes are often defined as fixpoints of polynomials.
data Tree = Leaf | Node Tree Tree
is really saying Tree = 1 + Tree*Tree. Differentiating the polynomial tells you the contexts for immediate subtrees: no subtrees in a Leaf; in a Node, it's either hole on the left, tree on the right, or tree on the left, hole on the right.
data Tree' = NodeLeft () Tree | NodeRight Tree ()
That's the polynomial differentiated and rendered as a type. A context for a subtree in a tree is thus a list of those Tree' steps.
type TreeCtxt = [Tree']
type TreeZipper = (Tree, TreeCtxt)
Here, for example, is a function (untried code) which searches a tree for subtrees passing a given test subtree.
search :: (Tree -> Bool) -> Tree -> [TreeZipper]
search p t = go (t, []) where
go :: TreeZipper -> [TreeZipper]
go z = here z ++ below z
here :: TreeZipper -> [TreeZipper]
here z#(t, _) | p t = [z]
| otherwise = []
below (Leaf, _) = []
below (Node l r, cs) = go (l, NodeLeft () r : cs) ++ go (r, NodeRight l () : cs)
The role of "below" is to generate the inhabitants of Tree' relevant to the given Tree.
Differentiation of datatypes is a good way make programs like "search" generic.
My interpretation is that, the derivative (zipper) of T is the type of all instances that resembles the "shape" of T, but with exactly 1 element replaced by a "hole".
For instance, a list is
List t = 1 []
+ t [a]
+ t^2 [a,b]
+ t^3 [a,b,c]
+ t^4 [a,b,c,d]
+ ... [a,b,c,d,...]
if we replace any of those 'a', 'b', 'c' etc by a hole (represented as #), we'll get
List' t = 0 empty list doesn't have hole
+ 1 [#]
+ 2*t [#,b] or [a,#]
+ 3*t^2 [#,b,c] or [a,#,c] or [a,b,#]
+ 4*t^3 [#,b,c,d] or [a,#,c,d] or [a,b,#,d] or [a,b,c,#]
+ ...
Another example, a binary tree is
data Tree t = TEmpty | TNode t (Tree t) (Tree t)
-- Tree t = 1 + t (Tree t)^2
so adding a hole generates the type:
{-
Tree' t = 0 empty tree doesn't have hole
+ (Tree X)^2 the root is a hole, followed by 2 normal trees
+ t*(Tree' t)*(Tree t) the left tree has a hole, the right is normal
+ t*(Tree t)*(Tree' t) the left tree is normal, the right has a hole
# or x or x
/ \ / \ / \
a b #? b a #?
/\ /\ / \ /\ /\ /\
c d e f #? #? e f c d #? #?
-}
data Tree' t = THit (Tree t) (Tree t)
| TLeft t (Tree' t) (Tree t)
| TRight t (Tree t) (Tree' t)
A third example which illustrates the chain rule is the rose tree (variadic tree):
data Rose t = RNode t [Rose t]
-- R t = t*List(R t)
the derivative says R' t = List(R t) + t * List'(R t) * R' t, which means
{-
R' t = List (R t) the root is a hole
+ t we have a normal root node,
* List' (R t) and a list that has a hole,
* R' t and we put a holed rose tree at the list's hole
x
|
[a,b,c,...,p,#?,r,...]
|
[#?,...]
-}
data Rose' t = RHit [Rose t] | RChild t (List' (Rose t)) (Rose' t)
Note that data List' t = LHit [t] | LTail t (List' t).
(These may be different from the conventional types where a zipper is a list of "directions", but they are isomorphic.)
The derivative is a systematic way to record how to encode a location in a structure, e.g. we can search with: (not quite optimized)
locateL :: (t -> Bool) -> [t] -> Maybe (t, List' t)
locateL _ [] = Nothing
locateL f (x:xs) | f x = Just (x, LHit xs)
| otherwise = do
(el, ctx) <- locateL f xs
return (el, LTail x ctx)
locateR :: (t -> Bool) -> Rose t -> Maybe (t, Rose' t)
locateR f (RNode a child)
| f a = Just (a, RHit child)
| otherwise = do
(whichChild, listCtx) <- locateL (isJust . locateR f) child
(el, ctx) <- locateR f whichChild
return (el, RChild a listCtx ctx)
and mutate (plug in the hole) using the context info:
updateL :: t -> List' t -> [t]
updateL x (LHit xs) = x:xs
updateL x (LTail a ctx) = a : updateL x ctx
updateR :: t -> Rose' t -> Rose t
updateR x (RHit child) = RNode x child
updateR x (RChild a listCtx ctx) = RNode a (updateL (updateR x ctx) listCtx)

Monads and custom traversal functions in Haskell

Given the following simple BST definition:
data Tree x = Empty | Leaf x | Node x (Tree x) (Tree x)
deriving (Show, Eq)
inOrder :: Tree x -> [x]
inOrder Empty = []
inOrder (Leaf x) = [x]
inOrder (Node root left right) = inOrder left ++ [root] ++ inOrder right
I'd like to write an in-order function that can have side effects. I achieved that with:
inOrderM :: (Show x, Monad m) => (x -> m a) -> Tree x -> m ()
inOrderM f (Empty) = return ()
inOrderM f (Leaf y) = f y >> return ()
inOrderM f (Node root left right) = inOrderM f left >> f root >> inOrderM f right
-- print tree in order to stdout
inOrderM print tree
This works fine, but it seems repetitive - the same logic is already present in inOrder and my experience with Haskell leads me to believe that I'm probably doing something wrong if I'm writing a similar thing twice.
Is there any way that I can write a single function inOrder that can take either pure or monadic functions?
In inOrder you are mapping a Tree x to a [x], i. e. you sequentialize your tree. Why not just use mapM or mapM_ on the resulting list?
mapM_ print $ inOrder tree
Just to remind the types of the functions I've mentioned:
mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]
mapM_ :: (Monad m) => (a -> m b) -> [a] -> m ()
You might want to look at implementing the Data.Traversable class or Data.Foldable class for your tree structure. Each only requires the definition of a single method.
In particular, if you implement the Data.Foldable class, you get the following two functions for free:
mapM_ :: (Foldable t, Monad m) => (a -> m b) -> t a -> m ()
toList :: Foldable t => t a -> [a]
It will also give you the rich set of functions (foldr, concatMap, any, ...) that you are used to using with the list type.
You only have to implement one of the following functions to create an instance of Data.Foldable:
foldMap :: Monoid m => (a -> m) -> t a -> m
foldr :: (a -> b -> b) -> b -> t a -> b

Resources