Linear ordering of directed multigraph of dependencies allowing for duplicates - algorithm

Problem description
Given vertices V which can be seen as named "propositions".
Given weights:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| Invalidates -- ^ Denotes that a "proposition" invalidates another.
In a linear ordering, if A requires B, then B must come before A, conversely, if A invalidates B, then B must come after A.
Given a weighted directed multigraph (multidigraph) with at most 2 parallel edges... Where a vertex can only require the inclusion of another vertex once, and only invalidates another vertex once...
G = (V, E)
E = (V, V, W)
Or alternatively represented as a directed cyclic graph with no self-loops and where the only cycles form directly between one vertex and another. With weights changed to:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| InvalidatedBy -- ^ Denotes that a "proposition" is invalidated by another.
Given that vertices may occur more than once in the ordering...
How can a linear ordering be constructed from such a graph?
Additionally, if the tail of the linear ordering ends with a vertex V which was included due to being InvalidatedBy another vertex, then it may be omitted if the head of the ordering starts with V.
Some desired properties are:
Minimality - there should be as little duplication of vertices as possible
Stability - the ordering should be as similar as possible to the order between vertices on the same "level" in which the graph was constructed
Run-time complexity - The number of vertices are not that high, but still... the run-time complexity should be as low as possible.
If various algorithms fulfill these to varying degrees, I'd love to see all of them with their trade offs.
Algorithms written in any language, or pseudocode, are welcome.
Example graphs:
Example graph 1:
B `requires` A
C `requires` A
D `requires` A
E `invalidates` A
F `invalidates` A
G `invalidates` A
With minimal linear ordering: [A, B, C, D, E, F, G]
Example graph 2:
C `requires` A
C `invalidates` A
B `requires` A
With minimal linear ordering: [A, B, C]
Example graph 3:
B `requires` A
B `invalidates` A
C `requires` A
C `invalidates` A
With minimal linear ordering: [A, B, A, C]
Naive implementation
A naive implementation constructs a linear ordering by starting with all nodes with no incoming edges and for all of those nodes:
fetches all outgoing edges
partitions those by requires/invalidates
constructs the linear ordering of "requires" and puts that first
adds the current node
constructs the linear ordering of "invalidates" and adds that.
Here's a Haskell implementation of this description:
import Data.List (partition)
import Data.Maybe (fromJust)
import Control.Arrow ((***))
import Data.Graph.Inductive.Graph
fboth :: Functor f => (a -> b) -> (f a, f a) -> (f b, f b)
fboth f = fmap f *** fmap f
outs :: Graph gr => gr a b -> Node -> (Adj b, a)
outs gr n = let (_, _, l, o) = fromJust $ fst $ match n gr in (o, l)
starts :: Graph gr => gr a b -> [(Adj b, a)]
starts gr = filter (not . null . fst) $ outs gr <$> nodes gr
partW :: Adj W -> (Adj W, Adj W)
partW = partition ((Requires ==) . fst)
linearize :: Graph gr => gr a W -> [a]
linearize gr = concat $ linearize' gr <$> starts gr
linearize' :: Graph gr => gr a W -> (Adj W, a) -> [a]
linearize' gr (o, a) = concat req ++ [a] ++ concat inv
where (req, inv) = fboth (linearize' gr . outs gr . snd) $ partW o
The ordering can then be optimized by removing equal consecutive like so:
-- | Remove consecutive elements which are equal to a previous element.
-- Runtime complexity: O(n), space: O(1)
removeConsequtiveEq :: Eq a => [a] -> [a]
removeConsequtiveEq = \case
[] -> []
[x] -> [x]
(h:t) -> h : ug h t
where
ug e = \case
[] -> []
(x:xs) | e == x -> ug x xs
(x:xs) | otherwise -> x : ug x xs
Edit: Using DCG, SCC, and topsort
With the algorithm described by #Cirdec :
Given a directed cyclic graph (DCG) where edges of form: (f, t) denote that f must come before t in the ordering.
Compute the condensation of the DCG in 1.
Turn each SSC in the condensation in 2. into a palindrome.
Compute the topsort of the graph in 3.
Concatenate the computed ordering.
In Haskell:
{-# LANGUAGE LambdaCase #-}
import Data.List (nub)
import Data.Maybe (fromJust)
import Data.Graph.Inductive.Graph
import Data.Graph.Inductive.PatriciaTree
import Data.Graph.Inductive.NodeMap
import Data.Graph.Inductive.Query.DFS
data MkEdge = MkEdge Bool Int Int
req = MkEdge True
inv = MkEdge False
toGraph :: [MkEdge] -> [(Int, Int, Bool)] -> Gr Int Bool
toGraph edges es = run_ empty nm
where ns = nub $ edges >>= \(MkEdge _ f t) -> [f, t]
nm = insMapNodesM ns >> insMapEdgesM es
-- | Make graph into a directed cyclic graph (DCG).
-- "Requires" denotes a forward edge.
-- "Invalidates" denotes a backward edge.
toDCG :: [MkEdge] -> Gr Int Bool
toDCG edges = toGraph edges $
(\(MkEdge w f t) -> if w then (t, f, w) else (f, t, w)) <$> edges
-- | Make a palindrome of the given list by computing: [1 .. n] ++ [n - 1 .. 1].
-- Runtime complexity: O(n).
palindrome :: [a] -> [a]
palindrome = \case
[] -> []
xs -> xs ++ tail (reverse xs)
linearize :: Gr Int a -> [Int]
linearize dcg = concat $ topsort' scc2
where scc = nmap (fmap (fromJust . lab dcg)) $ condensation dcg
scc2 = nmap palindrome scc
For the graph g2:
g2 = [ 2 `req` 1
, 2 `inv` 1
, 3 `req` 1
, 3 `inv` 1
, 4 `req` 1
, 5 `inv` 1
]
> prettyPrint $ toDCG g2
1:2->[(False,2)]
2:1->[(True,1),(True,3),(True,4)]
3:3->[(False,2)]
4:4->[]
5:5->[(False,2)]
> prettyPrint $ condensation $ toDCG g2
1:[5]->[((),2)]
2:[1,2,3]->[((),3)]
3:[4]->[]
> linearize $ toDCG g2
[5,2,1,3,1,2,4]
This ordering is neither minimal nor valid since the ordering violates the dependencies. 5 invalidates 1, which 2 depends on. 2 invalidates 1 which 4 depends on.
A valid and minimal ordering is: [1,4,2,1,3,5]. By shifting the list to the right, we get [5,1,4,2,1,3] which is also a valid ordering.
If the direction of the graph is flipped, the ordering becomes: [4,2,1,3,1,2,5]. This is not a valid ordering either... At the boundaries, 5 can happen, and then 4, but 5 invalidates 1 which 4 depends on.

I believe the following algorithm will find a minimal string of vertices in linear time:
Decompose the graph into its strongly connected components. Existing algorithms do this in linear time.
In each strongly connected component each node needs to be listed both before and after every other node. List the nodes [1..n] of each strongly connected component in the following order [1..n] ++ [n-1..1]
Concatenate the strongly connected components together in order by a topological sort. Existing algorithms topologically sort directed acylic graphs like this in linear time.

Related

Generate all unique directed graphs with 2 inputs to each node

I'm trying to generate all unique digraphs that fit a spec:
each node must have exactly 2 inputs
and are allowed arbitrarily many outputs to other nodes in the graph
My current solution is slow. Eg for 6 nodes, the algo has taken 1.5 days to get where I think it's complete, but it'll probably be checking for a few more days still.
My algorithm for a graph with n nodes:
generate all n-length strings of 0, where one symbol is a 1, eg, for n=3, [[0,0,1], [0,1,0], [1,0,0]]. These can be thought of as rows from an identity matrix.
generate all possible n * n matrixes where each row is all possible combinations of step 1. + step 1.
This is the connectivity matrix where each cell represents a connection from column-index to row-index
So, for n=3, these are possible:
[0,1,0] + [1,0,0] = [1,1,0]
[1,0,0] + [1,0,0] = [2,0,0]
These represent the inputs to a node, and by adding step 1 to itself, the result will always represent 2 inputs.
For ex:
A B C
A' [[0,1,1],
B' [0,2,0],
C' [1,1,0]]
So B and C connect to A once each: B -> A', C -> A',
And B connects to itself twice: B => B'
I only want unique ones, so for each connectivity matrix generated, I can only keep it if it is not isomorphic to an already-seen graph.
This step is expensive. I need to convert the graph to a "canonical form" by running through each permutation of isomorphic graphs, sorting them, and considering the first one as the "canonical form".
If anyone dives into testing any of this out, here are the count of unique graphs for n nodes:
2 - 6
3 - 44
4 - 475
5 - 6874
6 - 109,934 (I think, it's not done running yet but I haven't found a new graph in >24 hrs.)
7 - I really wanna know!
Possible optimizations:
since I get to generate the graphs to test, is there a way of ruling them out, without testing, as being isomorphic to already-seen ones?
is there a faster graph-isomorphism algorithm? I think this one is related to "Nauty", and there are others I've read of in papers, but I haven't had the expertise (or bandwidth) to implement them yet.
Here's a demonstrable connectivity matrix that can be plotted at graphonline.ru for fun, showing self connections, and 2 connections to t he same node:
1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 0,
0, 1, 0, 1, 0, 0,
0, 1, 2, 0, 0, 0,
0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0,
here's the code in haskell if you want to play with it, but I'm more concerned about getting the algorithm right (eg pruning down the search space), than the implementation:
-- | generate all permutations of length n given symbols from xs
npermutations :: [a] -> Int -> [[a]]
npermutations xs size = mapM (const xs) [1..size]
identity :: Int -> [[Int]]
identity size = scanl
(\xs _ -> take size $ 0 : xs) -- keep shifting right
(1 : (take (size - 1) (repeat 0))) -- initial, [1,0,0,...]
[1 .. size-1] -- correct size
-- | return all possible pairings of [Column]
columnPairs :: [[a]] -> [([a], [a])]
columnPairs xs = (map (\x y -> (x,y)) xs)
<*> xs
-- | remove duplicates
rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
rmdups' _ [] = []
rmdups' a (b : c) = if Set.member b a
then rmdups' a c
else b : rmdups' (Set.insert b a) c
-- | all possible patterns for inputting 2 things into one node.
-- eg [0,1,1] means cells B, and C project into some node
-- [0,2,0] means cell B projects twice into one node
binaryInputs :: Int -> [[Int]]
binaryInputs size = rmdups $ map -- rmdups because [1,0]+[0,1] is same as flipped
(\(x,y) -> zipWith (+) x y)
(columnPairs $ identity size)
transposeAdjMat :: [[Int]] -> [[Int]]
transposeAdjMat ([]:_) = []
transposeAdjMat m = (map head m) : transposeAdjMat (map tail m)
-- | AdjMap [(name, inbounds)]
data AdjMap a = AdjMap [(a, [a])] deriving (Show, Eq)
addAdjColToMap :: Int -- index
-> [Int] -- inbound
-> AdjMap Int
-> AdjMap Int
addAdjColToMap ix col (AdjMap xs) =
let conns = foldl (\c (cnt, i) -> case cnt of
1 -> i:c
2 -> i:i:c
_ -> c
)
[]
(zip col [0..]) in
AdjMap ((ix, conns) : xs)
adjMatToMap :: [[Int]] -> AdjMap Int
adjMatToMap cols = foldl
(\adjMap#(AdjMap nodes) col -> addAdjColToMap (length nodes) col adjMap)
(AdjMap [])
cols
-- | a graph's canonical form : http://mfukar.github.io/2015/09/30/haskellxiii.html
-- very expensive algo, of course
canon :: (Ord a, Enum a, Show a) => AdjMap a -> String
canon (AdjMap g) = minimum $ map f $ Data.List.permutations [1..(length g)]
where
-- Graph vertices:
vs = map fst g
-- Find, via brute force on all possible orderings (permutations) of vs,
-- a mapping of vs to [1..(length g)] which is minimal.
-- For example, map [1, 5, 6, 7] to [1, 2, 3, 4].
-- Minimal is defined lexicographically, since `f` returns strings:
f p = let n = zip vs p
in (show [(snd x, sort id $ map (\x -> snd $ head $ snd $ break ((==) x . fst) n)
$ snd $ take_edge g x)
| x <- sort snd n])
-- Sort elements of N in ascending order of (map f N):
sort f n = foldr (\x xs -> let (lt, gt) = break ((<) (f x) . f) xs
in lt ++ [x] ++ gt) [] n
-- Get the first entry from the adjacency list G that starts from the given node X
-- (actually, the vertex is the first entry of the pair, hence `(fst x)`):
take_edge g x = head $ dropWhile ((/=) (fst x) . fst) g
-- | all possible matrixes where each node has 2 inputs and arbitrary outs
binaryMatrixes :: Int -> [[[Int]]]
binaryMatrixes size = let columns = binaryInputs size
unfiltered = mapM (const columns) [1..size] in
fst $ foldl'
(\(keep, seen) x -> let can = canon . adjMatToMap $ x in
(if Set.member can seen
then keep
else id $! x : keep
, Set.insert can seen))
([], Set.fromList [])
unfiltered
There are a number of approaches you could try. One thing that I do note is that having loops with multi-edges (colored loops?) is a little unusual, but is probably just needs a refinement of existing techniques.
Filter the output of another program
The obvious candidate here is of course nAUTy/traces (http://pallini.di.uniroma1.it/) or similar (saucy, bliss, etc). Depending on how you want to do this, it could be as simple as run nauty (for example) and output to file, then read in the list filtering as you go.
For larger values of n this could start to be a problem if you are generating huge files. I'm not sure whether you start to run out of space before you run out of time, but still. What might be better is to generate and test them as you go, throwing away candidates. For your purposes, there may be an existing library for generation - I found this one but I have no idea how good it is.
Use graph invariants
A very easy first step to more efficient listing of graphs is to filter using graph invariants. An obvious one would be degree sequence (the ordered list of degrees of the graph). Others include the number of cycles, the girth, and so on. For your purposes, there might be some indegree/outdegree sequence you could use.
The basic idea is to use the invariant as a filter to avoid expensive checks for isomorphism. You can store the (list of ) invariants for already generated graphs, and check the new one against the list first. The canonical form of a structure is a kind of invariant.
Implement an algorithm
There are lost of GI algorithms, including the ones used by nauty and friends. However, they do tend to be quite hard! The description given in this answer is an excellent overview, but the devil is in the details of course.
Also note that the description is for general graphs, while you have a specific subclass of graph that might be easier to generate. There may be papers out there for digraph listing (generating) but I have not checked.

sort a list of numbers by their 'visual similarity'

consider a function, which rates the level of 'visual similarity' between two numbers: 666666 and 666166 would be very similar, unlike 666666 and 111111
type N = Int
type Rate = Int
similar :: N -> N -> Rate
similar a b = length . filter id . zipWith (==) a' $ b'
where a' = show a
b' = show b
similar 666666 666166
--> 5
-- high rate : very similar
similar 666666 111111
--> 0
-- low rate : not similar
There will be more sophisticated implementations for this, however this serves the purpose.
The intention is to find a function that sorts a given list of N's, so that each item is the most similar one to it's preceding item. Since the first item does not have a predecessor, there must be a given first N.
similarSort :: N -> [N] -> [N]
Let's look at some sample data: They don't need to have the same arity but it makes it easier to reason about it.
sample :: [N]
sample = [2234, 8881, 1222, 8888, 8822, 2221, 5428]
one could be tempted to implement the function like so:
similarSortWrong x xs = reverse . sortWith (similar x) $ xs
but this would lead to a wrong result:
similarSortWrong 2222 sample
--> [2221,1222,8822,2234,5428,8888,8881]
In the beginning it looks correct, but it's obvious that 8822 should rather be followed by 8881, since it's more similar that 2234.
So here's the implementation I came up with:
similarSort _ [] = []
similarSort x xs = x : similarSort a as
where (a:as) = reverse . sortWith (similar x) $ xs
similarSort 2222 sample
--> [2222,2221,2234,1222,8822,8888,8881]
It seems to work. but it also seems to do lot more more work than necessary. Every step the whole rest is sorted again, just to pick up the first element. Usually lazyness should allow this, but reverse might break this again. I'd be keen to hear, if someone know if there's a common abstraction for this problem.
It's relatively straightforward to implement the greedy algorithm you ask for. Let's start with some boilerplate; we'll use the these package for a zip-like that hands us the "unused" tail ends of zipped-together lists:
import Data.Align
import Data.These
sampleStart = "2222"
sampleNeighbors = ["2234", "8881", "1222", "8888", "8822", "2221", "5428"]
Instead of using numbers, I'll use lists of digits -- just so we don't have to litter the code with conversions between the form that's convenient for the user and the form that's convenient for the algorithm. You've been a bit fuzzy about how to rate the similarity of two digit strings, so let's make it as concrete as possible: any digits that differ cost 1, and if the digit strings vary in length we have to pay 1 for each extension to the right. Thus:
distance :: Eq a => [a] -> [a] -> Int
distance l r = sum $ alignWith elemDistance l r where
elemDistance (These l r) | l == r = 0
elemDistance _ = 1
A handy helper function will pick the smallest element of some list (by a user-specified measure) and return the rest of the list in some implementation-defined order.
minRestOn :: Ord b => (a -> b) -> [a] -> Maybe (a, [a])
minRestOn f [] = Nothing
minRestOn f (x:xs) = Just (go x [] xs) where
go min rest [] = (min, rest)
go min rest (x:xs) = if f x < f min
then go x (min:rest) xs
else go min (x:rest) xs
Now the greedy algorithm almost writes itself:
greedy :: Eq a => [a] -> [[a]] -> [[a]]
greedy here neighbors = here : case minRestOn (distance here) neighbors of
Nothing -> []
Just (min, rest) -> greedy min rest
We can try it out on your sample:
> greedy sampleStart sampleNeighbors
["2222","1222","2221","2234","5428","8888","8881","8822"]
Just eyeballing it, that seems to do okay. However, as with many greedy algorithms, this one only minimizes the local cost of each edge in the path. If you want to minimize the total cost of the path found, you need to use another algorithm. For example, we can pull in the astar package. For simplicity, I'm going to do everything in a very inefficient way, but it's not too hard to do it "right". We'll need a fair chunk more imports:
import Data.Graph.AStar
import Data.Hashable
import Data.List
import Data.Maybe
import qualified Data.HashSet as HS
Unlike before, where we only wanted the nearest neighbor, we'll now want all the neighbors. (Actually, we could probably implement the previous use of minRestOn using the following function and minimumOn or something. Give it a try if you're interested!)
neighbors :: (a, [a]) -> [(a, [a])]
neighbors (_, xs) = go [] xs where
go ls [] = []
go ls (r:rs) = (r, ls ++ rs) : go (r:ls) rs
We can now call the aStar search method with appropriate parameters. We'll use ([a], [[a]]) -- representing the current list of digits and the remaining lists that we can choose from -- as our node type. The arguments to aStar are then, in order: the function for finding neighboring nodes, the function for computing distance between neighboring nodes, the heuristic for how far we have left to go (we'll just say 1 for each unique element in the list), whether we've reached a goal node, and the initial node to start the search from. We'll call fromJust, but it should be okay: all nodes have at least one path to a goal node, just by choosing the remaining lists of digits in order.
optimal :: (Eq a, Ord a, Hashable a) => [a] -> [[a]] -> [[a]]
optimal here elsewhere = (here:) . map fst . fromJust $ aStar
(HS.fromList . neighbors)
(\(x, _) (y, _) -> distance x y)
(\(x, xs) -> HS.size (HS.fromList (x:xs)) - 1)
(\(_, xs) -> null xs)
(here, elsewhere)
Let's see it run in ghci:
> optimal sampleStart sampleNeighbors
["2222","1222","8822","8881","8888","5428","2221","2234"]
We can see that it's done better this time by adding a pathLength function that computes all the distances between neighbors in a result.
pathLength :: Eq a => [[a]] -> Int
pathLength xs = sum [distance x y | x:y:_ <- tails xs]
In ghci:
> pathLength (greedy sampleStart sampleNeighbors)
15
> pathLength (optimal sampleStart sampleNeighbors)
14
In this particular example, I think the greedy algorithm could have found the optimal path if it had made the "right" choices whenever there were ties for minimal next step; but I expect it is not too hard to cook up an example where the greedy algorithm is forced into bad early choices.

explain the Haskell breadth first numbering code to traverse trees

I am reading this paper by Chris Okasaki; titled "Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design".
A question is - how is the magic happening in the algorithm? There are some figures (e.g. figure 7 titled "threading the output of one level into the input of next level")
Unfortunately, maybe it's only me, but that figure has completely baffled me. I don't understand how the threading happens at all?
Breadth first traversal means traversing levels of a tree one by one. So let's assume we already know what are the numbers at the beginning of each level - the number of traversed elements so far before each level. For the simple example in the paper
import Data.Monoid
data Tree a = Tree (Tree a) a (Tree a)
| Empty
deriving (Show)
example :: Tree Char
example = Tree (Tree Empty 'b' (Tree Empty 'c' Empty)) 'a' (Tree Empty 'd' Empty)
the sizes would be 0, 1, 3, 4. Knowing this, we can thread such a list of sizes through a give tree (sub-tree) left-to-right: We advance the first element of the list by one for the node, and thread the tail of the list first through the left and then through the right subtree (see thread below).
After doing so, we'll get again the same list of sizes, only shifted by one - now we have the total number of elements after each level. So the trick is: Assume we have such a list, use it for the computation, and then feed the output as the input - tie the knot.
A sample implementation:
tagBfs :: (Monoid m) => (a -> m) -> Tree a -> Tree m
tagBfs f t = let (ms, r) = thread (mempty : ms) t
in r
where
thread ms Empty = (ms, Empty)
thread (m : ms) (Tree l x r) =
let (ms1, l') = thread ms l
(ms2, r') = thread ms1 r
in ((m <> f x) : ms2, Tree l' m r')
generalized to Monoid (for numbering you'd give const $ Sum 1 as the function).
One way to view tree numbering is in terms of a traversal. Specifically, we want to traverse the tree in breadth-first order using State to count up. The necessary Traversable instance looks something like this. Note that you'd probably actually want to define this instance for a newtype like BFTree, but I'm just using the raw Tree type for simplicity. This code is strongly inspired by ideas in Cirdec's monadic rose tree unfolding code, but the situation here seems to be substantially simpler. Hopefully I haven't missed something horrible.
{-# LANGUAGE DeriveFunctor,
GeneralizedNewtypeDeriving,
LambdaCase #-}
{-# OPTIONS_GHC -Wall #-}
module BFT where
import Control.Applicative
import Data.Foldable
import Data.Traversable
import Prelude hiding (foldr)
data Tree a = Tree (Tree a) a (Tree a)
| Empty
deriving (Show, Functor)
newtype Forest a = Forest {getForest :: [Tree a]}
deriving (Functor)
instance Foldable Forest where
foldMap = foldMapDefault
-- Given a forest, produce the forest consisting
-- of the children of the root nodes of non-empty
-- trees.
children :: Forest a -> Forest a
children (Forest xs) = Forest $ foldr go [] xs
where
go Empty c = c
go (Tree l _a r) c = l : r : c
-- Given a forest, produce a list of the root nodes
-- of the elements, with `Nothing` values in place of
-- empty trees.
parents :: Forest a -> [Maybe a]
parents (Forest xs) = foldr go [] xs
where
go Empty c = Nothing : c
go (Tree _l a _r) c = Just a : c
-- Given a list of values (mixed with blanks) and
-- a list of trees, attach the values to pairs of
-- trees to build trees; turn the blanks into `Empty`
-- trees.
zipForest :: [Maybe a] -> Forest a -> [Tree a]
zipForest [] _ts = []
zipForest (Nothing : ps) ts = Empty : zipForest ps ts
zipForest (Just p : ps) (Forest ~(t1 : ~(t2 : ts'))) =
Tree t1 p t2 : zipForest ps (Forest ts')
instance Traversable Forest where
-- Traversing an empty container always gets you
-- an empty one.
traverse _f (Forest []) = pure (Forest [])
-- First, traverse the parents. The `traverse.traverse`
-- gets us into the `Maybe`s. Then traverse the
-- children. Finally, zip them together, and turn the
-- result into a `Forest`. If the `Applicative` in play
-- is lazy enough, like lazy `State`, I believe
-- we avoid the double traversal Okasaki mentions as
-- a problem for strict implementations.
traverse f xs = (Forest .) . zipForest <$>
(traverse.traverse) f (parents xs) <*>
traverse f (children xs)
instance Foldable Tree where
foldMap = foldMapDefault
instance Traversable Tree where
traverse f t =
(\case {(Forest [r]) -> r;
_ -> error "Whoops!"}) <$>
traverse f (Forest [t])
Now we can write code to pair up each element of the tree with its breadth-first number like this:
import Control.Monad.Trans.State.Lazy
numberTree :: Tree a -> Tree (Int, a)
numberTree tr = flip evalState 1 $ for tr $ \x ->
do
v <- get
put $! (v+1)
return (v,x)

Breadth-First Search using State monad in Haskell

Recently, I've asked a question for building DFS tree from Graph in Stackoverflow and had learned that it can be simply implemented by using State Monad.
DFS in haskell
While DFS requires to track only visited nodes, so that we can use 'Set' or 'List' or some sort of linear data structure to track visited nodes, BFS requires 'visited node' and 'queue' data structure to be accomplished.
My pseudocode for BFS is
Q = empty queue
T = empty Tree
mark all nodes except u as unvisited
while Q is nonempty do
u = deq(Q)
for each vertex v ∈ Adj(u)
if v is not visited
then add edge (u,v) to T
Mark v as visited and enq(v)
As can be inferred from pseudocode, we only have to do 3 processes per iteration.
dequeue point from queue
add all unvisited neighbors of the point to current tree's child, queue and 'visited' list
repeat this for next in queue
Since we are not using recursive traversal for BFS search, we need some other traversal method such as while loop. I've looked up loop-while package in hackage, but it seems somewhat deprecated.
What I assume is that I require some sort of code like this :
{-...-}
... = evalState (bfs) ((Set.singleton start),[start])
where
neighbors x = Map.findWithDefault [] x adj
bfs =do (vis,x:queue)<-get
map (\neighbor ->
if (Set.member neighbor vis)
then put(vis,queue)
else put ((Set.insert neighbor vis), queue++[neighbor]) >> (addToTree neighbor)
) neighbors x
(vis,queue)<-get
while (length queue > 0)
I understand that this implementation is very erroneous but this should give minimalistic view for how I think BFS should be implemented. Also, I really don't know how to circumvent using while loop for do blocks.(i.e should I use recursive algorithm to overcome it or should I think of completely different strategy)
Considering one of the answer I've found in previous question linked above, it seems like the answer should look like this :
newtype Graph a = Graph (Map.Map a [a]) deriving (Ord, Eq, Show)
data Tree a = Tree a [Tree a] deriving (Ord, Eq, Show)
bfs :: (Ord a) => Graph a -> a -> Tree a
bfs (Graph adj) start = evalState (bfs') ((Set.singleton start),[start])
where
bfs' = {-part where I don't know-}
Finally, if such implementation for BFS using state monad is impossible due to some reason, (which I believe not to be) please correct my false assumption.
I've seen some of the examples for BFS in Haskell without using state monad but I want to learn more about how state monad can be processed and couldn't have found any of examples of BFS implemented using state monad.
Thanks in advance.
EDIT:
I came up with some sort of algorithm using state monad but I fall in infinite loop.
bfs :: (Ord a) => Graph a -> a -> Tree a
bfs (Graph adj) start = evalState (bfs' (Graph adj) start) (Set.singleton start)
bfs' :: (Ord a) => Graph a -> a -> State (Set.Set a) (Tree a)
bfs' (Graph adj) point= do
vis <- get
let neighbors x = Map.findWithDefault [] x adj
let addableNeighbors (x:xs) = if Set.member x vis
then addableNeighbors(xs)
else x:addableNeighbors(xs)
let addVisited (vis) (ns) = Set.union (vis) $ Set.fromList ns
let newVisited = addVisited vis $ addableNeighbors $ neighbors point
put newVisited
return (Tree point $ map (flip evalState newVisited) (map (bfs' (Graph adj)) $ addableNeighbors $ neighbors point))
EDIT2: With some expense of space complexity, I've came out with a solution to get BFS graph using graph to return and queue to process. Despite it is not the optimal solution for generating BFS tree/graph, it will work.
bfs :: (Ord a) => Graph a -> a -> Graph a
bfs (Graph adj) start = evalState (bfs' (Graph adj) (Graph(Map.empty)) [start]) (Set.singleton start)
bfs':: (Ord a) => Graph a -> Graph a -> [a] -> State (Set.Set a) (Graph a)
bfs' _ (Graph ret) [] = return (Graph ret)
bfs' (Graph adj) (Graph ret) (p:points)= do
vis <- get
let neighbors x = Map.findWithDefault [] x adj
let addableNeighbors ns
| null ns = []
| otherwise = if Set.member (head ns) vis
then addableNeighbors(tail ns)
else (head ns):addableNeighbors(tail ns)
let addVisited (v) (ns) = Set.union (v) $ Set.fromList ns
let unVisited = addableNeighbors $ neighbors p
let newVisited = addVisited vis unVisited
let unionGraph (Graph g1) (Graph g2) = Graph (Map.union g1 g2)
put newVisited
bfs' (Graph adj) (unionGraph (Graph ret) (Graph (Map.singleton p unVisited))) (points ++ unVisited)
EDIT3: I've added convert function for graph to tree. Running function in EDIT2, and EDIT3 will yield BFS Tree. It is not the best algorithm for computation time wise, but I believe it is intuitive and easy to understand for newbies like me :)
graphToTree :: (Ord a) => Graph a -> a -> Tree a
graphToTree (Graph adj) point = Tree point $ map (graphToTree (Graph adj)) $ neighbors point
where neighbors x = Map.findWithDefault [] x adj
Converting a graph into a Tree breadth-first is a bit more difficult than simply searching the graph breadth-first. If you are searching the graph, you only ever need to return from a single branch. When converting the graph into a tree, the result needs to include results from multiple branches.
We can use a more general type than Graph a for what we can search or convert to trees. We can search or convert to trees anything with a function a -> [a]. For a Graph we'd use the function (Map.!) m, where m is the Map. Searching with a transposition table has a signature like
breadthFirstSearchUnseen:: Ord r => (a -> r) -> -- how to compare `a`s
(a -> Bool) -> -- where to stop
(a -> [a]) -> -- where you can go from an `a`
[a] -> -- where to start
Maybe [a]
Converting the function to a tree that contains each reachable node at the earliest depth has a signature like
shortestPathTree :: Ord r => (a -> r) -> -- how to compare `a`s
(a -> l) -- what label to put in the tree
(a -> [a]) -> -- where you can go from an `a`
a -> -- where to start
Tree l
We can slightly more generally start at any number of nodes and build a Forest that contains each reachable node at the earliest depth.
shortestPathTrees :: Ord r => (a -> r) -> -- how to compare `a`s
(a -> l) -- what label to put in the tree
(a -> [a]) -> -- where you can go from an `a`
[a] -> -- where to start
[Tree l]
Searching
Performing the conversion to a tree doesn't really help us search, we can perform breadth first searches on the original graph.
import Data.Sequence (viewl, ViewL (..), (><))
import qualified Data.Sequence as Seq
import qualified Data.Set as Set
breadthFirstSearchUnseen:: Ord r => (a -> r) -> (a -> Bool) -> (a -> [a]) -> [a] -> Maybe [a]
breadthFirstSearchUnseen repr p expand = combine Set.empty Seq.empty []
where
combine seen queued ancestors unseen =
go
(seen `Set.union` (Set.fromList . map repr $ unseen))
(queued >< (Seq.fromList . map ((,) ancestors) $ unseen))
go seen queue =
case viewl queue of
EmptyL -> Nothing
(ancestors, a) :< queued ->
if p a
then Just . reverse $ ancestors'
else combine seen queued ancestors' unseen
where
ancestors' = a:ancestors
unseen = filter (flip Set.notMember seen . repr) . expand $ a
The state maintained in the above search algorithm is a Seq queue of what nodes to visit next and a Set of nodes that have already been seen. If we instead kept track of nodes that have already been visited, then we could visit the same node multiple times if we find multiple paths to the node at the same depth. There's a more complete explanation in the answer I wrote this breadth first search for.
We can easily write searching Graphs in terms of our general search.
import qualified Data.Map as Map
newtype Graph a = Graph (Map.Map a [a]) deriving (Ord, Eq, Show)
bfsGraph :: (Ord a) => Graph a -> (a -> Bool) -> [a] -> Maybe [a]
bfsGraph (Graph adj) test = breadthFirstSearchUnseen id test ((Map.!) adj)
We can also write how to search Trees themselves.
import Data.Tree
bfsTrees :: (Ord a) => (a -> Bool) -> [Tree a] -> Maybe [a]
bfsTrees test = fmap (map rootLabel) . breadthFirstSearchUnseen rootLabel (test . rootLabel) subForest
Building trees
Building trees breadth-first is a lot more difficult. Fortunately Data.Tree already provides ways to build Trees in breadth first order from a monadic unfold. The breadth first order will take care of the queuing, we will only need to keep track of the state for the nodes we've already seen.
unfoldTreeM_BF has the type Monad m => (b -> m (a, [b])) -> b -> m (Tree a). m is the Monad our computations will be in, b is the type of data we are going to build the tree based on, and a is the type for the labels of the tree. In order to use it to build a tree we need to make a function b -> m (a, [b]). We're going to rename a to l for label, and b to a, which is what we've been using for our nodes. We need to make an a -> m (l, [a]). For m, we'll use the State monad from transformers to keep track of some state; the state will be the Set of nodes whose representation r we've already seen; we'll be using the State (Set.Set r) monad. Overall, we need to provide a function a -> State (Set.Set r) (l, [a]).
expandUnseen :: Ord r => (a -> r) -> (a -> l) -> (a -> [a]) -> a -> State (Set.Set r) (l, [a])
expandUnseen repr label expand a = do
seen <- get
let unseen = filter (flip Set.notMember seen . repr) . uniqueBy repr . expand $ a
put . Set.union seen . Set.fromList . map repr $ unseen
return (label a, unseen)
To build the trees, we run the state computation built by unfoldForestM_BF
shortestPathTrees :: Ord r => (a -> r) -> (a -> l) -> (a -> [a]) -> [a] -> [Tree l]
shortestPathTrees repr label expand = run . unfoldForestM_BF k . uniqueBy repr
where
run = flip evalState Set.empty
k = expandUnseen repr label expand
uniqueBy is a nubBy that takes advantage of an Ord instance instead of Eq.
uniqueBy :: Ord r => (a -> r) -> [a] -> [a]
uniqueBy repr = go Set.empty
where
go seen [] = []
go seen (x:xs) =
if Set.member (repr x) seen
then go seen xs
else x:go (Set.insert (repr x) seen) xs
We can write building shortest path trees from Graphs in terms of our general shortest path tree building
shortestPathsGraph :: Ord a => Graph a -> [a] -> [Tree a]
shortestPathsGraph (Graph adj) = shortestPathTrees id id ((Map.!) adj)
We can do the same for filtering a Forest to only the shortest paths through the Forest.
shortestPathsTree :: Ord a => [Tree a] -> [Tree a]
shortestPathsTree = shortestPathTrees rootLabel rootLabel subForest
My solution is based on working level-by-level (wrt. to BFS), see also this question and answer.
The general idea is: Assume we already know the sets of visited elements prior each level of our BFS as a list of sets. Then we can traverse the graph, level by level, updating our list of sets, constructing the output Tree on the way.
The trick is that after such a level-by-level traversal, we'll have the sets of visited elements after each level. And this is the same as the list before each level, just shifted by one. So by tying the knot, we can use the shifted output as the input for the procedure.
import Control.Monad.State
import qualified Data.Map as M
import Data.Maybe (fromMaybe, catMaybes)
import qualified Data.Set as S
import Data.Tree
newtype Graph a = Graph (M.Map a [a])
deriving (Ord, Eq, Show)
tagBfs :: (Ord a) => Graph a -> a -> Maybe (Tree a)
tagBfs (Graph g) s = let (t, sets) = runState (thread s) (S.empty : sets)
in t
where
thread x = do
sets#(s : subsets) <- get
case M.lookup x g of
Just vs | not (S.member x s) -> do
-- recursively create sub-nodes and update the subsets list
let (nodes, subsets') = runState
(catMaybes `liftM` mapM thread vs) subsets
-- put the new combined list of sets
put (S.insert x s : subsets')
-- .. and return the node
return . Just $ Node x nodes
_ -> return Nothing -- node not in the graph, or already visited
Running tagBfs example2 'b' it on the following example
example2 :: Graph Char
example2 = Graph $ M.fromList
[ ('a', ['b', 'c', 'd'])
, ('b', ['a'])
, ('c', [])
, ('d', [])
]
yields
Just (Node {rootLabel = 'b',
subForest = [Node {rootLabel = 'a',
subForest = [Node {rootLabel = 'c',
subForest = []},
Node {rootLabel = 'd',
subForest = []}
]}
]}
)

Haskell: Optimising Graph processing algorithm

This is a follow up to this post, with code now based on Structuring Depth-First Search Algorithms in Haskell to do depth first search, by King and Launchbury in the 1990s. That paper suggests a generate and prune strategy, but uses a mutable array with a State Monad (with some grammar that I suspect has since been deprecated). The authors hint that a set could be used for remembering nodes visited, as the cost of an additional O(log n). I tried to implement with a set (we have better machines now than they did in the 1990s!), to use modern State Monad syntax, and to use Vectors rather than arrays (as I read that that is normally better).
As before, my code runs on small data sets, but fails to return on the 5m edge graph I need to analyse, and I'm looking for hints only as to the weakness operating at scale. What I do know is that the code operates comfortably within memory, so that is not the problem, but have I inadvertently slipped to O(n2)? (By contrast, the official implementation of this paper in the Data.Graph library (which I have lately also borrowed some code from) uses a mutable Array but fails on the big data set with a ... Stack Overflow!!!)
So now I have a Vector data store with IntSet State that does not complete and an Array with ST Monad Array 'official' one that crashes! Haskell should be able to do better than this?
import Data.Vector (Vector)
import qualified Data.IntSet as IS
import qualified Data.Vector as V
import qualified Data.ByteString.Char8 as BS
import Control.Monad.State
type Vertex = Int
type Table a = Vector a
type Graph = Table [Vertex]
type Edge = (Vertex, Vertex)
data Tree a = Node a (Forest a) deriving (Show,Eq)
type Forest a = [Tree a]
-- ghc -O2 -threaded --make
-- +RTS -Nx
generate :: Graph -> Vertex -> Tree Vertex
generate g v = Node v $ map (generate g) (g V.! v)
chop :: Forest Vertex -> State IS.IntSet (Forest Vertex)
chop [] = return []
chop (Node x ts:us) = do
visited <- contains x
if visited then
chop us
else do
include x
x1 <- chop ts
x2 <- chop us
return (Node x x1:x2)
prune :: Forest Vertex -> State IS.IntSet (Forest Vertex)
prune vs = chop vs
main = do
--edges <- V.fromList `fmap` getEdges "testdata.txt"
edges <- V.fromList `fmap` getEdges "SCC.txt"
let
-- calculate size of five largest SCC
maxIndex = fst $ V.last edges
gr = buildG maxIndex edges
sccRes = scc gr
big5 = take 5 sccRes
big5' = map (\l -> length $ postorder l) big5
putStrLn $ show $ big5'
contains :: Vertex -> State IS.IntSet Bool
contains v = state $ \visited -> (v `IS.member` visited, visited)
include :: Vertex -> State IS.IntSet ()
include v = state $ \visited -> ((), IS.insert v visited)
getEdges :: String -> IO [Edge]
getEdges path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]
vertices :: Graph -> [Vertex]
vertices gr = [1.. (V.length gr - 1)]
edges :: Graph -> [Edge]
edges g = [(u,v) | u <- vertices g, v <- g V.! u]
-- accumulate :: (a -> b -> a) -> Vector a-> Vector (Int, b)--> Vector a
-- accumulating function f
-- initial vector (of length m)
-- vector of index/value pairs (of length n)
buildG :: Int -> Table Edge -> Graph
buildG maxIndex edges = graph' where
graph = V.replicate (maxIndex + 1) []
--graph' = V.accumulate (\existing new -> new:existing) graph edges
-- flip f takes its (first) two arguments in the reverse order of f
graph' = V.accumulate (flip (:)) graph edges
mapT :: Ord a => (Vertex -> a -> b) -> Table a -> Table b
mapT = V.imap
outDegree :: Graph -> Table Int
outDegree g = mapT numEdges g
where numEdges v es = length es
indegree :: Graph -> Table Int
indegree g = outDegree $ transposeG g
transposeG :: Graph -> Graph
transposeG g = buildG (V.length g - 1) (reverseE g)
reverseE :: Graph -> Table Edge
reverseE g = V.fromList [(w, v) | (v,w) <- edges g]
-- --------------------------------------------------------------
postorder :: Tree a -> [a]
postorder (Node a ts) = postorderF ts ++ [a]
postorderF :: Forest a -> [a]
postorderF ts = concat (map postorder ts)
postOrd :: Graph -> [Vertex]
postOrd g = postorderF (dff g)
dfs :: Graph -> [Vertex] -> Forest Vertex
dfs g vs = map (generate g) vs
dfs' :: Graph -> [Vertex] -> Forest Vertex
dfs' g vs = fst $ runState (prune d) $ IS.fromList []
where d = dfs g vs
dff :: Graph -> Forest Vertex
dff g = dfs' g $ reverse (vertices g)
scc :: Graph -> Forest Vertex
scc g = dfs' g $ reverse $ postOrd (transposeG g)
Some small possible improvements:
Change
type Edge = (Vertex, Vertex)
to
data Edge = Edge {-# UNPACK #-} !Vertex {-# UNPACK #-} !Vertex
to reuse the memory usage for each edge from 7 words to 3 words and to improve cache locality. Reducing memory pressure almost always also improves runtime. As #jberryman mentioned could use an unboxed vector for Table Edge (then you don't need the above custom data type).
generate :: Graph -> Vertex -> Tree Vertex
generate g v = Node v $ map (generate g) (g V.! v)
If you're sure that the index is in bounds, you could use the unsafe indexing function from vector instead of .!.
contains :: Vertex -> State IS.IntSet Bool
contains v = state $ \visited -> (v `IS.member` visited, visited)
Use a combination of get and put $! instead.
include :: Vertex -> State IS.IntSet ()
include v = state $ \visited -> ((), IS.insert v visited)
Use modify' instead.
You're using quite a lot of lists in your program. Linked lists aren't the most memory/cache efficient data structures. See if you can convert your code to use more vectors.

Resources