Algorithm for allowing concurrent walk of a graph - algorithm

In a directed acyclic graph describing a set of tasks to process, i need to find all tasks that can be processed concurrently. The graph has no loops and is quite small (~1000 nodes, ~2000 edges), performance is not a primary concern.
Examples with desired result:
[] is a group. All tasks in a group must be processed before continuing
[x & y] means x and y can be processed concurrently (x and y in parallel)
x -> y means x and y must be processed sequentially (x before y)
1
a -> [b & c] -> c
2
[a & e] -> b -> c -> [d & f]
3
[ [a -> b] & [e -> f] ] -> [ [c -> d] & g ]
I do not want to actually execute the graph, but rather build a data structure that is as parallel as possible, while maintaining the order. The nomenclature and names of algorithms is not that familiar to me, so i'm having a hard time trying to find similar problems/solutions online.

Mathematically, I would frame this problem as finding a minimally defined series-parallel partial order extending the given partial order.
I would start by transitively reducing the graph and repeatedly applying two heuristics.
If x has one dependent y, and y has one dependency x, merge them into a new node z = [x → y].
If x and y have the same dependencies and dependents, merge them into a new node z = [x & y].
Now, if the input is already series-parallel, the result will be one node. In general, however, this will leave a graph that embeds an N-shaped structure like b → c, b → g, f → g from the last example in the question. This structure must be addressed by adding one or more of b → f, c → f, c → g, f → b, f → c, g → c. But in a different instance, this act would in turn create new N-shaped structures. There's no obvious notion of a closure, which is why this problem feels hard to me.
Some of these choices seem worse than others. For example, c → f forces the sequence b → c → f → g, whereas f → c is the only choice that doesn't increase the length of the critical path.
I guess what I'd try is,
If heuristics 1 and 2 have no targets, form a graph with edges x--y if and only if x and y have either a common dependent or a common dependency, compute the connected components of this graph, and &-merge the smallest component that isn't a singleton, followed by another transitive reduction.

Here's a solution i came up with (pseudocode):
sequence = []
for each (node, depth) in depthFirstSearch(graph)
sequence[depth].push(node)
return sequence
The sequence defines the order to process the graph. If an item in it contains more than one node, they can be processed concurrently.
While this allows for some concurrency, it does not advance as fast as it could. For example, f in the 3rd example in the question would require a to be completed first (as it will be at depth 1, when a and e are depth 0). Ideally work on f could start when e is done.

Related

Efficient way of generating graphs from source nodes

Let's say I have a graph G, and around each node I have a few source nodes xs. I have to create a new graph G' using xs=[[a, b, c], [d, e], [f]] nodes such that they won't conflict with grey donuts as shown in the figure below.
Expected output G' is [[a, d, f], [a, e, f], [b, e, f]]; all others are conflicting a gray donut.
I currently solved it by taking all permutation and combination of nodes xs. This works for smaller numbers of nodes, but as my number of nodes xs increases with bigger graph G, it soon becomes 100s of thousands of combination to try.
I am looking for an efficient algorithm which will help me speed things up and get me all the non-conflicting graphs with a minimum number of iterations.
You have a fairly obvious minimum set of edges for each stage of your path. They are both necessary and sufficient for your solution. For notational convenience, I'll label the original graph X--Y--Z. Your corresponding G' nodes are
X a b c
Y d f
Z f
You do this in two steps:
For each edge in G, you must test for validity each possible edge in G`. This consists of
X--Y [a, b, c] X [d, e]
a total of 6 edges; 3 qualify: set XY = [a--d, a--e, b--d]
Y--Z [d, e] X [f]
a total of 2 edges; 2 qualify: set YZ = [d--f, e--f]
Now, you need only generate all combinations of XY x YZ where the Y nodes match. If you sort the lists by the "inner" node, you can do this very quickly as
[a--d, b--d] x [d--f]
[a--e] x [e--f]
Most current languages have modules to perform combinations for you, so the code will be short enough.
Does that get you going?

Linear ordering of directed multigraph of dependencies allowing for duplicates

Problem description
Given vertices V which can be seen as named "propositions".
Given weights:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| Invalidates -- ^ Denotes that a "proposition" invalidates another.
In a linear ordering, if A requires B, then B must come before A, conversely, if A invalidates B, then B must come after A.
Given a weighted directed multigraph (multidigraph) with at most 2 parallel edges... Where a vertex can only require the inclusion of another vertex once, and only invalidates another vertex once...
G = (V, E)
E = (V, V, W)
Or alternatively represented as a directed cyclic graph with no self-loops and where the only cycles form directly between one vertex and another. With weights changed to:
data W
= Requires -- ^ Denotes that a "proposition" depends on another.
| InvalidatedBy -- ^ Denotes that a "proposition" is invalidated by another.
Given that vertices may occur more than once in the ordering...
How can a linear ordering be constructed from such a graph?
Additionally, if the tail of the linear ordering ends with a vertex V which was included due to being InvalidatedBy another vertex, then it may be omitted if the head of the ordering starts with V.
Some desired properties are:
Minimality - there should be as little duplication of vertices as possible
Stability - the ordering should be as similar as possible to the order between vertices on the same "level" in which the graph was constructed
Run-time complexity - The number of vertices are not that high, but still... the run-time complexity should be as low as possible.
If various algorithms fulfill these to varying degrees, I'd love to see all of them with their trade offs.
Algorithms written in any language, or pseudocode, are welcome.
Example graphs:
Example graph 1:
B `requires` A
C `requires` A
D `requires` A
E `invalidates` A
F `invalidates` A
G `invalidates` A
With minimal linear ordering: [A, B, C, D, E, F, G]
Example graph 2:
C `requires` A
C `invalidates` A
B `requires` A
With minimal linear ordering: [A, B, C]
Example graph 3:
B `requires` A
B `invalidates` A
C `requires` A
C `invalidates` A
With minimal linear ordering: [A, B, A, C]
Naive implementation
A naive implementation constructs a linear ordering by starting with all nodes with no incoming edges and for all of those nodes:
fetches all outgoing edges
partitions those by requires/invalidates
constructs the linear ordering of "requires" and puts that first
adds the current node
constructs the linear ordering of "invalidates" and adds that.
Here's a Haskell implementation of this description:
import Data.List (partition)
import Data.Maybe (fromJust)
import Control.Arrow ((***))
import Data.Graph.Inductive.Graph
fboth :: Functor f => (a -> b) -> (f a, f a) -> (f b, f b)
fboth f = fmap f *** fmap f
outs :: Graph gr => gr a b -> Node -> (Adj b, a)
outs gr n = let (_, _, l, o) = fromJust $ fst $ match n gr in (o, l)
starts :: Graph gr => gr a b -> [(Adj b, a)]
starts gr = filter (not . null . fst) $ outs gr <$> nodes gr
partW :: Adj W -> (Adj W, Adj W)
partW = partition ((Requires ==) . fst)
linearize :: Graph gr => gr a W -> [a]
linearize gr = concat $ linearize' gr <$> starts gr
linearize' :: Graph gr => gr a W -> (Adj W, a) -> [a]
linearize' gr (o, a) = concat req ++ [a] ++ concat inv
where (req, inv) = fboth (linearize' gr . outs gr . snd) $ partW o
The ordering can then be optimized by removing equal consecutive like so:
-- | Remove consecutive elements which are equal to a previous element.
-- Runtime complexity: O(n), space: O(1)
removeConsequtiveEq :: Eq a => [a] -> [a]
removeConsequtiveEq = \case
[] -> []
[x] -> [x]
(h:t) -> h : ug h t
where
ug e = \case
[] -> []
(x:xs) | e == x -> ug x xs
(x:xs) | otherwise -> x : ug x xs
Edit: Using DCG, SCC, and topsort
With the algorithm described by #Cirdec :
Given a directed cyclic graph (DCG) where edges of form: (f, t) denote that f must come before t in the ordering.
Compute the condensation of the DCG in 1.
Turn each SSC in the condensation in 2. into a palindrome.
Compute the topsort of the graph in 3.
Concatenate the computed ordering.
In Haskell:
{-# LANGUAGE LambdaCase #-}
import Data.List (nub)
import Data.Maybe (fromJust)
import Data.Graph.Inductive.Graph
import Data.Graph.Inductive.PatriciaTree
import Data.Graph.Inductive.NodeMap
import Data.Graph.Inductive.Query.DFS
data MkEdge = MkEdge Bool Int Int
req = MkEdge True
inv = MkEdge False
toGraph :: [MkEdge] -> [(Int, Int, Bool)] -> Gr Int Bool
toGraph edges es = run_ empty nm
where ns = nub $ edges >>= \(MkEdge _ f t) -> [f, t]
nm = insMapNodesM ns >> insMapEdgesM es
-- | Make graph into a directed cyclic graph (DCG).
-- "Requires" denotes a forward edge.
-- "Invalidates" denotes a backward edge.
toDCG :: [MkEdge] -> Gr Int Bool
toDCG edges = toGraph edges $
(\(MkEdge w f t) -> if w then (t, f, w) else (f, t, w)) <$> edges
-- | Make a palindrome of the given list by computing: [1 .. n] ++ [n - 1 .. 1].
-- Runtime complexity: O(n).
palindrome :: [a] -> [a]
palindrome = \case
[] -> []
xs -> xs ++ tail (reverse xs)
linearize :: Gr Int a -> [Int]
linearize dcg = concat $ topsort' scc2
where scc = nmap (fmap (fromJust . lab dcg)) $ condensation dcg
scc2 = nmap palindrome scc
For the graph g2:
g2 = [ 2 `req` 1
, 2 `inv` 1
, 3 `req` 1
, 3 `inv` 1
, 4 `req` 1
, 5 `inv` 1
]
> prettyPrint $ toDCG g2
1:2->[(False,2)]
2:1->[(True,1),(True,3),(True,4)]
3:3->[(False,2)]
4:4->[]
5:5->[(False,2)]
> prettyPrint $ condensation $ toDCG g2
1:[5]->[((),2)]
2:[1,2,3]->[((),3)]
3:[4]->[]
> linearize $ toDCG g2
[5,2,1,3,1,2,4]
This ordering is neither minimal nor valid since the ordering violates the dependencies. 5 invalidates 1, which 2 depends on. 2 invalidates 1 which 4 depends on.
A valid and minimal ordering is: [1,4,2,1,3,5]. By shifting the list to the right, we get [5,1,4,2,1,3] which is also a valid ordering.
If the direction of the graph is flipped, the ordering becomes: [4,2,1,3,1,2,5]. This is not a valid ordering either... At the boundaries, 5 can happen, and then 4, but 5 invalidates 1 which 4 depends on.
I believe the following algorithm will find a minimal string of vertices in linear time:
Decompose the graph into its strongly connected components. Existing algorithms do this in linear time.
In each strongly connected component each node needs to be listed both before and after every other node. List the nodes [1..n] of each strongly connected component in the following order [1..n] ++ [n-1..1]
Concatenate the strongly connected components together in order by a topological sort. Existing algorithms topologically sort directed acylic graphs like this in linear time.

sort a list of numbers by their 'visual similarity'

consider a function, which rates the level of 'visual similarity' between two numbers: 666666 and 666166 would be very similar, unlike 666666 and 111111
type N = Int
type Rate = Int
similar :: N -> N -> Rate
similar a b = length . filter id . zipWith (==) a' $ b'
where a' = show a
b' = show b
similar 666666 666166
--> 5
-- high rate : very similar
similar 666666 111111
--> 0
-- low rate : not similar
There will be more sophisticated implementations for this, however this serves the purpose.
The intention is to find a function that sorts a given list of N's, so that each item is the most similar one to it's preceding item. Since the first item does not have a predecessor, there must be a given first N.
similarSort :: N -> [N] -> [N]
Let's look at some sample data: They don't need to have the same arity but it makes it easier to reason about it.
sample :: [N]
sample = [2234, 8881, 1222, 8888, 8822, 2221, 5428]
one could be tempted to implement the function like so:
similarSortWrong x xs = reverse . sortWith (similar x) $ xs
but this would lead to a wrong result:
similarSortWrong 2222 sample
--> [2221,1222,8822,2234,5428,8888,8881]
In the beginning it looks correct, but it's obvious that 8822 should rather be followed by 8881, since it's more similar that 2234.
So here's the implementation I came up with:
similarSort _ [] = []
similarSort x xs = x : similarSort a as
where (a:as) = reverse . sortWith (similar x) $ xs
similarSort 2222 sample
--> [2222,2221,2234,1222,8822,8888,8881]
It seems to work. but it also seems to do lot more more work than necessary. Every step the whole rest is sorted again, just to pick up the first element. Usually lazyness should allow this, but reverse might break this again. I'd be keen to hear, if someone know if there's a common abstraction for this problem.
It's relatively straightforward to implement the greedy algorithm you ask for. Let's start with some boilerplate; we'll use the these package for a zip-like that hands us the "unused" tail ends of zipped-together lists:
import Data.Align
import Data.These
sampleStart = "2222"
sampleNeighbors = ["2234", "8881", "1222", "8888", "8822", "2221", "5428"]
Instead of using numbers, I'll use lists of digits -- just so we don't have to litter the code with conversions between the form that's convenient for the user and the form that's convenient for the algorithm. You've been a bit fuzzy about how to rate the similarity of two digit strings, so let's make it as concrete as possible: any digits that differ cost 1, and if the digit strings vary in length we have to pay 1 for each extension to the right. Thus:
distance :: Eq a => [a] -> [a] -> Int
distance l r = sum $ alignWith elemDistance l r where
elemDistance (These l r) | l == r = 0
elemDistance _ = 1
A handy helper function will pick the smallest element of some list (by a user-specified measure) and return the rest of the list in some implementation-defined order.
minRestOn :: Ord b => (a -> b) -> [a] -> Maybe (a, [a])
minRestOn f [] = Nothing
minRestOn f (x:xs) = Just (go x [] xs) where
go min rest [] = (min, rest)
go min rest (x:xs) = if f x < f min
then go x (min:rest) xs
else go min (x:rest) xs
Now the greedy algorithm almost writes itself:
greedy :: Eq a => [a] -> [[a]] -> [[a]]
greedy here neighbors = here : case minRestOn (distance here) neighbors of
Nothing -> []
Just (min, rest) -> greedy min rest
We can try it out on your sample:
> greedy sampleStart sampleNeighbors
["2222","1222","2221","2234","5428","8888","8881","8822"]
Just eyeballing it, that seems to do okay. However, as with many greedy algorithms, this one only minimizes the local cost of each edge in the path. If you want to minimize the total cost of the path found, you need to use another algorithm. For example, we can pull in the astar package. For simplicity, I'm going to do everything in a very inefficient way, but it's not too hard to do it "right". We'll need a fair chunk more imports:
import Data.Graph.AStar
import Data.Hashable
import Data.List
import Data.Maybe
import qualified Data.HashSet as HS
Unlike before, where we only wanted the nearest neighbor, we'll now want all the neighbors. (Actually, we could probably implement the previous use of minRestOn using the following function and minimumOn or something. Give it a try if you're interested!)
neighbors :: (a, [a]) -> [(a, [a])]
neighbors (_, xs) = go [] xs where
go ls [] = []
go ls (r:rs) = (r, ls ++ rs) : go (r:ls) rs
We can now call the aStar search method with appropriate parameters. We'll use ([a], [[a]]) -- representing the current list of digits and the remaining lists that we can choose from -- as our node type. The arguments to aStar are then, in order: the function for finding neighboring nodes, the function for computing distance between neighboring nodes, the heuristic for how far we have left to go (we'll just say 1 for each unique element in the list), whether we've reached a goal node, and the initial node to start the search from. We'll call fromJust, but it should be okay: all nodes have at least one path to a goal node, just by choosing the remaining lists of digits in order.
optimal :: (Eq a, Ord a, Hashable a) => [a] -> [[a]] -> [[a]]
optimal here elsewhere = (here:) . map fst . fromJust $ aStar
(HS.fromList . neighbors)
(\(x, _) (y, _) -> distance x y)
(\(x, xs) -> HS.size (HS.fromList (x:xs)) - 1)
(\(_, xs) -> null xs)
(here, elsewhere)
Let's see it run in ghci:
> optimal sampleStart sampleNeighbors
["2222","1222","8822","8881","8888","5428","2221","2234"]
We can see that it's done better this time by adding a pathLength function that computes all the distances between neighbors in a result.
pathLength :: Eq a => [[a]] -> Int
pathLength xs = sum [distance x y | x:y:_ <- tails xs]
In ghci:
> pathLength (greedy sampleStart sampleNeighbors)
15
> pathLength (optimal sampleStart sampleNeighbors)
14
In this particular example, I think the greedy algorithm could have found the optimal path if it had made the "right" choices whenever there were ties for minimal next step; but I expect it is not too hard to cook up an example where the greedy algorithm is forced into bad early choices.

Calculation of combinations/cartesian product of sets (without duplicates and without order restrictions)

I have a combinatorial problem that can be solved inefficiently using the cartesian
product of multiple sets. Concretely, I have multiple items and multiple elements that
satisfy each item. The problem consists of finding all possible combinations of elements
that satisfy all items. For example:
items -> elements
------------------------
1 -> {a,b} // a and b cover the item 1
2 -> {a,b} // a and b cover the item 2
3 -> {a,b,c} // a, b and c cover the item 3
4 -> {a,b,e,f} // a, b, e, f cover the item 4
Alternative representation:
element -> items covered
------------------------
a -> {1,2,3,4}
b -> {1,2,3,4}
c -> {3}
e -> {4}
f -> {4}
The goal is to find all combinations that cover items 1,2,3,4.
Valid solutions are:
{a},{a,b},{a,c},{a,e},{a,f},{a,b,c},{a,b,e},{a,b,f},{a,b,c,e},{a,b,c,f}
{b},{b,c},{b,e},{b,f},{b,c,e},{b,c,f}
Note that the order is not important, so {a,b} = {b,a} ({a,b} x {c,d} = {c,d} x {a,b}).
Also, note that {a,a,a,a}, {a,a,a,b}... are redundant combinations.
As you can see, this problem is similar to the set cover problem, where the universe
of elements for this example are the items U={1,2,3,4} and the set of subsets from U is S={ab={1,2,3,4},c={3},ef{4}}, where set {1,2,3,4} is the set of items covered by the element a and b, {3} is the set of elements covered by c, and {4} is the set of elements covered by elements e and f. However, the goal here is not finding the
minimal combination of sets from S that covers all elements from U, but finding all combinations of elements {a,b,c,e,f} that cover all items {1,2,3,4}.
A näive implementation could be done by performing a cartesian product between
sets for 1,2,3 and 4, and then filtering the combinations that are redundant. However,
this approach is very inefficient. Suppose I have this situation:
1 -> {a,b,c,d,e,f,g,h}
2 -> {a,b,c,d,e,f,g,h}
3 -> {a,b,c,d,e,f,g,h}
4 -> {a,b,c,d,e,f,g,h}
5 -> {a,b,c,d,e,f,g,h}
6 -> {a,b,c,d,e,f,g,h,i}
A cartesian product between the six sets will result in a 8^5*9=294912 combinations,
when there are actually many fewer combinations, which are: {a,b,c,d,e,f,g} U {a,b,c,d,e,f,g} x {i}.
Another way to solve this problem is to enumerate all elements, skipping
the combinations that are equivalent to other previously generated, and also
skipping repeated elements. This is kinda easy to compute and can be implemented
as an iterator that returns a combination at a time, but I don't know if there is
a better way to solve this problem, or if this problem was studied before.
How would you solve this problem?
First, realize that if a set of elements does not satisfy all items, neither does any of its subsets.
Second, realize that if a set satisfies all items, so do all its supersets.
Now, all you have to do is:
Let S be the set of all elements.
Let R be the empty set.
Define a function find( s, r ) which does:
If r includes s, return r.
If s does not satisfy all items, return r.
Otherwise add s to r.
For every item I in s,
let s' be s-I
let s be f(s', r)
return s.
Just call find(S,R) and you have your answer.
This method performs some duplicate tests, but always kills a branch whenever it is identified as such. This leads to a lot of pruning on a large set of elements.
Both lookup of whether r includes a particular set of elements and the check if s satisfies all items can be made very fast at the expense of extra memory.
What if you did this:
1 -> {a,b}
2 -> {b,c}
3 -> {a,b,c}
4 -> {a,e,f}
=>
a -> [1,3,4]
b -> [1,2,3]
c -> [2,3]
e -> [4]
f -> [4]
Then enumerate the combinations of the left side that provide (at least) [1,2,3,4]
For each item in the set of all-satisfying sets, enumerate combinations
with other items.
All-Satisfying-Sets: {{a,b},{b,e},{b,f}}
Combinations within All-Satisfiying-Sets: {{a,b,e},{a,b,f},{b,e,f},{a,b,e,f}}
Others: {c}
Combinations with Others: {{a,b,c},{b,e,c},{b,f,c}
,{a,b,e,c},{a,b,f,c},{b,e,f,c},{a,b,e,f,c}}
Or you could do this in Haskell:
import Data.List (union, subsequences, sort)
example1 = [(["a"],[1,2,3,4])
,(["b"],[1,2,3,4])
,(["c"],[3])
,(["e"],[4])
,(["f"],[4])]
example2 = [(["a"],[1,2,3,4,5,6])
,(["b"],[1,2,3,4,5,6])
,(["c"],[1,2,3,4,5,6])
,(["e"],[1,2,3,4,5,6])
,(["f"],[1,2,3,4,5,6])
,(["g"],[1,2,3,4,5,6])
,(["h"],[1,2,3,4,5,6])
,(["i"],[6])]
combs items list =
let unify (a,b) (a',b') = (sort (a ++ a'), sort (union b b'))
in map fst
. filter ((==items) . snd)
. map (foldr unify ([],[]))
. subsequences
$ list
OUTPUT:
*Main> combs [1..4] example1
[["a"],["b"],["a","b"],["a","c"],["b","c"],["a","b","c"],["a","e"],["b","e"],
["a","b","e"],["a","c","e"],["b","c","e"],["a","b","c","e"],["a","f"],["b","f"],
["a","b","f"],["a","c","f"],["b","c","f"],["a","b","c","f"],["a","e","f"],
["b","e","f"],["a","b","e","f"],["a","c","e","f"],["b","c","e","f"],
["a","b","c","e","f"]]

Can I insert data unsorted in Red-black tree?

While I'm still struggling to find a solution for this question, i have another one which maybe is easier. The following is the insert function of Okasaki red-black tree implementation. What I want to do is to keep the data unsorted as i insert into the tree. So the data always go to the leftmost/bottom-most leaf everytime i insert. There is no need to compare for x < y, x > y or x == y. It seems pretty straightforward at first by just removing these guards and only do: ins s#(T color a y b) = balance color (ins a) y b. The behaviour seems to be that the tree is kept balanced but the coloring becomes a bit messed up. And eventually that affects future inserts.. Any idea how this can be achieved? I think this could possibility my first step to my previous question. I just started playing with Haskell, so I am not getting it right straightforward. Thanks a lot.
insertSet x s = T B a y b
where ins E = T R E x E
ins s#(T color a y b) =
if x < y then balance color (ins a) y b
else if x > y then balance color a y (ins b)
else s
['d','a','s','f'] s
/\
a f
/
d (unsorted tree)
you can use my RBTree implementation in haskellDB,
http://hackage.haskell.org/package/RBTree
using the insert function:
insert :: (a -> a -> Ordering) -> RBTree a -> a -> RBTree a
feed it a (\_ _ -> LT) function, then you can always put new element into left-most place.

Resources