I have a dependency graph that I have represented as a Map<Node, Collection<Node>> (in Java-speak, or f(Node n) -> Collection[Node] as a function; this is a mapping from a given node n to a collection of nodes that depend on n). The graph is potentially cyclic*.
Given a list badlist of nodes, I would like to solve a reachability problem: i.e. generate a Map<Node, Set<Node>> badmap that represents a mapping from each node N in the list badlist to a set of nodes which includes N or other node that transitively depends on it.
Example:
(x -> y means node y depends on node x)
n1 -> n2
n2 -> n3
n3 -> n1
n3 -> n5
n4 -> n2
n4 -> n5
n6 -> n1
n7 -> n1
This can be represented as the adjacency map {n1: [n2], n2: [n3], n3: [n1, n5], n4: [n2, n5], n6: [n1], n7: [n1]}.
If badlist = [n4, n5, n1] then I expect to get badmap = {n4: [n4, n2, n3, n1, n5], n5: [n5], n1: [n1, n2, n3, n5]}.
I'm floundering with finding graph algorithm references online, so if anyone could point me at an efficient algorithm description for reachability, I'd appreciate it. (An example of something that is not helpful to me is http://www.cs.fit.edu/~wds/classes/cse5081/reach/reach.html since that algorithm is to determine whether a specific node A is reachable from a specific node B.)
*cyclic: if you're curious, it's because it represents C/C++ types, and structures can have members which are pointers to the structure in question.
In Python:
def reachable(graph, badlist):
badmap = {}
for root in badlist:
stack = [root]
visited = set()
while stack:
v = stack.pop()
if v in visited: continue
stack.extend(graph[v])
visited.add(v)
badmap[root] = visited
return badmap
here's what I ended up using, based on #quaint's answer:
(requires a few Guava classes for convenience)
static public <T> Set<T> findDependencies(
T rootNode,
Multimap<T, T> dependencyGraph)
{
Set<T> dependencies = Sets.newHashSet();
LinkedList<T> todo = Lists.newLinkedList();
for (T node = rootNode; node != null; node = todo.poll())
{
if (dependencies.contains(node))
continue;
dependencies.add(node);
Collection<T> directDependencies =
dependencyGraph.get(node);
if (directDependencies != null)
todo.addAll(directDependencies);
}
return dependencies;
}
static public <T> Multimap<T,T> findDependencies(
Iterable<T> rootNodes,
Multimap<T, T> dependencyGraph)
{
Multimap<T, T> dependencies = HashMultimap.create();
for (T rootNode : rootNodes)
dependencies.putAll(rootNode,
findDependencies(rootNode, dependencyGraph));
return dependencies;
}
static public void testDependencyFinder()
{
Multimap<Integer, Integer> dependencyGraph =
HashMultimap.create();
dependencyGraph.put(1, 2);
dependencyGraph.put(2, 3);
dependencyGraph.put(3, 1);
dependencyGraph.put(3, 5);
dependencyGraph.put(4, 2);
dependencyGraph.put(4, 5);
dependencyGraph.put(6, 1);
dependencyGraph.put(7, 1);
Multimap<Integer, Integer> dependencies =
findDependencies(ImmutableList.of(4, 5, 1), dependencyGraph);
System.out.println(dependencies);
// prints {1=[1, 2, 3, 5], 4=[1, 2, 3, 4, 5], 5=[5]}
}
You maybe should build a reachability matrix from your adjacency list for fast searches. I just found the paper Course Notes for CS336: Graph Theory - Jayadev Misra
which describes how to build the reachability matrix from a adjacency matrix.
If A is your adjacency matrix, the reachability matrix would be R = A + A² + ... + A^n where n is the number of nodes in the graph. A², A³, ... can be calculated by:
A² = A x A
A³ = A x A²
...
For the matrix multiplication the logical or is used in place of + and the logical and is used in place of x. The complexity is O(n^4).
Ordinary depth-first search or breadth-first search will do the trick: execute it once for each bad node.
Here's a working Java solution:
// build the example graph
Map<Node, Collection<Node>> graph = new HashMap<Node, Collection<Node>>();
graph.put(n1, Arrays.asList(new Node[] {n2}));
graph.put(n2, Arrays.asList(new Node[] {n3}));
graph.put(n3, Arrays.asList(new Node[] {n1, n5}));
graph.put(n4, Arrays.asList(new Node[] {n2, n5}));
graph.put(n5, Arrays.asList(new Node[] {}));
graph.put(n6, Arrays.asList(new Node[] {n1}));
graph.put(n7, Arrays.asList(new Node[] {n1}));
// compute the badmap
Node[] badlist = {n4, n5, n1};
Map<Node, Collection<Node>> badmap = new HashMap<Node, Collection<Node>>();
for(Node bad : badlist) {
Stack<Node> toExplore = new Stack<Node>();
toExplore.push(bad);
Collection<Node> reachable = new HashSet<Node>(toExplore);
while(toExplore.size() > 0) {
Node aNode = toExplore.pop();
for(Node n : graph.get(aNode)) {
if(! reachable.contains(n)) {
reachable.add(n);
toExplore.push(n);
}
}
}
badmap.put(bad, reachable);
}
System.out.println(badmap);
Just like with Christian Ammer, you take for A the adjacency matrix and use Boolean arithmetic, when doing the following, where I is the identity matrix.
B = A + I;
C = B * B;
while (B != C) {
B = C;
C = B * B;
}
return B;
Furthermore, standard matrix multiplication (both arithmetical and logical) is O(n^3), not O(n^2). But if n <= 64, you can sort of get rid of one factor n, because you can do 64 bits in parallel on nowadays 64 bits machines. For larger graphs, 64 bits parallelism is useful, too, but shader techniques might even be better.
EDIT: one can do 128 bits in parallel with SSE instructions, with AVX even more.
Related
I am solving the question 684. Redundant Connection from from Leetcode with partial success (I am failing one of the testcases). The question is asked as following:
In this problem, a tree is an undirected graph that is connected and
has no cycles.
You are given a graph that started as a tree with n nodes labeled from
1 to n, with one additional edge added. The added edge has two
different vertices chosen from 1 to n, and was not an edge that
already existed. The graph is represented as an array edges of length
n where edges[i] = [ai, bi] indicates that there is an edge between
nodes ai and bi in the graph.
Return an edge that can be removed so that the resulting graph is a
tree of n nodes. If there are multiple answers, return the answer that
occurs last in the input.
I solved this using BFS which gives a time complexity of O(n^2), however I read that using union-find will give a time complexity of O(n). My try with using union-find+path compression is only partially working, and I am stuck figuring out why.
For these following data, my code is working correctly (Meaning my code returns correct edge):
However, for this testcase below, my code don't succeed in finding the correct edge (My union-find runs through all the edges and return false, meaning there was no redundant edges):
Input data: [[7,8],[2,6],[2,8],[1,4],[9,10],[1,7],[3,9],[6,9],[3,5],[3,10]]
From logs i can see that my union-find constructs the graph as following (with no detected redundant edges):
Here is my code:
class Solution {
fun findRedundantConnection(edges: Array<IntArray>): IntArray {
val parents = IntArray(edges.size + 1) // 1 to N, we dont use 0
for(i in 1..parents.size - 1)
parents[i] = i //each node is its own parent, since this is a undirected graph
val rank = IntArray(edges.size + 1){ 1 } //all nodes have rank 1 since they are their own parent
val res = IntArray(2)
for(edge in edges){
val (node1, node2) = edge
if(union(node1,node2, parents, rank, res) == false)
return intArrayOf(node1, node2)
}
return res
}
private fun find(
node: Int,
parents: IntArray
): Int{
var parent = parents[node]
while(parents[node] != parent){
parents[parent] = parents[parents[parent]] //path compression
parent = parents[parent]
}
return parent
}
//modified union which return false on redundant connection
private fun union(
node1: Int,
node2: Int,
parents: IntArray,
rank: IntArray,
res: IntArray
): Boolean{
val parent1 = find(node1, parents)
val parent2 = find(node2, parents)
if(parent1 == parent2){ //redundant connection
res[0] = node1
res[1] = node2
return false
}
if(rank[parent1] > rank[parent2]){
parents[parent2] = parent1
rank[parent1] += rank[parent2]
}else{ // rank[parent1] <= rank[parent2]
parents[parent1] = parent2
rank[parent2] += rank[parent1]
}
return true
}
}
Any suggestions on what the problem might be? I have not been able to figure it out.
var parent = parents[node]
while(parents[node] != parent){
You're never going to get into the while loop.
I'm solving a problem where you have N events (1 <= N <= 100000) over M days (2 <= M <= 10^9). You are trying to find the minimum time of occurrence for each event.
For each event, you know that it couldn't have occurred prior to a day Si. You also have C triples (1 <= C <= 10^5) described by (a, b, x). An event b must have occurred at least x days after a.
Example:
There are 4 events, spread over 10 days. Event 1 had to occur on Day 1 or after. Event 2 had to occur on Day 2 or after. Event 3 had to occur on Day 3 or after. Event 4 had to occur on Day 4 or after.
The triples are (1, 2, 5); (2, 4, 2); (3, 4, 4). This means that Event 2 had to occur at least 5 days after Event 1; Event 4 had to occur at least 2 days after Event 2; and Event 4 had to occur at least 4 days after Event 3.
The solution is that Event 1 occurred on Day 1; Event 2 occurred on Day 6; Event 3 occurred on Day 3; and Event 4 occurred on Day 4. The reasoning behind this is Event 2 occurred at least five days after Event 1, so it cannot have occurred before Day 1+5=6. Event 4 occurred at least two days after Event 2, so it cannot have occurred before Day 6+2=8.
My solution:
I had the idea to use the triples to create a Directed graph. So in the example above, the graph would look like this:
1 --5-> 2 --2-> 4
3 --4-> 4
Basically you create a directed edge from the Event that happened first to the Event that had to happen after. The edge weight would be the number of days it had to at least happen after.
I thought that we could would first use the input data to create the graph. Then, you would just Binary search on all possible starting dates of the first event (1 through 10^9, which is about 30). In this case, the first event is Event 1. Then, you would go through the graph and see if this starting date was possible. If you ever encountered an event where the date it was occurring was before its Si date, then you would terminate this search and continue binary searching. This solution would have worked easy if it wasn't for the "event b must have occurred AT LEAST x days after a".
Does anyone have any other solutions for solving this problem, or how to alter mine so that it works? Thank you! If you have any questions please let me know :))
This can be mapped to a Simple Temporal Network where literature is rich, e.g.:
Dechter, Rina, Itay Meiri, and Judea Pearl. "Temporal constraint networks." Artificial intelligence 49.1-3 (1991): 61-95..
Planken, Léon Robert. "Algorithms for simple temporal reasoning." (2013). full dissertation
As indicated in the comments, all-pairs shortest-paths can calculate the minimal-network (which also generates new arcs/constraints between all these events). If your graph is sparse, Johnson's algorithm is better than Floyd-Warshall.
If you don't care about the complete minimal-network, but only about the bounds of your events, you are only interested in the first column and the first row of the all-pairs shortest-paths distance matrix. You can calculate these values by applying Bellman-Ford *2*n* times. These values are the distances of root -> i and i -> root where root is time 0.
Just some remarks about things which Damien indicated (reasoning from scratch it seems: impressive):
we use negative weights in the general problem such that pure Dijkstra won't do
existance of negative cycle <-> infeasibility / no solution / inconsistent
there will be a need for some root vertex which is the origin of time
Edit: Above somewhat targets strong inference / propagation like giving tight bounds in regards to their value-domains.
If you are only interested in some consistent solution, it might be another idea just to post these constraints as linear-program and use one of the highly-optimized implementations to solve it (open-source world: CoinOR clp; maybe google's glop). Simplex-based ones should give you an integral solution (i think the problem is totally unimodular). Interior-point based solvers should be faster, but i'm not sure if your result will be integral without some additional need for cross-over. (might be a good idea to add some dummy-objective like min(max(x)) (makespan-like))
Consider a topological sort of your DAG.
For a list L corresponding to the toposort of your graph, you have at the end the leaves.
Then for a vertex just before
L = [..., v, leaves]
you know that the edges outoing from v can only go to the vertices after (here the leaves).
This allows you to compute the minimal weight associated to v by applying Damien's max.
Do so up to the head of L.
Topological sorting is O(V+E)
Here is an illustration with a more interesting graph (read it from top to bottom)
5
/ \
4 7
1 2
0
6
A topo ordering is (4601275)
So we will visit in order 4,6,0,1,2,7 then 5 and any vertex we visit has all its dependencies already computed.
Assume each vertex k has event occuring after 2^k days. The after date is referred as weight.
e.g vertex 4 is weighted 2^4
Assume each edge (i,j) is weighted 5*i + j
6 is weighted 2^6 = 64
0 is weighted max(2^0, 64 + (0*5+6)) = 70
1 takes max(2^1, 70 + 5) = 75
7 takes max(2^7, 75 + 5*7+1, 2^2) = 2^7
Point to be highlighted (here for 7) is that the minimal date induced by dependencies of a node may occur before the date attached to that node. (and we have to keep the biggest one)
function topologicalSort({ V, E }) {
const visited = new Set ()
const stack = []
function dfs (v) {
if (visited.has(v)) { return }
E.has(v) && E.get(v).forEach(({ to, w }) => dfs(to))
visited.add(v)
stack.push(v)
}
// process nodes without incoming edges first
const heads = new Set ([...V])
for (const v of V) {
const edges = E.get(v)
edges && edges.forEach(({ to }) => heads.delete(to))
}
for (const v of heads) {
dfs(v)
}
for (const v of V) {
dfs(v)
}
return stack
}
class G {
constructor () {
this.V = new Set()
this.E = new Map()
}
setEdges (from, tos) {
this.V.add(from)
tos.forEach(({ to, w }) => this.V.add(to))
this.E.set(from, tos)
}
}
function solve ({ g, vToWeight }) {
const stack = topologicalSort(g)
console.log('ordering', stack.join(''))
stack.forEach(v => {
const edges = g.E.get(v)
if (!edges) { return }
const newval = Math.max(
vToWeight.get(v),
...edges.map(({ to, w }) => vToWeight.get(to) + w)
)
console.log('setting best for', v, edges.map(({ to, w }) => [vToWeight.get(to), w].join('+') ))
vToWeight.set(v, newval)
})
return vToWeight
}
function demo () {
const g = new G ()
g.setEdges(2, [{ to: 1, w: 5 }])
g.setEdges(4, [{ to: 2, w: 2 }, { to: 3, w: 4 }])
const vToWeight = new Map ([
[1, 1],
[2, 6],
[3, 3],
[4, 4]
])
return { g, vToWeight }
}
function demo2 () {
const g = new G ()
const addEdges = (i, ...tos) => {
g.setEdges(i, tos.map(to => ({ to, w: 5 * i + to })))
}
addEdges(5,4,7)
addEdges(7,1,2)
addEdges(1,0)
addEdges(0,6)
const vToWeight = new Map ([...g.V].map(v => [v, 2**v]))
return { g, vToWeight }
}
function dump (map) {
return [...map].map(([k, v])=> k+'->'+v)
}
console.log('----op\s sol----\n',dump(solve(demo())))
console.log('----that case---\n',dump(solve(demo2())))
The distance matrix (between all pairs of events = nodes) can by obtained in a iterative way, similar to the Floyd algorithm. Basically, iteratively:
T(x, y) = max (T(x,y), T(x, z) +T (z, y))
However, as mentioned by the OP in a comment, Floyd algorithm is O(n^3), which is too much for a value of n up to 10^5.
A key point is that no loop exists, and therefore a more efficient algorithm should exist.
A nice proposal was made by grodzi in their proposal: use a topologic sort of the Directed Acyclic Graph (DAG).
I made an implementation in C++ according to this idea, with on main difference:
I used a simple sort (from C++ library) for building the topological sorting. Doing it is simple and has a complexity of O(n logn). The dedicated method proposed by grodzi could be more efficient (seems O(n)). However, it is very easy to implement and such a complexity remains low.
After the topological sorting, we know that a given event only depends on the events before it. For this part, this insures a complexity of O(C), where C is the number of triples, i.e. the number of edges.
#include <iostream>
#include <vector>
#include <set>
#include <unordered_set>
#include <algorithm>
#include <tuple>
#include <numeric>
struct Triple {
int event1;
int event2;
int days;
};
struct Pred {
int pred;
int days;
};
void print_result (const std::vector<int> &index, const std::vector<int> ×) {
int n = times.size();
for (int i = 0; i < n; i++) {
std::cout << index[i]+1 << " " << times[index[i]] << "\n";
}
}
std::tuple<std::vector<int>, std::vector<int>> ordering (int n, const std::vector<Triple> &triples) {
std::vector<int> index(n);
std::vector<int> times(n, 0);
std::iota(index.begin(), index.end(), 0);
// Build predecessors matrix and sets
std::vector<std::vector<Pred>> pred (n);
std::vector<std::unordered_set<int>> set_pred (n);
for (auto &triple: triples) {
pred[triple.event2 - 1].emplace_back(Pred{triple.event1 - 1, triple.days});
set_pred[triple.event2 - 1].insert(triple.event1 - 1);
}
// Topological sort
std::sort (index.begin(), index.end(), [&set_pred] (int &i, int &j) {return set_pred[j].find(i) != set_pred[j].end();});
// Iterative calculation of times of arrival
for (int i = 1; i < n; ++i) {
int ip = index[i];
for (auto &p: pred[ip]) {
times[ip] = std::max(times[ip], times[p.pred] + p.days);
}
}
// Final sort, according to times of arrival
std::sort (index.begin(), index.end(), [×] (int &i, int &j) {return times[i] < times[j];});
return {index, times};
}
int main() {
int n_events = 4;
std::vector<Triple> triples = {
{1, 2, 5},
{1, 3, 1},
{3, 2, 6},
{3, 4, 1}
};
std::vector<int> index(n_events);
std::vector<int> times(n_events);
std::tie (index, times) = ordering (n_events, triples);
print_result (index, times);
}
Result:
1 0
3 1
4 2
2 7
A person is currently at (0,0) and wants to reach (X,0) and he has to jump a few steps to reach his house.From a point say (a,0), he can jump to either (a + k1,0) i.e forward of k1 steps or he can jump(a-k2,0) i.e backward jump of k2 steps. The first jump he takes must be forward.Also,he cannot jump backward twice consecutively.But he can jump any no of continuous forward jump.There are n points a1,a2 upto an where he cannot jump.
I have to determine minimum no of jumps to reach his house or to conclude that he cannot reach his house. If he can reach house print yes and specify no. of jumps If not print no.
Here
X = location of persons house.
N = no. of points where he cannot jump.
k1 = forward jump.
k2 = backward jump.
example
For inputs
X=6 N=2 k1=4 k2=2
Blocked points = 3 5
the answer is 3 (4 to 8 to 6 or 4 to 2 to 6)
For input
6 2 5 2
1 3
the person cannot reach his house
N can be upto 10^4 and X can be upto 10^5
I thought of using dynamic programming but i'm not able to implement it. Can anyone help?
I think your direction of using dynamic programming can work but I will show another way to solve the question with the same asymptotic time complexity as dynamic programming would achieve.
This question can be described as a problem in graphs where you have X nodes indexed 1 to X and these is an edge between every a and a + k1, b and b - k2, where you remove the nodes in N.
This will be enough if you can jump backward how many times you would like but you cannot jump twice in a row so you can add the following modification: Duplicate the graph's nodes, duplicate also the forward going edges but make them go from the duplicated to the original, now make all of the backward going edges to go to the duplicated graph. Now every backward edge will send you to the duplicated and you will not be able to take a backward edge again until you will go to the original using a forward going edge. This will make sure that after a backward edge you will always take a forward edge - so you will not be able to jump forward twice.
Now finding the shortest path from 1 to X is like finding smallest number of jumps since edge is a jump.
Finding the shortest path in directed unweighted graph takes O(|V|+|E|) time and memory (using BFS), your graph has 2 * X as |V| and also the number of edges will be 2 * 2 * X so time and memory complexity of O(X).
If you can jump backward twice you use the networkx library in python for a simple demo (you can also use if for complicated demo):
import matplotlib.pyplot as plt
import networkx as nx
X = 6
N = 2
k1 = 4
k2 = 2
nodes = [0, 1, 2, 4, 6]
G = nx.DiGraph()
G.add_nodes_from(nodes)
for n in nodes:
if n + k1 in nodes:
G.add_edge(n, n + k1)
if n - k2 in nodes:
G.add_edge(n, n - k2)
nx.draw(G, with_labels=True, font_weight='bold')
plt.plot()
plt.show()
path = nx.shortest_path(G, 0, X)
print(f"Number of jumps: {len(path) - 1}. path: {str(path)}")
Would a breadth-first search be efficient enough?
Something like this? (Python code)
from collections import deque
def f(x, k1, k2, blocked):
queue = deque([(k1, 0, None, None)])
while (queue):
(p, depth, direction, prev) = queue.popleft()
if p in blocked or (x + k2 < p < x - k1): # not sure about these boundaries ... ideas welcome
continue
if p == x:
return depth
blocked.add(p) # visited
queue.append((p + k1, depth + 1, "left", direction))
if prev != "right":
queue.append((p - k2, depth + 1, "right", direction))
X = 6
k1 = 4
k2 = 2
blocked = set([3, 5])
print f(X, k1, k2, blocked)
X = 2
k1 = 3
k2 = 4
blocked = set()
print f(X, k1, k2, blocked)
Here is the code of גלעדברקן in c++:
#include <iostream>
#include <queue>
using namespace std;
struct node {
int id;
int depth;
int direction; // 1 is left, 0 is right
};
int BFS(int start, int end, int k1, int k2, bool blocked[], int length)
{
queue<node> q;
blocked[0] = true;
q.push({start, 0, 0});
while(!q.empty())
{
node f = q.front();
q.pop();
if (f.id == end) {
return f.depth;
}
if(f.id + k1 < length and !blocked[f.id + k1])
{
blocked[f.id + k1] = true;
q.push({f.id + k1, f.depth + 1, 0});
}
if (f.direction != 1) { // If you just went left - don't go left again
if(f.id - k2 >= 0 and !blocked[f.id - k2])
{
blocked[f.id - k2] = true;
q.push({f.id - k2, f.depth + 1, 1});
}
}
}
return -1;
}
int main() {
bool blocked[] = {false, false, false, false, false, false, false};
std::cout << BFS(0, 6, 4, 2, blocked, 7) << std::endl;
return 0;
}
You can control on the length of the steps, the start and end, and the blocked nodes.
Consider an infinite binary tree defined as follows.
For a node labelled v, let its left child be denoted 2*v and its right child 2*v+1. The root of the tree is labelled 1.
For a given n ranges [a_1, b_1], [a_2, b_2], ... [a_n, b_n] for which (a_i <= b_i) for all i, each range [a_i,b_i] denotes a set of all integers not less than a_i and not greater than b_i. For example, [5,9] would represent the set {5,6,7,8,9}.
For some integer T, let S represent the union [a_i, b_i] for all i up to n.
I need to find the number of unique pairs (irrespective of order) of elements x,y in S such that the lca(x,y) = T
(Wikipedia has a pretty good explanation of what the LCA of two nodes is.)
For example, for input:
A = {2, 12, 11}
B = {3, 13, 12}
T = 1
The output should be 6. (The ranges are [2,3], [12,13], and [11,12], and their union is the set {2,3,11,12,13}. Of all 20 possible pairs, exactly 6 of them ((2,3), (2,13), (3,11), (3,12), (11,13), and (12,13)) have an LCA of 1.)
And for input:
A = {1,7}
B = {2,15}
T = 3
The output should be 6. (The given ranges are [1,2] and [7,15], and their union is the set {1,2,7,8,9,10,11,12,13,14,15}. Of the 110 possible pairs, exactly 6 of them ((7,12), (7,13), (12,14), (12, 15), (13,14) and (13,15)) have an LCA of 3.)
Well, it is fairly simple to compute the LCA of two nodes in your notation, using this recursive method:
int lca(int a, int b) {
if(a == b) return a;
if(a < b) return lca(b, a);
return lca(a/2, b);
}
Now to find the union of the sets, we first need to be able to find what set a particular range represents. Lets introduce a factory method for this:
Set<Integer> rangeSet(int a, int b){
Set<Integer> result = new HashSet<Integer>(b-a);
for(int n = a; n <= b; n++) result.add(n);
return result;
}
This will return a Set<Integer> containing all the integers contained in the range.
To find the union of these sets, just addAll their elements to one set:
Set<Integer> unionSet(Set<Integer> ... sets){
Set<Integer> result = new HashSet<Integer>();
for(Set<Integer> s: sets)
result.addAll(s);
return result;
}
Now, we need to iterate over all possible pairs in the set:
pairLcaCount(int t, Set<Integer> nodes){
int result = 0;
for(int x: nodes)
for(int y: nodes)
if(x > y && lca(x,y) == t) result++;
return result;
}
Everything else is just glue logic, methods to convert from your input requirements to the ones taken here. For instance, something like:
Set<Integer> unionSetFromBoundsLists(int[] a, int[] b){
Set<Integer> [] ranges = new Set<Integer>[a.length];
for(int idx = 0; idx < ranges.length; idx++)
ranges[idx] = rangeSet(a[idx], b[idx]);
return unionSet(ranges);
}
Let say I have a graph where the nodes is stored in a sorted list. I now want to topological sort this graph while keeping the original order where the topological order is undefined.
Are there any good algorithms for this?
One possibility is to compute the lexicographically least topological order. The algorithm is to maintain a priority queue containing the nodes whose effective in-degree (over nodes not yet processed) is zero. Repeatedly dequeue the node with the least label, append it to the order, decrement the effective in-degrees of its successors, enqueue the ones that now have in-degree zero. This produces 1234567890 on btilly's example but does not in general minimize inversions.
The properties I like about this algorithm are that the output has a clean definition obviously satisfied by only one order and that, whenever there's an inversion (node x appears after node y even though x < y), x's largest dependency is larger than y's largest dependency, which is an "excuse" of sorts for inverting x and y. A corollary is that, in the absence of constraints, the lex least order is sorted order.
The problem is two-fold:
Topological sort
Stable sort
After many errors and trials I came up with a simple algorithm that resembles bubble sort but with topological order criteria.
I thoroughly tested the algorithm on full graphs with complete edge combinations so it can be considered as proven.
Cyclic dependencies are tolerated and resolved according to original order of elements in sequence. The resulting order is perfect and represents the closest possible match.
Here is the source code in C#:
static class TopologicalSort
{
/// <summary>
/// Delegate definition for dependency function.
/// </summary>
/// <typeparam name="T">The type.</typeparam>
/// <param name="a">The A.</param>
/// <param name="b">The B.</param>
/// <returns>
/// Returns <c>true</c> when A depends on B. Otherwise, <c>false</c>.
/// </returns>
public delegate bool TopologicalDependencyFunction<in T>(T a, T b);
/// <summary>
/// Sorts the elements of a sequence in dependency order according to comparison function with Gapotchenko algorithm.
/// The sort is stable. Cyclic dependencies are tolerated and resolved according to original order of elements in sequence.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">A sequence of values to order.</param>
/// <param name="dependencyFunction">The dependency function.</param>
/// <param name="equalityComparer">The equality comparer.</param>
/// <returns>The ordered sequence.</returns>
public static IEnumerable<T> StableOrder<T>(
IEnumerable<T> source,
TopologicalDependencyFunction<T> dependencyFunction,
IEqualityComparer<T> equalityComparer)
{
if (source == null)
throw new ArgumentNullException("source");
if (dependencyFunction == null)
throw new ArgumentNullException("dependencyFunction");
if (equalityComparer == null)
throw new ArgumentNullException("equalityComparer");
var graph = DependencyGraph<T>.TryCreate(source, dependencyFunction, equalityComparer);
if (graph == null)
return source;
var list = source.ToList();
int n = list.Count;
Restart:
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < i; ++j)
{
if (graph.DoesXHaveDirectDependencyOnY(list[j], list[i]))
{
bool jOnI = graph.DoesXHaveTransientDependencyOnY(list[j], list[i]);
bool iOnJ = graph.DoesXHaveTransientDependencyOnY(list[i], list[j]);
bool circularDependency = jOnI && iOnJ;
if (!circularDependency)
{
var t = list[i];
list.RemoveAt(i);
list.Insert(j, t);
goto Restart;
}
}
}
}
return list;
}
/// <summary>
/// Sorts the elements of a sequence in dependency order according to comparison function with Gapotchenko algorithm.
/// The sort is stable. Cyclic dependencies are tolerated and resolved according to original order of elements in sequence.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">A sequence of values to order.</param>
/// <param name="dependencyFunction">The dependency function.</param>
/// <returns>The ordered sequence.</returns>
public static IEnumerable<T> StableOrder<T>(
IEnumerable<T> source,
TopologicalDependencyFunction<T> dependencyFunction)
{
return StableOrder(source, dependencyFunction, EqualityComparer<T>.Default);
}
sealed class DependencyGraph<T>
{
private DependencyGraph()
{
}
public IEqualityComparer<T> EqualityComparer
{
get;
private set;
}
public sealed class Node
{
public int Position
{
get;
set;
}
List<T> _Children = new List<T>();
public IList<T> Children
{
get
{
return _Children;
}
}
}
public IDictionary<T, Node> Nodes
{
get;
private set;
}
public static DependencyGraph<T> TryCreate(
IEnumerable<T> source,
TopologicalDependencyFunction<T> dependencyFunction,
IEqualityComparer<T> equalityComparer)
{
var list = source as IList<T>;
if (list == null)
list = source.ToArray();
int n = list.Count;
if (n < 2)
return null;
var graph = new DependencyGraph<T>();
graph.EqualityComparer = equalityComparer;
graph.Nodes = new Dictionary<T, Node>(n, equalityComparer);
bool hasDependencies = false;
for (int position = 0; position < n; ++position)
{
var element = list[position];
Node node;
if (!graph.Nodes.TryGetValue(element, out node))
{
node = new Node();
node.Position = position;
graph.Nodes.Add(element, node);
}
foreach (var anotherElement in list)
{
if (equalityComparer.Equals(element, anotherElement))
continue;
if (dependencyFunction(element, anotherElement))
{
node.Children.Add(anotherElement);
hasDependencies = true;
}
}
}
if (!hasDependencies)
return null;
return graph;
}
public bool DoesXHaveDirectDependencyOnY(T x, T y)
{
Node node;
if (Nodes.TryGetValue(x, out node))
{
if (node.Children.Contains(y, EqualityComparer))
return true;
}
return false;
}
sealed class DependencyTraverser
{
public DependencyTraverser(DependencyGraph<T> graph)
{
_Graph = graph;
_VisitedNodes = new HashSet<T>(graph.EqualityComparer);
}
DependencyGraph<T> _Graph;
HashSet<T> _VisitedNodes;
public bool DoesXHaveTransientDependencyOnY(T x, T y)
{
if (!_VisitedNodes.Add(x))
return false;
Node node;
if (_Graph.Nodes.TryGetValue(x, out node))
{
if (node.Children.Contains(y, _Graph.EqualityComparer))
return true;
foreach (var i in node.Children)
{
if (DoesXHaveTransientDependencyOnY(i, y))
return true;
}
}
return false;
}
}
public bool DoesXHaveTransientDependencyOnY(T x, T y)
{
var traverser = new DependencyTraverser(this);
return traverser.DoesXHaveTransientDependencyOnY(x, y);
}
}
}
And a small sample application:
class Program
{
static bool DependencyFunction(char a, char b)
{
switch (a + " depends on " + b)
{
case "A depends on B":
return true;
case "B depends on D":
return true;
default:
return false;
}
}
static void Main(string[] args)
{
var source = "ABCDEF";
var result = TopologicalSort.StableOrder(source.ToCharArray(), DependencyFunction);
Console.WriteLine(string.Concat(result));
}
}
Given the input elements {A, B, C, D, E, F} where A depends on B and B depends on D the output is {D, B, A, C, E, F}.
UPDATE:
I wrote a small article about stable topological sort objective, algorithm and its proofing. Hope this gives more explanations and is useful to developers and researchers.
You have insufficient criteria to specify what you're looking for. For instance consider a graph with two directed components.
1 -> 2 -> 3 -> 4 -> 5
6 -> 7 -> 8 -> 9 -> 0
Which of the following sorts would you prefer?
6, 7, 8, 9, 0, 1, 2, 3, 4, 5
1, 2, 3, 4, 5, 6, 7, 8, 9, 0
The first results from breaking all ties by putting the lowest node as close to the head of the list as possible. Thus 0 wins. The second results from trying to minimize the number of times that A < B and B appears before A in the topological sort. Both are reasonable answers. The second is probably more pleasing.
I can easily produce an algorithm for the first. To start, take the lowest node, and do a breadth-first search to locate the distance to the shortest root node. Should there be a tie, identify the set of nodes that could appear on such a shortest path. Take the lowest node in that set, and place the best possible path from it to a root, and then place the best possible path from the lowest node we started with to it. Search for the next lowest node that is not already in the topological sort, and continue.
Producing an algorithm for the more pleasing version seems much harder. See http://en.wikipedia.org/wiki/Feedback_arc_set for a related problem that strongly suggests that it is, in fact, NP-complete.
Here's an easy iterative approach to topological sorting: continually remove a node with in-degree 0, along with its edges.
To achieve a stable version, just modify to: continually remove the smallest-index node with in-degree 0, along with its edges.
In pseudo-python:
# N is the number of nodes, labeled 0..N-1
# edges[i] is a list of nodes j, corresponding to edges (i, j)
inDegree = [0] * N
for i in range(N):
for j in edges[i]:
inDegree[j] += 1
# Now we maintain a "frontier" of in-degree 0 nodes.
# We take the smallest one until the frontier is exhausted.
# Note: You could use a priority queue / heap instead of a list,
# giving O(NlogN) runtime. This naive implementation is
# O(N^2) worst-case (when the order is very ambiguous).
frontier = []
for i in range(N):
if inDegree[i] == 0:
frontier.append(i)
order = []
while frontier:
i = min(frontier)
frontier.remove(i)
for j in edges[i]:
inDegree[j] -= 1
if inDegree[j] == 0:
frontier.append(j)
# Done - order is now a list of the nodes in topological order,
# with ties broken by original order in the list.
The depth-first search algorithm on Wikipedia worked for me:
const assert = chai.assert;
const stableTopologicalSort = ({
edges,
nodes
}) => {
// https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search
const result = [];
const marks = new Map();
const visit = node => {
if (marks.get(node) !== `permanent`) {
assert.notEqual(marks.get(node), `temporary`, `not a DAG`);
marks.set(node, `temporary`);
edges.filter(([, to]) => to === node).forEach(([from]) => visit(from));
marks.set(node, `permanent`);
result.push(node);
}
};
nodes.forEach(visit);
return result;
};
const graph = {
edges: [
[5, 11],
[7, 11],
[3, 8],
[11, 2],
[11, 9],
[11, 10],
[8, 9],
[3, 10]
],
nodes: [2, 3, 5, 7, 8, 9, 10, 11]
};
assert.deepEqual(stableTopologicalSort(graph), [5, 7, 11, 2, 3, 8, 9, 10]);
<script src="https://cdnjs.cloudflare.com/ajax/libs/chai/4.2.0/chai.min.js"></script>
Interpreting "stable topological sort" as a linearization of a DAG such that ranges in the linearization where the topological order doesn't matter, are sorted lexicographically. This can be solved with the DFS method of linearization, with the modification that nodes are visited in lexicographical order.
I have a Python Digraph class with a linearization method which looks like this:
def linearize_as_needed(self):
if self.islinearized:
return
# Algorithm: DFS Topological sort
# https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search
temporary = set()
permanent = set()
L = [ ]
def visit(vertices):
for vertex in sorted(vertices, reverse=True):
if vertex in permanent:
pass
elif vertex in temporary:
raise NotADAG
else:
temporary.add(vertex)
if vertex in self.arrows:
visit(self.arrows[vertex])
L.append(vertex)
temporary.remove(vertex)
permanent.add(vertex)
# print('visit: {} => {}'.format(vertices, L))
visit(self.vertices)
self._linear = list(reversed(L))
self._iter = iter(self._linear)
self.islinearized = True
Here
self.vertices
is the set of all vertices, and
self.arrows
holds the adjacency relation as a dict of left nodes to sets of right nodes.