Time for 2 nodes to collide - algorithm

We are given a graph of N nodes. (1-N), where each node has exactly 1 directed edge to some node (this node can be the same node).
We need to answer the queries of type : A, B, which asks time required when 2 objects collide if one start at A and other start at B. Both moves 1 hop in 1 sec. If it's not possible for them to collide time would be -1.
Time : from X -> to Y : 1 hop = 1 second.
Constraints :
N, Q <= 10^5 (number of nodes, number of queries).
Example : for given graph
A -> B -> C -> D -> E
^ |
K <- F
Query(A, E) : 3 seconds, as at time t = 3 secs they both will be on node D.
Query(C, D) : -1 seconds, as they will never collide.
What's the optimal way to answer each query?
Brute Force Approach: time - O(Q * N)
Improved solution using binary lifting technique: time - O(Q * log(N))
private static int[] collisionTime(int N, int Q, int[] A, int[][] queries) {
// ancestor matrix : creation time- O(n * log(n))
int M = (int) (Math.ceil(Math.log10(N) / Math.log10(2))) + 1;
int[][] ancestor = new int[N + 1][M];
for(int i = 1; i <= N; i++) {
ancestor[i][0] = A[i]; // 2^0-th ancestor.
}
for(int j = 1; j < M; j++) {
for(int i = 1; i <= N; i++) {
ancestor[i][j] = ancestor[ancestor[i][j-1]][j-1];
}
}
int[] answer = new int[Q];
for(int i = 0; i < Q; i++) {
int u = queries[i][0];
int v = queries[i][1];
answer[i] = timeToCollide(u, v, ancestor);
}
return answer;
}
// using binary lifting: time- O(log(n))
private static int timeToCollide(int u, int v, int[][] ancestor) {
int m = ancestor[0].length;
// edge cases
if(u == v) // already in collision state
return 0;
if(ancestor[u][m-1] != ancestor[v][m-1]) // their top most ancestor is not the same means they cannot collide at all.
return -1;
int t = 0;
for(int j = m - 1; j >=0; j--) {
if(ancestor[u][j] != ancestor[v][j]) {
u = ancestor[u][j];
v = ancestor[v][j];
t += Math.pow(2, j);
}
}
return t + 1;
}

Find all the terminal cycles and designate an arbitrary vertex in each terminal cycle as the cycle root (O(N))
For each vertex, record the length of its terminal cycle, its distance to entry into the terminal cycle, and the distance to the terminal cycle root (O(N)).
Sever the outgoing link from each cycle root. This turns the graph into a forest.
For each query, find the closest (lowest) common ancestor of the two query nodes in this forest.
From the information saved about each query node and the lowest common ancestor, you can figure out the time to collision in constant time.
Step (4), the lowest common ancestor query, is a very well-studied problem. The best algorithms require only linear processing time and provide constant query time, leading to O(N + Q) time for this problem all together.

I believe that the following approach technically achieves O(N+Q) time complexity.
Observations
Subgraphs: The graph is not necessarily contiguous. All graphs consist or one or more disjoint contiguous complete subgraphs, meaning:
No nodes are shared between subgraphs ("disjoint")
All of the nodes in the subgraph are connected ("contiguous")
There are no paths connecting different subgraphs ("complete")
I will hereafter refer to these as the subgraphs of the graph or just "subgraphs". These subgraphs have the following additional properties, which are a consequence of their definition (above) and the types of nodes in the graph (they are all "parent-pointer nodes" with exactly one out-edge/pointer):
All such subgraphs must have exactly one cycle in them (because a cycle is the only way that they can terminate or be closed)
The cycle can be of any length cycle.Len >= 1
Additionally, there may be any number (t >= 0) trees attached to the cycle at their root (base)
All nodes are either in the cycle or in one of these trees (the roots of the trees are in the cycle, but also counted as part of a tree)
Terms:
cycle Length: The number of nodes in a cycle.
cycle Base: An arbitrarily chosen node in the cycle used to measure and distances between two nodes in the same cycle, and also any two nodes in the same subgraph.
tree Base: The base or root node of one of the trees attached to the cycle. As the tree base is also the node that attaches it to the cycle, tree base nodes are counted as being in the cycle (and also part of their tree).
Distance: For a node in the cycle, this is the distance (number of hops) from that node to the cycle Base (zero if it is the cycle Base). For a node in a tree (not counting tree Base nodes, which count as in the cycle), this is the distance from that node to the tree Base node.
Collisions Cases
Trivial
There are many ways or "forms" of collisions possible in a graph, but we can identify two trivial cases upfront:
(A, B) Relation
Collide?
Collision Distance
same node
Yes
0
different subgraphs
No
-1
Obviously, if A and B are the same node, then they trivially collide at distance zero. Likewise, if they are in two different subgraphs, then they can never collide because there are no connections between the subgraphs. For the collision cases that follow I will be assuming that these two cases have already been filtered out so that:
A and B are assumed to be different nodes, and
A and B are assumed to be in the same subgraph
Non-Trivial
The following table covers all of the other, non-trivial, cases of the relation between two nodes.
(A, B) Relation
Collide?
Collision Distance
Notes
same cycle
No
-1
nodes in cycle always stay the same distance apart
A in a tree & B in the cycle (or vice-versa)
if they both arrive at A's treeBase at the same time
-1 OR A.treeDist
if B.cycleDist = (A.treeDist MOD cycle.Len)
A and B are in different trees
if A and B's distance to their cycle.Base is equal MOD cycle.Len
MAX(A.treeDist, B.treeDist)
They meet when the farther one gets to the cycle (tree root)
A & B are in the same tree, but have different treeDist's
If their treeDist's are equal MOD cycle.Len
MAX(A.treeDist, B.treeDist)
They meet when the farther one gets to the cycle (tree root/base)
A & B are in the same tree, and have the same treeDist's
Yes
At their lowest common ancestor (LCA) in the tree
Have to search up the tree
One important principle applied several times above is that two different nodes in a cycle can never collide. This is because when each node follows its path around the cycle, they will always stay the same distance apart, there is no way for one node's path to "catch-up" to another's in the cycle. They can only "collide" if they start out in the cycle at the same node.
The consequences of this are that:
Two different nodes in the cycle can never collide.
A node in a tree can only collide with a node in a cycle, if their total distances to the cycle base are the same Modulo the cycle length (that is the remainder when divided by the cycle length).
This is also true for two nodes in different trees and two nodes in the same tree but with different distances to their tree base.
In all of these cases (from #2 and #3), they will collide when the node that is farthest from its tree Base gets to the cycle (which is also its tree base). This is because nodes in the cycle cannot "catch-up" to each other, so they must always be the same once they are both in the cycle. Thus they always collide when the farther one finally gets to the cycle.
Another important consequence is that every case in both tables above, except for the last one, can be answered in O(1) time, simply by annotating the nodes with so easily determined information:
their Base node (tree or cycle)
their Distance to that base node
the Subgraph they belong to
the Length of their subgraph's Cycle
These can all be easily determined when initializing the graph in O(1) time per node (or O(N) total time).
Approach
Nodes
After the nodes are initially loaded into the graph (MPDGraph object), then I annotate the nodes with the additional information listed above. This process (called "Mapping" in the code) proceeds as follows:
Pick any node
Follow it's path until it "terminates" by reaching a node already in it's path, or a node that was previously mapped
If #2 crossed it's own path, then we've found a new cycle. Designate the node we crossed as the base node of the cycle, and fill-in the mapping properties (cycle, base node, distance, etc.). Unwind our path one node at a time, filling in each node and marking it as InCycle as we go back up the path until we reach the base node again. Now we are ate the base of the tree that our path followed into the cycle, so when we move back to the pervious node in the path we switch to marking it as a tree node, until we return to the first node in our path.
If instead, #2 reached an already mapped node, then we well attach our path to that node and copy its tree/cycle, base etc. information to our current node. Then we will return back up our path as in #3, setting the mapping properties of each node as we go,
If there are any unmapped nodes, pick one and goto #2.
This all takes O(N) time.
Queries
We have a method (called MPDGraph.FindCollision) that given two nodes will apply the rules in the two Collision Cases tables above and return the result. For very case except the last (nodes in same tree and same distance) the distance can be determined in O(1) time by using the mapping properties.
If the two nodes are in the same tree and are also the same distance from their tree base, then they could meet anywhere between them and their common treeBase node. For this case the FindCollision(A,B) method calls the findTreeDistance(A,B) which:
Returns zero if they are the same node.
Otherwise it checks a cache of previously calculated distances to see if it has already been calculated for these two node. If so, then it returns that value.
Otherwise, it calls findTreeDistance passing in the parents of the current two nodes to get their distance, and adds one to that. Then it adds this to the cache and returns the value.
Without this memoization (i.e., the cache) this would take on average apprx. O(log N) for each query of this type. With the memoization it is hard to calculate but I would guess no worse than O(log log N) but for Q counts much larger than N, this will converge to O(1).
This makes the query processing time complexity somewhere between O(Q log log N) and O(Q), and the total time between O(N + Q(log log N)) and O(N + Q).
Code
public static int[] collisionTime(int N, int Q, int[] A, int[,] queries)
{
// create the graph and fill-in the mapping attributes for all nodes
var graph = new MPDGraph(A);
graph.MapAllNodes();
int[] answers = new int[queries.GetLength(0)];
for (int i = 0; i < answers.Length; i++)
{
answers[i] = graph.FindCollision(queries[i, 0], queries[i, 1]);
}
return answers;
}
This utilizes the following classes,
MPDGraph Class:
// MPDG: Mono-Pointing, Directed Graph
// An MPDG is a directed graph where every node has exactly one out-edge.
// (MPDG is my own term, I don't know the real name for these)
class MPDGraph
{
public Node[] Nodes;
Dictionary<(Node, Node), int> cachedDistances = new Dictionary<(Node, Node), int>();
// constructor
public MPDGraph(int[] Pointers)
{
Nodes = new Node[Pointers.Length];
// first fill the array with objects
for (int i = 0; i < Nodes.Length; i++) { Nodes[i] = new Node(i); }
// apply their pointers
for(int i = 0; i < Nodes.Length; i++) { Nodes[i].toNode = Nodes[Pointers[i]]; }
}
// map all of the nodes by filling their mapping properties
public void MapAllNodes()
{
for(int i=0; i<Nodes.Length; i++)
{
if (!Nodes[i].isMapped)
MapPath(Nodes[i], 1);
}
}
// recursively map a path of unmapped nodes, starting from curr
// returns true if curr is in a cycle, false otherwise
public Boolean MapPath(Node curr, int pathNo)
{
Boolean inCycle = false;
curr.pathNo = pathNo;
Node next = curr.toNode;
if (next.IsInPath)
{
// we have found a new cycle
Cycle Cycle = new Cycle(this, next, curr.pathNo - next.pathNo + 1);
curr.Map(Cycle);
return true;
}
else if (next.isMapped)
{
// we are joining an already partially mapped tree
if (next.IsInCycle)
{
// next is a tree-Base, the top of our tree and also in the cycle
curr.Map(next.Cycle, false, next, 1);
}
else
{
// next is a normal tree-node
curr.Map(next.Cycle, false, next.BaseNode, next.Distance + 1);
}
return false;
}
else
{
// continue forward on the path, recurse to the next node
inCycle = MapPath(next, pathNo+1);
if (inCycle)
{
if (next.Cycle.Base == next || next.Distance == 0)
{
//we have returned from the cycleBase, which is also a treeBase
// so, switch from Cycle to Tree
curr.Map(next.Cycle, false, next, 1);
return false;
}
else
{
// still in the cycle
curr.Map(next.Cycle);
}
}
else
{
//returned in tree
curr.Map(next.Cycle, false, next.BaseNode, next.Distance + 1);
}
return inCycle;
}
}
// Given two starting nodes, determine how many steps it takes until their
// paths collide. Returns -1 if they will never collide.
public int FindCollision(int index1, int index2)
{
Node node1 = Nodes[index1];
Node node2 = Nodes[index2];
// eliminate trivial cases
if (node1.Cycle != node2.Cycle)
return -1; // cant collide if they're in different subgraphs
else if (node1 == node2)
return 0; // if they're the same node, then distance = 0
else if (node1.IsInCycle && node2.IsInCycle)
return -1; // different nodes in a cycle never collide
else
{ // they're both in the same subgraph, use math to tell if they collide
// get their distances to the cycle base
int dist1 = node1.Distance + (node1.IsInCycle ? 0 : node1.BaseNode.Distance);
int dist2 = node2.Distance + (node2.IsInCycle ? 0 : node2.BaseNode.Distance);
int cycleLen = node1.Cycle.Length;
// use math: modulo(cycle length)
if ((dist1 % cycleLen) != (dist2 % cycleLen))
{
return -1; // incompatible distances: cannot possibly collide
}
else
{
// they must collide somewhere, figure out how far that is
if (node1.IsInCycle || node2.IsInCycle)
{
// if one is in the cycle, they will collide when
// the other one reaches the cycle (it's treeBase)
return (!node1.IsInCycle ? node1.Distance : node2.Distance);
}
else if (node1.BaseNode != node2.BaseNode)
{
// They are in different trees: they will collide at
// the treeBase of the node that is farther
return Math.Max(node1.Distance, node2.Distance);
}
else
{
// They are in the same tree:
if (node1.Distance != node2.Distance)
{
//if they are in the same tree, but have different distances
// to the treeBase, then they will collide at the treeBase
// when the farther one arrives at the treeBase
return Math.Max(node1.Distance, node2.Distance);
}
else
{
// the hard case, have to walk down their paths
// to find their LCA (Lowest Common Ancestor)
return findTreeDistance(node1, node2);
}
}
}
}
}
int findTreeDistance(Node node1, Node node2)
{
if (node1 == node2) return 0;
// normalize their order
if (node1.index > node2.index)
{
var tmp = node1;
node1 = node2;
node2 = tmp;
}
// check the cache
int dist;
if (cachedDistances.ContainsKey((node1,node2)))
{
dist = cachedDistances[(node1, node2)];
}
else
{
// keep recursing to find where they meet
dist = findTreeDistance(node1.toNode, node2.toNode) + 1;
// cache this new distance
cachedDistances.Add((node1, node2), dist);
}
return dist;
}
}
Node Class:
// Represents a node in the MPDG (Mono-Pointing Directed Graph) with the constraint
// that all nodes have exactly one out-edge ("toNode").
class Node
{
// Primary properties (from input)
public int index { get; set; } // this node's unique index (to the original array)
public Node toNode { get; set; } // what our single out-edge is pointing to
public Node(int Index_) { this.index = Index_; }
// Mapping properties
// (these must be filled-in to finish mapping the node)
// The unique cycle of this node's subgraph (all MPDG-subgraphs have exactly one)
public Cycle Cycle;
// Every node is either in their subgraphs cycle or in one of the inverted
// trees whose apex (base) is attached to it. Only valid when BaseNode is set.
// (tree base nodes are counted as being in the cycle)
public Boolean IsInCycle = false;
// The base node of the tree or cycle that this node is in.
// If (IsInCycle) then it points to this cycle's base node (cycleBase)
// Else it points to base node of this node's tree (treeBase)
public Node BaseNode;
// The distance (hops) from this node to the BaseNode
public int Distance = -1; // -1 if not yet mapped
// Total distance from this node to the cycleBase
public int TotalDistance { get { return Distance + (IsInCycle ? 0 : BaseNode.Distance); } }
// housekeeping for mapping nodes
public int pathNo = -1; // order in our working path
// Derived (implicit) properties
public Boolean isMapped { get { return Cycle != null; } }
public Boolean IsInPath { get { return (pathNo >= 0); } }
public void Map(Cycle Cycle, bool InCycle = true, Node BaseNode = null, int distance_ = -1)
{
this.Cycle = Cycle;
IsInCycle = InCycle;
if (InCycle)
{
this.BaseNode = Cycle.Base;
Distance = (Cycle.Length + (Cycle.Base.pathNo - pathNo)) % Cycle.Length;
}
else
{
this.BaseNode = BaseNode;
Distance = distance_;
}
pathNo = -1; // clean-up the path once we're done
}
}
Cycle Class:
// represents the cycle of a unique MPDG-subgraph
// (should have one of these for each subgraph)
class Cycle
{
public MPDGraph Graph; // the MPDG that contains this cycle
public Node Base; // the base node of a subgraph's cycle
public int Length; // the length of this cycle
public Cycle(MPDGraph graph_, Node base_, int length_)
{
Graph = graph_;
Base = base_;
Length = length_;
}
}
Performance Measurements:
Node Count
Build & Map Graphmean microsecs
Question Count
All Questions mean microsecs
Question mean microseconds
Total mean microseconds
50
0.9
1225
26
0.0212
26.9
500
10.1
124750
2267
0.0182
2277.1
1000
23.4
499500
8720
0.0175
8743.4
5000
159.6
12497500
229000
0.0183
229159.6
10000
345.3
49995000
793212
0.0159
793557.3

It is only possible for a collision to occur on a node that has more than 1 link leading to it. Node D in your example.
Let's call these nodes "crash sites"
So you can prune your graph down to just the crash site nodes. The nodes that lead to the crash site nodes become attributes of the crash site nodes.
Like this for your example:
D : { A,B,C }, { E,F,K }
A collision can ONLY occur if the starting nodes are on two different attribute lists of the same crash site node.
Once you are sure a crash can occurr, then you can check that both starting nodes are the same distance from the crash site.
Algorithm:
Prune graph to crash site nodes
LOOP over questions
Get 2 start nodes
LOOP over crash sites
IF start nodes on two different attributes of crash site
IF start nodes are equi-distant from crash site
report crash time
BREAK from crash site loop
Here is a randomly generated graph with 50 nodes where every node has one out edge connected to another node chosen randomly
The collision sites are
5 7 8 9 10 11 18 19 23 25 31 33 36 37 39
So the algorithm need only loop over 15 nodes, at most, instead of 50.
The answer to the question 'do two particular nodes collide?' is almost always 'NO'. It is kind of boring that way. So let's ask a slightly different question: 'for a particular graph, which pairs of nodes result in a collision?' This requires the same algorithm ( applied to every pair of nodes ) but alway produces an interesting answer.
for this graph
I get this answer
0 and 29 collide at 41
1 and 5 collide at 40
2 and 23 collide at 13
8 and 16 collide at 34
8 and 22 collide at 34
8 and 39 collide at 34
9 and 30 collide at 37
10 and 31 collide at 25
11 and 47 collide at 23
12 and 28 collide at 25
12 and 35 collide at 25
12 and 49 collide at 25
13 and 38 collide at 27
14 and 44 collide at 1
15 and 17 collide at 0
15 and 18 collide at 0
15 and 37 collide at 0
16 and 22 collide at 34
16 and 39 collide at 34
17 and 18 collide at 0
17 and 37 collide at 0
18 and 37 collide at 0
20 and 26 collide at 9
20 and 42 collide at 9
20 and 43 collide at 9
21 and 45 collide at 24
22 and 39 collide at 34
25 and 34 collide at 3
26 and 42 collide at 9
26 and 43 collide at 9
28 and 35 collide at 25
28 and 49 collide at 25
32 and 48 collide at 34
33 and 36 collide at 7
35 and 49 collide at 25
42 and 43 collide at 9
Some timing results
Node Count
Crash Sitesmillisecs
Question Count
Question meanmicrosecs
50
0.4
1225
0.02
500
50
124750
0.02
5000
5500
~12M
0.02
10000
30000
~50M
0.03
30000
181000
~450M
0.6
Notes:
The mean time for a question is the average of checking every possible pair of nodes for a possible collision.
Answering a single question is extremely fast, about 20 nanoseconds for moderately sized graphs ( < 10,000 nodes ) [ A previous timing report included outputting the results when a collision was found, which takes much longer than detecting the collision. These results were taken with all output to the console commented out. ]
Setting up the crash sites and their tributaries gets slow with moderately sized graphs ( > 5,000 nodes ). It is only worth doing if a lot of questions are going to be asked.
The code for this is available at https://github.com/JamesBremner/PathFinder

Related

How to optimise my solution to HackerRank's Largest Rectangle problem? [duplicate]

I have a histogram with integer heights and constant width 1. I want to maximize the rectangular area under a histogram.
e.g.:
_
| |
| |_
| |
| |_
| |
The answer for this would be 6, 3 * 2, using col1 and col2.
O(n^2) brute force is clear to me, I would like an O(n log n) algorithm. I'm trying to think dynamic programming along the lines of maximum increasing subsequence O(n log n) algo, but am not going forward. Should I use divide and conquer algorithm?
PS: People with enough reputation are requested to remove the divide-and-conquer tag if there is no such solution.
After mho's comments: I mean the area of largest rectangle that fits entirely. (Thanks j_random_hacker for clarifying :) ).
The above answers have given the best O(n) solution in code, however, their explanations are quite tough to comprehend. The O(n) algorithm using a stack seemed magic to me at first, but right now it makes every sense to me. OK, let me explain it.
First observation:
To find the maximal rectangle, if for every bar x, we know the first smaller bar on its each side, let's say l and r, we are certain that height[x] * (r - l - 1) is the best shot we can get by using height of bar x. In the figure below, 1 and 2 are the first smaller of 5.
OK, let's assume we can do this in O(1) time for each bar, then we can solve this problem in O(n)! by scanning each bar.
Then, the question comes: for every bar, can we really find the first smaller bar on its left and on its right in O(1) time? That seems impossible right? ... It is possible, by using a increasing stack.
Why using an increasing stack can keep track of the first smaller on its left and right?
Maybe by telling you that an increasing stack can do the job is not convincing at all, so I will walk you through this.
Firstly, to keep the stack increasing, we need one operation:
while x < stack.top():
stack.pop()
stack.push(x)
Then you can check that in the increasing stack (as depicted below), for stack[x], stack[x-1] is the first smaller on its left, then a new element that can pop stack[x] out is the first smaller on its right.
Still can't believe stack[x-1] is the first smaller on the left on stack[x]?
I will prove it by contradiction.
First of all, stack[x-1] < stack[x] is for sure. But let's assume stack[x-1] is not the first smaller on the left of stack[x].
So where is the first smaller fs?
If fs < stack[x-1]:
stack[x-1] will be popped out by fs,
else fs >= stack[x-1]:
fs shall be pushed into stack,
Either case will result fs lie between stack[x-1] and stack[x], which is contradicting to the fact that there is no item between stack[x-1] and stack[x].
Therefore stack[x-1] must be the first smaller.
Summary:
Increasing stack can keep track of the first smaller on left and right for each element. By using this property, the maximal rectangle in histogram can be solved by using a stack in O(n).
Congratulations! This is really a tough problem, I'm glad my prosaic explanation didn't stop you from finishing. Attached is my proved solution as your reward :)
def largestRectangleArea(A):
ans = 0
A = [-1] + A
A.append(-1)
n = len(A)
stack = [0] # store index
for i in range(n):
while A[i] < A[stack[-1]]:
h = A[stack.pop()]
area = h*(i-stack[-1]-1)
ans = max(ans, area)
stack.append(i)
return ans
There are three ways to solve this problem in addition to the brute force approach. I will write down all of them. The java codes have passed tests in an online judge site called leetcode: http://www.leetcode.com/onlinejudge#question_84. so I am confident codes are correct.
Solution 1: dynamic programming + n*n matrix as cache
time: O(n^2), space: O(n^2)
Basic idea: use the n*n matrix dp[i][j] to cache the minimal height between bar[i] and bar[j]. Start filling the matrix from rectangles of width 1.
public int solution1(int[] height) {
int n = height.length;
if(n == 0) return 0;
int[][] dp = new int[n][n];
int max = Integer.MIN_VALUE;
for(int width = 1; width <= n; width++){
for(int l = 0; l+width-1 < n; l++){
int r = l + width - 1;
if(width == 1){
dp[l][l] = height[l];
max = Math.max(max, dp[l][l]);
} else {
dp[l][r] = Math.min(dp[l][r-1], height[r]);
max = Math.max(max, dp[l][r] * width);
}
}
}
return max;
}
Solution 2: dynamic programming + 2 arrays as cache.
time: O(n^2), space: O(n)
Basic idea: this solution is like solution 1, but saves some space. The idea is that in solution 1 we build the matrix from row 1 to row n. But in each iteration, only the previous row contributes to the building of the current row. So we use two arrays as previous row and current row by turns.
public int Solution2(int[] height) {
int n = height.length;
if(n == 0) return 0;
int max = Integer.MIN_VALUE;
// dp[0] and dp[1] take turns to be the "previous" line.
int[][] dp = new int[2][n];
for(int width = 1; width <= n; width++){
for(int l = 0; l+width-1 < n; l++){
if(width == 1){
dp[width%2][l] = height[l];
} else {
dp[width%2][l] = Math.min(dp[1-width%2][l], height[l+width-1]);
}
max = Math.max(max, dp[width%2][l] * width);
}
}
return max;
}
Solution 3: use stack.
time: O(n), space:O(n)
This solution is tricky and I learnt how to do this from explanation without graphs and explanation with graphs. I suggest you read the two links before reading my explanation below. It's hard to explain without graphs so my explanations might be hard to follow.
Following are my explanations:
For each bar, we must be able to find the biggest rectangle containing this bar. So the biggest one of these n rectangles is what we want.
To get the biggest rectangle for a certain bar (let's say bar[i], the (i+1)th bar), we just need to find out the biggest interval
that contains this bar. What we know is that all the bars in this interval must be at least the same height with bar[i]. So if we figure out how many
consecutive same-height-or-higher bars are there on the immediate left of bar[i], and how many consecutive same-height-or-higher bars are there on the immediate right of the bar[i], we
will know the length of the interval, which is the width of the biggest rectangle for bar[i].
To count the number of consecutive same-height-or-higher bars on the immediate left of bar[i], we only need to find the closest bar on the left that is shorter
than the bar[i], because all the bars between this bar and bar[i] will be consecutive same-height-or-higher bars.
We use a stack to dynamicly keep track of all the left bars that are shorter than a certain bar. In other words, if we iterate from the first bar to bar[i], when we just arrive at the bar[i] and haven't updated the stack,
the stack should store all the bars that are no higher than bar[i-1], including bar[i-1] itself. We compare bar[i]'s height with every bar in the stack until we find one that is shorter than bar[i], which is the cloest shorter bar.
If the bar[i] is higher than all the bars in the stack, it means all bars on the left of bar[i] are higher than bar[i].
We can do the same thing on the right side of the i-th bar. Then we know for bar[i] how many bars are there in the interval.
public int solution3(int[] height) {
int n = height.length;
if(n == 0) return 0;
Stack<Integer> left = new Stack<Integer>();
Stack<Integer> right = new Stack<Integer>();
int[] width = new int[n];// widths of intervals.
Arrays.fill(width, 1);// all intervals should at least be 1 unit wide.
for(int i = 0; i < n; i++){
// count # of consecutive higher bars on the left of the (i+1)th bar
while(!left.isEmpty() && height[i] <= height[left.peek()]){
// while there are bars stored in the stack, we check the bar on the top of the stack.
left.pop();
}
if(left.isEmpty()){
// all elements on the left are larger than height[i].
width[i] += i;
} else {
// bar[left.peek()] is the closest shorter bar.
width[i] += i - left.peek() - 1;
}
left.push(i);
}
for (int i = n-1; i >=0; i--) {
while(!right.isEmpty() && height[i] <= height[right.peek()]){
right.pop();
}
if(right.isEmpty()){
// all elements to the right are larger than height[i]
width[i] += n - 1 - i;
} else {
width[i] += right.peek() - i - 1;
}
right.push(i);
}
int max = Integer.MIN_VALUE;
for(int i = 0; i < n; i++){
// find the maximum value of all rectangle areas.
max = Math.max(max, width[i] * height[i]);
}
return max;
}
Implementation in Python of the #IVlad's answer O(n) solution:
from collections import namedtuple
Info = namedtuple('Info', 'start height')
def max_rectangle_area(histogram):
"""Find the area of the largest rectangle that fits entirely under
the histogram.
"""
stack = []
top = lambda: stack[-1]
max_area = 0
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_area = max(max_area, top().height*(pos-top().start))
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_area = max(max_area, height*(pos-start))
return max_area
Example:
>>> f = max_rectangle_area
>>> f([5,3,1])
6
>>> f([1,3,5])
6
>>> f([3,1,5])
5
>>> f([4,8,3,2,0])
9
>>> f([4,8,3,1,1,0])
9
Linear search using a stack of incomplete subproblems
Copy-paste algorithm's description (in case the page goes down):
We process the elements in
left-to-right order and maintain a
stack of information about started but
yet unfinished subhistograms. Whenever
a new element arrives it is subjected
to the following rules. If the stack
is empty we open a new subproblem by
pushing the element onto the stack.
Otherwise we compare it to the element
on top of the stack. If the new one is
greater we again push it. If the new
one is equal we skip it. In all these
cases, we continue with the next new
element. If the new one is less, we
finish the topmost subproblem by
updating the maximum area w.r.t. the
element at the top of the stack. Then,
we discard the element at the top, and
repeat the procedure keeping the
current new element. This way, all
subproblems are finished until the
stack becomes empty, or its top
element is less than or equal to the
new element, leading to the actions
described above. If all elements have
been processed, and the stack is not
yet empty, we finish the remaining
subproblems by updating the maximum
area w.r.t. to the elements at the
top.
For the update w.r.t. an element, we
find the largest rectangle that
includes that element. Observe that an
update of the maximum area is carried
out for all elements except for those
skipped. If an element is skipped,
however, it has the same largest
rectangle as the element on top of the
stack at that time that will be
updated later. The height of the
largest rectangle is, of course, the
value of the element. At the time of
the update, we know how far the
largest rectangle extends to the right
of the element, because then, for the
first time, a new element with smaller
height arrived. The information, how
far the largest rectangle extends to
the left of the element, is available
if we store it on the stack, too.
We therefore revise the procedure
described above. If a new element is
pushed immediately, either because the
stack is empty or it is greater than
the top element of the stack, the
largest rectangle containing it
extends to the left no farther than
the current element. If it is pushed
after several elements have been
popped off the stack, because it is
less than these elements, the largest
rectangle containing it extends to the
left as far as that of the most
recently popped element.
Every element is pushed and popped at
most once and in every step of the
procedure at least one element is
pushed or popped. Since the amount of
work for the decisions and the update
is constant, the complexity of the
algorithm is O(n) by amortized
analysis.
The other answers here have done a great job presenting the O(n)-time, O(n)-space solution using two stacks. There's another perspective on this problem that independently provides an O(n)-time, O(n)-space solution to the problem, and might provide a little bit more insight as to why the stack-based solution works.
The key idea is to use a data structure called a Cartesian tree. A Cartesian tree is a binary tree structure (though not a binary search tree) that's built around an input array. Specifically, the root of the Cartesian tree is built above the minimum element of the array, and the left and right subtrees are recursively constructed from the subarrays to the left and right of the minimum value.
For example, here's a sample array and its Cartesian tree:
+----------------------- 23 ------+
| |
+------------- 26 --+ +-- 79
| | |
31 --+ 53 --+ 84
| |
41 --+ 58 -------+
| |
59 +-- 93
|
97
+----+----+----+----+----+----+----+----+----+----+----+
| 31 | 41 | 59 | 26 | 53 | 58 | 97 | 93 | 23 | 84 | 79 |
+----+----+----+----+----+----+----+----+----+----+----+
The reason that Cartesian trees are useful in this problem is that the question at hand has a really nice recursive structure to it. Begin by looking at the lowest rectangle in the histogram. There are three options for where the maximum rectangle could end up being placed:
It could pass right under the minimum value in the histogram. In that case, to make it as large as possible, we'd want to make it as wide as the entire array.
It could be entirely to the left of the minimum value. In that case, we recursively want the answer formed from the subarray purely to the left of the minimum value.
It could be entirely to the right of the minimum value. In that case, we recursively want the answer formed from the subarray purely to the right of the minimum value.
Notice that this recursive structure - find the minimum value, do something with the subarrays to the left and the right of that value - perfectly matches the recursive structure of a Cartesian tree. In fact, if we can create a Cartesian tree for the overall array when we get started, we can then solve this problem by recursively walking the Cartesian tree from the root downward. At each point, we recursively compute the optimal rectangle in the left and right subarrays, along with the rectangle you'd get by fitting right under the minimum value, and then return the best option we find.
In pseudocode, this looks like this:
function largestRectangleUnder(int low, int high, Node root) {
/* Base case: If the range is empty, the biggest rectangle we
* can fit is the empty rectangle.
*/
if (low == high) return 0;
/* Assume the Cartesian tree nodes are annotated with their
* positions in the original array.
*/
return max {
(high - low) * root.value, // Widest rectangle under the minimum
largestRectangleUnder(low, root.index, root.left),
largestRectnagleUnder(root.index + 1, high, root.right)
}
}
Once we have the Cartesian tree, this algorithm takes time O(n), since we visit each node exactly once and do O(1) work per node.
It turns out that there's a simple, linear-time algorithm for building Cartesian trees. The "natural" way you'd probably think to build one would be to scan across the array, find the minimum value, then recursively build a Cartesian tree from the left and right subarrays. The problem is that the process of finding the minimum value is really expensive, and this can take time Θ(n2).
The "fast" way to build a Cartesian tree is by scanning the array from the left to the right, adding in one element at a time. This algorithm is based on the following observations about Cartesian trees:
First, Cartesian trees obey the heap property: every element is less than or equal to its children. The reason for this is that the Cartesian tree root is the smallest value in the overall array, and its children are the smallest elements in their subarrays, etc.
Second, if you do an inorder traversal of a Cartesian tree, you get back the elements of the array in the order in which they appear. To see why this is, notice that if you do an inorder traversal of a Cartesian tree, you first visit everything to the left of the minimum value, then the minimum value, then everything to the right of the minimum value. Those visitations are recursively done the same way, so everything ends up being visited in order.
These two rules give us a lot of information about what happens if we start with a Cartesian tree of the first k elements of the array and want to form a Cartesian tree for the first k+1 elements. That new element will have to end up on the right spine of the Cartesian tree - the part of the tree formed by starting at the root and only taking steps to the right - because otherwise something would come after it in an inorder traversal. And, within that right spine, it has to be placed in a way that makes it bigger than everything above it, since we need to obey the heap property.
The way that you actually add a new node to the Cartesian tree is to start at the rightmost node in the tree and walk upwards until you either hit the root of the tree or find a node that has a smaller value. You then make the new value have as its left child the last node it walked up on top of.
Here's a trace of that algorithm on a small array:
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
2 becomes the root.
2 --+
|
4
4 is bigger than 2, we can't move upwards. Append to right.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
2 ------+
|
--- 3
|
4
3 is lesser than 4, climb over it. Can't climb further over 2, as it is smaller than 3. Climbed over subtree rooted at 4 goes to the left of new value 3 and 3 becomes rightmost node now.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
+---------- 1
|
2 ------+
|
--- 3
|
4
1 climbs over the root 2, the entire tree rooted at 2 is moved to left of 1, and 1 is now the new root - and also the rightmost value.
+---+---+---+---+
| 2 | 4 | 3 | 1 |
+---+---+---+---+
Although this might not seem to run in linear time - wouldn't you potentially end up climbing all the way to the root of the tree over and over and over again? - you can show that this runs in linear time using a clever argument. If you climb up over a node in the right spine during an insertion, that node ends up getting moved off the right spine and therefore can't be rescanned in a future insertion. Therefore, every node is only ever scanned over at most once, so the total work done is linear.
And now the kicker - the standard way that you'd actually implement this approach is by maintaining a stack of the values that correspond to the nodes on the right spine. The act of "walking up" and over a node corresponds to popping a node off the stack. Therefore, the code for building a Cartesian tree looks something like this:
Stack s;
for (each array element x) {
pop s until it's empty or s.top > x
push x onto the stack.
do some sort of pointer rewiring based on what you just did.
}
The stack manipulations here might seem really familiar, and that's because these are the exact stack operations that you would do in the answers shown elsewhere here. In fact, you can think of what those approaches are doing as implicitly building the Cartesian tree and running the recursive algorithm shown above in the process of doing so.
The advantage, I think, of knowing about Cartesian trees is that it provides a really nice conceptual framework for seeing why this algorithm works correctly. If you know that what you're doing is running a recursive walk of a Cartesian tree, it's easier to see that you're guaranteed to find the largest rectangle. Plus, knowing that the Cartesian tree exists gives you a useful tool for solving other problems. Cartesian trees show up in the design of fast data structures for the range minimum query problem and are used to convert suffix arrays into suffix trees.
Here's some Java code that implements this idea, courtesy of #Azeem!
import java.util.Stack;
public class CartesianTreeMakerUtil {
private static class Node {
int val;
Node left;
Node right;
}
public static Node cartesianTreeFor(int[] nums) {
Node root = null;
Stack<Node> s = new Stack<>();
for(int curr : nums) {
Node lastJumpedOver = null;
while(!s.empty() && s.peek().val > curr) {
lastJumpedOver = s.pop();
}
Node currNode = this.new Node();
currNode.val = curr;
if(s.isEmpty()) {
root = currNode;
}
else {
s.peek().right = currNode;
}
currNode.left = lastJumpedOver;
s.push(currNode);
}
return root;
}
public static void printInOrder(Node root) {
if(root == null) return;
if(root.left != null ) {
printInOrder(root.left);
}
System.out.println(root.val);
if(root.right != null) {
printInOrder(root.right);
}
}
public static void main(String[] args) {
int[] nums = new int[args.length];
for (int i = 0; i < args.length; i++) {
nums[i] = Integer.parseInt(args[i]);
}
Node root = cartesianTreeFor(nums);
tester.printInOrder(root);
}
}
The easiest solution in O(N)
long long getMaxArea(long long hist[], long long n)
{
stack<long long> s;
long long max_area = 0;
long long tp;
long long area_with_top;
long long i = 0;
while (i < n)
{
if (s.empty() || hist[s.top()] <= hist[i])
s.push(i++);
else
{
tp = s.top(); // store the top index
s.pop(); // pop the top
area_with_top = hist[tp] * (s.empty() ? i : i - s.top() - 1);
if (max_area < area_with_top)
{
max_area = area_with_top;
}
}
}
while (!s.empty())
{
tp = s.top();
s.pop();
area_with_top = hist[tp] * (s.empty() ? i : i - s.top() - 1);
if (max_area < area_with_top)
max_area = area_with_top;
}
return max_area;
}
There is also another solution using Divide and Conquer. The algorithm for it is :
1) Divide the array into 2 parts with the smallest height as the breaking point
2) The maximum area is the maximum of :
a) Smallest height * size of the array
b) Maximum rectangle in left half array
c) Maximum rectangle in right half array
The time complexity comes to O(nlogn)
The stack solution is one of the most clever solutions I've seen till date. And it can be a little hard to understand why that works.
I've taken a jab at explaining the same in some detail here.
Summary points from the post:-
General way our brain thinks is :-
Create every situation and try to find the value of the contraint that is needed to solve the problem.
And we happily convert that to code as :- find the value of contraint(min) for each situation(pair(i,j))
The clever solutions tries to flip the problem.For each constraint/min value of tha area, what is the best possible left and right extremes ?
So if we traverse over each possible min in the array. What are the left and right extremes for each value ?
Little thought says, the first left most value less than the current min and similarly the first rightmost value that is lesser than the current min.
So now we need to see if we can find a clever way to find the first left and right values lesser than the current value.
To think: If we have traversed the array partially say till min_i, how can the solution to min_i+1 be built?
We need the first value less than min_i to its left.
Inverting the statement : we need to ignore all values to the left of min_i that are greater than min_i. We stop when we find the first value smaller than min_i (i) . The troughs in the curve hence become useless once we have crossed it. In histogram , (2 4 3) => if 3 is min_i, 4 being larger is not of interest.
Corrollary: in a range (i,j). j being the min value we are considering.. all values between j and its left value i are useless. Even for further calculations.
Any histogram on the right with a min value larger than j, will be binded at j. The values of interest on the left form a monotonically increasing sequence with j being the largest value. (Values of interest here being possible values that may be of interest for the later array)
Since, we are travelling from left to right, for each min value/ current value - we do not know whether the right side of the array will have an element smaller than it.
So we have to keep it in memory until we get to know this value is useless. (since a smaller value is found)
All this leads to a usage of our very own stack structure.
We keep on stack until we don't know its useless.
We remove from stack once we know the thing is crap.
So for each min value to find its left smaller value, we do the following:-
pop the elements larger to it (useless values)
The first element smaller than the value is the left extreme. The i to our min.
We can do the same thing from the right side of the array and we will get j to our min.
It's quite hard to explain this, but if this is making sense then I'd suggest read the complete article here since it has more insights and details.
I don't understand the other entries, but I think I know how to do it in O(n) as follows.
A) for each index find the largest rectangle inside the histogram ending at that index where the index column touches the top of the rectangle and remember where the rectangle starts. This can be done in O(n) using a stack based algorithm.
B) Similarly for each index find the largest rectangle starting at that index where the index column touches the top of the rectangle and remember where the rectangle ends. Also O(n) using the same method as (A) but scanning the histogram backwards.
C) For each index combine the results of (A) and (B) to determine the largest rectangle where the column at that index touches the top of the rectangle. O(n) like (A).
D) Since the largest rectangle must be touched by some column of the histogram the largest rectangle is the largest rectangle found in step (C).
The hard part is implementing (A) and (B), which I think is what JF Sebastian may have solved rather than the general problem stated.
I coded this one and felt little better in the sense:
import java.util.Stack;
class StackItem{
public int sup;
public int height;
public int sub;
public StackItem(int a, int b, int c){
sup = a;
height = b;
sub =c;
}
public int getArea(){
return (sup - sub)* height;
}
#Override
public String toString(){
return " from:"+sup+
" to:"+sub+
" height:"+height+
" Area ="+getArea();
}
}
public class MaxRectangleInHistogram {
Stack<StackItem> S;
StackItem curr;
StackItem maxRectangle;
public StackItem getMaxRectangleInHistogram(int A[], int n){
int i = 0;
S = new Stack();
S.push(new StackItem(0,0,-1));
maxRectangle = new StackItem(0,0,-1);
while(i<n){
curr = new StackItem(i,A[i],i);
if(curr.height > S.peek().height){
S.push(curr);
}else if(curr.height == S.peek().height){
S.peek().sup = i+1;
}else if(curr.height < S.peek().height){
while((S.size()>1) && (curr.height<=S.peek().height)){
curr.sub = S.peek().sub;
S.peek().sup = i;
decideMaxRectangle(S.peek());
S.pop();
}
S.push(curr);
}
i++;
}
while(S.size()>1){
S.peek().sup = i;
decideMaxRectangle(S.peek());
S.pop();
}
return maxRectangle;
}
private void decideMaxRectangle(StackItem s){
if(s.getArea() > maxRectangle.getArea() )
maxRectangle = s;
}
}
Just Note:
Time Complexity: T(n) < O(2n) ~ O(n)
Space Complexity S(n) < O(n)
I would like to thank #templatetypedef for his/her extremely detailed and intuitive answer. The Java code below is based on his suggestion to use Cartesian Trees and solves the problem in O(N) time and O(N) space. I suggest that you read #templatetypedef's answer above before reading the code below. The code is given in the format of the solution to the problem at leetcode: https://leetcode.com/problems/largest-rectangle-in-histogram/description/ and passes all 96 test cases.
class Solution {
private class Node {
int val;
Node left;
Node right;
int index;
}
public Node getCartesianTreeFromArray(int [] nums) {
Node root = null;
Stack<Node> s = new Stack<>();
for(int i = 0; i < nums.length; i++) {
int curr = nums[i];
Node lastJumpedOver = null;
while(!s.empty() && s.peek().val >= curr) {
lastJumpedOver = s.pop();
}
Node currNode = this.new Node();
currNode.val = curr;
currNode.index = i;
if(s.isEmpty()) {
root = currNode;
}
else {
s.peek().right = currNode;
}
currNode.left = lastJumpedOver;
s.push(currNode);
}
return root;
}
public int largestRectangleUnder(int low, int high, Node root, int [] nums) {
/* Base case: If the range is empty, the biggest rectangle we
* can fit is the empty rectangle.
*/
if(root == null) return 0;
if (low == high) {
if(0 <= low && low <= nums.length - 1) {
return nums[low];
}
return 0;
}
/* Assume the Cartesian tree nodes are annotated with their
* positions in the original array.
*/
int leftArea = -1 , rightArea= -1;
if(root.left != null) {
leftArea = largestRectangleUnder(low, root.index - 1 , root.left, nums);
}
if(root.right != null) {
rightArea = largestRectangleUnder(root.index + 1, high,root.right, nums);
}
return Math.max((high - low + 1) * root.val,
Math.max(leftArea, rightArea));
}
public int largestRectangleArea(int[] heights) {
if(heights == null || heights.length == 0 ) {
return 0;
}
if(heights.length == 1) {
return heights[0];
}
Node root = getCartesianTreeFromArray(heights);
return largestRectangleUnder(0, heights.length - 1, root, heights);
}
}
python-3
a=[3,4,7,4,6]
a.sort()
r=0
for i in range(len(a)):
if a[i]* (n-1) > r:
r = a[i]*(n-i)
print(r)
output:
16
I come across this question in one of interview. Was trying to solve this, resulting in observed following things -
Need to check consecutive left elements greater than current
element
Need to check consecutive right elements greater than
current element
Calculate area (number of left side max elements + number of right side max elements + 1) * current element
Check and replace existing maxArea if calculated area is greater than
maxArea
Following is the JS code implementing above pseudocode
function maxAreaCovered(arr) {
let maxArea = 0;
for (let index = 0; index < arr.length; index++) {
let l = index - 1;
let r = index + 1;
let maxEleCount = 0
while (l > -1) {
if (arr[l] >= arr[index]) {
maxEleCount++;
} else {
break;
}
l--;
}
while (r < arr.length) {
if (arr[r] >= arr[index]) {
maxEleCount++;
} else {
break;
}
r++;
}
let area = (maxEleCount + 1) * arr[index];
maxArea = Math.max(area, maxArea);
}
return maxArea
}
console.log(maxAreaCovered([6, 2, 5, 4, 5, 1, 6]));
You can use O(n) method which uses stack to calculate the maximum area under the histogram.
long long histogramArea(vector<int> &histo){
stack<int> s;
long long maxArea=0;
long long area= 0;
int i =0;
for (i = 0; i < histo.size();) {
if(s.empty() || histo[s.top()] <= histo[i]){
s.push(i++);
}
else{
int top = s.top(); s.pop();
area= histo[top]* (s.empty()?i:i-s.top()-1);
if(area >maxArea)
maxArea= area;
}
}
while(!s.empty()){
int top = s.top();s.pop();
area= histo[top]* (s.empty()?i:i-s.top()-1);
if(area >maxArea)
maxArea= area;
}
return maxArea;
}
For explanation you can read here http://www.geeksforgeeks.org/largest-rectangle-under-histogram/

Least distance between two values in a large binary tree with duplicate values

Given a binary tree that might contain duplicate values, you need to find minimum distance between two given values. Note that the binary tree can be large.
For example:
5
/ \
1 7
/ \ / \
4 3 8 2
/ \
1 2
The function should return 2 for (1 and 2 as input).
(If duplicates are not present, we can find LCA and then calculate the distance.)
I've written the following code but I couldn't handle cases when the values are present in different subtrees and in the below cases:
root = 1, root.left = 4, root.left.left = 3, root.left.right = 2, root.left.left.left = 1
root = 1, root.left = 4, root.left.left = 3, root.left.left.left = 1, root.left.left.right = 2
void dist(struct node* root,int& min,int n1,int n2,int pos1,int pos2,int level) {
if(!root)
return;
if(root->data==n1){
pos1 = level;
if(pos2>=0)
if(pos1-pos2 < min)
min = pos1-pos2;
}
else if(root->data==n2){
pos2 = level;
if(pos1>=0)
if(pos2-pos1 < min)
min = pos2-pos1;
}
dist(root->left,min,n1,n2,pos1,pos2,level+1);
dist(root->right,min,n1,n2,pos1,pos2,level+1);
}
I think at each node we can find if that node is the LCA of the values or not. If that node is LCA then find the distance and update min accordingly, but this would take O(n2).
Following is an algorithm to solve the problem:-
traverse all of the tree and calculate paths for each node using binary strings representation and store into hash map
eg. For your tree the hashmap will be
1 => 0,000
2 => 001,11
3 => 01
...
When query for distance between (u,v) check for each pair and calculate distance between them. Remove common prefix from strings and then sum the remaining lengths
eg. u=1 and v=2
distance(0,001) = 2
distance(0,11) = 3
distance(000,001) = 2
distance(000,11) = 5
min = 2
Note: I think the second step can be made more efficient but need to do more research
You can compute the LCA of a set of nodes by computing LCA(x1, LCA(x2, LCA(x3... and all the nodes in the set will be somewhere below this LCA. If you compare the LCAs of two sets of nodes and one is not directly beneath the other then the minimum distance between any two nodes in different sets will be at least the distance between the LCAs. If one LCA is above the other then the minimum distance could be zero.
This allows a sort of branch and bound approach. At each point you have a best minimum distance so far (initialized as infinity). Given two sets of nodes, use their LCAs to work out a lower bound on their minimum distance and discard them if this is no better than the best answer so far. If not discarded, split each set into two plus a possible single depending on whether each node in the set is to the left of the LCA, to the right of the LCA, or is the LCA. Recursively check for the minimum distance in the (up to nine) pairs of split sets. If both splits in a pair are below some minimum size, just work out the LCAs and minimum distances of each pair of nodes across the two sets - at this point may find out that you have a new best answer and can update the best answer so far.
Looking at the example at the top of the question, the LCA of the 2s is the root of the tree, and the LCA of the 1s is the highest 1. So the minimum distance between these two sets could be close to zero. Now split each set in two. The left hand 2 is distance two from both of the two 1s. The LCA of the right hand 2 is itself, on the right hand branch of the tree, and the LCA of each of the two 1s is down on the left hand branch of the tree. So the distance between the two is at least two, and we could tell this even if we had a large number of 2s anywhere below the position of the existing right-hand two, and a large number of 1s anywhere on the left hand subtree.
Do a pre-order traversal of the tree (or any traversal should work).
During this process, simply keep track of the closest 1 and 2, and update the distance whenever you find a 2 and the closest 1 is closer than the closest distance so far, or vice versa.
Code (C++, untested first draft): (hardcoded 1 and 2 for simplicity)
int getLeastDistance(Node *n, int *distTo1, int *distTo2)
{
if (n == NULL)
return;
int dist = LARGE_VALUE;
// process current node
if (n->data == 1)
{
dist = *distTo2;
*distTo1 = 0;
}
else if (n->data == 2)
{
dist = *distTo1;
*distTo2 = 0;
}
// go left
int newDistTo1 = *distTo1 + 1,
newDistTo2 = *distTo2 + 1;
dist = min(dist, getLeastDistance(n->left, &newDistTo1, &newDistTo2));
// update distances
*distTo1 = min(*distTo1, newDistTo1 + 1);
*distTo2 = min(*distTo2, newDistTo2 + 1);
// go right
newDistTo1 = *distTo1 + 1;
newDistTo2 = *distTo2 + 1;
dist = min(dist, getLeastDistance(n->right, &newDistTo1, &newDistTo2));
}
Caller:
Node root = ...;
int distTo1 = LARGE_VALUE, distTo2 = LARGE_VALUE;
int dist = getLeastDistance(&root, &distTo1, &distTo2);
Just be sure to make LARGE_VALUE far enough from the maximum value for int such that it won't overflow if incremented (-1 is probably safer, but it requires more complex code).

A fast way to find connected component in a 1-NN graph?

First of all, I got a N*N distance matrix, for each point, I calculated its nearest neighbor, so we had a N*2 matrix, It seems like this:
0 -> 1
1 -> 2
2 -> 3
3 -> 2
4 -> 2
5 -> 6
6 -> 7
7 -> 6
8 -> 6
9 -> 8
the second column was the nearest neighbor's index. So this was a special kind of directed
graph, with each vertex had and only had one out-degree.
Of course, we could first transform the N*2 matrix to a standard graph representation, and perform BFS/DFS to get the connected components.
But, given the characteristic of this special graph, is there any other fast way to do the job ?
I will be really appreciated.
Update:
I've implemented a simple algorithm for this case here.
Look, I did not use a union-find algorithm, because the data structure may make things not that easy, and I doubt whether It's the fastest way in my case(I meant practically).
You could argue that the _merge process could be time consuming, but if we swap the edges into the continuous place while assigning new label, the merging may cost little, but it need another N spaces to trace the original indices.
The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.
This will definitely run in linear time:
- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).
Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:
for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop,
collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets
The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.
You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.
If you want to do it sequencially you can do it using weighted quick union and path compression .Complexity O(N+Mlog(log(N))).check this link .
Here is the pseudocode .honoring #pycho 's words
`
public class QuickUnion
{
private int[] id;
public QuickUnion(int N)
{
id = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
}
public int root(int i)
{
while (i != id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}
public boolean find(int p, int q)
{
return root(p) == root(q);
}
public void unite(int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}
`
#reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
If you want to find connected components parallely, the asymptotic complexity can be reduced to O(log(log(N)) time using pointer jumping and weighted quick union with path compression. Check this link
https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/
Since each node has only one outgoing edge, you can just traverse the graph one edge at a time until you get to a vertex you've already visited. An out-degree of 1 means any further traversal at this point will only take you where you've already been. The traversed vertices in that path are all in the same component.
In your example:
0->1->2->3->2, so [0,1,2,3] is a component
4->2, so update the component to [0,1,2,3,4]
5->6->7->6, so [5,6,7] is a component
8->6, so update the compoent to [5,6,7,8]
9->8, so update the compoent to [5,6,7,8,9]
You can visit each node exactly once, so time is O(n). Space is O(n) since all you need is a component id for each node, and a list of component ids.

Dijkstra's pathfinding algorithm

I'm learning about pathfinding from a book, but I have spotted some weird statement.
For completeness, I will include most of the code, but feel free to skip to the second part (Search())
template <class graph_type >
class Graph_SearchDijkstra
{
private:
//create typedefs for the node and edge types used by the graph
typedef typename graph_type::EdgeType Edge;
typedef typename graph_type::NodeType Node;
private:
const graph_type & m_Graph;
//this vector contains the edges that comprise the shortest path tree -
//a directed sub-tree of the graph that encapsulates the best paths from
//every node on the SPT to the source node.
std::vector<const Edge*> m_ShortestPathTree;
//this is indexed into by node index and holds the total cost of the best
//path found so far to the given node. For example, m_CostToThisNode[5]
//will hold the total cost of all the edges that comprise the best path
//to node 5 found so far in the search (if node 5 is present and has
//been visited of course).
std::vector<double> m_CostToThisNode;
//this is an indexed (by node) vector of "parent" edges leading to nodes
//connected to the SPT but that have not been added to the SPT yet.
std::vector<const Edge*> m_SearchFrontier;
int m_iSource;
int m_iTarget;
void Search();
public:
Graph_SearchDijkstra(const graph_type& graph,
int source,
int target = -1):m_Graph(graph),
m_ShortestPathTree(graph.NumNodes()),
m_SearchFrontier(graph.NumNodes()),
m_CostToThisNode(graph.NumNodes()),
m_iSource(source),
m_iTarget(target)
{
Search();
}
//returns the vector of edges defining the SPT. If a target is given
//in the constructor, then this will be the SPT comprising all the nodes
//examined before the target is found, else it will contain all the nodes
//in the graph.
std::vector<const Edge*> GetAllPaths()const;
//returns a vector of node indexes comprising the shortest path
//from the source to the target. It calculates the path by working
//backward through the SPT from the target node.
std::list<int> GetPathToTarget()const;
//returns the total cost to the target
double GetCostToTarget()const;
};
Search():
template <class graph_type>
void Graph_SearchDijkstra<graph_type>::Search()
{
//create an indexed priority queue that sorts smallest to largest
//(front to back). Note that the maximum number of elements the iPQ
//may contain is NumNodes(). This is because no node can be represented
// on the queue more than once.
IndexedPriorityQLow<double> pq(m_CostToThisNode, m_Graph.NumNodes());
//put the source node on the queue
pq.insert(m_iSource);
//while the queue is not empty
while(!pq.empty())
{
//get the lowest cost node from the queue. Don't forget, the return value
//is a *node index*, not the node itself. This node is the node not already
//on the SPT that is the closest to the source node
int NextClosestNode = pq.Pop();
//move this edge from the search frontier to the shortest path tree
m_ShortestPathTree[NextClosestNode] = m_SearchFrontier[NextClosestNode];
//if the target has been found exit
if (NextClosestNode == m_iTarget) return;
//now to relax the edges. For each edge connected to the next closest node
graph_type::ConstEdgeIterator ConstEdgeItr(m_Graph, NextClosestNode);
for (const Edge* pE=ConstEdgeItr.begin();
!ConstEdgeItr.end();
pE=ConstEdgeItr.next())
{
//the total cost to the node this edge points to is the cost to the
//current node plus the cost of the edge connecting them.
double NewCost = m_CostToThisNode[NextClosestNode] + pE->Cost();
//if this edge has never been on the frontier make a note of the cost
//to reach the node it points to, then add the edge to the frontier
//and the destination node to the PQ.
if (m_SearchFrontier[pE->To()] == 0)
{
m_CostToThisNode[pE->To()] = NewCost;
pq.insert(pE->To());
m_SearchFrontier[pE->To()] = pE;
}
//else test to see if the cost to reach the destination node via the
//current node is cheaper than the cheapest cost found so far. If
//this path is cheaper we assign the new cost to the destination
//node, update its entry in the PQ to reflect the change, and add the
//edge to the frontier
else if ( (NewCost < m_CostToThisNode[pE->To()]) &&
(m_ShortestPathTree[pE->To()] == 0) )
{
m_CostToThisNode[pE->To()] = NewCost;
//because the cost is less than it was previously, the PQ must be
//resorted to account for this.
pq.ChangePriority(pE->To());
m_SearchFrontier[pE->To()] = pE;
}
}
}
}
What I don't get is this part:
//else test to see if the cost to reach the destination node via the
//current node is cheaper than the cheapest cost found so far. If
//this path is cheaper we assign the new cost to the destination
//node, update its entry in the PQ to reflect the change, and add the
//edge to the frontier
else if ( (NewCost < m_CostToThisNode[pE->To()]) &&
(m_ShortestPathTree[pE->To()] == 0) )
if the new cost is lower than the cost already found, then why do we also test if the node has not already been added to the SPT? This seems to beat the purpose of the check?
FYI, in m_ShortestPathTree[pE->To()] == 0 the container is a vector that has a pointer to an edge (or NULL) for each index (the index represents a node)
Imagine the following graph:
S --5-- A --2-- F
\ /
-3 -4
\ /
B
And you want to go from S to F. First, let me tell you the Dijkstra's algorithm assumes there are no loops in the graph with a negative weight. In my example, this loop is S -> B -> A -> S or simpler yet, S -> B -> S
If you have such a loop, you can infinitely loop in it, and your cost to F gets lower and lower. That is why this is not acceptable by Dijkstra's algorithm.
Now, how do you check that? It is as the code you posted does. Every time you want to update the weight of a node, besides checking whether it gets smaller weight, you check if it's not in the accepted list. If it is, then you must have had a negative weight loop. Otherwise, how can you end up going forward and reaching an already accepted node with smaller weight?
Let's follow the algorithm on the example graph (nodes in [] are accepted):
Without the if in question:
Starting Weights: S(0), A(inf), B(inf), F(inf)
- Accept S
New weights: [S(0)], A(5), B(-3), F(inf)
- Accept B
New weights: [S(-3)], A(-7), [B(-3)], F(inf)
- Accept A
New weights: [S(-3)], [A(-7)], [B(-11)], F(-5)
- Accept B again
New weights: [S(-14)], [A(-18)], [B(-11)], F(-5)
- Accept A again
... infinite loop
With the if in question:
Starting Weights: S(0), A(inf), B(inf), F(inf)
- Accept S
New weights: [S(0)], A(5), B(-3), F(inf)
- Accept B (doesn't change S)
New weights: [S(0)], A(-7), [B(-3)], F(inf)
- Accept A (doesn't change S or B
New weights: [S(0)], [A(-7)], [B(-3)], F(-5)
- Accept F

Find all chordless cycles in an undirected graph

How to find all chordless cycles in an undirected graph?
For example, given the graph
0 --- 1
| | \
| | \
4 --- 3 - 2
the algorithm should return 1-2-3 and 0-1-3-4, but never 0-1-2-3-4.
(Note: [1] This question is not the same as small cycle finding in a planar graph because the graph is not necessarily planar. [2] I have read the paper Generating all cycles, chordless cycles, and Hamiltonian cycles with the principle of exclusion but I don't understand what they're doing :). [3] I have tried CYPATH but the program only gives the count, algorithm EnumChordlessPath in readme.txt has significant typos, and the C code is a mess. [4] I am not trying to find an arbitrary set of fundametal cycles. Cycle basis can have chords.)
Assign numbers to nodes from 1 to n.
Pick the node number 1. Call it 'A'.
Enumerate pairs of links coming out of 'A'.
Pick one. Let's call the adjacent nodes 'B' and 'C' with B less than C.
If B and C are connected, then output the cycle ABC, return to step 3 and pick a different pair.
If B and C are not connected:
Enumerate all nodes connected to B. Suppose it's connected to D, E, and F. Create a list of vectors CABD, CABE, CABF. For each of these:
if the last node is connected to any internal node except C and B, discard the vector
if the last node is connected to C, output and discard
if it's not connected to either, create a new list of vectors, appending all nodes to which the last node is connected.
Repeat until you run out of vectors.
Repeat steps 3-5 with all pairs.
Remove node 1 and all links that lead to it. Pick the next node and go back to step 2.
Edit: and you can do away with one nested loop.
This seems to work at the first sight, there may be bugs, but you should get the idea:
void chordless_cycles(int* adjacency, int dim)
{
for(int i=0; i<dim-2; i++)
{
for(int j=i+1; j<dim-1; j++)
{
if(!adjacency[i+j*dim])
continue;
list<vector<int> > candidates;
for(int k=j+1; k<dim; k++)
{
if(!adjacency[i+k*dim])
continue;
if(adjacency[j+k*dim])
{
cout << i+1 << " " << j+1 << " " << k+1 << endl;
continue;
}
vector<int> v;
v.resize(3);
v[0]=j;
v[1]=i;
v[2]=k;
candidates.push_back(v);
}
while(!candidates.empty())
{
vector<int> v = candidates.front();
candidates.pop_front();
int k = v.back();
for(int m=i+1; m<dim; m++)
{
if(find(v.begin(), v.end(), m) != v.end())
continue;
if(!adjacency[m+k*dim])
continue;
bool chord = false;
int n;
for(n=1; n<v.size()-1; n++)
if(adjacency[m+v[n]*dim])
chord = true;
if(chord)
continue;
if(adjacency[m+j*dim])
{
for(n=0; n<v.size(); n++)
cout<<v[n]+1<<" ";
cout<<m+1<<endl;
continue;
}
vector<int> w = v;
w.push_back(m);
candidates.push_back(w);
}
}
}
}
}
#aioobe has a point. Just find all the cycles and then exclude the non-chordless ones. This may be too inefficient, but the search space can be pruned along the way to reduce the inefficiencies. Here is a general algorithm:
void printChordlessCycles( ChordlessCycle path) {
System.out.println( path.toString() );
for( Node n : path.lastNode().neighbors() ) {
if( path.canAdd( n) ) {
path.add( n);
printChordlessCycles( path);
path.remove( n);
}
}
}
Graph g = loadGraph(...);
ChordlessCycle p = new ChordlessCycle();
for( Node n : g.getNodes()) {
p.add(n);
printChordlessCycles( p);
p.remove( n);
}
class ChordlessCycle {
private CountedSet<Node> connected_nodes;
private List<Node> path;
...
public void add( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.increment( neighbor);
}
path.add( n);
}
public void remove( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.decrement( neighbor);
}
path.remove( n);
}
public boolean canAdd( Node n) {
return (connected_nodes.getCount( n) == 0);
}
}
Just a thought:
Let's say you are enumerating cycles on your example graph and you are starting from node 0.
If you do a breadth-first search for each given edge, e.g. 0 - 1, you reach a fork at 1. Then the cycles that reach 0 again first are chordless, and the rest are not and can be eliminated... at least I think this is the case.
Could you use an approach like this? Or is there a counterexample?
How about this. First, reduce the problem to finding all chordless cycles that pass through a given vertex A. Once you've found all of those, you can remove A from the graph, and repeat with another point until there's nothing left.
And how to find all the chordless cycles that pass through vertex A? Reduce this to finding all chordless paths from B to A, given a list of permitted vertices, and search either breadth-first or depth-first. Note that when iterating over the vertices reachable (in one step) from B, when you choose one of them you must remove all of the others from the list of permitted vertices (take special care when B=A, so as not to eliminate three-edge paths).
Find all cycles.
Definition of a chordless cycle is a set of points in which a subset cycle of those points don't exist. So, once you have all cycles problem is simply to eliminate cycles which do have a subset cycle.
For efficiency, for each cycle you find, loop through all existing cycles and verify that it is not a subset of another cycle or vice versa, and if so, eliminate the larger cycle.
Beyond that, only difficulty is figuring out how to write an algorithm that determines if a set is a subset of another.

Resources