Space/Time Complexity of DFS/Backtracking Question - performance

I'm writing a function which calculates the max path from one node to another in an adjacency list graph after going through the graph in DFS/backtracking manner. The path will not have any cycles, but can have the same nodes within a different path. Ex: A->B->C->D and A-C->B->D is valid while A-B->A->C->D is not. To avoid cycles, I can use a visited set to add nodes once discovered and pop later on.
The algorithm must go through every possible path from the starting node to end node as it is possible paths with the same nodes, but different ordering will be valued differently.
I believe the algorithm may be O(n!) considering everything in the graph may be connected, but I'm not too sure. I'm a bit new to graphs, so I'm having a hard time understanding the exact space/time complexity of things.

I can offer to consider the concrete graph (tree), where vertices are numbers but not letters. The problem, you want to solve, supposes applying the Depth First Search algorithm as you have shown in your requirement.
Assume, we have the graph (tree):
Relatively of this graph, the max path from one vertex to another vertex must be calculated.
Here is the solution of your problem.
#include <iostream>
#include <vector>
using namespace std;
const int maximumSize=10;
vector<int> visited(maximumSize, 0), distances(maximumSize, 0);
vector<int> graph[maximumSize];
int vertices, edges;
void showContentVector1D(vector<int>& input)
{
for(int index=0; index<input.size(); ++index)
{
cout<<input[index]<<", ";
}
return;
}
void createGraph()
{
cin>>vertices>>edges;
int vertex0, vertex1;
for(int edge=1; edge<=edges; ++edge)
{
cin>>vertex0>>vertex1;
graph[vertex0].push_back(vertex1);
graph[vertex1].push_back(vertex0);
}
return;
}
void depthFirstSearch(int currentVertex, int previousVertex)
{
if(visited[currentVertex]==1)
{
return;
}
visited[currentVertex]=1;
distances[currentVertex]=0;
for(int nextVertex : graph[currentVertex])
{
if(nextVertex==previousVertex)
{
continue;
}
depthFirstSearch(nextVertex, currentVertex);
distances[currentVertex]=max(distances[currentVertex], distances[nextVertex]+1);
}
return;
}
void solve()
{
createGraph();
depthFirstSearch(1, 0);
cout<<"distances <- ";
showContentVector1D(distances);
cout<<endl;
return;
}
int main()
{
solve();
return 0;
}
Input:
6 5
1 4
2 4
3 4
4 5
5 6
The first line in the input is 6 5, where 6 is the quantity of vertices in a graph and 5 is the quantity of edges in a graph.
1 4, 2 4, 3 4, 4 5, 5 6 are edges of an provided undirected graph.
Output:
distances <- 0, 3, 0, 0, 2, 1, 0, 0, 0, 0,
The number 3 is assigned into the index 1 which corresponds to the vertex 1 of the provided undirected graph (tree). Also in the concrete indices the distances 2 and 1 are assigned. Those indices are 4 and 5 correspondingly. As you can see the maximum distance is 3 from the vertex 1 to the vertex 6.

Related

Time for 2 nodes to collide

We are given a graph of N nodes. (1-N), where each node has exactly 1 directed edge to some node (this node can be the same node).
We need to answer the queries of type : A, B, which asks time required when 2 objects collide if one start at A and other start at B. Both moves 1 hop in 1 sec. If it's not possible for them to collide time would be -1.
Time : from X -> to Y : 1 hop = 1 second.
Constraints :
N, Q <= 10^5 (number of nodes, number of queries).
Example : for given graph
A -> B -> C -> D -> E
^ |
K <- F
Query(A, E) : 3 seconds, as at time t = 3 secs they both will be on node D.
Query(C, D) : -1 seconds, as they will never collide.
What's the optimal way to answer each query?
Brute Force Approach: time - O(Q * N)
Improved solution using binary lifting technique: time - O(Q * log(N))
private static int[] collisionTime(int N, int Q, int[] A, int[][] queries) {
// ancestor matrix : creation time- O(n * log(n))
int M = (int) (Math.ceil(Math.log10(N) / Math.log10(2))) + 1;
int[][] ancestor = new int[N + 1][M];
for(int i = 1; i <= N; i++) {
ancestor[i][0] = A[i]; // 2^0-th ancestor.
}
for(int j = 1; j < M; j++) {
for(int i = 1; i <= N; i++) {
ancestor[i][j] = ancestor[ancestor[i][j-1]][j-1];
}
}
int[] answer = new int[Q];
for(int i = 0; i < Q; i++) {
int u = queries[i][0];
int v = queries[i][1];
answer[i] = timeToCollide(u, v, ancestor);
}
return answer;
}
// using binary lifting: time- O(log(n))
private static int timeToCollide(int u, int v, int[][] ancestor) {
int m = ancestor[0].length;
// edge cases
if(u == v) // already in collision state
return 0;
if(ancestor[u][m-1] != ancestor[v][m-1]) // their top most ancestor is not the same means they cannot collide at all.
return -1;
int t = 0;
for(int j = m - 1; j >=0; j--) {
if(ancestor[u][j] != ancestor[v][j]) {
u = ancestor[u][j];
v = ancestor[v][j];
t += Math.pow(2, j);
}
}
return t + 1;
}
Find all the terminal cycles and designate an arbitrary vertex in each terminal cycle as the cycle root (O(N))
For each vertex, record the length of its terminal cycle, its distance to entry into the terminal cycle, and the distance to the terminal cycle root (O(N)).
Sever the outgoing link from each cycle root. This turns the graph into a forest.
For each query, find the closest (lowest) common ancestor of the two query nodes in this forest.
From the information saved about each query node and the lowest common ancestor, you can figure out the time to collision in constant time.
Step (4), the lowest common ancestor query, is a very well-studied problem. The best algorithms require only linear processing time and provide constant query time, leading to O(N + Q) time for this problem all together.
I believe that the following approach technically achieves O(N+Q) time complexity.
Observations
Subgraphs: The graph is not necessarily contiguous. All graphs consist or one or more disjoint contiguous complete subgraphs, meaning:
No nodes are shared between subgraphs ("disjoint")
All of the nodes in the subgraph are connected ("contiguous")
There are no paths connecting different subgraphs ("complete")
I will hereafter refer to these as the subgraphs of the graph or just "subgraphs". These subgraphs have the following additional properties, which are a consequence of their definition (above) and the types of nodes in the graph (they are all "parent-pointer nodes" with exactly one out-edge/pointer):
All such subgraphs must have exactly one cycle in them (because a cycle is the only way that they can terminate or be closed)
The cycle can be of any length cycle.Len >= 1
Additionally, there may be any number (t >= 0) trees attached to the cycle at their root (base)
All nodes are either in the cycle or in one of these trees (the roots of the trees are in the cycle, but also counted as part of a tree)
Terms:
cycle Length: The number of nodes in a cycle.
cycle Base: An arbitrarily chosen node in the cycle used to measure and distances between two nodes in the same cycle, and also any two nodes in the same subgraph.
tree Base: The base or root node of one of the trees attached to the cycle. As the tree base is also the node that attaches it to the cycle, tree base nodes are counted as being in the cycle (and also part of their tree).
Distance: For a node in the cycle, this is the distance (number of hops) from that node to the cycle Base (zero if it is the cycle Base). For a node in a tree (not counting tree Base nodes, which count as in the cycle), this is the distance from that node to the tree Base node.
Collisions Cases
Trivial
There are many ways or "forms" of collisions possible in a graph, but we can identify two trivial cases upfront:
(A, B) Relation
Collide?
Collision Distance
same node
Yes
0
different subgraphs
No
-1
Obviously, if A and B are the same node, then they trivially collide at distance zero. Likewise, if they are in two different subgraphs, then they can never collide because there are no connections between the subgraphs. For the collision cases that follow I will be assuming that these two cases have already been filtered out so that:
A and B are assumed to be different nodes, and
A and B are assumed to be in the same subgraph
Non-Trivial
The following table covers all of the other, non-trivial, cases of the relation between two nodes.
(A, B) Relation
Collide?
Collision Distance
Notes
same cycle
No
-1
nodes in cycle always stay the same distance apart
A in a tree & B in the cycle (or vice-versa)
if they both arrive at A's treeBase at the same time
-1 OR A.treeDist
if B.cycleDist = (A.treeDist MOD cycle.Len)
A and B are in different trees
if A and B's distance to their cycle.Base is equal MOD cycle.Len
MAX(A.treeDist, B.treeDist)
They meet when the farther one gets to the cycle (tree root)
A & B are in the same tree, but have different treeDist's
If their treeDist's are equal MOD cycle.Len
MAX(A.treeDist, B.treeDist)
They meet when the farther one gets to the cycle (tree root/base)
A & B are in the same tree, and have the same treeDist's
Yes
At their lowest common ancestor (LCA) in the tree
Have to search up the tree
One important principle applied several times above is that two different nodes in a cycle can never collide. This is because when each node follows its path around the cycle, they will always stay the same distance apart, there is no way for one node's path to "catch-up" to another's in the cycle. They can only "collide" if they start out in the cycle at the same node.
The consequences of this are that:
Two different nodes in the cycle can never collide.
A node in a tree can only collide with a node in a cycle, if their total distances to the cycle base are the same Modulo the cycle length (that is the remainder when divided by the cycle length).
This is also true for two nodes in different trees and two nodes in the same tree but with different distances to their tree base.
In all of these cases (from #2 and #3), they will collide when the node that is farthest from its tree Base gets to the cycle (which is also its tree base). This is because nodes in the cycle cannot "catch-up" to each other, so they must always be the same once they are both in the cycle. Thus they always collide when the farther one finally gets to the cycle.
Another important consequence is that every case in both tables above, except for the last one, can be answered in O(1) time, simply by annotating the nodes with so easily determined information:
their Base node (tree or cycle)
their Distance to that base node
the Subgraph they belong to
the Length of their subgraph's Cycle
These can all be easily determined when initializing the graph in O(1) time per node (or O(N) total time).
Approach
Nodes
After the nodes are initially loaded into the graph (MPDGraph object), then I annotate the nodes with the additional information listed above. This process (called "Mapping" in the code) proceeds as follows:
Pick any node
Follow it's path until it "terminates" by reaching a node already in it's path, or a node that was previously mapped
If #2 crossed it's own path, then we've found a new cycle. Designate the node we crossed as the base node of the cycle, and fill-in the mapping properties (cycle, base node, distance, etc.). Unwind our path one node at a time, filling in each node and marking it as InCycle as we go back up the path until we reach the base node again. Now we are ate the base of the tree that our path followed into the cycle, so when we move back to the pervious node in the path we switch to marking it as a tree node, until we return to the first node in our path.
If instead, #2 reached an already mapped node, then we well attach our path to that node and copy its tree/cycle, base etc. information to our current node. Then we will return back up our path as in #3, setting the mapping properties of each node as we go,
If there are any unmapped nodes, pick one and goto #2.
This all takes O(N) time.
Queries
We have a method (called MPDGraph.FindCollision) that given two nodes will apply the rules in the two Collision Cases tables above and return the result. For very case except the last (nodes in same tree and same distance) the distance can be determined in O(1) time by using the mapping properties.
If the two nodes are in the same tree and are also the same distance from their tree base, then they could meet anywhere between them and their common treeBase node. For this case the FindCollision(A,B) method calls the findTreeDistance(A,B) which:
Returns zero if they are the same node.
Otherwise it checks a cache of previously calculated distances to see if it has already been calculated for these two node. If so, then it returns that value.
Otherwise, it calls findTreeDistance passing in the parents of the current two nodes to get their distance, and adds one to that. Then it adds this to the cache and returns the value.
Without this memoization (i.e., the cache) this would take on average apprx. O(log N) for each query of this type. With the memoization it is hard to calculate but I would guess no worse than O(log log N) but for Q counts much larger than N, this will converge to O(1).
This makes the query processing time complexity somewhere between O(Q log log N) and O(Q), and the total time between O(N + Q(log log N)) and O(N + Q).
Code
public static int[] collisionTime(int N, int Q, int[] A, int[,] queries)
{
// create the graph and fill-in the mapping attributes for all nodes
var graph = new MPDGraph(A);
graph.MapAllNodes();
int[] answers = new int[queries.GetLength(0)];
for (int i = 0; i < answers.Length; i++)
{
answers[i] = graph.FindCollision(queries[i, 0], queries[i, 1]);
}
return answers;
}
This utilizes the following classes,
MPDGraph Class:
// MPDG: Mono-Pointing, Directed Graph
// An MPDG is a directed graph where every node has exactly one out-edge.
// (MPDG is my own term, I don't know the real name for these)
class MPDGraph
{
public Node[] Nodes;
Dictionary<(Node, Node), int> cachedDistances = new Dictionary<(Node, Node), int>();
// constructor
public MPDGraph(int[] Pointers)
{
Nodes = new Node[Pointers.Length];
// first fill the array with objects
for (int i = 0; i < Nodes.Length; i++) { Nodes[i] = new Node(i); }
// apply their pointers
for(int i = 0; i < Nodes.Length; i++) { Nodes[i].toNode = Nodes[Pointers[i]]; }
}
// map all of the nodes by filling their mapping properties
public void MapAllNodes()
{
for(int i=0; i<Nodes.Length; i++)
{
if (!Nodes[i].isMapped)
MapPath(Nodes[i], 1);
}
}
// recursively map a path of unmapped nodes, starting from curr
// returns true if curr is in a cycle, false otherwise
public Boolean MapPath(Node curr, int pathNo)
{
Boolean inCycle = false;
curr.pathNo = pathNo;
Node next = curr.toNode;
if (next.IsInPath)
{
// we have found a new cycle
Cycle Cycle = new Cycle(this, next, curr.pathNo - next.pathNo + 1);
curr.Map(Cycle);
return true;
}
else if (next.isMapped)
{
// we are joining an already partially mapped tree
if (next.IsInCycle)
{
// next is a tree-Base, the top of our tree and also in the cycle
curr.Map(next.Cycle, false, next, 1);
}
else
{
// next is a normal tree-node
curr.Map(next.Cycle, false, next.BaseNode, next.Distance + 1);
}
return false;
}
else
{
// continue forward on the path, recurse to the next node
inCycle = MapPath(next, pathNo+1);
if (inCycle)
{
if (next.Cycle.Base == next || next.Distance == 0)
{
//we have returned from the cycleBase, which is also a treeBase
// so, switch from Cycle to Tree
curr.Map(next.Cycle, false, next, 1);
return false;
}
else
{
// still in the cycle
curr.Map(next.Cycle);
}
}
else
{
//returned in tree
curr.Map(next.Cycle, false, next.BaseNode, next.Distance + 1);
}
return inCycle;
}
}
// Given two starting nodes, determine how many steps it takes until their
// paths collide. Returns -1 if they will never collide.
public int FindCollision(int index1, int index2)
{
Node node1 = Nodes[index1];
Node node2 = Nodes[index2];
// eliminate trivial cases
if (node1.Cycle != node2.Cycle)
return -1; // cant collide if they're in different subgraphs
else if (node1 == node2)
return 0; // if they're the same node, then distance = 0
else if (node1.IsInCycle && node2.IsInCycle)
return -1; // different nodes in a cycle never collide
else
{ // they're both in the same subgraph, use math to tell if they collide
// get their distances to the cycle base
int dist1 = node1.Distance + (node1.IsInCycle ? 0 : node1.BaseNode.Distance);
int dist2 = node2.Distance + (node2.IsInCycle ? 0 : node2.BaseNode.Distance);
int cycleLen = node1.Cycle.Length;
// use math: modulo(cycle length)
if ((dist1 % cycleLen) != (dist2 % cycleLen))
{
return -1; // incompatible distances: cannot possibly collide
}
else
{
// they must collide somewhere, figure out how far that is
if (node1.IsInCycle || node2.IsInCycle)
{
// if one is in the cycle, they will collide when
// the other one reaches the cycle (it's treeBase)
return (!node1.IsInCycle ? node1.Distance : node2.Distance);
}
else if (node1.BaseNode != node2.BaseNode)
{
// They are in different trees: they will collide at
// the treeBase of the node that is farther
return Math.Max(node1.Distance, node2.Distance);
}
else
{
// They are in the same tree:
if (node1.Distance != node2.Distance)
{
//if they are in the same tree, but have different distances
// to the treeBase, then they will collide at the treeBase
// when the farther one arrives at the treeBase
return Math.Max(node1.Distance, node2.Distance);
}
else
{
// the hard case, have to walk down their paths
// to find their LCA (Lowest Common Ancestor)
return findTreeDistance(node1, node2);
}
}
}
}
}
int findTreeDistance(Node node1, Node node2)
{
if (node1 == node2) return 0;
// normalize their order
if (node1.index > node2.index)
{
var tmp = node1;
node1 = node2;
node2 = tmp;
}
// check the cache
int dist;
if (cachedDistances.ContainsKey((node1,node2)))
{
dist = cachedDistances[(node1, node2)];
}
else
{
// keep recursing to find where they meet
dist = findTreeDistance(node1.toNode, node2.toNode) + 1;
// cache this new distance
cachedDistances.Add((node1, node2), dist);
}
return dist;
}
}
Node Class:
// Represents a node in the MPDG (Mono-Pointing Directed Graph) with the constraint
// that all nodes have exactly one out-edge ("toNode").
class Node
{
// Primary properties (from input)
public int index { get; set; } // this node's unique index (to the original array)
public Node toNode { get; set; } // what our single out-edge is pointing to
public Node(int Index_) { this.index = Index_; }
// Mapping properties
// (these must be filled-in to finish mapping the node)
// The unique cycle of this node's subgraph (all MPDG-subgraphs have exactly one)
public Cycle Cycle;
// Every node is either in their subgraphs cycle or in one of the inverted
// trees whose apex (base) is attached to it. Only valid when BaseNode is set.
// (tree base nodes are counted as being in the cycle)
public Boolean IsInCycle = false;
// The base node of the tree or cycle that this node is in.
// If (IsInCycle) then it points to this cycle's base node (cycleBase)
// Else it points to base node of this node's tree (treeBase)
public Node BaseNode;
// The distance (hops) from this node to the BaseNode
public int Distance = -1; // -1 if not yet mapped
// Total distance from this node to the cycleBase
public int TotalDistance { get { return Distance + (IsInCycle ? 0 : BaseNode.Distance); } }
// housekeeping for mapping nodes
public int pathNo = -1; // order in our working path
// Derived (implicit) properties
public Boolean isMapped { get { return Cycle != null; } }
public Boolean IsInPath { get { return (pathNo >= 0); } }
public void Map(Cycle Cycle, bool InCycle = true, Node BaseNode = null, int distance_ = -1)
{
this.Cycle = Cycle;
IsInCycle = InCycle;
if (InCycle)
{
this.BaseNode = Cycle.Base;
Distance = (Cycle.Length + (Cycle.Base.pathNo - pathNo)) % Cycle.Length;
}
else
{
this.BaseNode = BaseNode;
Distance = distance_;
}
pathNo = -1; // clean-up the path once we're done
}
}
Cycle Class:
// represents the cycle of a unique MPDG-subgraph
// (should have one of these for each subgraph)
class Cycle
{
public MPDGraph Graph; // the MPDG that contains this cycle
public Node Base; // the base node of a subgraph's cycle
public int Length; // the length of this cycle
public Cycle(MPDGraph graph_, Node base_, int length_)
{
Graph = graph_;
Base = base_;
Length = length_;
}
}
Performance Measurements:
Node Count
Build & Map Graphmean microsecs
Question Count
All Questions mean microsecs
Question mean microseconds
Total mean microseconds
50
0.9
1225
26
0.0212
26.9
500
10.1
124750
2267
0.0182
2277.1
1000
23.4
499500
8720
0.0175
8743.4
5000
159.6
12497500
229000
0.0183
229159.6
10000
345.3
49995000
793212
0.0159
793557.3
It is only possible for a collision to occur on a node that has more than 1 link leading to it. Node D in your example.
Let's call these nodes "crash sites"
So you can prune your graph down to just the crash site nodes. The nodes that lead to the crash site nodes become attributes of the crash site nodes.
Like this for your example:
D : { A,B,C }, { E,F,K }
A collision can ONLY occur if the starting nodes are on two different attribute lists of the same crash site node.
Once you are sure a crash can occurr, then you can check that both starting nodes are the same distance from the crash site.
Algorithm:
Prune graph to crash site nodes
LOOP over questions
Get 2 start nodes
LOOP over crash sites
IF start nodes on two different attributes of crash site
IF start nodes are equi-distant from crash site
report crash time
BREAK from crash site loop
Here is a randomly generated graph with 50 nodes where every node has one out edge connected to another node chosen randomly
The collision sites are
5 7 8 9 10 11 18 19 23 25 31 33 36 37 39
So the algorithm need only loop over 15 nodes, at most, instead of 50.
The answer to the question 'do two particular nodes collide?' is almost always 'NO'. It is kind of boring that way. So let's ask a slightly different question: 'for a particular graph, which pairs of nodes result in a collision?' This requires the same algorithm ( applied to every pair of nodes ) but alway produces an interesting answer.
for this graph
I get this answer
0 and 29 collide at 41
1 and 5 collide at 40
2 and 23 collide at 13
8 and 16 collide at 34
8 and 22 collide at 34
8 and 39 collide at 34
9 and 30 collide at 37
10 and 31 collide at 25
11 and 47 collide at 23
12 and 28 collide at 25
12 and 35 collide at 25
12 and 49 collide at 25
13 and 38 collide at 27
14 and 44 collide at 1
15 and 17 collide at 0
15 and 18 collide at 0
15 and 37 collide at 0
16 and 22 collide at 34
16 and 39 collide at 34
17 and 18 collide at 0
17 and 37 collide at 0
18 and 37 collide at 0
20 and 26 collide at 9
20 and 42 collide at 9
20 and 43 collide at 9
21 and 45 collide at 24
22 and 39 collide at 34
25 and 34 collide at 3
26 and 42 collide at 9
26 and 43 collide at 9
28 and 35 collide at 25
28 and 49 collide at 25
32 and 48 collide at 34
33 and 36 collide at 7
35 and 49 collide at 25
42 and 43 collide at 9
Some timing results
Node Count
Crash Sitesmillisecs
Question Count
Question meanmicrosecs
50
0.4
1225
0.02
500
50
124750
0.02
5000
5500
~12M
0.02
10000
30000
~50M
0.03
30000
181000
~450M
0.6
Notes:
The mean time for a question is the average of checking every possible pair of nodes for a possible collision.
Answering a single question is extremely fast, about 20 nanoseconds for moderately sized graphs ( < 10,000 nodes ) [ A previous timing report included outputting the results when a collision was found, which takes much longer than detecting the collision. These results were taken with all output to the console commented out. ]
Setting up the crash sites and their tributaries gets slow with moderately sized graphs ( > 5,000 nodes ). It is only worth doing if a lot of questions are going to be asked.
The code for this is available at https://github.com/JamesBremner/PathFinder

Back edges in a graph

I'm having a hard time understanding Tarjan's algorithm for articulation points. I'm currently following this tutorial here: https://www.hackerearth.com/practice/algorithms/graphs/articulation-points-and-bridges/tutorial/. What I really can't see, and couldn't see in any other tutorial, is what exactly a "back edge" means. Considering the graph given there, I know 3-1 and 4-2 are back edges, but are 2-1, 3-2, and 4-3 back edges too? Thank you.
...a Back Edge is an edge that connects a vertex to a vertex that is discovered before it's parent.
from your source.
Think about it like this: When you apply a DFS on a graph you fix some path that the algorithm chooses. Now in the given case: 0->1->2->3->4. As in the article mentioned, the source graph contains the edges 4-2 and 3-1. When the DFS reaches 3 it could choose 1 but 1 is already in your path so it is a back edge and therefore, as mentioned in the source, a possible alternative path.
Addressing your second question: Are 2-1, 3-2, and 4-3 back edges too? For a different path they can be. Suppose your DFS chooses 0->1->3->2->4 then 2-1 and 4-3 are back edges.
Consider the following (directed) graph traversal with DFS. Here the colors of the nodes represent the following:
The floral-white nodes are the ones that are yet to be visited
The gray nodes are the nodes that are visited and on stack
The black nodes are the ones that are popped from the stack.
Notice that when the node 13 discovers the node 0 through the edge 13->0 the node 0 is still on the stack. Here, 13->0 is a back edge and it denotes the existence of a cycle (the triangle 0->1->13).
In essence, when you do a DFS, if there are cycles in your graph between nodes A, B and C and you have discovered the edges A-B, later you discover the edge B-C, then, since you have reached node C, you will discover the edge C-A, but you need to ignore this path in your search to avoid infinite loops. So, in your search A-B and B-C were not back edges, but C-A is a back edge, since this edge forms a cycle back to an already visited node.
From article mentioned:
Given a DFS tree of a graph, a Back Edge is an edge that connects a
vertex to a vertex that is discovered before it's parent.
2-1, 3-2, 4-3 are not "Back edge" because they link the vertices with their parents in DFS tree.
Here is the code for a better understand:
#include<bits/stdc++.h>
using namespace std;
struct vertex{
int node;
int start;
int finish;
int color;
int parent;
};
int WHITE=0, BLACK=1, GREY=2;
vector<int> adjList[8];
int num_of_verts = 8;
struct vertex vertices[8];
int t=0;
bool DFS_visit(int u){
bool cycleExists = false;
vertices[u].color=GREY;
t++;
vertices[u].start= t;
for( int i=0; adjList[u][i]!=-1; i++){
if( vertices[adjList[u][i]].color == WHITE ){
if(!cycleExists) cycleExists = DFS_visit(adjList[u][i]);
else DFS_visit(adjList[u][i]);
}
else {
cout << "Cycle detected at edge - ("<<u<<", "<<adjList[u][i]<<")"<<endl;
cycleExists = true;
}
}
vertices[u].color=BLACK;
t++;
vertices[u].finish= t;
return cycleExists;
}
void DFS(){
for(int i=0;i<num_of_verts;i++){
vertices[i].color=WHITE;
vertices[i].parent=NULL;
}
t=0;
for(int i=0;i<num_of_verts;i++){
if(vertices[i].color==WHITE){
cout << "Traversing component "<<i<<"-"<<endl;
bool cycle = DFS_visit(i);
cycle==1? cout<<"Cycle Exists\n\n":cout <<"Cycle does not exist\n\n";
}
}
}
int main(){
adjList[0] = {4, -1};
adjList[1] = {0, 5, -1};
adjList[2] = {1, 5, -1};
adjList[3] = {6, 7, -1};
adjList[4] = {1, -1};
adjList[5] = {-1};
adjList[6] = {2, 5, -1};
adjList[7] = {3, 6, -1};
DFS();
return 0;
}

Create 2 pillars of equal height from an array of bricks

Problem Statement:
There are N bricks (a1, a2, ...., aN). Each brick has length L1, L2, ...., LN). Make 2 highest parallel pillars (same length pillars) using the bricks provided.
Constraints:
There are N bricks. 5<=N<=50
Length of each brick. 1<=L<=1000
Sum of the bricks lengths <= 1000
Length of the bricks is not given in size order. There may be multiple bricks which may have the same length. Not all bricks have to be used to create the pillars.
Example:
1st Example-
N = 5
2, 3, 4, 1, 6
Possible Sets:
(2, 6) and (3, 4, 1)
Answer: 8
My Approach:
Finding the maximum possible length of the 2 parallel pillars ie. floor(N/2). Then, using DP to find all the sum lengths that are possible using all the bricks. Starting with the highest possible sum possible <= floor(N/2), I take a single subset of elements that forms the sum. Then, again repeating the DP approach to find if the same sum can be formed using the remaining elements. If it can be formed, then the output is the highest possible sum, else using the 1st DP, check the next highest possible sum that can be formed and then, again iterate the whole process.
The problem with the above approach is that it checks for only one subset of elements to form the required sum. All possible subsets that can form the required sum should be checked and then for each of these subsets, using the remaining elements it should be checked if the same required sum can be formed. The trouble is the implementation of this in my current approach.
2nd Example-
N = 6
3, 2, 6, 4, 7, 1
Possible Sets:
(3, 2, 6) and (7, 4)
Answer: 11
The problem in my code might come in this case depending on the order in which elements (brick lengths) are given as input. It might be possible that the first set is formed using the elements (3, 7, 1) = 11 but the second set (2, 6, 4) cannot form sum = 11. Hence, my code starts to find the next possible maximum sum .ie. 10 which is wrong.
Can someone suggest better approaches or possible improvements in my current approach.
I think you can solve this with dynamic programming where for each pair (x, y) you work out whether it is possible to build pillars of length x and y using different bricks from the bricks considered so far.
Consider each brick in turn. At the start only (0, 0) is possible. When you see a brick of length L then for every possible pillar (x, y) there are three descendants - (x, y) (ignore the brick), (x + L, y) (use the brick on the first pillar), and (x, y + L) (use the brick on the second pillar).
So after you have considered all the bricks you will have a long list of possible pairs and you simply choose the pair which has two pillars of the same length and as long as possible. This will be practical as long as there are never too many different pairs (you can remove any duplicates from the list).
Assuming that the brick lengths are integers and the maximum pillar length is 1000 there are only 1001 * 1001 possible pairs, so this should be practical, and in fact it is probably easiest if you store pairs by having an array of size [1001, 1001] and setting entries [x, y] to 1 if pair (x, y) is possible and 0 otherwise.
For the first few steps of the example the reachable states are
(0,0) considering nothing
(0,3) (3,0) (0,0) considering 3
(0,5) (2,3) (0,3) (5,0) (3,0) (0,2) (2,0) (0,0) considering 3 and 2
The number of reachable states grows very fast at first but since we are only considering values from 0..1000 and we only care about whether an array is reachable or not we can maintain them using a boolean array of dimension 1001x1001
Recently I tried to solve this problem while preparing for Samsung Competency Test. I build the solution and this might help you guys for your practice. Thanks to https://stackoverflow.com/users/240457/mcdowella that I am able to solve this using his strategy.
public class Pillars {
public static int h=0,mh=0;
public static void main(String[] args) {
java.util.Scanner scan = new java.util.Scanner (System.in);
int n = scan.nextInt();
int a[]=new int[n];
for(int i=0;i<n;i++)
{
a[i]=scan.nextInt();
mh+=a[i];
}
mh/=2;// the height of the two pillars is the total, then single pillar can't be more than half of total
maxHeight(0,0,a,0);
// if no pillars can be built using the data, this print statement is executed
System.out.println("Maximum Height Formed with the given Data "+h);
}
public static void maxHeight(int x,int y,int a[],int i)
{
if(x==y && x!=0 && x>h)// whether the heights formed are equal or not
{
h=x;
if(h==mh) // if equal then print and exit the program
{
System.out.println("Maximum Height Formed with the given Data "+h);
System.exit(0);
}
}
if(i<a.length )
{
maxHeight(x+a[i],y,a,i+1);
maxHeight(x,y+a[i],a,i+1);
}
}
}
Well, this question can be solved simply using recursion. But it will not be efficient for large values for n. Here is my code
#include <iostream>
using namespace std;
void solve(int a[], int vis[], int p1, int p2, int n, int &ans){
if(p1 == p2){
if(p1 > ans){
ans = p1;
}
}
for(int i=0 ; i<n ; ++i){
if(vis[i] == 0){
vis[i] = 1;
solve(a, vis, p1 + a[i], p2, n, ans);
solve(a, vis, p1, p2 + a[i], n, ans);
vis[i] = 0;
}
}
}
int main(){
int n;
cin>>n;
int a[n];
for(int i=0 ; i<n ; ++i){
cin>>a[i];
}
int vis[n] = {0};
int ans = -1;
solve(a,vis,0,0,n,ans);
cout<<ans;
return 0;
}

How to solve weighted Activity selection with use of Segment Trees and Binary search?

Given N jobs where every job is represented by following three elements of it.
1) Start Time
2) Finish Time.
3) Profit or Value Associated.
Find the maximum profit subset of jobs such that no two jobs in the subset overlap.
I know a dynamic programming solution which has a complexity of O(N^2) (close to LIS where we have to just check the previous elements with which we can merge the current interval and take the interval whose merging gives maximum till the i th element ).This solution can be further improved to O(N*log N ) using Binary search and simple sorting!
But my friend was telling me that it can be even solved by using Segment Trees and binary search! I have no clue as to where I am going to use Segment Tree and how .??
Can you help?
On request,sorry not commented
What I am doing is sorting on the basis of the starting index, storing the maximum obtainable value till i at DP[i] by merging previous intervals and their maximum obtainable value !
void solve()
{
int n,i,j,k,high;
scanf("%d",&n);
pair < pair < int ,int>, int > arr[n+1];// first pair represents l,r and int alone shows cost
int dp[n+1];
memset(dp,0,sizeof(dp));
for(i=0;i<n;i++)
scanf("%d%d%d",&arr[i].first.first,&arr[i].first.second,&arr[i].second);
std::sort(arr,arr+n); // by default sorting on the basis of starting index
for(i=0;i<n;i++)
{
high=arr[i].second;
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high , dp[j]+arr[i].second);
}
dp[i]=high;
}
for(i=0;i<n;i++)
dp[n-1]=std::max(dp[n-1],dp[i]);
printf("%d\n",dp[n-1]);
}
int main()
{solve();return 0;}
EDIT:
My working code finally took me 3 hours to debug it though! Morover this code is slower than the binary search and sorting one due to a larger constant and bad implementation :P (just for reference)
#include<stdio.h>
#include<algorithm>
#include<vector>
#include<cstring>
#include<iostream>
#include<climits>
#define lc(idx) (2*idx+1)
#define rc(idx) (2*idx+2)
#define mid(l,r) ((l+r)/2)
using namespace std;
int Tree[4*2*10000-1];
void update(int L,int R,int qe,int idx,int value)
{
if(value>Tree[0])
Tree[0]=value;
while(L<R)
{
if(qe<= mid(L,R))
{
idx=lc(idx);
R=mid(L,R);
}
else
{
idx=rc(idx);
L=mid(L,R)+1;
}
if(value>Tree[idx])
Tree[idx]=value;
}
return ;
}
int Get(int L,int R,int idx,int q)
{
if(q<L )
return 0;
if(R<=q)
return Tree[idx];
return max(Get(L,mid(L,R),lc(idx),q),Get(mid(L,R)+1,R,rc(idx),q));
}
bool cmp(pair < pair < int , int > , int > A,pair < pair < int , int > , int > B)
{
return A.first.second< B.first.second;
}
int main()
{
int N,i;
scanf("%d",&N);
pair < pair < int , int > , int > P[N];
vector < int > V;
for(i=0;i<N;i++)
{
scanf("%d%d%d",&P[i].first.first,&P[i].first.second,&P[i].second);
V.push_back(P[i].first.first);
V.push_back(P[i].first.second);
}
sort(V.begin(),V.end());
for(i=0;i<N;i++)
{
int &l=P[i].first.first,&r=P[i].first.second;
l=lower_bound(V.begin(),V.end(),l)-V.begin();
r=lower_bound(V.begin(),V.end(),r)-V.begin();
}
sort(P,P+N,cmp);
int ans=0;
memset(Tree,0,sizeof(Tree));
for(i=0;i<N;i++)
{
int aux=Get(0,2*N-1,0,P[i].first.first)+P[i].second;
if(aux>ans)
ans=aux;
update(0,2*N-1,P[i].first.second,0,ans);
}
printf("%d\n",ans);
return 0;
}
high=arr[i].second;
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high, dp[j]+arr[i].second);
}
dp[i]=high;
This can be done in O(log n) with a segment tree.
First of all, let's rewrite it a bit. The max you are taking is a bit complicated, because it takes the maximum of a sum involving both i and j. But i is constant in this part, so let's take it out.
high=dp[0];
for(j=1;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high, dp[j]);
}
dp[i]=high + arr[i].second;
Great, now we have reduced the problem to determining the maximum in [0, i - 1] out of the values that satisfy your if condition.
If we didn't have the if, it would be a simple application of segment trees.
Now there are two choices.
1. Deal with O(log V) query time and O(V) memory for the segment tree
Where V is the maximum size of an interval's endpoint.
You can build a segment tree to which you insert interval start points as you move your i. Then you query over the range of values. Something like this, where the segment tree is initialized to -infinity and of size O(V).
Update(node, index, value):
if node.associated_interval == [index, index]:
node.max = value
return
if index in node.left.associated_interval:
Update(node.left, index, value)
else:
Update(node.right, index, value)
node.max = max(node.left.max, node.right.max)
Query(node, left, right):
if [left, right] does not intersect node.associated_interval:
return -infinity
if node.associated_interval included in [left, right]:
return node.max
return max(Query(node.left, left, right),
Query(node.right, left, right))
[...]
high=Query(tree, 0, arr[i].first.first)
dp[i]=high + arr[i].second;
Update(tree, arr[i].first.first, dp[i])
2. Reducing to O(log n) query time and O(n) memory for the segment tree
Since the number of intervals might be significantly less than their length, it's reasonable to think that we might be able to encode them better somehow, so that their length is also O(n). Indeed, we can.
This involves normalizing your intervals in the range [1, 2*n]. Consider the following intervals
8 100
3 50
90 92
Let's plot them on a line. They'd look like this:
3 8 50 90 92 100
Now replace each of them with their index:
1 2 3 4 5 6
3 8 50 90 92 100
And write your new intervals:
2 6
1 3
4 5
Note that they retain the properties of your initial intervals: the same ones overlap, the same ones are included in each other etc.
This can be done with a sort. You can now apply the same segment tree algorithm, except you declare the segment tree for the size 2*n.

Find all chordless cycles in an undirected graph

How to find all chordless cycles in an undirected graph?
For example, given the graph
0 --- 1
| | \
| | \
4 --- 3 - 2
the algorithm should return 1-2-3 and 0-1-3-4, but never 0-1-2-3-4.
(Note: [1] This question is not the same as small cycle finding in a planar graph because the graph is not necessarily planar. [2] I have read the paper Generating all cycles, chordless cycles, and Hamiltonian cycles with the principle of exclusion but I don't understand what they're doing :). [3] I have tried CYPATH but the program only gives the count, algorithm EnumChordlessPath in readme.txt has significant typos, and the C code is a mess. [4] I am not trying to find an arbitrary set of fundametal cycles. Cycle basis can have chords.)
Assign numbers to nodes from 1 to n.
Pick the node number 1. Call it 'A'.
Enumerate pairs of links coming out of 'A'.
Pick one. Let's call the adjacent nodes 'B' and 'C' with B less than C.
If B and C are connected, then output the cycle ABC, return to step 3 and pick a different pair.
If B and C are not connected:
Enumerate all nodes connected to B. Suppose it's connected to D, E, and F. Create a list of vectors CABD, CABE, CABF. For each of these:
if the last node is connected to any internal node except C and B, discard the vector
if the last node is connected to C, output and discard
if it's not connected to either, create a new list of vectors, appending all nodes to which the last node is connected.
Repeat until you run out of vectors.
Repeat steps 3-5 with all pairs.
Remove node 1 and all links that lead to it. Pick the next node and go back to step 2.
Edit: and you can do away with one nested loop.
This seems to work at the first sight, there may be bugs, but you should get the idea:
void chordless_cycles(int* adjacency, int dim)
{
for(int i=0; i<dim-2; i++)
{
for(int j=i+1; j<dim-1; j++)
{
if(!adjacency[i+j*dim])
continue;
list<vector<int> > candidates;
for(int k=j+1; k<dim; k++)
{
if(!adjacency[i+k*dim])
continue;
if(adjacency[j+k*dim])
{
cout << i+1 << " " << j+1 << " " << k+1 << endl;
continue;
}
vector<int> v;
v.resize(3);
v[0]=j;
v[1]=i;
v[2]=k;
candidates.push_back(v);
}
while(!candidates.empty())
{
vector<int> v = candidates.front();
candidates.pop_front();
int k = v.back();
for(int m=i+1; m<dim; m++)
{
if(find(v.begin(), v.end(), m) != v.end())
continue;
if(!adjacency[m+k*dim])
continue;
bool chord = false;
int n;
for(n=1; n<v.size()-1; n++)
if(adjacency[m+v[n]*dim])
chord = true;
if(chord)
continue;
if(adjacency[m+j*dim])
{
for(n=0; n<v.size(); n++)
cout<<v[n]+1<<" ";
cout<<m+1<<endl;
continue;
}
vector<int> w = v;
w.push_back(m);
candidates.push_back(w);
}
}
}
}
}
#aioobe has a point. Just find all the cycles and then exclude the non-chordless ones. This may be too inefficient, but the search space can be pruned along the way to reduce the inefficiencies. Here is a general algorithm:
void printChordlessCycles( ChordlessCycle path) {
System.out.println( path.toString() );
for( Node n : path.lastNode().neighbors() ) {
if( path.canAdd( n) ) {
path.add( n);
printChordlessCycles( path);
path.remove( n);
}
}
}
Graph g = loadGraph(...);
ChordlessCycle p = new ChordlessCycle();
for( Node n : g.getNodes()) {
p.add(n);
printChordlessCycles( p);
p.remove( n);
}
class ChordlessCycle {
private CountedSet<Node> connected_nodes;
private List<Node> path;
...
public void add( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.increment( neighbor);
}
path.add( n);
}
public void remove( Node n) {
for( Node neighbor : n.getNeighbors() ) {
connected_nodes.decrement( neighbor);
}
path.remove( n);
}
public boolean canAdd( Node n) {
return (connected_nodes.getCount( n) == 0);
}
}
Just a thought:
Let's say you are enumerating cycles on your example graph and you are starting from node 0.
If you do a breadth-first search for each given edge, e.g. 0 - 1, you reach a fork at 1. Then the cycles that reach 0 again first are chordless, and the rest are not and can be eliminated... at least I think this is the case.
Could you use an approach like this? Or is there a counterexample?
How about this. First, reduce the problem to finding all chordless cycles that pass through a given vertex A. Once you've found all of those, you can remove A from the graph, and repeat with another point until there's nothing left.
And how to find all the chordless cycles that pass through vertex A? Reduce this to finding all chordless paths from B to A, given a list of permitted vertices, and search either breadth-first or depth-first. Note that when iterating over the vertices reachable (in one step) from B, when you choose one of them you must remove all of the others from the list of permitted vertices (take special care when B=A, so as not to eliminate three-edge paths).
Find all cycles.
Definition of a chordless cycle is a set of points in which a subset cycle of those points don't exist. So, once you have all cycles problem is simply to eliminate cycles which do have a subset cycle.
For efficiency, for each cycle you find, loop through all existing cycles and verify that it is not a subset of another cycle or vice versa, and if so, eliminate the larger cycle.
Beyond that, only difficulty is figuring out how to write an algorithm that determines if a set is a subset of another.

Resources