Implementing Graph Program - algorithm

I was trying to implement graph via adjacency matrix using arrays.
But I stumbled upon an issue
My graph has vertices like 5,7,3,6 (they are not in order to map to array index).
I know to implement then is they are same as array indices.
I thought to make another lookup array, with vertices and array indices.
But that will make it have high time complexity.
I tried searching on Google too, but all I found was for array indices.
Any suggestion will be helpful adjacency list or matrix

I can think of two straight-forward options:
1. Vertex Tagging
For each vertex you read, identify it with a tag. In your case:
vertex | tag
-------|-----
5 | 0
7 | 1
3 | 2
6 | 3
In order to have a good performance doing this, you should map the vertices with a unordered_map (or hash map) in direct and inverse order, so that when you are given a vertex, you can get its tag in O(1) with forward[vertex] and when you are given a tag, you can find its vertex in O(1) with backwards[tag]:
int tag = 0;
unordered_map<int, int> forward, backwards;
for(int v : vertices){
forward[v] = tag;
backwards[tag] = v;
tag++;
}
Lastly, represent your graph as an adjacency list just as you know, using the tags, that are 0, 1, ..., n - 1.
2. Direct Mapping
This is easier. Instead of using an array for your vertices, use an unordered_map that maps a vertex value to its adjacency list:
unordered_map<int, vector<int>> G;
for(int v : vertices){
G[v] = vector<int>();
vector<int> &adj = G[v];
//add all nodes adjacent to v in adj
}
If you want to iterate through the adjacency list of 5, just do:
vector<int> &adjList = G[5];
for(int v : adjList){
//stuff
}

Related

How to search for the largest subset where every pair meets criteria?

I hope this isn't more of a statistics question...
Suppose I have an interface:
public interface PairValidatable<T>
{
public boolean isValidWith(T);
}
Now if I have a large array of PairValidatables, how do I find the largest subset of that array where every pair passes the isValidWith test?
To clarify, if there are three entries in a subset, then elements 0 and 1 should pass isValidWith, elements 1 and 2 should pass isValidWith, and elements 0 and 2 should pass isValidWith.
Example,
public class Point implements PairValidatable<Point>
{
int x;
int y;
public Point(int xIn, int yIn)
{
x = xIn;
y = yIn;
}
public boolean isValidWith(Point other)
{
//whichever has the greater x must have the lesser (or equal) y
return x > other.x != y > other.y;
}
}
The intuitive idea is to keep a vector of Points, add array element 0, and compare each remaining array element to the vector if it passes the validation with every element in the vector, adding it to the vector if so... but the problem is that element 0 might be very restrictive. For example,
Point[] arr = new Point[5];
arr[0] = new Point(1000, 1000);
arr[1] = new Point(10, 10);
arr[2] = new Point(15, 7);
arr[3] = new Point(3, 6);
arr[4] = new Point(18, 6);
Iterating through as above would give us a subset containing only element 0, but the subset of elements 1, 2 and 4 is a larger subset where every pair passes the validation. The algorithm should then return the points stored in elements 1, 2 and 4. Though elements 3 and 4 are valid with each other and elements 1 and 4 are valid with each other, elements 2 and 3 are not, nor elements 1 and 3. The subset containing 1, 2 and 4 is a larger subset than 3 and 4.
I would guess some tree or graph algorithm would be best for solving this but I'm not sure how to set it up.
The solution doesn't have to be Java-specific, and preferably could be implemented in any language instead of relying on Java built-ins. I just used Java-like pseudocode above for familiarity reasons.
Presumably isValidWith is commutative -- that is, if x.isValidWith(y) then y.isValidWith(x). If you know nothing more than that, you have an instance of the maximum clique problem, which is known to be NP-complete:
Skiena, S. S. "Clique and Independent Set" and "Clique." ยง6.2.3 and 8.5.1 in The Algorithm Design Manual. New York: Springer-Verlag, pp. 144 and 312-314, 1997.
Therefore, if you want an efficient algorithm, you will have to hope that your specific isValidWith function has more structure than mere commutativity, and you will have to exploit that structure.
For your specific problem, you should be able to do the following:
Sort your points in increasing order of x coordinate.
Find the longest decreasing subsequence of the y coordinates in the sorted list.
Each operation can be performed in O(n*log(n)) time, so your particular problem is efficiently solvable.

Boost Graph identifying vertices from its created matrix

I have a table of vertices and edges and from this tables i created a Boost graph. each of the vertex edges had its id assign to it while the edges also contains length. now i want to prune the graph by removing nodes. my algorithm is done by creating a matrix of num_vertices. My problem is how to associate my matrix with the boost::vertices that is how do i know which of the matrix column correspond to my vertex in the graph since the matrix has no id. hope i am not thinking too complicated.
void Nodekiller::build_matrix(){
int ndsize=num_vertices(graph);
double matrixtb[ndsize][ndsize];
for(int i=0; i<ndsize;i++){
for (int j=0;j<ndsize; j++){
if(i==j) {matrixtb[i][j]=0;}
else {
matrixtb[i][j]=addEdgeValue(); //if none add random value
}
}
}
}
//i want to to sum each column and then prioritize them based on the values gotten.
so i don't know how to associate the boost::vertices(graph) with the matrix in other to be able to prune the graph.
The question is not very clear. Do I understand right:
You have a boost graph
You create a matrix from that graph?
So a first trivial question (maybe outside of the scope): do you really need two representations of the same graphe? one as a boost::graph, and an other as your matrix?
You can add and remove edges from a boost::graph easily. The easiest representation is the adjacency list: http://www.boost.org/doc/libs/1_55_0/libs/graph/doc/adjacency_list.html
Maybe a starting point could be this answer: adding custom vertices to a boost graph
You can create all your nodes, iterate on every node, and add a vertice only if the two nodes are different. Something like :
boost::graph_traits<Graph>::vertex_iterator vi, vi_end;
boost::tie(vi, vi_end) = boost::vertices(g);
boost::tie(vi2, vi2_end) = boost::vertices(g);
for (; vi != vi_end; ++vi) {
for (; vi2 != vi2_end; ++vi2) {
if(*vi != *vi2) {
boost::add_edge(
edge_t e; bool b;
boost::tie(e,b) = boost::add_edge(u,v,g);
g[e] = addEdgeValue();
}
}
}

Number of minimum vertex covers in a tree

There are [at least] three algorithms which find minimum vertex cover in a tree in linear (O(n)) time. What I'm interested in is a modification of all of these algorithms so that I'll also get number of these minimum vertex covers.
For example for tree P4 (path with 4 nodes) the number of MVC's is 3 because we can choose nodes: 1 and 3, 2 and 4 or 2 and 3.
Of course you can describe the solution for any of the free algorithms - not all 3. I'm just interested in all of them, but if you have anything to add, don't hesitate.
I'll describe the algorithms that I know to make it easier for you.
1. Greedy algorithm.
We can notice that for every edge we have to include one of the nodes. Which one to choose? Assume we have an edge with "normal" node and a leaf. Which node is better to choose? Not the leaf of course, as the other node might help us with one more edge. The algorithm is as follows:
Start from any node which is not a leaf.
For each child make a DFS call and when it returns check if either parent or child are marked as node in vertex cover. If not you have to choose one of them so choose the parent (and mark it).
For a leaf do nothing.
Here's the code: https://ideone.com/mV4bqg.
#include<stdio.h>
#include<vector>
using namespace std;
vector<int> graph[100019];
int mvc[100019];
int mvc_tree(int v)
{
mvc[v] = -1;
if(graph[v].size() == 1)
return 0;
int x = 0;
for(int i = 0; i < graph[v].size(); ++i)
if(!mvc[graph[v][i]])
{
x += mvc_tree(graph[v][i]);
if(mvc[v] < 1 && mvc[graph[v][i]] < 1)
++x,
mvc[v] = 1;
}
return x;
}
int main()
{
int t, n, a, b, i;
scanf("%d", &t);
while(t--)
{
scanf("%d", &n);
for(i = 1; i <= n; ++i)
graph[i].clear();
for(i = 1; i < n; ++i)
{
scanf("%d%d", &a, &b);
graph[a].push_back(b);
graph[b].push_back(a);
mvc[i] = 0;
}
mvc[n] = 0;
if(n < 3)
{
puts("1");
continue;
}
for(i = 1; i <= n; ++i)
if(graph[i].size() > 1)
break;
printf("%d\n", mvc_tree(i));
}
return 0;
}
2. Dynamic programming algorithm.
We can also use recursion to solve the task.
MVC(v) = min(
1 + sum(MVC(child) for child in v.children),
v.children.size + sum(MVC(grandchild) for grandchild in v.grandchildren)
)
When we are at node v it can either be in MVC or not. If it is, we add it to our result 1 (because we include v) and subresults for subtrees for all v's children. If, on the other hand, it's not in MVC, then all his children have to be in MVC, so we add to the result number of children and for each of the children we add subresult's of their children (so v's grandchildren).
The algorithm is linear, because we check each node 2 times - for their parent and grandparent.
3. Dynamic programming no 2.
Instead of 2 states for node v (1 - in MVC, 2 - not in MVC) we can make 3 adding "maybe in MVC". How does that help? First, we call MVC(v = random node, "maybe") as we don't know whether v should be in MVC or not. The result for "maybe" is minimum of results from "yes" and "no". The result for "yes" is 1+sum(MVC(child, "maybe") for child in v.children). And the result for "no" is sum(MVC(child, "yes") for child in v.children). I think it's pretty clear why. If not, ask in comments.
The formula is therefore:
MVC(v, "maybe") = min(MVC(v, "yes"), MVC(v, "no"))
MVC(v, "yes") = 1 + sum(MVC(child, "maybe") for child in v.children)
MVC(v, "no") = sum(MVC(child, "yes") for child in v.children)
The complexity is also O(n) because every node is checked twice - with "yes" and with "no".
Dynamic programming solution
This solution expands on your third algorithm "dynamic programming no 2": We recursively define six functions
cover_maybe(v) := min(cover_no(v), cover_yes(v))
cover_no (v) := sum(cover_yes (child) for child in v.children)
cover_yes (v) := sum(cover_maybe(child) for child in v.children) + 1
count_maybe(v) :=
count_no (v) if cover_no(v) < cover_yes(v)
count_yes(v) if cover_no(v) > cover_yes(v)
count_no (v) + count_yes(v) if cover_no(v) == cover(yes)
count_no (v) := product(count_yes (child) for child in v.children)
count_yes (v) := product(count_maybe(child) for child in v.children)
The first three functions cover_maybe, cover_no and cover_yes precisely correspond to your function MVC for the states "maybe", "no" and "yes". They count the minimum number of vertices that need to be included into a vertex cover of the sub-tree below v:
cover_maybe(v) determines the minimal vertex cover for the sub-tree below v.
cover_no(v): MVC for the sub-tree below v with the condition that v is not included in this cover.
cover_yes(v): MVC for the sub-tree below v with the condition that v is included in this cover.
Explanations:
cover_maybe(v): In any vertex cover, v is either included in the cover or not. MVC picks a solution with minimal number of included vertices: the minimum of cover_no(v) and cover_yes(v).
cover_no(v): If v is not included in the cover, then all children must be included in the cover (in order to cover the edges from v to the children). Therefore, we need to add the included vertices in cover_yes(child) for all children of v.
cover_yes(v): Because v is included in the cover, it already covers the edges from v to the children --- we not restricted whether to include a child into the cover or not and hence add cover_maybe(child) for all children of v.
The next three functions count the number of solutions for these MVC problems:
count_maybe(v) counts the number of MVC solutions for the sub-tree below v.
count_no(v) counts the number of MVC solutions with the condition that v is not included in the covers.
count_yes(v) counts the number of MVC solutions with the condition that v is contained in the covers.
Explanations:
count_maybe(v): We need to consider three separate cases: If cover_no(v) is less than cover_yes(v), then it is better always to exclude v from the cover: count_maybe(v)=count_no(v). Similarly, if cover_yes(v) is less than cover_no(v), we always include v in the cover and set count_maybe(v)=count_yes(v). But if count_no(v) is equal to count_yes(v) then we can either include or exclude v from the cover. The number of possibilities is the sum: count_maybe(v)=count_no(v)+count_yes(v).
count_no(v) and count_yes(v): Because we already know whether to include or exclude the node v into the cover, we are left with independent sub-trees for the children. The number of possible solutions is the product of solution counts for each sub-tree. The choice of the correct sub-problem (count_yes or count_maybe) is as explained above (for cover_no(v) and cover_yes(v)).
Two notes regarding the implementation:
As usual for dynamic programming, you must cache the results of each function: The first time a result is calculated and stored in a cache. When the same query is asked again, then the result is read out of the cache instead of being calculated again. Through this caching, the run time of this algorithm is O(n) because each of the six function can be computed at most once for each node.
You must start the calculation with the root of the tree (not with a random node as you suggest in your question): Even though the problem is defined with undirected --- our "divide and conquer" algorithm picks one root node and arranges the children of nodes according to their distance from this root.

How to reverse a graph in linear time?

I know there are two ways to represent my graph: one is using a matrix, and the other one is using a list.
If I use a matrix, I have to flip all the bits in the matrix. Doesn't that take O(V^2) time?
If I use a list, wouldn't I have to traverse each list, one by one, and create a new set? That would seem to take O(V+E) time which is linear. Am I correct?
So, I got another question here. Consider, for example, that I use the Dijkstra algorithm on my graph (either a matrix or a list), and we use a priority queue for the data structure behind the scene. Is there any relation of graph representation and the use of data structure? Will it affect the performance of the algorithm?
Suppose I were to use a list for representations and a priority queue for the Dijkstra algorithm, would there be a difference between matrix and use priority queue for Dijkstra?
I guess it relates to makeQueue operation only? Or they don't have different at all?
Reversing the adjacency lists of a Directed Graph can be done in linear time. We traverse the graph only once. Order of complexity will be O(|V|+|E|).
Maintain a HashMap of Adjaceny Lists where the key is the vertex label and the value is an ArrayList of adjacent vertices of the key vertex.
For reversing, create a new HashMap of the same kind. Scan the original hash map and for each key you come across, traverse the corresponding list.
For each vertex found in the value list, add a key in the new hashMap, putting the key of the original HashMap as an entry in the ArrayList corresponding to the new key in the new HashMap.
public static HashMap<Character,ArrayList <Character>> getReversedAdjLists(RGraph g)
{
HashMap <Character, ArrayList<Character>> revAdjListMap = new HashMap <Character, ArrayList<Character>>();
Set <Character> oldLabelSet = g.adjListMap.keySet();
for(char oldLabel:oldLabelSet)
{
ArrayList<Character> oldLabelList = g.adjListMap.get(oldLabel);
for (char newLabel : oldLabelList)
{
ArrayList<Character> newLabelList = revAdjListMap.get(newLabel);
if (newLabelList == null)
{
newLabelList = new ArrayList<Character>();
newLabelList.add(oldLabel);
}
else if ( ! newLabelList.contains(oldLabel))
{
newLabelList.add(oldLabel);
}
revAdjListMap.put(newLabel, newLabelList);
}
}
return revAdjListMap;
}
I think reversing the graph by traversing the list takes O(V2), since for each vertex you must add or delete (V-1) edges.
As for Dijkstra's algorithm, as I understand it, if you represent the graph as a matrix or list the algorithm takes O(V2), but some other data structures are faster. The fastest known is a Fibonacci heap, which gives O(E + VlogV).
G = {"A":["B", "C","D"],"B":["C", "E"], "C":["D", "E"],"D":[],"E":["D"] }
res ={}
for k in G.keys():
for val in G[k]:
if val not in res.keys():
res[val] = [k]
else:
res[val].append(k)
print(res)
Since I see a couple of comments asking about an in place graph transpose (reversal), here is my version of it. Please note this will only work on DAGs.Feedback and suggestions for improvement would be welcome
def transpose(G):
"""
Return the transpose of a directed graph i.e. all the edges are reversed (In Place)
"""
#note this is not a standard lib function afaik and you would have to
#implement topological sort but that should be easy
topsort = topological_sort(G)
topsort.reverse() # we want to process starting from sink node
for v in topsort:
for node in G[v]:
G[node].append(v)
# remove all older members of the vertex 'v'
G[v] = []
print(G)

A fast way to find connected component in a 1-NN graph?

First of all, I got a N*N distance matrix, for each point, I calculated its nearest neighbor, so we had a N*2 matrix, It seems like this:
0 -> 1
1 -> 2
2 -> 3
3 -> 2
4 -> 2
5 -> 6
6 -> 7
7 -> 6
8 -> 6
9 -> 8
the second column was the nearest neighbor's index. So this was a special kind of directed
graph, with each vertex had and only had one out-degree.
Of course, we could first transform the N*2 matrix to a standard graph representation, and perform BFS/DFS to get the connected components.
But, given the characteristic of this special graph, is there any other fast way to do the job ?
I will be really appreciated.
Update:
I've implemented a simple algorithm for this case here.
Look, I did not use a union-find algorithm, because the data structure may make things not that easy, and I doubt whether It's the fastest way in my case(I meant practically).
You could argue that the _merge process could be time consuming, but if we swap the edges into the continuous place while assigning new label, the merging may cost little, but it need another N spaces to trace the original indices.
The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.
This will definitely run in linear time:
- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).
Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:
for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop,
collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets
The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.
You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.
If you want to do it sequencially you can do it using weighted quick union and path compression .Complexity O(N+Mlog(log(N))).check this link .
Here is the pseudocode .honoring #pycho 's words
`
public class QuickUnion
{
private int[] id;
public QuickUnion(int N)
{
id = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
}
public int root(int i)
{
while (i != id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}
public boolean find(int p, int q)
{
return root(p) == root(q);
}
public void unite(int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}
`
#reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
If you want to find connected components parallely, the asymptotic complexity can be reduced to O(log(log(N)) time using pointer jumping and weighted quick union with path compression. Check this link
https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/
Since each node has only one outgoing edge, you can just traverse the graph one edge at a time until you get to a vertex you've already visited. An out-degree of 1 means any further traversal at this point will only take you where you've already been. The traversed vertices in that path are all in the same component.
In your example:
0->1->2->3->2, so [0,1,2,3] is a component
4->2, so update the component to [0,1,2,3,4]
5->6->7->6, so [5,6,7] is a component
8->6, so update the compoent to [5,6,7,8]
9->8, so update the compoent to [5,6,7,8,9]
You can visit each node exactly once, so time is O(n). Space is O(n) since all you need is a component id for each node, and a list of component ids.

Resources