Generate random graph with probability p - algorithm

Write a function in main.cpp, which creates a random graph of a certain size as follows. The function takes two parameters. The first parameter is the number of vertices n. The second parameter p (1 >= p >= 0) is the probability that an edge exists between a pair of nodes. In particular, after instantiating a graph with n vertices and 0 edges, go over all possible vertex pairs one by one, and for each such pair, put an edge between the vertices with probability p.
How to know if an edge exists between two vertices .
Here is the full question
PS: I don't need the code implementation

The problem statement clearly says that the first input parameter is the number of nodes and the second parameter is the probability p that an edge exists between any 2 nodes.
What you need to do is as follows (Updated to amend a mistake that was pointed out by #user17732522):
1- Create a bool matrix (2d nested array) of size n*n initialized with false.
2- Run a loop over the rows:
- Run an inner loop over the columns:
- if row_index != col_index do:
- curr_p = random() // random() returns a number between 0 and 1 inclusive
- if curr_p <= p: set matrix[row_index][col_index] = true
else: set matrix[row_index][col_index] = false
- For an undirected graph, also set matrix[col_index][row_index] = true/false based on curr_p
Note: Since we are setting both cells (both directions) in the matrix in case of a probability hit, we could potentially set an edge 2 times. This doesn't corrupt the correctnes of the probability and isn't much additional work. It helps to keep the code clean.
If you want to optimize this solution, you could run the loop such that you only visit the lower-left triangle (excluding the diagonal) and just mirror the results you get for those cells to the upper-right triangle.
That's it.

Related

Find largest ones after setting a coordinate to one

Interview Question:
You are given a grid of ones and zeros. You can arbitrarily select any point in that grid. You have to write a function which does two things:
If you choose e.g. coordinate (3,4) and it is zero you need to flip
that to a one. If it is a one you need to flip that to a zero.
You need to return the largest contiguous region
with the most ones i.e. ones have to be at least connected to
another one.
E.g.
[0,0,0,0]
[0,1,1,0]
[1,0,1,0]
We have the largest region being the 3 ones. We have another region which have only one one (found at coordinate (2,0)).
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
My Solution which has Time Complexity:O(num_row*num_col) each time this function is called:
def get_all_coordinates_of_ones(grid):
set_ones = set()
for i in range(len(grid[0])):
for j in range(len(grid)):
if grid[i][j]:
set_ones.add((i, j))
return set_ones
def get_largest_region(x, y, grid):
num_col = len(grid)
num_row = len(grid[0])
one_or_zero = grid[x][y]
if not grid[x][y]:
grid[x][y] = 1 - grid[x][y]
# get the coordinates of ones in the grid
# Worst Case O(num_col * num_row)
coordinates_ones = get_all_coordinates_of_ones(grid)
while coordinates_ones:
queue = collections.deque([coordinates_ones.pop()])
largest_one = float('-inf')
count_one = 1
visited = set()
while queue:
x, y = queue.popleft()
visited.add((x, y))
for new_x, new_y in ((x, y + 1), (x, y - 1), (x + 1, y), (x - 1, y)):
if (0 <= new_x < num_row and 0 <= new_y < num_col):
if grid[new_x][new_y] == 1 and (new_x, new_y) not in visited:
count_one += 1
if (new_x, new_y) in coordinates_ones:-
coordinates_ones.remove((new_x, new_y))
queue.append((new_x, new_y))
largest_one = max(largest_one, count_one)
return largest_one
My Proposed modifications:
Use Union Find by rank. Encountered a problem. Union all the ones that are adjacent to each other. Now when one of the
coordinates is flipped e.g. from zero to one I will need to remove that coordinate from the region that it is connected to.
Questions are:
What is the fastest algorithm in terms of time complexity?
Using Union Find with rank entails removing a node. Is this the way to do improve the time complexity. If so, is there an implementation of removing a node in union find online?
------------------------ EDIT ---------------------------------
Should we always subtract one from the degree from sum(degree-1 of each 'cut' vertex). Here are two examples the first one where we need to subtract one and the second one where we do not need to subtract one:
Block Cut Tree example 1
Cut vertex is vertex B. Degree of vertex B in the block cut tree is 2.
Sum(cardinality of each 'block' vertex) : 2(A,B) + 1(B) + 3 (B,C,D) = 6
Sum(degree of each 'cut' vertex) : 1 (B)
Block cut size: 6 – 1 = 5 but should be 4 (A. B, C, D, E, F). Here need to subtract one more.
Block Cut Tree Example 2
Sum(cardinality of each 'block' vertex) : 3 (A,B,C) + 1(C) + 1(D) + 3 (D, E, F) = 8
Sum(degree of each 'cut' vertex) : 2 (C and D)
Block cut size: 8 – 2 = 6 which is (A. B, C, D, E, F). Here no need to subtract one.
Without preprocessing:
Flip the cell in the matrix.
Consider the matrix as a graph where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components. For each connected component - store its cardinality.
Return the highest cardinality.
Note that O(V) = O(E) = O(num_row*num_col).
Step 3 takes O(V+E)=O(num_row*num_col), which is similar to your solution.
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
That hints that you can benefit from preprocessing:
Preprocessing:
Consider the original matrix as a graph G where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components
Construct the set of block-cut trees (section 5.2) of G (also here, here and here) (one block-cut tree for each connected component of G). Construction: see here.
Processing:
If you flip a '0' cell to '1':
Find neighbor connected components (0 to 4)
Remove old block-cut trees, construct a new block-cut tree for the merged component (Optimizations are possible: in some cases, previous tree(s) may be updated instead of reconstructed).
If you flip a '1' cell to '0':
If this cell is a 'cut' in a block-cut tree:
remove it from the block-cut-tree
remove it from each neighbor 'cut' vertex
split the block-cut-tree into several block-cut trees
Otherwise (this cell is part of only one 'block vertex')
remove it from the 'block' vertex; if empty - remove vertex. If block-cut-tree empty - remove it from the set of trees.
The size of a block-cut tree = sum(cardinality of each 'block' vertex) - sum(neighbor_blocks-1 of each 'cut' vertex).
Block-cut trees are not 'well known' as other data structures, so I'm not sure if this is what the interviewer had in mind. If it is - they're really looking for someone well experienced with graph algorithms.

implementing stochastic ACO algorithm

I am trying to implement a stochastic ant colony optimisation algorithm, and I'm having trouble working out how to implement movement choices based on probabilities.
the standard (greedy) version that I have implemented so far is that an ant m at a vertex i on a graph G = (V,E) where E is the set of edges (i, j), will choose the next vertex j based on the following criteria:
j = argmax(<fitness function for j>)
such that j is connected to i
the problem I am having is in trying to implement a stochastic version of this, so that now the criteria for choosing a new vertex, j is:
P(j) = <fitness function for j>/sum(<fitness function for J>)
where P(j) is the probability of choosing vertex j,
such j is connected to i,
and J is the set of all vertices connected to i
I understand the mathematics behind it, I am just having trouble working out how i should actually implement it.
if, say, i have 3 vertices connected to i, each with a probability of 0.2, 0.3, 0.5 - what is the best way to make the selection? should I just randomly select a vertex j, then generate a random number r in the range (0,1) and if r >= P(j), select vertex j? or is there a better way?
Looking at the problem statement, I think you are not trying to visit all nodes (connected to i (say) ), but some of the nodes based on some probability distribution. Lets take an example:
You have a node i and connected to it are 5 nodes, a1...a5, with probabilities p1...p5, such that sum(p_i) = 1. No, say the precision of probabilities that you consider is 2 places after decimal. Also, you dont want to visit all 5 nodes, but only k of them. Lets say, in this example, k = 2. So, since 2 places of decimal is your probability precision, add 3 to it to increase normality of probability distribution in the random function. (You can change this 3 to any number of your choice, as far as performance is concerned) (Since you have not tagged any language, I'll take example of java's nextInt() function to generate random numbers.)
Lets give some values:
p1...p5 = {0.17, 0.11, 0.45, 0.03, 0.24}
Now, in a loop from 1 to k, generate a random number from (0...10^5). {5 = 2 + 3, ie. precision + 3}. If the generated number is from 0 to 16999, go with node a1, 17000 to 27999, go with a2, 28000 to 72999, go with a3...and so on. You get the idea.
What you're trying to implement is a weighted random choice depending on the probabilities for the components of the solution, or a random proportional selection rule on ACO terms. Here is an snippet of the implementation of this rule on the Isula Framework:
double value = random.nextDouble();
while (componentWithProbabilitiesIterator.hasNext()) {
Map.Entry<C, Double> componentWithProbability = componentWithProbabilitiesIterator
.next();
Double probability = componentWithProbability.getValue();
total += probability;
if (total >= value) {
nextNode = componentWithProbability.getKey();
getAnt().visitNode(nextNode);
return true;
}
}
You just need to generate a random value between 0 and 1 (stored in value), and start accumulating the probabilities of the components (on the total variable). When the total exceeds the threshold defined in value, we have found the component to add to the solution.

Pathfinding while forcing unique node attributes -- which algorithm should I use?

Update 2011-12-28: Here's a blog post with a less vague description of the problem I was trying to solve, my work on it, and my current solution: Watching Every MLB Team Play A Game
I'm trying to solve a kind of strange pathfinding challenge. I have an acyclic directional graph, and every edge has a distance value. And I want to find a shortest path. Simple, right? Well, there are a couple of reasons I can't just use Dijkstra's or A*.
I don't care at all what the starting node of my path is, nor the ending node. I just need a path that includes exactly 10 nodes. But:
Each node has an attribute, let's say it's color. Each node has one of 20 different possible colors.
The path I'm trying to find is the shortest path with exactly 10 nodes, where each node is a different color. I don't want any of the nodes in my path to have the same color as any other node.
It'd be nice to be able to force my path to have one value for one of the attributes ("at least one node must be blue", for instance), but that's not really necessary.
This is a simplified example. My full data set actually has three different attributes for each node that must all be unique, and I have 2k+ nodes each with an average of 35 outgoing edges. Since getting a perfect "shortest path" may be exponential or factorial time, an exhaustive search is really not an option. What I'm really looking for is some approximation of a "good path" that meets the criterion under #3.
Can anyone point me towards an algorithm that I might be able to use (even modified)?
Some stats on my full data set:
Total nodes: 2430
Total edges: 86524
Nodes with no incoming edges: 19
Nodes with no outgoing edges: 32
Most outgoing edges: 42
Average edges per node: 35.6 (in each direction)
Due to the nature of the data, I know that the graph is acyclic
And in the full data set, I'm looking for a path of length 15, not 10
It is the case when the question actually contains most of the answer.
Do a breadth-first search starting from all root nodes. When the number of parallelly searched paths exceeds some limit, drop the longest paths. Path length may be weighed: last edges may have weight 10, edges passed 9 hops ago - weight 1. Also it is possible to assign lesser weight to all paths having the preferred attribute or paths going through the weakly connected nodes. Store last 10 nodes in the path to the hash table to avoid duplication. And keep somewhere the minimum sum of the last 9 edge lengths along with the shortest path.
If the number of possible values is low, you can use the Floyd algorithm with a slight modification: for each path you store a bitmap that represents the different values already visited. (In your case the bitmap will be 20 bits wide per path.
Then when you perform the length comparison, you also AND your bitmaps to check whether it's a valid path and if it is, you OR them together and store that as the new bitmap for the path.
Have you tried a straight-forward approach and failed? From your description of the problem, I see no reason a simple greedy algorithm like depth-first search might be just fine:
Pick a start node.
Check the immediate neighbors, are there any nodes that are ok to append to the path? Expand the path with one of them and repeat the process for that node.
If you fail, backtrack to the last successful state and try a new neighbor.
If you run out of neighbors to check, this node cannot be the start node of a path. Try a new one.
If you have 10 nodes, you're done.
Good heuristics for picking a start node is hard to give without any knowledge about how the attributes are distributed, but it is possible that it is beneficial to nodes with high degree first.
It looks like a greedy depth first search will be your best bet. With a reasonable distribution of attribute values, I think finding a single valid sequence is E[O(1)] time, that is expected constant time. I could probably prove that, but it might take some time. The proof would use the assumption that there is a non-zero probability that a valid next segment of the sequence could be found at every step.
The greedy search would backtracking whenever the unique attribute value constraint is violated. The search stops when a 15 segment path is found. If we accept my hunch that each sequence can be found in E[O(1)], then it is a matter of determining how many parallel searches to undertake.
For those who want to experiment, here is a (postgres) sql script to generate some fake data.
SET search_path='tmp';
-- DROP TABLE nodes CASCADE;
CREATE TABLE nodes
( num INTEGER NOT NULL PRIMARY KEY
, color INTEGER
-- Redundant fields to flag {begin,end} of paths
, is_root boolean DEFAULT false
, is_terminal boolean DEFAULT false
);
-- DROP TABLE edges CASCADE;
CREATE TABLE edges
( numfrom INTEGER NOT NULL REFERENCES nodes(num)
, numto INTEGER NOT NULL REFERENCES nodes(num)
, cost INTEGER NOT NULL DEFAULT 0
);
-- Generate some nodes, set color randomly
INSERT INTO nodes (num)
SELECT n
FROM generate_series(1,2430) n
WHERE 1=1
;
UPDATE nodes SET COLOR= 1+TRUNC(20*random() );
-- (partial) cartesian product nodes*nodes. The ordering guarantees a DAG.
INSERT INTO edges(numfrom,numto,cost)
SELECT n1.num ,n2.num, 0
FROM nodes n1 ,nodes n2
WHERE n1.num < n2.num
AND random() < 0.029
;
UPDATE edges SET cost = 1+ 1000 * random();
ALTER TABLE edges
ADD PRIMARY KEY (numfrom,numto)
;
ALTER TABLE edges
ADD UNIQUE (numto,numfrom)
;
UPDATE nodes no SET is_root = true
WHERE NOT EXISTS (
SELECT * FROM edges ed
WHERE ed.numfrom = no.num
);
UPDATE nodes no SET is_terminal = true
WHERE NOT EXISTS (
SELECT * FROM edges ed
WHERE ed.numto = no.num
);
SELECT COUNT(*) AS nnode FROM nodes;
SELECT COUNT(*) AS nedge FROM edges;
SELECT color, COUNT(*) AS cnt FROM nodes GROUP BY color ORDER BY color;
SELECT COUNT(*) AS nterm FROM nodes no WHERE is_terminal = true;
SELECT COUNT(*) AS nroot FROM nodes no WHERE is_root = true;
WITH zzz AS (
SELECT numto, COUNT(*) AS fanin
FROM edges
GROUP BY numto
)
SELECT zzz.fanin , COUNT(*) AS cnt
FROM zzz
GROUP BY zzz.fanin
ORDER BY zzz.fanin
;
WITH zzz AS (
SELECT numfrom, COUNT(*) AS fanout
FROM edges
GROUP BY numfrom
)
SELECT zzz.fanout , COUNT(*) AS cnt
FROM zzz
GROUP BY zzz.fanout
ORDER BY zzz.fanout
;
COPY nodes(num,color,is_root,is_terminal)
TO '/tmp/nodes.dmp';
COPY edges(numfrom,numto, cost)
TO '/tmp/edges.dmp';
The problem may be solving by dynamic programming as follows. Let's start by formally defining its solution.
Given a DAG G = (V, E), let C the be set of colors of vertices visited so far and let w[i, j] and c[i] be respectively the weight (distance) associated to edge (i, j) and the color of a vertex i. Note that w[i, j] is zero if the edge (i, j) does not belong to E.
Now define the distance d for going from vertex i to vertex j taking into account C as
d[i, j, C] = w[i, j] if i is not equal to j and c[j] does not belong to C
= 0 if i = j
= infinite if i is not equal to j and c[j] belongs to C
We are now ready to define our subproblems as follows:
A[i, j, k, C] = shortest path from i to j that uses exactly k edges and respects the colors in C so that no two vertices in the path are colored using the same color (one of the colors in C)
Let m be the maximum number of edges permitted in the path and assume that the vertices are labeled 1, 2, ..., n. Let P[i,j,k] be the predecessor vertex of j in the shortest path satisfying the constraints from i to j. The following algorithm solves the problem.
for k = 1 to m
for i = 1 to n
for j = 1 to n
A[i,j,k,C] = min over x belonging to V {d[i,x,C] + A[x,j,k-1,C union c[x]]}
P[i,j,k] = the vertex x that minimized A[i,j,k,C] in the previous statement
Set the initial conditions as follows:
A[i,j,k,C] = 0 for k = 0
A[i,j,k,C] = 0 if i is equal to j
A[i,j,k,C] = infinite in all of the other cases
The overall computational complexity of the algorithm is O(m n^3); taking into account that in your particular case m = 14 (since you want exactly 15 nodes), it follows that m = O(1) so that the complexity actually is O(n^3). To represent the set C use an hash table so that insertion and membership testing require O(1) on average. Note that in the algorithm the operation C union c[x] is actually an insert operation in which you add the color of vertex x into the hash table for C. However, since you are inserting just an element, the set union operation leads to exactly the same result (if the color is not in the set, it is added; otherwise, it is simply discarded and the set does not change). Finally, to represent the DAG, use the adjacency matrix.
Once the algorithm is done, to find the minimum shortest path among all possible vertices i and j, simply find the minimum among the values A[i,j,m,C]. Note that if this value is infinite, then no valid shortest path exists. If a valid shortest path exists, then you can actually determine it by using the P[i,j,k] values and tracing backwards through predecessor vertices. For instance, starting from a = P[i,j,m] the last edge on the shortest path is (a,j), the previous edge is given by b = P[i,a,m-1] and its is (b,a) and so on.

Finding the path with the maximum minimal weight

I'm trying to work out an algorithm for finding a path across a directed graph. It's not a conventional path and I can't find any references to anything like this being done already.
I want to find the path which has the maximum minimum weight.
I.e. If there are two paths with weights 10->1->10 and 2->2->2 then the second path is considered better than the first because the minimum weight (2) is greater than the minimum weight of the first (1).
If anyone can work out a way to do this, or just point me in the direction of some reference material it would be incredibly useful :)
EDIT:: It seems I forgot to mention that I'm trying to get from a specific vertex to another specific vertex. Quite important point there :/
EDIT2:: As someone below pointed out, I should highlight that edge weights are non negative.
I am copying this answer and adding also adding my proof of correctness for the algorithm:
I would use some variant of Dijkstra's. I took the pseudo code below directly from Wikipedia and only changed 5 small things:
Renamed dist to width (from line 3 on)
Initialized each width to -infinity (line 3)
Initialized the width of the source to infinity (line 8)
Set the finish criterion to -infinity (line 14)
Modified the update function and sign (line 20 + 21)
1 function Dijkstra(Graph, source):
2 for each vertex v in Graph: // Initializations
3 width[v] := -infinity ; // Unknown width function from
4 // source to v
5 previous[v] := undefined ; // Previous node in optimal path
6 end for // from source
7
8 width[source] := infinity ; // Width from source to source
9 Q := the set of all nodes in Graph ; // All nodes in the graph are
10 // unoptimized – thus are in Q
11 while Q is not empty: // The main loop
12 u := vertex in Q with largest width in width[] ; // Source node in first case
13 remove u from Q ;
14 if width[u] = -infinity:
15 break ; // all remaining vertices are
16 end if // inaccessible from source
17
18 for each neighbor v of u: // where v has not yet been
19 // removed from Q.
20 alt := max(width[v], min(width[u], width_between(u, v))) ;
21 if alt > width[v]: // Relax (u,v,a)
22 width[v] := alt ;
23 previous[v] := u ;
24 decrease-key v in Q; // Reorder v in the Queue
25 end if
26 end for
27 end while
28 return width;
29 endfunction
Some (handwaving) explanation why this works: you start with the source. From there, you have infinite capacity to itself. Now you check all neighbors of the source. Assume the edges don't all have the same capacity (in your example, say (s, a) = 300). Then, there is no better way to reach b then via (s, b), so you know the best case capacity of b. You continue going to the best neighbors of the known set of vertices, until you reach all vertices.
Proof of correctness of algorithm:
At any point in the algorithm, there will be 2 sets of vertices A and B. The vertices in A will be the vertices to which the correct maximum minimum capacity path has been found. And set B has vertices to which we haven't found the answer.
Inductive Hypothesis: At any step, all vertices in set A have the correct values of maximum minimum capacity path to them. ie., all previous iterations are correct.
Correctness of base case: When the set A has the vertex S only. Then the value to S is infinity, which is correct.
In current iteration, we set
val[W] = max(val[W], min(val[V], width_between(V-W)))
Inductive step: Suppose, W is the vertex in set B with the largest val[W]. And W is dequeued from the queue and W has been set the answer val[W].
Now, we need to show that every other S-W path has a width <= val[W]. This will be always true because all other ways of reaching W will go through some other vertex (call it X) in the set B.
And for all other vertices X in set B, val[X] <= val[W]
Thus any other path to W will be constrained by val[X], which is never greater than val[W].
Thus the current estimate of val[W] is optimum and hence algorithm computes the correct values for all the vertices.
You could also use the "binary search on the answer" paradigm. That is, do a binary search on the weights, testing for each weight w whether you can find a path in the graph using only edges of weight greater than w.
The largest w for which you can (found through binary search) gives the answer. Note that you only need to check if a path exists, so just an O(|E|) breadth-first/depth-first search, not a shortest-path. So it's O(|E|*log(max W)) in all, comparable to the Dijkstra/Kruskal/Prim's O(|E|log |V|) (and I can't immediately see a proof of those, too).
Use either Prim's or Kruskal's algorithm. Just modify them so they stop when they find out that the vertices you ask about are connected.
EDIT: You ask for maximum minimum, but your example looks like you want minimum maximum. In case of maximum minimum Kruskal's algorithm won't work.
EDIT: The example is okay, my mistake. Only Prim's algorithm will work then.
I am not sure that Prim will work here. Take this counterexample:
V = {1, 2, 3, 4}
E = {(1, 2), (2, 3), (1, 4), (4, 2)}
weight function w:
w((1,2)) = .1,
w((2,3)) = .3
w((1,4)) = .2
w((4,2)) = .25
If you apply Prim to find the maxmin path from 1 to 3, starting from 1 will select the 1 --> 2 --> 3 path, while the max-min distance is attained for the path that goes through 4.
This can be solved using a BFS style algorithm, however you need two variations:
Instead of marking each node as "visited", you mark it with the minimum weight along the path you took to reach it.
For example, if I and J are neighbors, I has value w1, and the weight of the edge between them is w2, then J=min(w1, w2).
If you reach a marked node with value w1, you might need to remark and process it again, if assigning a new value w2 (and w2>w1). This is required to make sure you get the maximum of all minimums.
For example, if I and J are neighbors, I has value w1, J has value w2, and the weight of the edge between them is w3, then if min(w2, w3) > w1 you must remark J and process all it's neighbors again.
Ok, answering my own question here just to try and get a bit of feedback I had on the tentative solution I worked out before posting here:
Each node stores a "path fragment", this is the entire path to itself so far.
0) set current vertex to the starting vertex
1) Generate all path fragments from this vertex and add them to a priority queue
2) Take the fragment off the top off the priority queue, and set the current vertex to the ending vertex of that path
3) If the current vertex is the target vertex, then return the path
4) goto 1
I'm not sure this will find the best path though, I think the exit condition in step three is a little ambitious. I can't think of a better exit condition though, since this algorithm doesn't close vertices (a vertex can be referenced in as many path fragments as it likes) you can't just wait until all vertices are closed (like Dijkstra's for example)
You can still use Dijkstra's!
Instead of using +, use the min() operator.
In addition, you'll want to orient the heap/priority_queue so that the biggest things are on top.
Something like this should work: (i've probably missed some implementation details)
let pq = priority queue of <node, minimum edge>, sorted by min. edge descending
push (start, infinity) on queue
mark start as visited
while !queue.empty:
current = pq.top()
pq.pop()
for all neighbors of current.node:
if neighbor has not been visited
pq.decrease_key(neighbor, min(current.weight, edge.weight))
It is guaranteed that whenever you get to a node you followed an optimal path (since you find all possibilities in decreasing order, and you can never improve your path by adding an edge)
The time bounds are the same as Dijkstra's - O(Vlog(E)).
EDIT: oh wait, this is basically what you posted. LOL.

Algorithm to establish ordering amongst a set of items

I have a set of students (referred to as items in the title for generality). Amongst these students, some have a reputation for being rambunctious. We are told about a set of hate relationships of the form 'i hates j'. 'i hates j' does not imply 'j hates i'. We are supposed to arrange the students in rows (front most row numbered 1) in a way such that if 'i hates j' then i should be put in a row that is strictly lesser numbered than that of j (in other words: in some row that is in front of j's row) so that i doesn't throw anything at j (Turning back is not allowed). What would be an efficient algorithm to find the minimum number of rows needed (each row need not have the same number of students)?
We will make the following assumptions:
1) If we model this as a directed graph, there are no cycles in the graph. The most basic cycle would be: if 'i hates j' is true, 'j hates i' is false. Because otherwise, I think the ordering would become impossible.
2) Every student in the group is at least hated by one other student OR at least hates one other student. Of course, there would be students who are both hated by some and who in turn hate other students. This means that there are no stray students who don't form part of the graph.
Update: I have already thought of constructing a directed graph with i --> j if 'i hates j and doing topological sorting. However, since the general topological sort would suit better if I had to line all the students in a single line. Since there is a variation of the rows here, I am trying to figure out how to factor in the change into topological sort so it gives me what I want.
When you answer, please state the complexity of your solution. If anybody is giving code and you don't mind the language, then I'd prefer Java but of course any other language is just as fine.
JFYI This is not for any kind of homework (I am not a student btw :)).
It sounds to me that you need to investigate topological sorting.
This problem is basically another way to put the longest path in a directed graph problem. The number of rows is actually number of nodes in path (number of edges + 1).
Assuming the graph is acyclic, the solution is topological sort.
Acyclic is a bit stronger the your assumption 1. Not only A -> B and B -> A is invalid. Also A -> B, B -> C, C -> A and any cycle of any length.
HINT: the question is how many rows are needed, not which student in which row. The answer to the question is the length of the longest path.
It's from a project management theory (or scheduling theory, I don't know the exact term). There the task is about sorting jobs (vertex is a job, arc is a job order relationship).
Obviously we have some connected oriented graph without loops. There is an arc from vertex a to vertex b if and only if a hates b. Let's assume there is a source (without incoming arcs) and destination (without outgoing arcs) vertex. If that is not the case, just add imaginary ones. Now we want to find length of a longest path from source to destination (it will be number of rows - 1, but mind the imaginary verteces).
We will define vertex rank (r[v]) as number of arcs in a longest path between source and this vertex v. Obviously we want to know r[destination]. Algorithm for finding rank:
0) r_0[v] := 0 for all verteces v
repeat
t) r_t[end(j)] := max( r_{t-1}[end(j)], r_{t-1}[start(j)] + 1 ) for all arcs j
until for all arcs j r_{t+1}[end(j)] = r_t[end(j)] // i.e. no changes on this iteration
On each step at least one vertex increases its rank. Therefore in this form complexity is O(n^3).
By the way, this algorithm also gives you student distribution among rows. Just group students by their respective ranks.
Edit: Another code with the same idea. Possibly it is better understandable.
# Python
# V is a list of vertex indices, let it be something like V = range(N)
# source has index 0, destination has index N-1
# E is a list of edges, i.e. tuples of the form (start vertex, end vertex)
R = [0] * len(V)
do:
changes = False
for e in E:
if R[e[1]] < R[e[0]] + 1:
changes = True
R[e[1]] = R[e[0]] + 1
while changes
# The answer is derived from value of R[N-1]
Of course this is the simplest implementation. It can be optimized, and time estimate can be better.
Edit2: obvious optimization - update only verteces adjacent to those that were updated on the previous step. I.e. introduce a queue with verteces whose rank was updated. Also for edge storing one should use adjacency lists. With such optimization complexity would be O(N^2). Indeed, each vertex may appear in the queue at most rank times. But vertex rank never exceeds N - number of verteces. Therefore total number of algorithm steps will not exceed O(N^2).
Essentailly the important thing in assumption #1 is that there must not be any cycles in this graph. If there are any cycles you can't solve this problem.
I would start by seating all of the students that do not hate any other students in the back row. Then you can seat the students who hate these students in the next row and etc.
The number of rows is the length of the longest path in the directed graph, plus one. As a limit case, if there is no hate relationship everyone can fit on the same row.
To allocate the rows, put everyone who is not hated by anyone else on the row one. These are the "roots" of your graph. Everyone else is put on row N + 1 if N is the length of the longest path from any of the roots to that person (this path is of length one at least).
A simple O(N^3) algorithm is the following:
S = set of students
for s in S: s.row = -1 # initialize row field
rownum = 0 # start from first row below
flag = true # when to finish
while (flag):
rownum = rownum + 1 # proceed to next row
flag = false
for s in S:
if (s.row != -1) continue # already allocated
ok = true
foreach q in S:
# Check if there is student q who will sit
# on this or later row who hates s
if ((q.row == -1 or q.row = rownum)
and s hated by q) ok = false; break
if (ok): # can put s here
s.row = rownum
flag = true
Simple answer = 1 row.
Put all students in the same row.
Actually that might not solve the question as stated - lesser row, rather than equal row...
Put all students in row 1
For each hate relation, put the not-hating student in a row behind the hating student
Iterate till you have no activity, or iterate Num(relation) times.
But I'm sure there are better algorithms - look at acyclic graphs.
Construct a relationship graph where i hates j will have a directed edge from i to j. So end result is a directed graph. It should be a DAG otherwise no solutions as it's not possible to resolve circular hate relations ship.
Now simply do a DFS search and during the post node callbacks, means the once the DFS of all the children are done and before returning from the DFS call to this node, simply check the row number of all the children and assign the row number of this node as row max row of the child + 1. Incase if there is some one who doesn't hate anyone basically node with no adjacency list simply assign him row 0.
Once all the nodes are processed reverse the row numbers. This should be easy as this is just about finding the max and assigning the row numbers as max-already assigned row numbers.
Here is the sample code.
postNodeCb( graph g, int node )
{
if ( /* No adj list */ )
row[ node ] = 0;
else
row[ node ] = max( row number of all children ) + 1;
}
main()
{
.
.
for ( int i = 0; i < NUM_VER; i++ )
if ( !visited[ i ] )
graphTraverseDfs( g, i );`enter code here`
.
.
}

Resources