What is the diameter of a graph with just one noed? - algorithm

I'm trying to find an answer to a problem in my Distributed Algorithms course, and to do so I want to get something clarified.
What is the diameter of a graph with one node, with an edge to itself? Is it 1 or 0?
If you are interested, the question to which I'm trying to find an answer is this:
In terms of n (# nodes), the number of messages (= diam * |E|) used in
the FloodMax algorithm is easily seen to be O(n^3). Produce a class of
digraphs in which the product (diam * |E|) really is Omega(n^3).
The digraph I came up with is a graph with just one node, which has a directed edge to itself. That way |E| would be 1 which is n^2, and if the diam is 1, it satisfies the second condition where diam = 1 = n as well. So it gives me a class of digraphs with message complexity being Omega(n^3).
So am I correct in my thinking, that in such a graph the diameter is 1?

Two things:
It seems to be 0 according to this, which says:
In other words, a graph's diameter is the largest number of vertices which must be traversed in order to travel from one vertex to another when paths which backtrack, detour, or loop are excluded from consideration.
Your solution to the given problem should describe how to build a graph (or rather say what type of known graph has that property, since it says "produce a class") with n nodes, not a graph with however many nodes you manually figured out a solution for. I can do the same for 2 nodes:
1 -- 2
|E| = 1 = (1/4)*2^2 = (1/4)*n^2 = O(n^2)
diam = 1 = 2 - 1 = n - 1 = O(n)
Or here's how we can make your solution work even if the diameter is 0: 0 = 1 - 1 = n - 1 = O(n) => your solution still works!
So even if you considered paths with loops as well, I would still deem your solution incorrect.

O(n^3) and Omega(n^3) do not mean cn^3, and there is no problem with a function that is 0 at finitely many nonzero values of n being in O(n^3) and Omega(n^3). For example, n^3-100 is in both, as is n^3-100n^2. For the purposes of asymptotics, it is unimportant what the diameter is for a single example. You are asked to find an infinite family of graphs with large enough diameters, and a single example of a graph doesn't affect the asymptotics of the infinite family.
That said, the diameter of a graph (or strongly connected digraph) can be defined in a few ways. One possibility is the greatest value of half of the minimum of the length of a round trip from v to w and back over all pairs v and w, and that is 0 when v and w coincide. So, the diameter of a graph with one vertex is 0.
Again, this doesn't help at all with the exercise that you have, to construct an infinite family. A family with one node and lots of edges back to itself isn't going to cut it. Think of how you might add many edges to graphs with large diameter, such as an n-cycle or path, without decreasing the diameter that much.


Minimum number of flips to get adjacent 1's in a matrix

Given a binary matrix (values of 0 or 1), adjacent entries of 1 denote “hills”. Also, given some number k, find the minimum number of 0's you need to “flip” to 1 in order to form a hill of at least size k.
Edit: For clarification, adjacent means left-right-up-down neighborhoods. Diagonals do not count as adjacent. For example,
[0 1
0 1]
is one hill of size 2,
[0 1
1 0]
defines 2 hills of size 1,
[0 1
1 1]
defines 1 hill of size 3, and
[1 1
1 1]
defines 1 hill of size 4.
Also for clarification, size is defined by the area formed by the adjacent blob of 1's.
My initial solution has to do with transforming each existing hill into nodes of a graph, and the cost to be the minimal path to each other node. Then, performing a DFS (or similar algorithm) to find the minimum cost.
This fails in cases where choosing some path reduces the cost for another edge, and solutions to combat this (that I can think of) are too close to a brute force solution.
Your problem is closely related to the rectilinear Steiner tree problem.
A Steiner tree connects a set of points together using line segments, minimising the total length of the line segments. The line segments can meet in arbitrary places, not necessarily at points in the set (so it is not the same thing as a minimum spanning tree). For example, given three points at the corners of an equilateral triangle, the Euclidean Steiner tree connects them by meeting in the middle:
A rectilinear Steiner tree is the same, except you minimise the total Manhattan distance instead of the total Euclidean distance.
In your problem, instead of joining your hills with line segments whose length is measured by Euclidean distance, you are joining your hills by adding pixels. The total number of 0s you need to flip to join two cells in your array is equal to the Manhattan distance between those two cells, minus 1.
The rectilinear Steiner tree problem is known to be NP-complete, even when restricted to points with integer coordinates. Your problem is a generalisation, except for two differences:
The "minus 1" part when measuring the Manhattan distance. I doubt that this subtle difference is enough to bring the problem into a lower complexity class, though I don't have a proof for you.
The coordinates of your integer points are bounded by the size of the matrix (as pointed out by Albert Hendriks in the comments). This does matter — it means that pseudo-polynomial time for the rectilinear Steiner tree problem would be polynomial time for your problem.
This means that your problem may or may not be NP-hard, depending on whether the rectilinear Steiner tree problem is weakly NP-complete or strongly NP-complete. I wasn't able to find a definitive answer to this in the literature, and there isn't much information about the problem other than in academic literature. It does at least appear that there isn't a known pseudo-polynomial time algorithm, as far as I can tell.
Given that, your most likely options are some kind of backtracking search for an exact solution, or applying a heuristic to get a "good enough" solution. One possible heuristic as described by Wikipedia is to compute a rectilinear minimum spanning tree and then try to improve on the RMST using an iterative improvement method. The RMST itself gives a solution within a constant factor of 1.5 of the true optimum.
A hill is composed by four sequences of 1's:
The right sequence is composed of r 'bits', the up sequence has u bits, and so on.
A hill of size k is k= 1 + r + l + u + d (1 central + sequences), where each value is 0 <= v < k.
The problem is combinatorial. For each cell all possible combinations of {r,l,u,d} that satisfy the former relation should be tested.
When testing a combination in a cell, you must count the number of the existing 1 in each value of the combination, they don't "flip". This will also skip early some other combinations.

Finding the cycle of smallest average weight in a directed graph

I'm looking for an algorithm that takes a directed, weighted graph (with positive integer weights) and finds the cycle in the graph with the smallest average weight (as opposed to total weight).
Based on similar questions (but for total weight), I considered applying a modification of the Floyd–Warshall algorithm, but it would rely on the following property, which does not hold (thank you Ron Teller for providing a counterexample to show this): "For vertices U,V,W, if there are paths p1,p2 from U to V, and paths p3,p4 from V to W, then the optimal combination of these paths to get from U to W is the better of p1,p2 followed by the better of p3,p4."
What other algorithms might I consider that don't rely on this property?
Edit: Moved the following paragraph, which is no longer relevant, below the question.
While this property seems intuitive, it doesn't seem to hold in the case of two paths that are equally desirable. For example, if p1 has total weight 2 and length 2, and p2 has total weight 3 and length 3, neither one is better than the other. However, if p3 and p4 have greater total weights than lengths, p2 is preferable to p1. In my desired application, weights of each edge are positive integers, so this property is enforced and I think I can assume that, in the case of a tie, longer paths are better. However, I still can't prove that this works, so I can't verify the correctness of any algorithm relying on it.
"While this property seems intuitive, it doesn't seem to hold in the
case of two paths that are equally desirable."
Actually, when you are considering 2 parameters (weight, length), it doesn't hold in any case, here's an example when P1 that in itself has a smaller average than P2, can be sometimes better (example 1) for the final solution or worse (example 2), depending on P3 and P4.
Example 1:
L(P1) = 9, W(P1) = 10
L(P2) = 1, W(P2) = 1
L(P3) = 1, W(P3) = 1
L(P4) = 1, W(P4) = 1
Example 2:
L(P1) = 9, W(P1) = 10
L(P2) = 1, W(P2) = 1
L(P3) = 5, W(P3) = 10
L(P4) = 5, W(P4) = 10
These two parameters have an effect on your objective function that cannot be determined locally, thus the Floyd–Warshall algorithm with any modification won't work.
Since you are considering cycles only, you might want to consider a brute force algorithm that validates the average weight of each of the cycles in the graph. you can do it in polynomial time, see:
Finding all cycles in graph
I can suggest another algorithm.
Let's fix C. Now substract C from all weights. How would the answer change? If we substracted the same number from all weights then the average weight of each cycle decreased on the same number C. Now let's check wether we have cycles with negative average weight. The condition of being of negative average weight is equal to condition of being of negative weight. So it's enough to check wether we have cycle with negative weight. We can do it using Bellman-Ford algorithm. If we have such a cycle then the answer is less than C.
So now we can find the answer via binary search. The resulting complexity wil be O(VE log(MaxWeight))
The problem you describe is called the minimum mean cycle problem and can be solved efficiently. Further, there's some really nice optimization theory to check out if you're interested (start with the standard reference AMO93).

Breadth first search branching factor

The run time of BFS is O(b^d)
b is the branching factor
d is the depth(# of level) of the graph from starting node.
I googled for awhile, but I still dont see anyone mention how they figure out this "b"
So I know branching factor means the "# of child that each node has"
Eg, branching factor for a binary Tree is 2.
so for a BFS graph , is that b= average all the branching factor of each node in our graph.
or b = MAX( among all branch factor of each node) ?
Also, no matter which way we pick the b, still seeming ambiguous to approach our run time.
For example , if our graph has 30000 nodes, only 5 nodes has 10000 branching, and all the rest 29955 nodes just have 10 branching. and we have the depth setup to be 100.
Seems O(b^d) is not making sense at this case.
Can someone explain a little bit. Thankyou!
The runtime that is more often quoted is that BFS is O(m + n) where m is the number of edges and n the number of nodes. This is because each vertex is processed once and each edge at most twice.
I think O(b^d) is used when using BFS on, say, brute-forcing a game of chess, where each position had a relatively constant branching factor and your engine needs to search a certain number of positions deep. For example, b is about 35 for chess and Deep Blue had a search depth of 6-8 (going up to 20).
In such cases, because the graph is relatively acyclic, b^d is roughly the same as m + n (they are equal for trees). O(b^d) is more useful as b is fixed and d is something you control.
in graphs O(b^d), the b = MAX. Since it is the worst case. check this link from princeton http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Breadth-first_search.html - go to time complexity portion
To quote from Artificial Intelligence - A modern approach by Stuart Russel and Peter Norvig:
Time and space complexity are always considered with respect to some measure of the prob- lem difficulty. In theoretical computer science, the typical measure is the size of the state space graph, |V | + |E|, where V is the set of vertices (nodes) of the graph and E is the set of edges (links). This is appropriate when the graph is an explicit data structure that is input to the search program. (The map of Romania is an example of this.) In AI, the graph is often represented implicitly by the initial state, actions, and transition model and is frequently infi- nite. For these reasons, complexity is expressed in terms of three quantities: b, the branching factor or maximum number of successors of any node; d, the depth of the shallowest goal node (i.e., the number of steps along the path from the root); and m, the maximum length of any path in the state space. Time is often measured in terms of the number of nodes generated during the search, and space in terms of the maximum number of nodes stored in memory. For the most part, we describe time and space complexity for search on a tree; for a graph, the answer depends on how “redundant” the paths in the state space are.
This should give you a clear insight about the difference between O(|V|+|E|) and b^d

Find sub-array of objects with maximum distance between elements

Let be an array of objects [a, b, c, d, ...] and a function distance(x, y) that gives a numeric value showing how 'different' are objects x and y.
I'm looking for an algorithm to find the subset of the array of length n that maximizes the minimum difference between that subset element.
Of course, I can't simply sort the array by the minimum of the differences with other elements and take the n highest entries, since removing an element can very well change the distances. For instance, if a=b, then removing a means the minimum distance of b with another element will change dramatically.
So far, the only solution I could find was wether to iteratively remove the element with the lowest minimum distance and re-calculate the distance at each iteration, until only n elements are left, or, vice-versa, to iteratively pick new elements, recalculate the distances, add the new pick or replace an existing one based on the distance minimums.
Does anybody know how I could get the same results without those iterations?
PS: here is an example, the matrix shows the 'distance' between each element...
a b c d
a - 1 3 2
b 1 - 4 2
c 3 4 - 5
d 2 2 5 -
If we'd keep only 2 elements, that would be c and d; if we'd keep 3, that would be a or b, c and d.
This problem is NP-hard, so no-one knows an efficient (polynomial time) algorithm for solving it.
Here's a quick sketch that it is NP-hard, by reduction from CLIQUE.
Suppose we have an instance of CLIQUE in the form of a graph G and a number n and we want to know whether there is a clique of size n in G. Construct a distance matrix d such that d(i, j) = 1 if vertices i and j are connected in G, or 0 if they are not. Then find a subset of the vertices of G of size n that maximizes the minimum distance between elements (your problem). If the minimum distance between vertices in this subset is 1, then G has a clique of size n; otherwise it does not.
As Gareth said this is an NP-hard problem, however there has been a lot of research into solving these kind of problems and as such better methods than brute force have been found. Unfortunately this is such a large area that you could spend forever looking at the possible implementations of a solutions.
However if you are interested in a heuristic way of solving this I would suggest looking into Ant Colony Optimization (ACO) which has proven fairly effective at finding optimum paths within graphs.

Minimum number of days required to solve a list of questions

There are N problems numbered 1..N which you need to complete. You've arranged the problems in increasing difficulty order, and the ith problem has estimated difficulty level i. You have also assigned a rating vi to each problem. Problems with similar vi values are similar in nature. On each day, you will choose a subset of the problems and solve them. You've decided that each subsequent problem solved on the day should be tougher than the previous problem you solved on that day. Also, to not make it boring, consecutive problems you solve should differ in their vi rating by at least K. What is the least number of days in which you can solve all problems?
The first line contains the number of test cases T. T test cases follow. Each case contains an integer N and K on the first line, followed by integers v1,...,vn on the second line.
Output T lines, one for each test case, containing the minimum number of days in which all problems can be solved.
1 <= T <= 100
1 <= N <= 300
1 <= vi <= 1000
1 <= K <= 1000
Sample Input:
3 2
5 4 7
5 1
5 3 4 5 6
Sample Output:
This is one of the challenge from interviewstreet.
Below is my approach
Start from 1st question and find out max possible number of question can be solve and remove these questions from the question list.Now again start from first element of the remainning list and do this till now size of the question list is 0.
I am getting wrong answer from this method so looking for some algo to solve this challenge.
Construct a DAG of problems in the following way. Let pi and pj be two different problems. Then we will draw a directed edge from pi to pj if and only if pj can be solved directly after pi on the same day, consecutively. Namely, the following conditions have to be satisfied:
i < j, because you should solve the less difficult problem earlier.
|vi - vj| >= K (the rating requirement).
Now notice that each subset of problems that is chosen to be solved on some day corresponds to the directed path in that DAG. You choose your first problem, and then you follow the edges step by step, each edge in the path corresponds to the pair of problems that have been solved consecutively on the same day. Also, each problem can be solved only once, so any node in our DAG may appear only in exactly one path. And you have to solve all the problems, so these paths should cover all the DAG.
Now we have the following problem: given a DAG of n nodes, find the minimal number of non-crossing directed paths that cover this DAG completely. This is a well-known problem called Path cover. Generally speaking, it is NP-hard. However, our directed graph is acyclic, and for acyclic graphs it can be solved in polynomial time using reduction to the matching problem. Maximal matching problem, in its turn, can be solved using Hopcroft-Karp algorithm, for example. The exact reduction method is easy and can be read, say, on Wikipedia. For each directed edge (u, v) of the original DAG one should add an undirected edge (au, bv) to the bipartite graph, where {ai} and {bi} are two parts of size n.
The number of nodes in each part of the resulting bipartite graph is equal to the number of nodes in the original DAG, n. We know that Hopcroft-Karp algorithm runs in O(n2.5) in the worst case, and 3002.5 ≈ 1 558 845. For 100 tests this algorithm should take under a 1 second in total.
The algorithm is simple. First, sort the problems by v_i, then, for each problem, find the number of problems in the interval (v_i-K, v_i]. The maximum of those numbers is the result. The second phase can be done in O(n), so the most costly operation is sorting, making the whole algorithm O(n log n). Look here for a demonstration of the work of the algorithm on your data and K=35 in a spreadsheet.
Why does this work
Let's reformulate the problem to the problem of graph coloring. We create graph G as follows: vertices will be the problems and there will be an edge between two problems iff |v_i - v_j| < K.
In such graph, independent sets exactly correspond to sets of problems doable on the same day. (<=) If the set can be done on a day, it is surely an independent set. (=>) If the set doesn't contain two problems not satisfying the K-difference criterion, you can just sort them according to the difficulty and solve them in this order. Both condition will be satisfied this way.
Therefore, it easily follows that colorings of graph G exactly correspond to schedules of the problems on different days, with each color corresponding to one day.
So, we want to find the chromaticity of graph G. This will be easy once we recognize the graph is an interval graph, which is a perfect graph, those have chromaticity equal to cliqueness, and both can be found by a simple algorithm.
Interval graphs are graphs of intervals on the real line, edges are between intervals that intersect. Our graph, as can be easily seen, is an interval graph (for each problem, assign an interval (v_i-K, v_i]. It can be easily seen that the edges of this interval graph are exactly the edges of our graph).
Lemma 1: In an interval graph, there exist a vertex whose neighbors form a clique.
Proof is easy. You just take the interval with the lowest upper bound (or highest lower bound) of all. Any intervals intersecting it have the upper bound higher, therefore, the upper bound of the first interval is contained in the intersection of them all. Therefore, they intersect each other and form a clique. qed
Lemma 2: In a family of graphs closed on induced subgraphs and having the property from lemma 1 (existence of vertex, whose neighbors form a clique), the following algorithm produces minimal coloring:
Find the vertex x, whose neighbors form a clique.
Remove x from the graph, making its subgraph G'.
Color G' recursively
Color x by the least color not found on its neighbors
Proof: In (3), the algorithm produces optimal coloring of the subgraph G' by induction hypothesis + closeness of our family on induced subgraphs. In (4), the algorithm only chooses a new color n if there is a clique of size n-1 on the neighbors of x. That means, with x, there is a clique of size n in G, so its chromaticity must be at least n. Therefore, the color given by the algorithm to a vertex is always <= chromaticity(G), which means the coloring is optimal. (Obviously, the algorithm produces a valid coloring). qed
Corollary: Interval graphs are perfect (perfect <=> chromaticity == cliqueness)
So we just have to find the cliqueness of G. That is easy easy for interval graphs: You just process the segments of the real line not containing interval boundaries and count the number of intervals intersecting there, which is even easier in your case, where the intervals have uniform length. This leads to an algorithm outlined in the beginning of this post.
Do we really need to go to path cover? Can't we just follow similar strategy as LIS.
The input is in increasing order of complexity. We just have to maintain a bunch of queues for tasks to be performed each day. Every element in the input will be assigned to a day by comparing the last elements of all queues. Wherever we find a difference of 'k' we append the task to that list.
For ex: 5 3 4 5 6
1) Input -> 5 (Empty lists so start a new one)
2) 3 (only list 5 & abs(5-3) is 2 (k) so append 3)
5--> 3
3) 4 (only list with last vi, 3 and abs(3-4) < k, so start a new list)
5--> 3
4) 5 again (abs(3-5)=k append)
5) 6 again (abs(5-6) < k but abs(4-6) = k) append to second list)
We just need to maintain an array with last elements of each list. Since order of days (when tasks are to be done is not important) we can maintain a sorted list of last tasks therefore, searching a place to insert the new task is just looking for the value abs(vi-k) which can be done via binary search.
The loop runs for N elements. In the worst case , we may end up querying ceil value using binary search (log i) for many input[i].
Therefore, T(n) < O( log N! ) = O(N log N). Analyse to ensure that the upper and lower bounds are also O( N log N ). The complexity is THETA (N log N).
The minimum number of days required will be the same as the length of the longest path in the complementary (undirected) graph of G (DAG). This can be solved using Dilworth's theorem.
