What is the time complexity of the algorithm below? - algorithm

I just want to have confirmation of the time complexity of the algorithm below.
EDIT: in what follows, we do not assume that the optimal data structures are being used. However, feel free to propose solutions that use such structures (clearly mentioning what data structures are used and what beneficial impact they have on complexity).
Notation: in what follows n=|V|, m=|E|, and avg_neigh denotes the average number of neighbors of any given node in the graph.
Assumption: the graph is unweighted, undirected, and we have its adjacency list representation loaded in memory (a list of lists containing the neighbors of each vertex).
Here is what we have so far:
Line 1: computing the degrees is O(n), as it simply involves getting the length of each sublist in the adjacency list representation, i.e., performing n O(1) operations.
Line 3: finding the lowest value requires checking all values, which is O(n). Since this is nested in the while loop which visits all nodes once, it becomes O(n^2).
Lines 6-7: removing vertex v is O(avg_neigh^2) as we know the neighbors of v from the adjacency list representation and removing v from each of the neighbors' sublists is O(avg_neigh). Lines 6-7 are nested in the while loop, so it becomes O(n * avg_neigh^2).
Line 9: it is O(1), because it simply involves getting the length of one list. It is nested in the for loop and while loop so it becomes O(n * avg_neigh).
Summary: the total complexity is O(n) + O(n^2) + O(n * avg_neigh^2) + O(n * avg_neigh) = O(n^2).
Note 1: if the length of each sublist is not available (e.g., because the adjacency list cannot be loaded in memory), computing the degrees in line 1 is O(n * avg_neigh), as each list features avg_neigh elements on average and there are n sublists. And line 9, the total complexity becomes O(n * avg_neigh^2).
Note 2: if the graph is weighted, we can store the edge weights in the adjacency list representation. However, getting the degrees in line 1 requires summing over each sublist and is now O(n * avg_neigh) if the adjacency list is loaded in RAM and O(n * avg_neigh^2) else. Similarly, line 9 becomes O(n * avg_neigh^2) or O(n * avg_neigh^3).

There is an algorithm that (1) is recognizable as an implementation of Algorithm 1 and (2) runs in time O(|E| + |V|).
First, let's consider the essence of Algorithm 1. Until the graph is empty, do the following: record the priority of the node with the lowest priority as the core number of that node, and delete that node. The priority of a node is defined dynamically as the maximum over (1) its degree in the residual graph and (2) the core numbers of its deleted neighbors. Observe that the priority of a node never increases, since its degree never increases, and a higher-priority neighbor would not be deleted first. Also, priorities always decrease by exactly one in each outer loop iteration.
The data structure support sufficient to achieve O(|E| + |V|) time splits nicely into
A graph that, starting from the initial graph (initialization time O(|E| + |V|)), supports
Deleting a node (O(degree + 1), summing to O(|E| + |V|) over all nodes),
Getting the residual degree of a node (O(1)).
A priority queue that, starting from the initial degrees (initialization time O(|V|)), supports
Popping the minimum priority element (O(1) amortized),
Decreasing the priority of an element by one (O(1)), never less than zero.
A suitable graph implementation uses doubly linked adjacency lists. Each undirected edge has two corresponding list nodes, which are linked to each other in addition to the previous/next nodes originating from the same vertex. Degrees can be stored in a hash table. It turns out we only need the residual degrees to implement this algorithm, so don't bother storing the residual graph.
Since the priorities are numbers between 0 and |V| - 1, a bucket queue works very nicely. This implementation consists of a |V|-element vector of doubly linked lists, where the list at index p contains the elements of priority p. We also track a lower bound on the minimum priority of an element in the queue. To find the minimum priority element, we examine the lists in order starting from this bound. In the worst case, this costs O(|V|), but if we credit the algorithm whenever it increases the bound and debit it whenever the bound decreases, we obtain an amortization scheme that offsets the variable cost of this scanning.

If choosing a data structure that can support efficient lookup of min value and efficient deletions (such as self balancing binary tree or min heap), this can be done in O(|E|log|V|).
Creating the data structure in line 1 is done in O(|V|) or
O(|V|log|V|).
The loop goes exactly once over each node v in V.
For each such node, it needs to peek the min value, and adjust the DS so you can peek at the next min efficiently next iteration. O(log|V|) is required for that (otherwise, you could do heap sort in o(nlogn) - note little o notation here).
Getting neighbors, and removing items from V and E could be done in O(|neighbors|) using efficient set implementation, such as a hash table. Since each edge and node is chosen exactly once for this step, it sums over all iterations in O(|V| + |E|).
The for loop over neighbors, again, sums to O(|E|), thanks to the fact that each edge is counted exactly once here.
In this loop, however, you might need to update p, and such an update costs O(log|V|), so this sums to O(|E|log|V|)
This sums in O(|V|log|V| + |E|log|V|). Assuming the graph is not very sparse (|E| is in Omega(|V|) - this is O(|E|log|V|).

Related

What's the runtime complexity of a recursive tree algorithm which is quadratic in the number of distinct child subtree sizes?

I have an algorithm which operates on a rooted tree. It first recursively computes results for each of the root's child subtrees. It then does some work to combine them. The amount of work at the root is K^2 where K is the number of distinct values among the sizes of the subtrees.
What's the best bound on its runtime complexity? I haven't been able to construct a case in which it does more than linear work in the size of the tree.
This is governed by the Master Theorem of divide and conqour algorithms. For this particular case (me reading between the lines in what you have described) it is mainly determined by how much work it takes on a single node to combine the work compiled for K values in the subtrees. Specifically if it is less than K work, then the cost is dominated by the cost at the lowest level and would be O(K) in total, if the work at a given level is O(K) then the total work becomes O(K log(K)). For work at a level higher than O(K), it is dominated by the work at the higest level. We therefor have that your algorithm as a runtime complexity of O(K^2).

Space complexity of Adjacency List representation of Graph

I read here that for Undirected graph the space complexity is O(V + E) when represented as a adjacency list where V and E are number of vertex and edges respectively.
My analysis is, for a completely connected graph each entry of the list will contain |V|-1 nodes then we have a total of |V| vertices hence, the space complexity seems to be O(|V|*|V-1|) which seems O(|V|^2) what I am missing here?
You analysis is correct for a completely connected graph. However, note that for a completely connected graph the number of edges E is O(V^2) itself, so the notation O(V+E) for the space complexity is still correct too.
However, the real advantage of adjacency lists is that they allow to save space for the graphs that are not really densely connected. If the number of edges is much smaller than V^2, then adjacency lists will take O(V+E), and not O(V^2) space.
Note that when you talk about O-notation, you usually have three types of variables (or, well, input data in general). First is the variables dependence on which you are studying; second are those variables that are considered constant; and third are kind of "free" variables, which you usually assume to take the worst-case values. For example, if you talk about sorting an array of N integers, you usually want to study the dependence of sorting time on N, so N is of the first kind. You usually consider the size of integers to be constant (that is, you assume that comparison is done in O(1), etc.), and you usually consider the particular array elements to be "free", that is, you study that runtime for the worst possible combination of particular array elements. However, you might want to study the same algorithm from a different point of view, and it will lead to a different expression of complexity.
For graph algorithms, you can, of course, consider the number of vertices V to be of first kind, and the number of edges to be the third kind, and study the space complexity for given V and for the worst-case number of edges. Then you indeed get O(V^2). But it is also often useful to treat both V and E as variables of the first type, thus getting the complexity expression as O(V+E).
Size of array is |V| (|V| is the number of nodes). These |V| lists each have the degree which is denoted by deg(v). We add up all those, and apply the Handshaking Lemma.
∑deg(v)=2|E| .
So, you have |V| references (to |V| lists) plus the number of nodes in the lists, which never exceeds 2|E| . Therefore, the worst-case space (storage) complexity of an adjacency list is O(|V|+2|E|)= O(|V|+|E|).
Hope this helps
according to u r logic, total space = O(v^2-v)
as total no. of connections(E) = v^2 - v,
space = O(E).
even I used to think the same,
But if you have total no. of connections(E) much lesser than no. of vertices(V),
lets say out of 10 people (V=10),
only 2 knows each other, therefore (E=2)
then according to your logic
space = O(E) = O(2)
but we in actual we have to allocate much larger space
which is
space = O(V+E) = O(V+2) = O(V)
that's why,
space = O(V+E), this will work if V > E or E > V

Kth minimum in a Range

Given an array of integers and some query operations.
The query operations are of 2 types
1.Update the value of the ith index to x.
2.Given 2 integers find the kth minimum in that range.(Ex if the 2 integers are i and j ,we have to find out the kth minimum between i and j both inclusive).
I can find the Range minimum query using segment tree but could no do so for the kth minimum.
Can anyone help me?
Here is a O(polylog n) per query solution that does actually not assume a constant k, so the k can vary between queries. The main idea is to use a segment tree, where every node represents an interval of array indices and contains a multiset (balanced binary search tree) of the values in the represened array segment. The update operation is pretty straightforward:
Walk up the segment tree from the leaf (the array index you're updating). You will encounter all nodes that represent an interval of array indices that contain the updated index. At every node, remove the old value from the multiset and insert the new value into the multiset. Complexity: O(log^2 n)
Update the array itself.
We notice that every array element will be in O(log n) multisets, so the total space usage is O(n log n). With linear-time merging of multisets we can build the initial segment tree in O(n log n) as well (there's O(n) work per level).
What about queries? We are given a range [i, j] and a rank k and want to find the k-th smallest element in a[i..j]. How do we do that?
Find a disjoint coverage of the query range using the standard segment tree query procedure. We get O(log n) disjoint nodes, the union of whose multisets is exactly the multiset of values in the query range. Let's call those multisets s_1, ..., s_m (with m <= ceil(log_2 n)). Finding the s_i takes O(log n) time.
Do a select(k) query on the union of s_1, ..., s_m. See below.
So how does the selection algorithm work? There is one really simple algorithm to do this.
We have s_1, ..., s_n and k given and want to find the smallest x in a, such that s_1.rank(x) + ... + s_m.rank(x) >= k - 1, where rank returns the number of elements smaller than x in the respective BBST (this can be implemented in O(log n) if we store subtree sizes).
Let's just use binary search to find x! We walk through the BBST of the root, do a couple of rank queries and check whether their sum is larger than or equal to k. It's a predicate monotone in x, so binary search works. The answer is then the minimum of the successors of x in any of the s_i.
Complexity: O(n log n) preprocessing and O(log^3 n) per query.
So in total we get a runtime of O(n log n + q log^3 n) for q queries. I'm sure we could get it down to O(q log^2 n) with a cleverer selection algorithm.
UPDATE: If we are looking for an offline algorithm that can process all queries at once, we can get O((n + q) * log n * log (q + n)) using the following algorithm:
Preprocess all queries, create a set of all values that ever occured in the array. The number of those will be at most q + n.
Build a segment tree, but this time not on the array, but on the set of possible values.
Every node in the segment tree represents an interval of values and maintains a set of positions where these values occurs.
To answer a query, start at the root of the segment tree. Check how many positions in the left child of the root lie in the query interval (we can do that by doing two searches in the BBST of positions). Let that number be m. If k <= m, recurse into the left child. Otherwise recurse into the right child, with k decremented by m.
For updates, remove the position from the O(log (q + n)) nodes that cover the old value and insert it into the nodes that cover the new value.
The advantage of this approach is that we don't need subtree sizes, so we can implement this with most standard library implementations of balanced binary search trees (e.g. set<int> in C++).
We can turn this into an online algorithm by changing the segment tree out for a weight-balanced tree such as a BB[α] tree. It has logarithmic operations like other balanced binary search trees, but allows us to rebuild an entire subtree from scratch when it becomes unbalanced by charging the rebuilding cost to the operations that must have caused the imbalance.
If this is a programming contest problem, then you might be able to get away with the following O(n log(n) + q n^0.5 log(n)^1.5)-time algorithm. It is set up to use the C++ STL well and has a much better big-O constant than Niklas's (previous?) answer on account of using much less space and indirection.
Divide the array into k chunks of length n/k. Copy each chunk into the corresponding locations of a second array and sort it. To update: copy the chunk that changed into the second array and sort it again (time O((n/k) log(n/k)). To query: copy to a scratch array the at most 2 (n/k - 1) elements that belong to a chunk partially overlapping the query interval. Sort them. Use one of the answers to this question to select the element of the requested rank out of the union of the sorted scratch array and fully overlapping chunks, in time O(k log(n/k)^2). The optimum setting of k in theory is (n/log(n))^0.5. It's possible to shave another log(n)^0.5 using the complicated algorithm of Frederickson and Johnson.
perform a modification of the bucket sort: create a bucket that contains the numbers in the range you want and then sort this bucket only and find the kth minimum.
Damn, this solution can't update an element but at least finds that k-th element, here you'll get some ideas so you can think of some solution that provides update. Try pointer-based B-trees.
This is O(n log n) space and O(q log^2 n) time complexity. Later I explained the same with O(log n) per query.
So, you'll need to do the next:
1) Make a "segment tree" over given array.
2) For every node, instead of storing one number, you would store a whole array. The size of that array has to be equal to the number of it's children. That array (as you guessed) has to contain the values of the bottom nodes (children, or the numbers from that segment), but sorted.
3) To make such an array, you would merge two arrays from its two sons from segment tree. But not only that, for every element from the array you have just made (by merging), you need to remember the position of the number before its insertion in merged array (basically, the array from which it comes, and position in it). And a pointer to the first next element that is not inserted from the same array.
4) With this structure, you can check how many numbers there are that are lower than given value x, in some segment S. You find (with binary search) the first number in the array of the root node that is >= x. And then, using the pointers you have made, you can find the results for the same question for two children arrays (arrays of nodes that are children to the previous node) in O(1). You stop to operate this descending for each node that represents the segment that is whole either inside or outside of given segment S. The time complexity is O(log n): O(log n) to find the first element that is >=x, and O(log n) for all segments of decomposition of S.
5) Do a binary search over solution.
This was solution with O(log^2 n) per query. But you can reduce to O(log n):
1) Before doing all I wrote above, you need to transform the problem. You need to sort all numbers and remember the positions for each in original array. Now these positions are representing the array you are working on. Call that array P.
If bounds of the query segment are a and b. You need to find the k-th element in P that is between a and b by value (not by index). And that element represents the index of your result in original array.
2) To find that k-th element, you would do some type of back-tracking with complexity of O(log n). You will be asking the number of elements between index 0 and (some other index) that are between a and b by value.
3) Suppose that you know the answer for such a question for some segment (0,h). Get answers on same type of questions for all segments in tree that begin on h, starting from the greatest one. Keep getting those answers as long as the current answer (from segment (0,h)) plus the answer you got the last are greater than k. Then update h. Keep updating h, until there is only one segment in tree that begins with h. That h is the index of the number you are looking for in the problem you have stated.
To get the answer to such a question for some segment from tree you will spend exactly O(1) of time. Because you already know the answer of it's parent's segment, and using the pointers I explained in the first algorithm you can get the answer for the current segment in O(1).

Prim's algorithm when range of Edge weights is known

Suppose that all the edge weights in a graph are integers in the range from 1 to |V|. How fast can you make Prim's algorithm run? What if edge weights are integers in the range 1 to W for some constant W?
I think since the Prim's algorithm is based on implementation of min-heap, knowledge about the weights of edges will not help in speeding up the procedure. Is this correct?
With this constraint, you can implement a heap that uses O(V) / O(W) respectively space but has O(1) insert and O(1) extract-min operations. Actually you can get O(1) for all operations you require for Prim's algorithm. Since the time complexity of the heap influences the complexity of the main algorithm, you can get better than the default generic implementation.
I think the main idea to solve this problem is remember that W is a constant, so, if you represent your priority queue as some structure which size is bounded by W, travel the entire list at each iteration will not change the time complexity of your algorithm...
For example, if you represent your priority queue as an array T with W + 1 positions, having a linked list of vertices in each position such that T[i] is a list with all the vertices that have priority equal to i and use T[W + 1] to store vertices with priority equal to infinite, you will take
O(V) to build your priority queue (just insert all the vertices in the list T[W+1])
O(W) to extract the minimum element (just travel T searching for the first position non empty)
O(1) to decrease key (if vertex v had key equal to i and it was updated to j, just take-off v from list T[i] and insert at the first position of the list T[j]).
So, it will give you complexity O(VW + E) instead of O(V logV + E).
(Of course, it will not work if the range is from 1 to V, because V^2 + E is greater than V \logV + E).
For non-binary heap Prim's implementation, the pseudocode can be found with Cormen, Introduction to Algorithms, 3rd edition.
Knowing the range being 1...k, we can create an array with k size and walk through the list, adding edges to the corresponding weight. This, by nature of its storage, means the edges are sorted by weights. This would be O(n+m) time.
Relying on the pseudocode for Prim's algorithm in Cormen, we can analyze its complexity to result in O(nlog{n} + mlog{n}) = O((n+m)log{n}) time (Cormen page 636). In specific, step 7 and step 11 contributes the log{n} element that is iterated over n and m loop. The n log{n}-loop is from the EXTRACT-MIN operation, and the m log{n}-loop is from the "implicit DECREASE-KEY" operation. Both can be replaced with our edge-weight array, a loop of O(k). As such, with our modified Prim's algorithm, we would have a O(nk + mk) = O(k(n+m)) algorithm.

Range query for a semigroup operator (union)

I'm looking to implement an algorithm, which is given an array of integers and a list of ranges (intervals) in that array, returns the number of distinct elements in each interval. That is, given the array A and a range [i,j] returns the size of the set {A[i],A[i+1],...,A[j]}.
Obviously, the naive approach (iterate from i to j and count ignoring duplicates) is too slow. Range-Sum seems inapplicable, since A U B - B isn't always equal to B.
I've looked up Range Queries in Wikipedia, and it hints that Yao (in '82) showed an algorithm that does this for semigroup operators (which union seems to be) with linear preprocessing time and space and almost constant query time. The article, unfortunately, is not available freely.
Edit: it appears this exact problem is available at http://www.spoj.com/problems/DQUERY/
There's rather simple algorithm which uses O(N log N) time and space for preprocessing and O(log N) time per query. At first, create a persistent segment tree for answering range sum query(initially, it should contain zeroes at all the positions). Then iterate through all the elements of the given array and store the latest position of each number. At each iteration create a new version of the persistent segment tree putting 1 to the latest position of each element(at each iteration the position of only one element can be updated, so only one position's value in segment tree changes so update can be done in O(log N)). To answer a query (l, r) You just need to find sum on (l, r) segment for the version of the tree which was created when iterating through the r's element of the initial array.
Hope this algorithm is fast enough.
Upd. There's a little mistake in my explanation: at each step, at most two positions' values in the segment tree might change(because it's necessary to put 0 to a previous latest position of a number if it's updated). However, it doesn't change the complexity.
You can answer any of your queries in constant time by performing a quadratic-time precomputation:
For every i from 0 to n-1
S <- new empty set backed by hashtable;
C <- 0;
For every j from i to n-1
If A[j] does not belong to S, increment C and add A[j] to S.
Stock C as the answer for the query associated to interval i..j.
This algorithm takes quadratic time since for each interval we perform a bounded number of operations, each one taking constant time (note that the set S is backed by a hashtable), and there's a quadratic number of intervals.
If you don't have additional information about the queries (total number of queries, distribution of intervals), you cannot do essentially better, since the total number of intervals is already quadratic.
You can trade off the quadratic precomputation by n linear on-the-fly computations: after receiving a query of the form A[i..j], precompute (in O(n) time) the answer for all intervals A[i..k], k>=i. This will guarantee that the amortized complexity will remain quadratic, and you will not be forced to perform the complete quadratic precomputation at the beginning.
Note that the obvious algorithm (the one you call obvious in the statement) is cubic, since you scan every interval completely.
Here is another approach which might be quite closely related to the segment tree. Think of the elements of the array as leaves of a full binary tree. If there are 2^n elements in the array there are n levels of that full tree. At each internal node of the tree store the union of the points that lie in the leaves beneath it. Each number in the array needs to appear once in each level (less if there are duplicates). So the cost in space is a factor of log n.
Consider a range A..B of length K. You can work out the union of points in this range by forming the union of sets associated with leaves and nodes, picking nodes as high up the tree as possible, as long as the subtree beneath those nodes is entirely contained in the range. If you step along the range picking subtrees that are as big as possible you will find that the size of the subtrees first increases and then decreases, and the number of subtrees required grows only with the logarithm of the size of the range - at the beginning if you could only take a subtree of size 2^k it will end on a boundary divisible by 2^(k+1) and you will have the chance of a subtree of size at least 2^(k+1) as the next step if your range is big enough.
So the number of semigroup operations required to answer a query is O(log n) - but note that the semigroup operations may be expensive as you may be forming the union of two large sets.

Resources