What is the best way to calculate runtime complexity for any method? It's easy to do that for non-recursive methods, like bubblesort
outer-for loop
{
inner-for loop
{
compare and exchange
}
}
To check, the best way is to put a counter in the inner-most loop. But, when the method is recursive, where should I put the counter, for instance merge sort,
sort(int[] array){
left = first-half
right = second-half
sort(left);
sort(right);
ret merge(left, right);
}
merge(int[] left, right)
{
count = length(left + right);
int[] result;
loop-count-times
{
compare and put in result;
}
return result;
}
Since this is merge sort, the big(o) is o(n log n), so an array of 100 ints should return a big-o of 200 exactly. Where will the counter go? If I put it at the top of sort(..), I get an average of 250, 280, 300, which should be wrong. What is the best place for this counter?
references:http://en.wikipedia.org/wiki/Mergesort
Thanks.
Since this is merge sort, the big(o) is o(n log n), so an array of 100 ints should return a big-o of 200 exactly.
Not even close to right.
Computational complexity denoted using the big Ordo-notation does not tell you how many steps/computational operations will be executed exactly. There's a reason it's called asymptotic and not identical complexity: it only gives you a function that approaches (more precisely, gives a higher bound on) the running time of the algorithm with regards to the size of the input.
So O(n log n) doesn't mean that for 100 elements, 200 operations will be performed (how come, by the way, that the base of the logarithm must be 10?), it tells you that if you increase the size of your input, the (average-case) running time will be proportional to the number of pieces of input data added, multiplied by the logarithm of the number of this additional data.
To the point: if you want to count the number of calls to a recursive function, you should put the counter in as an argument, like this:
void merge_sort(int array[], size_t length, int *counter)
{
(*counter)++;
// apply the algorithm to `array`:
merge_sort(array, length, counter);
}
and call it like this:
int num_calls = 0;
merge_sort(array, sizeof(array) / sizeof(array[0]), &num_calls);
printf("Called %d times\n", num_calls);
I think you have slightly misunderstood the concept of Big-O notation. If the complexity is O(n log n) and the value of n is 100, there is no strict rule that the program should execute exactly in Big-O of 200. It only gives us an upper bound. For example consider selection sort with an O(n2) complexity. Even if n is 100 the counter set inside the inner loop will not give you 1002 as result if the list is already sorted. So in your case what you get as answer (250, 280, 300, etc.) is perfectly valid. Because all the answers are limited by k times n log n, where k is an arbitrary constant.
Related
I'm trying to solve question 11.1 in Elements of Programming Interviews (EPI) in Java: Search a Sorted Array for First Occurrence of K.
The problem description from the book:
Write a method that takes a sorted array and a key and returns the index of the first occurrence of that key in the array.
The solution they provide in the book is a modified binary search algorithm that runs in O(logn) time. I wrote my own algorithm also based on a modified binary search algorithm with a slight difference - it uses recursion. The problem is I don't know how to determine the time complexity of my algorithm - my best guess is that it will run in O(logn) time because each time the function is called it reduces the size of the candidate values by half. I've tested my algorithm against the 314 EPI test cases that are provided by the EPI Judge so I know it works, I just don't know the time complexity - here is the code:
public static int searchFirstOfKUtility(List<Integer> A, int k, int Lower, int Upper, Integer Index)
{
while(Lower<=Upper){
int M = Lower + (Upper-Lower)/2;
if(A.get(M)<k)
Lower = M+1;
else if(A.get(M) == k){
Index = M;
if(Lower!=Upper)
Index = searchFirstOfKUtility(A, k, Lower, M-1, Index);
return Index;
}
else
Upper=M-1;
}
return Index;
}
Here is the code that the tests cases call to exercise my function:
public static int searchFirstOfK(List<Integer> A, int k) {
Integer foundKey = -1;
return searchFirstOfKUtility(A, k, 0, A.size()-1, foundKey);
}
So, can anyone tell me what the time complexity of my algorithm would be?
Assuming that passing arguments is O(1) instead of O(n), performance is O(log(n)).
The usual theoretical approach for analyzing recursion is calling the Master Theorem. It is to say that if the performance of a recursive algorithm follows a relation:
T(n) = a T(n/b) + f(n)
then there are 3 cases. In plain English they correspond to:
Performance is dominated by all the calls at the bottom of the recursion, so is proportional to how many of those there are.
Performance is equal between each level of recursion, and so is proportional to how many levels of recursion there are, times the cost of any layer of recursion.
Performance is dominated by the work done in the very first call, and so is proportional to f(n).
You are in case 2. Each recursive call costs the same, and so performance is dominated by the fact that there are O(log(n)) levels of recursion times the cost of each level. Assuming that passing a fixed number of arguments is O(1), that will indeed be O(log(n)).
Note that this assumption is true for Java because you don't make a complete copy of the array before passing it. But it is important to be aware that it is not true in all languages. For example I recently did a bunch of work in PL/pgSQL, and there arrays are passed by value. Meaning that your algorithm would have been O(n log(n)).
Most recursive functions I have seen being asked about (e.g. Fibonacci or Hanoi) have had O(1) returns, but what would the time complexity be if it wasn't O(1) but O(n) instead?
For example, a recursive Fibonacci with O(n) base case:
class fibonacci {
static int fib(int n) {
if (n <= 1)
for (int i=0;i<n;i++) {
// something
}
return n;
return fib(n-1) + fib(n-2);
}
public static void main (String args[])
{
int n = 9;
System.out.println(fib(n));
}
}
The base case for the function that you’ve written here actually still has time complexity O(1). The reason for this is that if the base case triggers here, then n ≤ 1, so the for loop here will run at most once.
Because so many base cases trigger when the input size is small, it’s comparatively rare to get a base case whose runtime is, say, O(n) when the input to the algorithm has size n. This would mean that the base case is independent of the array size, which can happen but is somewhat unusual.
A more common occurrence - albeit one I think is still pretty uncommon - would be for a recursive function to have two different parameters to it (say, n and k), where the recursion reduces n but leaves k unmodified. For example, imagine taking the code you have here and replacing the for loop on n in the base case with a for loop on k in the base case. What happens then?
This turns out to be an interesting question. In the case of this particular problem, it means that the total work done will be given by O(k) times the number of base cases triggered, plus O(1) times the number of non-base-case recursive calls. For the Fibonacci recursion, the number of base cases triggered computing Fn is Fn+1 and there are (Fn+1 - 1) non-base-case calls, so the overall runtime would be Θ(k Fn+1 + Fn+1) = Θ(k φn). For the Towers of Hanoi, you’d similarly see a scaling effect where the overall runtime would be Θ(k 2n). But for other recursive functions the runtime might vary in different ways, depending on how those functions were structured.
Hope this helps!
For the purpose of my question, I'll include a sample problem.
Say we need to iterate through a vector of N Elements and remove duplicates. So, we'd probably use a set right? (Let's use a C++ Set that's a tree)
O(N) cost to iterate through each element - then insert into the Set Data Structure.
My question Has a log n cost with the Set structure, and we insert N times, is this algorithm O(N log N) or simply O(N)? I was discussing this with a professor, and I'm not sure. The Leetcode/SO/online community seems to disregard data structure costs, but from an academic point of view, N inserts into a red/black tree with log N worst case - This is Log N, N times no?
For clarification - Yes It'd make more sense to use unordered_set, but that doesn't make my question valid.
Complexities express the count of some reference operation.
For example, you can very well count the inserts in some black-box structure and enumerate O(N) inserts.
But if you focus on, say, comparisons and you know that an insert costs Log N comparisons on average, the total number of comparisons is O(N Log N).
Now if you are comparing strings of Log N characters, you will count O(N Log²N) character comparisons...
Yes, it is O(n * log(n)). If you have a method like
public void foo(int n) {
for (int i = 0; i < n; i++) {
// Call a method that is in O(log n)
someLogNMethod();
}
}
then the method foo runs in O(n * log n) time.
Example
There are many non-constructed examples. Like computing the median-value in an array of integer. Take a look at the following solution to this problem which solves it by sorting the array first. Sorting is in Theta(n log n) (see comparison based sorting).
public int median(int[] values) {
int[] sortedValues = sort(values);
// Let's ignore special cases (even, empty, ...) for simplicity
int indexOfMedian = values.length / 2;
return sortedValues[indexOfMedian];
}
Obviously you wouldn't call this median method to be in Theta(1) though all it does runs in constant time (excluding the sort method).
However, the problem depends on the sort method. You can't solve the problem of finding the median of general arrays in O(1). You need to include the sort in your analysis. The method thus actually runs in Theta(n log n + 1) which is Theta(n log n).
Note that the problem can actually be solved in Theta(n) (see Find median of unsorted array in O(n) time).
I have 4 arrays A, B, C, D of size n. n is at most 4000. The elements of each array are 30 bit (positive/negative) numbers. I want to know the number of ways, A[i]+B[j]+C[k]+D[l] = 0 can be formed where 0 <= i,j,k,l < n.
The best algorithm I derived is O(n^2 lg n), is there a faster algorithm?
Ok, Here is my O(n^2lg(n^2)) algorithm-
Suppose there is four array A[], B[], C[], D[]. we want to find the number of way A[i]+B[j]+C[k]+D[l] = 0 can be made where 0 <= i,j,k,l < n.
So sum up all possible arrangement of A[] and B[] and place them in another array E[] that contain n*n number of element.
int k=0;
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
E[k++]=A[i]+B[j];
}
}
The complexity of above code is O(n^2).
Do the same thing for C[] and D[].
int l=0;
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
AUX[l++]=C[i]+D[j];
}
}
The complexity of above code is O(n^2).
Now sort AUX[] so that you can find the number of occurrence of unique element in AUX[] easily.
Sorting complexity of AUX[] is O(n^2 lg(n^2)).
now declare a structure-
struct myHash
{
int value;
int valueOccuredNumberOfTimes;
}F[];
Now in structure F[] place the unique element of AUX[] and number of time they appeared.
It's complexity is O(n^2)
possibleQuardtupple=0;
Now for each item of E[], do the following
for(i=0;i<k;i++)
{
x=E[i];
find -x in structure F[] using binary search.
if(found in j'th position)
{
possibleQuardtupple+=number of occurrences of -x in F[j];
}
}
For loop i ,total n^2 number of iteration is performed and in each
iteration for binary search lg(n^2) comparison is done. So overall
complexity is O(n^2 lg(n^2)).
The number of way 0 can be reached is = possibleQuardtupple.
Now you can use stl map/ binary search. But stl map is slow, so its better to use binary search.
Hope my explanation is clear enough to understand.
I disagree that your solution is in fact as efficient as you say. In your solution populating E[] and AUX[] is O(N^2) each, so 2.N^2. These will each have N^2 elements.
Generating x = O(N)
Sorting AUX = O((2N)*log((2N)))
The binary search for E[i] in AUX[] is based on N^2 elements to be found in N^2 elements.
Thus you are still doing N^4 work, plus extra work generating the intermediate arrays ans for sorting the N^2 elements in AUX[].
I have a solution (work in progress) but I find it very difficult to calculate how much work it is. I deleted my previous answer. I will post something when I am more sure of myself.
I need to find a way to compare O(X)+O(Z)+O(X^3)+O(X^2)+O(Z^3)+O(Z^2)+X.log(X)+Z.log(Z) to O(N^4) where X+Z = N.
It is clearly less than O(N^4) ... but by how much???? My math is failing me here....
The judgement is wrong. The supplied solution generates arrays with size N^2. It then operates on these arrays (sorting, etc).
Therefore the Order of work, which would normaly be O(n^2.log(n)) should have n substituted with n^2. The result is therefore O((n^2)^2.log(n^2))
How can we remove the median of a set with time complexity O(log n)? Some idea?
If the set is sorted, finding the median requires O(1) item retrievals. If the items are in arbitrary sequence, it will not be possible to identify the median with certainty without examining the majority of the items. If one has examined most, but not all, of the items, that will allow one to guarantee that the median will be within some range [if the list contains duplicates, the upper and lower bounds may match], but examining the majority of the items in a list implies O(n) item retrievals.
If one has the information in a collection which is not fully ordered, but where certain ordering relationships are known, then the time required may require anywhere between O(1) and O(n) item retrievals, depending upon the nature of the known ordering relation.
For unsorted lists, repeatedly do O(n) partial sort until the element located at the median position is known. This is at least O(n), though.
Is there any information about the elements being sorted?
For a general, unsorted set, it is impossible to reliably find the median in better than O(n) time. You can find the median of a sorted set in O(1), or you can trivially sort the set yourself in O(n log n) time and then find the median in O(1), giving an O(n logn n) algorithm. Or, finally, there are more clever median selection algorithms that can work by partitioning instead of sorting and yield O(n) performance.
But if the set has no special properties and you are not allowed any pre-processing step, you will never get below O(n) by the simple fact that you will need to examine all of the elements at least once to ensure that your median is correct.
Here's a solution in Java, based on TreeSet:
public class SetWithMedian {
private SortedSet<Integer> s = new TreeSet<Integer>();
private Integer m = null;
public boolean contains(int e) {
return s.contains(e);
}
public Integer getMedian() {
return m;
}
public void add(int e) {
s.add(e);
updateMedian();
}
public void remove(int e) {
s.remove(e);
updateMedian();
}
private void updateMedian() {
if (s.size() == 0) {
m = null;
} else if (s.size() == 1) {
m = s.first();
} else {
SortedSet<Integer> h = s.headSet(m);
SortedSet<Integer> t = s.tailSet(m + 1);
int x = 1 - s.size() % 2;
if (h.size() < t.size() + x)
m = t.first();
else if (h.size() > t.size() + x)
m = h.last();
}
}
}
Removing the median (i.e. "s.remove(s.getMedian())") takes O(log n) time.
Edit: To help understand the code, here's the invariant condition of the class attributes:
private boolean isGood() {
if (s.isEmpty()) {
return m == null;
} else {
return s.contains(m) && s.headSet(m).size() + s.size() % 2 == s.tailSet(m).size();
}
}
In human-readable form:
If the set "s" is empty, then "m" must be
null.
If the set "s" is not empty, then it must
contain "m".
Let x be the number of elements
strictly less than "m", and let y be
the number of elements greater than
or equal "m". Then, if the total
number of elements is even, x must be
equal to y; otherwise, x+1 must be
equal to y.
Try a Red-black-tree. It should work quiet good and with a binary search you get ur log(n). It has aswell a remove and insert time of log(n) and rebalancing is done in log(n) aswell.
As mentioned in previous answers, there is no way to find the median without touching every element of the data structure. If the algorithm you look for must be executed sequentially, then the best you can do is O(n). The deterministic selection algorithm (median-of-medians) or BFPRT algorithm will solve the problem with a worst case of O(n). You can find more about that here: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
However, the median of medians algorithm can be made to run faster than O(n) making it parallel. Due to it's divide and conquer nature, the algorithm can be "easily" made parallel. For instance, when dividing the input array in elements of 5, you could potentially launch a thread for each sub-array, sort it and find the median within that thread. When this step finished the threads are joined and the algorithm is run again with the newly formed array of medians.
Note that such design would only be beneficial in really large data sets. The additional overhead that spawning threads has and merging them makes it unfeasible for smaller sets. This has a bit of insight: http://www.umiacs.umd.edu/research/EXPAR/papers/3494/node18.html
Note that you can find asymptotically faster algorithms out there, however they are not practical enough for daily use. Your best bet is the already mentioned sequential median-of-medians algorithm.
Master Yoda's randomized algorithm has, of course, a minimum complexity of n like any other, an expected complexity of n (not log n) and a maximum complexity of n squared like Quicksort. It's still very good.
In practice, the "random" pivot choice might sometimes be a fixed location (without involving a RNG) because the initial array elements are known to be random enough (e.g. a random permutation of distinct values, or independent and identically distributed) or deduced from an approximate or exactly known distribution of input values.
I know one randomize algorithm with time complexity of O(n) in expectation.
Here is the algorithm:
Input: array of n numbers A[1...n] [without loss of generality we can assume n is even]
Output: n/2th element in the sorted array.
Algorithm ( A[1..n] , k = n/2):
Pick a pivot - p universally at random from 1...n
Divided array into 2 parts:
L - having element <= A[p]
R - having element > A[p]
if(n/2 == |L|) A[|L| + 1] is the median stop
if( n/2 < |L|) re-curse on (L, k)
else re-curse on (R, k - (|L| + 1)
Complexity:
O( n)
proof is all mathematical. One page long. If you are interested ping me.
To expand on rwong's answer: Here is an example code
// partial_sort example
#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;
int main () {
int myints[] = {9,8,7,6,5,4,3,2,1};
vector<int> myvector (myints, myints+9);
vector<int>::iterator it;
partial_sort (myvector.begin(), myvector.begin()+5, myvector.end());
// print out content:
cout << "myvector contains:";
for (it=myvector.begin(); it!=myvector.end(); ++it)
cout << " " << *it;
cout << endl;
return 0;
}
Output:
myvector contains: 1 2 3 4 5 9 8 7 6
The element in the middle would be the median.