I'm trying to solve question 11.1 in Elements of Programming Interviews (EPI) in Java: Search a Sorted Array for First Occurrence of K.
The problem description from the book:
Write a method that takes a sorted array and a key and returns the index of the first occurrence of that key in the array.
The solution they provide in the book is a modified binary search algorithm that runs in O(logn) time. I wrote my own algorithm also based on a modified binary search algorithm with a slight difference - it uses recursion. The problem is I don't know how to determine the time complexity of my algorithm - my best guess is that it will run in O(logn) time because each time the function is called it reduces the size of the candidate values by half. I've tested my algorithm against the 314 EPI test cases that are provided by the EPI Judge so I know it works, I just don't know the time complexity - here is the code:
public static int searchFirstOfKUtility(List<Integer> A, int k, int Lower, int Upper, Integer Index)
{
while(Lower<=Upper){
int M = Lower + (Upper-Lower)/2;
if(A.get(M)<k)
Lower = M+1;
else if(A.get(M) == k){
Index = M;
if(Lower!=Upper)
Index = searchFirstOfKUtility(A, k, Lower, M-1, Index);
return Index;
}
else
Upper=M-1;
}
return Index;
}
Here is the code that the tests cases call to exercise my function:
public static int searchFirstOfK(List<Integer> A, int k) {
Integer foundKey = -1;
return searchFirstOfKUtility(A, k, 0, A.size()-1, foundKey);
}
So, can anyone tell me what the time complexity of my algorithm would be?
Assuming that passing arguments is O(1) instead of O(n), performance is O(log(n)).
The usual theoretical approach for analyzing recursion is calling the Master Theorem. It is to say that if the performance of a recursive algorithm follows a relation:
T(n) = a T(n/b) + f(n)
then there are 3 cases. In plain English they correspond to:
Performance is dominated by all the calls at the bottom of the recursion, so is proportional to how many of those there are.
Performance is equal between each level of recursion, and so is proportional to how many levels of recursion there are, times the cost of any layer of recursion.
Performance is dominated by the work done in the very first call, and so is proportional to f(n).
You are in case 2. Each recursive call costs the same, and so performance is dominated by the fact that there are O(log(n)) levels of recursion times the cost of each level. Assuming that passing a fixed number of arguments is O(1), that will indeed be O(log(n)).
Note that this assumption is true for Java because you don't make a complete copy of the array before passing it. But it is important to be aware that it is not true in all languages. For example I recently did a bunch of work in PL/pgSQL, and there arrays are passed by value. Meaning that your algorithm would have been O(n log(n)).
Suppose we have a recursive function which only terminates if a randomly generated parameter meets some condition:
e.g:
{
define (some-recursive-function)
x = (random in range of 1 to 100000);
if (x == 10)
{
return "this function has terminated";
}
else
{
(some-recursive-function)
}
}
I understand that for infinite loops, there would not be an complexity defined. What about some function that definitely terminates, but after an unknown amount of time?
Finding the average time complexity for this would be fine. How would one go about finding the worse case time complexity, if one exists?
Thank you in advance!
EDIT: As several have pointed out, I've completely missed the fact that there is no input to this function. Suppose instead, we have:
{define (some-recursive-function n)
x = (random in range of 1 to n);
if (x == 10)
{
return "this function has terminated";
}
else
{
(some-recursive-function)
}
}
Would this change anything?
If there is no function of n which bounds the runtime of the function from above, then there just isn't an upper bound on the runtime. There could be an lower bound on the runtime, depending on the case. We can also speak about the expected runtime, and even put bounds on the expected runtime, but that is distinct from, on the one hand, bounds on the average case and, on the other hand, bounds on the runtime itself.
As it's currently written, there are no bounds at all when n is under 10: the function just doesn't terminate in any event. For n >= 10, there is still no upper bound on any of the cases - it can take arbitrarily long to finish - but the lower bound in any case is as low as linear (you must at least read the value of n, which consists of N = ceiling(log n) bits; your method of choosing a random number no greater than n may require additional time and/or space). The case behavior here is fairly uninteresting.
If we consider the expected runtime of the function in terms of the value (not length) of the input, we observe that there is a 1/n chance that any particular invocation picks the right random number (again, for n >= 10); we recognize that the number of times we need to try to get one is given by a geometric distribution and that the expectation is 1/(1/n) = n. So, the expected recursion depth is a linear function of the value of the input, n, and therefore an exponential function of the input size, N = log n. We recover an exact expression for the expectation; the upper and lower bounds are therefore both linear as well, and this covers all cases (best, worst, average, etc.) I say recursion depth since the runtime will also have an additional factor of N = log n, or more, owing to the observation in the preceding paragraph.
You need to know that there are "simple" formulas to calculate the complexity of a recursive algorithm, using of course recurrence.
In this case we obviously need to know what is that recursive algorithm, because in the best case, it is O(1) (temporal complexity), but in the worst case, we need to add O(n) (having into account that numbers may repeat) to the complexity of the algorithm itself.
I'll put this question/answer for more facility:
Determining complexity for recursive functions (Big O notation)
Assume the tree T is a binary tree.
Algorithm computeDepths(node, depth)
Input: node and its depth. For all depths, call with computeDepths(T.root, 0)
Output: depths of all the nodes of T
if node != null
depth ← node.depth
computeDepths(node.left, depth + 1)
computeDepths(node.right, depth + 1)
return depth
end if
I ran it on paper with a full and complete binary tree containing 7 elements, but I still can't put my head around what time complexity it is. If I had to guess, I'd say it's O(n*log n).
It is O(n)
To get an idea on the time complexity, we need to find out the amount of work done by the algorithm, compared with the size of the input. In this algorithm, the work done per function call is constant (only assigning a given value to a variable). So let's count how many times the function is called.
The first time the function is called, it's called on the root.
Then for any subsequent calls, the function checks if the node is null, if it is not null, it set the depth accordingly and set the depths of its children. Then this is done recursively.
Now note that the function is called once per node in the tree, plus two times the number of leaves. In a binary tree, the number of leaves is n/2 (rounded up), so the total number of function calls is:
n + 2*(n/2) = 2n
So this is the amount of work done by the algorithm. And so the time complexity is O(n).
Good morning, I am studying algorithms and the way to calculate complexity when doing recursive calls, but I cannot find a reference on how a level limit in recursive calls can affect the complexity calculation. For instance this code:
countFamilyMembers(int level,....,int count){
if(noOperationCondition) { // for example no need to process this item because business rules like member already counted
return count;
} else if(level >= MAX_LEVEL) { // Level validation, we want just to look up to certain level
return ++count //last level to see then no more recurrence.
} else {
for (...each memberRelatives...) { //can be a database lookup for relatives to explore
count = countFamilyMembers(++level,...,++count);
}
return count;
}
}
I think this is O(2^n) because the recursive call in the loop. However, I have two main questions:
1. What happens if the loop values is not related to the original input at all? can that be considered "n" as well?
2. The level validation is for sure cutting limiting the recursive calls, how do this affect the complexity calculation?
Thanks for the clarifications. So we'll take n as some "best metric" on the number of relatives; this is also known as the "fan-out" in some paradigms.
Thus, you'll have 1 person at level 0, n at level 1, n^2 at level 2, and so on. A rough estimate of the return value ... and the number of operations (node visits, increments, etc.) is the sum of n^level for level ranging 0 to MAX_LEVEL. The dominant term is the highest exponent, n^MAX_LEVEL.
With the given information, I believe that's your answer: O(n^^MAX_LEVEL), a.k.a. polynomial time.
Note that, if you happen to be given a value for n, even an upper bound for n, then this becomes a constant, and the complexity is O(1).
I came across an interesting algorithm question in an interview. I gave my answer but not sure whether there is any better idea. So I welcome everyone to write something about his/her ideas.
You have an empty set. Now elements are put into the set one by one. We assume all the elements are integers and they are distinct (according to the definition of set, we don't consider two elements with the same value).
Every time a new element is added to the set, the set's median value is asked. The median value is defined the same as in math: the middle element in a sorted list. Here, specially, when the size of set is even, assuming size of set = 2*x, the median element is the x-th element of the set.
An example:
Start with an empty set,
when 12 is added, the median is 12,
when 7 is added, the median is 7,
when 8 is added, the median is 8,
when 11 is added, the median is 8,
when 5 is added, the median is 8,
when 16 is added, the median is 8,
...
Notice that, first, elements are added to set one by one and second, we don't know the elements going to be added.
My answer.
Since it is a question about finding median, sorting is needed. The easiest solution is to use a normal array and keep the array sorted. When a new element comes, use binary search to find the position for the element (log_n) and add the element to the array. Since it is a normal array so shifting the rest of the array is needed, whose time complexity is n. When the element is inserted, we can immediately get the median, using instance time.
The WORST time complexity is: log_n + n + 1.
Another solution is to use link list. The reason for using link list is to remove the need of shifting the array. But finding the location of the new element requires a linear search. Adding the element takes instant time and then we need to find the median by going through half of the array, which always takes n/2 time.
The WORST time complexity is: n + 1 + n/2.
The third solution is to use a binary search tree. Using a tree, we avoid shifting array. But using the binary search tree to find the median is not very attractive. So I change the binary search tree in a way that it is always the case that the left subtree and the right subtree are balanced. This means that at any time, either the left subtree and the right subtree have the same number of nodes or the right subtree has one node more than in the left subtree. In other words, it is ensured that at any time, the root element is the median. Of course this requires changes in the way the tree is built. The technical detail is similar to rotating a red-black tree.
If the tree is maintained properly, it is ensured that the WORST time complexity is O(n).
So the three algorithms are all linear to the size of the set. If no sub-linear algorithm exists, the three algorithms can be thought as the optimal solutions. Since they don't differ from each other much, the best is the easiest to implement, which is the second one, using link list.
So what I really wonder is, will there be a sub-linear algorithm for this problem and if so what will it be like. Any ideas guys?
Steve.
Your complexity analysis is confusing. Let's say that n items total are added; we want to output the stream of n medians (where the ith in the stream is the median of the first i items) efficiently.
I believe this can be done in O(n*lg n) time using two priority queues (e.g. binary or fibonacci heap); one queue for the items below the current median (so the largest element is at the top), and the other for items above it (in this heap, the smallest is at the bottom). Note that in fibonacci (and other) heaps, insertion is O(1) amortized; it's only popping an element that's O(lg n).
This would be called an "online median selection" algorithm, although Wikipedia only talks about online min/max selection. Here's an approximate algorithm, and a lower bound on deterministic and approximate online median selection (a lower bound means no faster algorithm is possible!)
If there are a small number of possible values compared to n, you can probably break the comparison-based lower bound just like you can for sorting.
I received the same interview question and came up with the two-heap solution in wrang-wrang's post. As he says, the time per operation is O(log n) worst-case. The expected time is also O(log n) because you have to "pop an element" 1/4 of the time assuming random inputs.
I subsequently thought about it further and figured out how to get constant expected time; indeed, the expected number of comparisons per element becomes 2+o(1). You can see my writeup at http://denenberg.com/omf.pdf .
BTW, the solutions discussed here all require space O(n), since you must save all the elements. A completely different approach, requiring only O(log n) space, gives you an approximation to the median (not the exact median). Sorry I can't post a link (I'm limited to one link per post) but my paper has pointers.
Although wrang-wrang already answered, I wish to describe a modification of your binary search tree method that is sub-linear.
We use a binary search tree that is balanced (AVL/Red-Black/etc), but not super-balanced like you described. So adding an item is O(log n)
One modification to the tree: for every node we also store the number of nodes in its subtree. This doesn't change the complexity. (For a leaf this count would be 1, for a node with two leaf children this would be 3, etc)
We can now access the Kth smallest element in O(log n) using these counts:
def get_kth_item(subtree, k):
left_size = 0 if subtree.left is None else subtree.left.size
if k < left_size:
return get_kth_item(subtree.left, k)
elif k == left_size:
return subtree.value
else: # k > left_size
return get_kth_item(subtree.right, k-1-left_size)
A median is a special case of Kth smallest element (given that you know the size of the set).
So all in all this is another O(log n) solution.
We can difine a min and max heap to store numbers. Additionally, we define a class DynamicArray for the number set, with two functions: Insert and Getmedian. Time to insert a new number is O(lgn), while time to get median is O(1).
This solution is implemented in C++ as the following:
template<typename T> class DynamicArray
{
public:
void Insert(T num)
{
if(((minHeap.size() + maxHeap.size()) & 1) == 0)
{
if(maxHeap.size() > 0 && num < maxHeap[0])
{
maxHeap.push_back(num);
push_heap(maxHeap.begin(), maxHeap.end(), less<T>());
num = maxHeap[0];
pop_heap(maxHeap.begin(), maxHeap.end(), less<T>());
maxHeap.pop_back();
}
minHeap.push_back(num);
push_heap(minHeap.begin(), minHeap.end(), greater<T>());
}
else
{
if(minHeap.size() > 0 && minHeap[0] < num)
{
minHeap.push_back(num);
push_heap(minHeap.begin(), minHeap.end(), greater<T>());
num = minHeap[0];
pop_heap(minHeap.begin(), minHeap.end(), greater<T>());
minHeap.pop_back();
}
maxHeap.push_back(num);
push_heap(maxHeap.begin(), maxHeap.end(), less<T>());
}
}
int GetMedian()
{
int size = minHeap.size() + maxHeap.size();
if(size == 0)
throw exception("No numbers are available");
T median = 0;
if(size & 1 == 1)
median = minHeap[0];
else
median = (minHeap[0] + maxHeap[0]) / 2;
return median;
}
private:
vector<T> minHeap;
vector<T> maxHeap;
};
For more detailed analysis, please refer to my blog: http://codercareer.blogspot.com/2012/01/no-30-median-in-stream.html.
1) As with the previous suggestions, keep two heaps and cache their respective sizes. The left heap keeps values below the median, the right heap keeps values above the median. If you simply negate the values in the right heap the smallest value will be at the root so there is no need to create a special data structure.
2) When you add a new number, you determine the new median from the size of your two heaps, the current median, and the two roots of the L&R heaps, which just takes constant time.
3) Call a private threaded method to perform the actual work to perform the insert and update, but return immediately with the new median value. You only need to block until the heap roots are updated. Then, the thread doing the insert just needs to maintain a lock on the traversing grandparent node as it traverses the tree; this will ensue that you can insert and rebalance without blocking other inserting threads working on other sub-branches.
Getting the median becomes a constant time procedure, of course now you may have to wait on synchronization from further adds.
Rob
A balanced tree (e.g. R/B tree) with augmented size field should find the median in lg(n) time in the worst case. I think it is in Chapter 14 of the classic Algorithm text book.
To keep the explanation brief, you can efficiently augment a BST to select a key of a specified rank in O(h) by having each node store the number of nodes in its left subtree. If you can guarantee that the tree is balanced, you can reduce this to O(log(n)). Consider using an AVL which is height-balanced (or red-black tree which is roughly balanced), then you can select any key in O(log(n)). When you insert or delete a node into the AVL you can increment or decrement a variable that keeps track of the total number of nodes in the tree to determine the rank of the median which you can then select in O(log(n)).
In order to find the median in linear time you can try this (it just came to my mind). You need to store some values every time you add number to your set, and you won't need sorting. Here it goes.
typedef struct
{
int number;
int lesser;
int greater;
} record;
int median(record numbers[], int count, int n)
{
int i;
int m = VERY_BIG_NUMBER;
int a, b;
numbers[count + 1].number = n:
for (i = 0; i < count + 1; i++)
{
if (n < numbers[i].number)
{
numbers[i].lesser++;
numbers[count + 1].greater++;
}
else
{
numbers[i].greater++;
numbers[count + 1].lesser++;
}
if (numbers[i].greater - numbers[i].lesser == 0)
m = numbers[i].number;
}
if (m == VERY_BIG_NUMBER)
for (i = 0; i < count + 1; i++)
{
if (numbers[i].greater - numbers[i].lesser == -1)
a = numbers[i].number;
if (numbers[i].greater - numbers[i].lesser == 1)
b = numbers[i].number;
m = (a + b) / 2;
}
return m;
}
What this does is, each time you add a number to the set, you must now how many "lesser than your number" numbers have, and how many "greater than your number" numbers have. So, if you have a number with the same "lesser than" and "greater than" it means your number is in the very middle of the set, without having to sort it. In the case that you have an even amount of numbers you may have two choices for a median, so you just return the mean of those two. BTW, this is C code, I hope this helps.