A college instructor here. I am trying to find a meaningful (practical) code example to illustrate different time complexities for beginners in a ELi5 manner. The code should start with constant complexity and then incrementally, by adding small piece of code, increases in complexity: .., logn, n, nlogn, n^2, 2^n, ..
I think I can explain it better with one example that has small incremental changes rather than switch the context from searching to sorting to brute force algorithms .
Any example will be artificial. But here is one that does reasonably well.
Let vec be a sorted array of numbers, i an integer, and x be another number. In order answer the following questions.
O(1) What is the value of vec[i]?
O(n) Is x in a range from vec by linear search?
O(log(n)) Is x in a range from vec by binary search?
O(n^2) Is x the sum of two elements in a range from of vec by a double loop?
O(n log(n)) Is x the sum of two elements of vec by linear search on the first with a binary search on the second. (Simplifying trick, do a linear search on the smaller and binary on the second. then reuse your code from 3.)
O(2^n) Is x the sum of any subset of elements of vec by recursion?
(pseudopolynomial) Memoize the previous solution. Discuss memory vs speed tradeoffs.
I have the following question as part of my revision for a final exam:
For each of the following problems give the worst-case running time in
Big-O notation.
(i) Adding n numbers
(ii) Finding the minimum of n numbers in an unordered array
(iii) Finding an item in a binary heap
(iv) Sorting an unordered list of items using merge sort
(v) Finding the median (The value of a numerical set that equally divides
the number of values that are larger and smaller) of an array of sorted
items
Are my current ideas correct?
(i) This would be O(n) because you are adding n numbers.
(ii) This again would be O(n). You have to check every element in this list.
(iii) Not 100% sure here, but i assume it would be worst case O(n log n) as most things are with binary heaps.
(iv) This would be O(n log n)
(v) Again i am not sure on this, maybe O(log n) since the array is sorted so you only need to search half the values, essentially a binary chop.
Could anybody point me in the right direction if any of my answers are incorrect.
Thanks,
Chris.
(v) Finding the median (The value of a numerical set that equally divides
the number of values that are larger and smaller) of an array of sorted
items
(v) Again i am not sure on this, maybe O(log n) since the array is sorted so you only need to search half the values, essentially a binary chop.
This one is O(1). You are interested in the item that is in the middle of the sorted array (for N odd), or the average of the two "closest" to the middle (for N even).
Since the data is ordered, you can simply examine the one or two elements needed for the result.
(iii) Finding an item in a binary heap
(iii) Not 100% sure here, but i assume it would be worst case O(n log
n) as most things are with binary heaps.
This is actually O(N) as it can be solved by a traversal of the binary tree the heap is built on.
Given an array of integers and some query operations.
The query operations are of 2 types
1.Update the value of the ith index to x.
2.Given 2 integers find the kth minimum in that range.(Ex if the 2 integers are i and j ,we have to find out the kth minimum between i and j both inclusive).
I can find the Range minimum query using segment tree but could no do so for the kth minimum.
Can anyone help me?
Here is a O(polylog n) per query solution that does actually not assume a constant k, so the k can vary between queries. The main idea is to use a segment tree, where every node represents an interval of array indices and contains a multiset (balanced binary search tree) of the values in the represened array segment. The update operation is pretty straightforward:
Walk up the segment tree from the leaf (the array index you're updating). You will encounter all nodes that represent an interval of array indices that contain the updated index. At every node, remove the old value from the multiset and insert the new value into the multiset. Complexity: O(log^2 n)
Update the array itself.
We notice that every array element will be in O(log n) multisets, so the total space usage is O(n log n). With linear-time merging of multisets we can build the initial segment tree in O(n log n) as well (there's O(n) work per level).
What about queries? We are given a range [i, j] and a rank k and want to find the k-th smallest element in a[i..j]. How do we do that?
Find a disjoint coverage of the query range using the standard segment tree query procedure. We get O(log n) disjoint nodes, the union of whose multisets is exactly the multiset of values in the query range. Let's call those multisets s_1, ..., s_m (with m <= ceil(log_2 n)). Finding the s_i takes O(log n) time.
Do a select(k) query on the union of s_1, ..., s_m. See below.
So how does the selection algorithm work? There is one really simple algorithm to do this.
We have s_1, ..., s_n and k given and want to find the smallest x in a, such that s_1.rank(x) + ... + s_m.rank(x) >= k - 1, where rank returns the number of elements smaller than x in the respective BBST (this can be implemented in O(log n) if we store subtree sizes).
Let's just use binary search to find x! We walk through the BBST of the root, do a couple of rank queries and check whether their sum is larger than or equal to k. It's a predicate monotone in x, so binary search works. The answer is then the minimum of the successors of x in any of the s_i.
Complexity: O(n log n) preprocessing and O(log^3 n) per query.
So in total we get a runtime of O(n log n + q log^3 n) for q queries. I'm sure we could get it down to O(q log^2 n) with a cleverer selection algorithm.
UPDATE: If we are looking for an offline algorithm that can process all queries at once, we can get O((n + q) * log n * log (q + n)) using the following algorithm:
Preprocess all queries, create a set of all values that ever occured in the array. The number of those will be at most q + n.
Build a segment tree, but this time not on the array, but on the set of possible values.
Every node in the segment tree represents an interval of values and maintains a set of positions where these values occurs.
To answer a query, start at the root of the segment tree. Check how many positions in the left child of the root lie in the query interval (we can do that by doing two searches in the BBST of positions). Let that number be m. If k <= m, recurse into the left child. Otherwise recurse into the right child, with k decremented by m.
For updates, remove the position from the O(log (q + n)) nodes that cover the old value and insert it into the nodes that cover the new value.
The advantage of this approach is that we don't need subtree sizes, so we can implement this with most standard library implementations of balanced binary search trees (e.g. set<int> in C++).
We can turn this into an online algorithm by changing the segment tree out for a weight-balanced tree such as a BB[α] tree. It has logarithmic operations like other balanced binary search trees, but allows us to rebuild an entire subtree from scratch when it becomes unbalanced by charging the rebuilding cost to the operations that must have caused the imbalance.
If this is a programming contest problem, then you might be able to get away with the following O(n log(n) + q n^0.5 log(n)^1.5)-time algorithm. It is set up to use the C++ STL well and has a much better big-O constant than Niklas's (previous?) answer on account of using much less space and indirection.
Divide the array into k chunks of length n/k. Copy each chunk into the corresponding locations of a second array and sort it. To update: copy the chunk that changed into the second array and sort it again (time O((n/k) log(n/k)). To query: copy to a scratch array the at most 2 (n/k - 1) elements that belong to a chunk partially overlapping the query interval. Sort them. Use one of the answers to this question to select the element of the requested rank out of the union of the sorted scratch array and fully overlapping chunks, in time O(k log(n/k)^2). The optimum setting of k in theory is (n/log(n))^0.5. It's possible to shave another log(n)^0.5 using the complicated algorithm of Frederickson and Johnson.
perform a modification of the bucket sort: create a bucket that contains the numbers in the range you want and then sort this bucket only and find the kth minimum.
Damn, this solution can't update an element but at least finds that k-th element, here you'll get some ideas so you can think of some solution that provides update. Try pointer-based B-trees.
This is O(n log n) space and O(q log^2 n) time complexity. Later I explained the same with O(log n) per query.
So, you'll need to do the next:
1) Make a "segment tree" over given array.
2) For every node, instead of storing one number, you would store a whole array. The size of that array has to be equal to the number of it's children. That array (as you guessed) has to contain the values of the bottom nodes (children, or the numbers from that segment), but sorted.
3) To make such an array, you would merge two arrays from its two sons from segment tree. But not only that, for every element from the array you have just made (by merging), you need to remember the position of the number before its insertion in merged array (basically, the array from which it comes, and position in it). And a pointer to the first next element that is not inserted from the same array.
4) With this structure, you can check how many numbers there are that are lower than given value x, in some segment S. You find (with binary search) the first number in the array of the root node that is >= x. And then, using the pointers you have made, you can find the results for the same question for two children arrays (arrays of nodes that are children to the previous node) in O(1). You stop to operate this descending for each node that represents the segment that is whole either inside or outside of given segment S. The time complexity is O(log n): O(log n) to find the first element that is >=x, and O(log n) for all segments of decomposition of S.
5) Do a binary search over solution.
This was solution with O(log^2 n) per query. But you can reduce to O(log n):
1) Before doing all I wrote above, you need to transform the problem. You need to sort all numbers and remember the positions for each in original array. Now these positions are representing the array you are working on. Call that array P.
If bounds of the query segment are a and b. You need to find the k-th element in P that is between a and b by value (not by index). And that element represents the index of your result in original array.
2) To find that k-th element, you would do some type of back-tracking with complexity of O(log n). You will be asking the number of elements between index 0 and (some other index) that are between a and b by value.
3) Suppose that you know the answer for such a question for some segment (0,h). Get answers on same type of questions for all segments in tree that begin on h, starting from the greatest one. Keep getting those answers as long as the current answer (from segment (0,h)) plus the answer you got the last are greater than k. Then update h. Keep updating h, until there is only one segment in tree that begins with h. That h is the index of the number you are looking for in the problem you have stated.
To get the answer to such a question for some segment from tree you will spend exactly O(1) of time. Because you already know the answer of it's parent's segment, and using the pointers I explained in the first algorithm you can get the answer for the current segment in O(1).
Preprocess an array A in O(n log n) time so that you can answer queries of the form
findmax(i,j): find the maximum value in an interval [i; j] (that is, the maximum value
among the array elements A[i],A[i + 1],...,A[j]) in O(1)) time per query.
Additional question: Show how to preprocess in O(n) time so that you can answer the above queries in O(log n) time.
The problem is known as range minimum (maximum) query - RMQ. The link basically answers both of your questions.
The classic solutions are dynamic programming and segment trees.
I have a situation (where performance is critical) to check if a customer account number fits in anyone of a set of valid number ranges(about a thousand of them). What would be the most efficient way to do that? How (and where)store the ranges, how search through them?
You should consider using the Segment Tree to do this.
A segment tree for a set I of n intervals uses O(n log n) storage and can be built in O(n log n) time. Segment trees support searching for all the intervals that contain a query point in O(log n + k), k being the number of retrieved intervals or segments.[1]
So it's really efficient doing the range queries.