n(logn) Algorithms problems - sorting

So I was just finishing up reading on time complexity, and I ran into a few questions that I can't solve.
"You’re given an array of n integers, and must answer a series of n
queries, each of the form: “how many elements of the array have value between
L and R?”, where L and R are integers. Design an O(n log n) algorithm that
answers all of these queries."
Thanks.

You can use segment-tree.
Or a balanced tree such that avl and store in each node its left and right children. In each query find the L and R in O(logn) and calculate the mid-renge nodes. Construction takes O(n logn). Also for n queries you reach O(n logn).
Or a BIT with a approach same as second idea.

Related

Example of an algorithm whose runtime is O(n^2 log n)?

I have to construct an algorithm where it's upper bound is O(n2 log n). Can anyone provide any examples on what an O(n2 log n) algorithm would look like? I cannot seem to wrap my mind around it.
My mental image of it would be two nested for loops and within the second loop a log n operation is performed. Is this correct?
There are many ways to get a runtime of O(n2 log n) in an algorithm. Here's a sampler.
Sorting a list of n2 items efficiently. For example, if you take n items, form all n2 pairs of those items, and then sort them using something like heapsort, the runtime will be O(n2 log n2) = O(n2 log n). This follows from properties of logarithms: log n2 = 2 log n = O(log n). More generally, running an O(n log n)-time algorithm on an input of size n2 will give you an O(n2 log n) runtime.
Running Dijkstra's algorithm on a dense graph using a binary heap. The runtime of Dijkstra's algorithm on a graph with n nodes an m edges, using a binary heap, is O(m log n). A dense graph is one where m = Θ(n2), so Dijkstra's algorithm would take time O(n2 log n) in this case. This is also the time bound for running some other graph algorithms on dense graphs, such as Prim's algorithm when using a binary heap.
Certain divide-and-conquer algorithms. A divide-and-conquer algorithm whose recurrence is T(n) = 2T(n / √2) + O(n2) has a runtime of O(n2 log n). This comes up, for example, as a subroutine in the Karger-Stein minimum cut algorithm.
Performing n2 searches on a binary tree of n items. The cost of each search is O(log n), so this would work out to O(n2 log n) total work. More generally, doing any O(log n)-time operation a total of O(n2) times will give you this bound.
Naive construction of a suffix array. A suffix array is an array of all the suffixes of a string in sorted order. Naively sorting the suffixes requires O(n log n) comparisons, but since comparing two suffixes can take time O(n), the total cost is O(n2 log n).
Constructing a 2D range tree. The range tree data structure allows for fast querying of all points in k-D space within an axis-aligned box. In two dimensions, the construction time is O(n2 log n), though this can be improved to O(n log n) using some more clever techniques.
This is, of course, not a comprehensive list, but it gives a sampler of where O(n2 log n) runtimes pop up in practice.
Technically, any algorithm which is asymptotically faster than n^2 log n is called O(n^2 log n). Examples include "do nothing" algorithm Theta(1), binary search Theta(log n), linear search Theta(n), bubble sort Theta(n^2).
The algorithm you describe would be O(n^2 log n) too while also being Omega(n^2 log n) and thus Theta(n^2 log n):
for i in range(n):
for j in range(n):
# binary search in array of size n
One approach to constructing a O(n2 log n) algorithm is to start with a O(n3) algorithm and optimize it so one of the loops runs in log n steps instead of n.
That could be non-trivial though, so searching Google turns up the question Why is the Big-O of this algorithm N^2*log N? The problem there is:
Fill array a from a[0] to a[n-1]: generate random numbers until you
get one that is not already in the previous indexes.
Even though there are faster algorithms to solve this problem, the one presented is O(n2 log n).

Merging search trees and finding elements in them

I've got this problem in my theoretical homework on algorithms and data structures related to search trees:
Given n numbers a1, ..., an, initially each in its own set. There are two types of queries:
unite two sets;
find the smallest element bigger than x in a specific set.
In these queries, set is specified by one of its elements' index in {ai}. The task is to process q queries in O(n + q log(n)) time.
I've tried using AVL trees to store sets' elements, but this approach results in O(n log(n)) or O(n) merge time, so the overall time complexity requirement is not satisfied. At the moment I have only these few ideas (but actually they don't quite help):
There are at most n unite queries.
If q > n, eventually, we'll need to build a search tree containing all n elements of {ai} to process the last (q - n) queries of type (2). Thus, it seems to be reasonable to first solve the problem with q ≤ n and then naturally extend the solution to q > n.
To create a set containing (k + 1) elements at least k merge operations is needed (this is easy to prove by mathematical induction), so at each step of processing queries we need work with "not-so-big" sets only. This might yield some tight asymptotic estimates.
Probably there is a way to somehow scan several first queries before processing them, understand which sets are involved in type (2) queries, and merge them only, ignoring other unite requests.
There is no memory limit, so this might be abused in some way.
Actually your solution of using self-balancing binary search trees to represent the sets was correct, and your ideas (1) - (3) are essential to achieve a tighter asymptotic bound.
Setting up the sets initially is O(n), and searching (finding the smallest element larger than some x) within each set is O(log n), so q searches has a cost of O(q log n).
Now let's consider the merge operations. To merge two binary search trees of size a and b, insert all elements of the smaller tree into the larger tree. This can be done in O(min(a,b)*log(max(a,b)+1)).
But what is the complexity of q successive merge operations, if we start with singleton sets? We can prove by induction that for q < n, the cost is O(q log n). (As you have noted, there cannot be any more merge operations apart from merging a set with itself, which is a no-op.)
So the cost of q merge operations is the cost of q-1 merges plus the cost of the last merge. By the inductive hypothesis, the cost of q-1 merges is O((q-1)log n).
The cost of the last merge is O(min(a,b)*log(max(a,b)+1)). But a and b are less than q, so for the last merge we get an upper bound of O(q * log(q + 1)). Since q < n, this is a subset of O(q log n). So the total cost of q merge operations is O((q-1) log n + q log n) = O(q log n).
Therefore, the total complexity is bounded by O(n + q log n).

Best way to join two red-black trees

The easiest way is to store two trees in two arrays, merge them and build a new red-black tree with a sorted array which takes O(m + n) times.
Is there an algorithm with less time complexity?
You can merge two red-black trees in time O(m log(n/m + 1)) where n and m are the input sizes and, WLOG, m ≤ n. Notice that this bound is tighter than O(m+n). Here's some intuition:
When the two trees are similar in size (m ≈ n), the bound is approximately O(m) = O(n) = O(n + m).
When one tree is significantly larger than the other (m ≪ n), the bound is approximately O(log n).
You can find a brief description of the algorithm here. A more in-depth description which generalizes to other balancing schemes (AVL, BB[α], Treap, ...) can be found in a recent paper.
I think that if you have a generic Sets (so generic red-black tree) you can't choose the solution which was suggested #Sam Westrick. Because he assumes that all elements in the first set are less then the elements in the second set. Also into the Cormen (the best book to learn algorithm and data structures) specifies this condition to join two red-black tree.
Due to the fact that you need to compare each element in both m and n Red-Black Trees, you will have to deal with a minimum of O(m+n) time complexity, there's a way to do it O(1) space complexity, but that is something else which has nothing to do with your qu. if you are not iterating and checking each element in each Red-Black Tree, you cannot guarantee that your new Red-Black Tree will be sorted. I can think of another way of merging two Red-Black Trees, which called "In-Place Merge using DLL", but this one would also result O(m+n) time complexity.
Convert the given two Red-Black Trees into Doubly Linked List, which has O(m+n) time complexity.
Merge the two sorted Linked Lists, which has O(m+n) time complexity.
Build a Balanced Red-Black Tree from the merged list created in step 2, which has O(m+n) time complexity.
Time complexity of this method is also O(m+n).
So due to the fact you have to compare the elements each Tree with the other elements of the other Tree, you will have to end up with at least O(m+n).

Find minimum element of a subsequence

Given a sequence S of n integer elements, I need a function min(i,j) that finds the minimum element of the sequence between index i and index j (both inclusive) such that:
Initialization takes O(n);
Memory space O(n);
min(i,j) takes O(log(n)).
Please suggest an algorithm for this.
Segmenttree is that what you need because it fulfils all your requirements.
Initialisation takes O(n) with Segment Tree
Memory is also O(n)
Queries can be done in O(log n)
Beside this, the tree is dynamic and can support updating in O(log n). This means one can modify the element of some element i in O(log n) and still retrieve the minimum.
This TopCoder tutorial: An < O(n), O(1) > approach discusses your problem in a more detail way. In the notation, means the approach takes f(n) complexity to setup, and g(n) complexity to query.
Also, this post chews the algorithm again: Range Minimum Query <O(n), O(1)> approach (from tree to restricted RMQ).
Hope them clarifies your question :)
Segment tree is just what you need(it can be build in O(n) time and one query takes O(log n) time).
Here is an article about it: http://wcipeg.com/wiki/Segment_tree.
Even though there is an algorithm that uses O(n) time for initialization and O(1) time per query, segment tree can be a good choice because it is much simpler.

How to calculate order (big O) for more complex algorithms (eg quicksort)

I know there are quite a bunch of questions about big O notation, I have already checked:
Plain english explanation of Big O
Big O, how do you calculate/approximate it?
Big O Notation Homework--Code Fragment Algorithm Analysis?
to name a few.
I know by "intuition" how to calculate it for n, n^2, n! and so, however I am completely lost on how to calculate it for algorithms that are log n , n log n, n log log n and so.
What I mean is, I know that Quick Sort is n log n (on average).. but, why? Same thing for merge/comb, etc.
Could anybody explain me in a not too math-y way how do you calculate this?
The main reason is that Im about to have a big interview and I'm pretty sure they'll ask for this kind of stuff. I have researched for a few days now, and everybody seem to have either an explanation of why bubble sort is n^2 or the unreadable explanation (for me) on Wikipedia
The logarithm is the inverse operation of exponentiation. An example of exponentiation is when you double the number of items at each step. Thus, a logarithmic algorithm often halves the number of items at each step. For example, binary search falls into this category.
Many algorithms require a logarithmic number of big steps, but each big step requires O(n) units of work. Mergesort falls into this category.
Usually you can identify these kinds of problems by visualizing them as a balanced binary tree. For example, here's merge sort:
6 2 0 4 1 3 7 5
2 6 0 4 1 3 5 7
0 2 4 6 1 3 5 7
0 1 2 3 4 5 6 7
At the top is the input, as leaves of the tree. The algorithm creates a new node by sorting the two nodes above it. We know the height of a balanced binary tree is O(log n) so there are O(log n) big steps. However, creating each new row takes O(n) work. O(log n) big steps of O(n) work each means that mergesort is O(n log n) overall.
Generally, O(log n) algorithms look like the function below. They get to discard half of the data at each step.
def function(data, n):
if n <= constant:
return do_simple_case(data, n)
if some_condition():
function(data[:n/2], n / 2) # Recurse on first half of data
else:
function(data[n/2:], n - n / 2) # Recurse on second half of data
While O(n log n) algorithms look like the function below. They also split the data in half, but they need to consider both halves.
def function(data, n):
if n <= constant:
return do_simple_case(data, n)
part1 = function(data[n/2:], n / 2) # Recurse on first half of data
part2 = function(data[:n/2], n - n / 2) # Recurse on second half of data
return combine(part1, part2)
Where do_simple_case() takes O(1) time and combine() takes no more than O(n) time.
The algorithms don't need to split the data exactly in half. They could split it into one-third and two-thirds, and that would be fine. For average-case performance, splitting it in half on average is sufficient (like QuickSort). As long as the recursion is done on pieces of (n/something) and (n - n/something), it's okay. If it's breaking it into (k) and (n-k) then the height of the tree will be O(n) and not O(log n).
You can usually claim log n for algorithms where it halves the space/time each time it runs. A good example of this is any binary algorithm (e.g., binary search). You pick either left or right, which then axes the space you're searching in half. The pattern of repeatedly doing half is log n.
For some algorithms, getting a tight bound for the running time through intuition is close to impossible (I don't think I'll ever be able to intuit a O(n log log n) running time, for instance, and I doubt anyone will ever expect you to). If you can get your hands on the CLRS Introduction to Algorithms text, you'll find a pretty thorough treatment of asymptotic notation which is appropriately rigorous without being completely opaque.
If the algorithm is recursive, one simple way to derive a bound is to write out a recurrence and then set out to solve it, either iteratively or using the Master Theorem or some other way. For instance, if you're not looking to be super rigorous about it, the easiest way to get QuickSort's running time is through the Master Theorem -- QuickSort entails partitioning the array into two relatively equal subarrays (it should be fairly intuitive to see that this is O(n)), and then calling QuickSort recursively on those two subarrays. Then if we let T(n) denote the running time, we have T(n) = 2T(n/2) + O(n), which by the Master Method is O(n log n).
Check out the "phone book" example given here: What is a plain English explanation of "Big O" notation?
Remember that Big-O is all about scale: how much more operation will this algorithm require as the data set grows?
O(log n) generally means you can cut the dataset in half with each iteration (e.g. binary search)
O(n log n) means you're performing an O(log n) operation for each item in your dataset
I'm pretty sure 'O(n log log n)' doesn't make any sense. Or if it does, it simplifies down to O(n log n).
I'll attempt to do an intuitive analysis of why Mergesort is n log n and if you can give me an example of an n log log n algorithm, I can work through it as well.
Mergesort is a sorting example that works through splitting a list of elements repeatedly until only elements exists and then merging these lists together. The primary operation in each of these merges is comparison and each merge requires at most n comparisons where n is the length of the two lists combined. From this you can derive the recurrence and easily solve it, but we'll avoid that method.
Instead consider how Mergesort is going to behave, we're going to take a list and split it, then take those halves and split it again, until we have n partitions of length 1. I hope that it's easy to see that this recursion will only go log (n) deep until we have split the list up into our n partitions.
Now that we have that each of these n partitions will need to be merged, then once those are merged the next level will need to be merged, until we have a list of length n again. Refer to wikipedia's graphic for a simple example of this process http://en.wikipedia.org/wiki/File:Merge_sort_algorithm_diagram.svg.
Now consider the amount of time that this process will take, we're going to have log (n) levels and at each level we will have to merge all of the lists. As it turns out each level will take n time to merge, because we'll be merging a total of n elements each time. Then you can fairly easily see that it will take n log (n) time to sort an array with mergesort if you take the comparison operation to be the most important operation.
If anything is unclear or I skipped somewhere please let me know and I can try to be more verbose.
Edit Second Explanation:
Let me think if I can explain this better.
The problem is broken into a bunch of smaller lists and then the smaller lists are sorted and merged until you return to the original list which is now sorted.
When you break up the problems you have several different levels of size first you'll have two lists of size: n/2, n/2 then at the next level you'll have four lists of size: n/4, n/4, n/4, n/4 at the next level you'll have n/8, n/8 ,n/8 ,n/8, n/8, n/8 ,n/8 ,n/8 this continues until n/2^k is equal to 1 (each subdivision is the length divided by a power of 2, not all lengths will be divisible by four so it won't be quite this pretty). This is repeated division by two and can continue at most log_2(n) times, because 2^(log_2(n) )=n, so any more division by 2 would yield a list of size zero.
Now the important thing to note is that at every level we have n elements so for each level the merge will take n time, because merge is a linear operation. If there are log(n) levels of the recursion then we will perform this linear operation log(n) times, therefore our running time will be n log(n).
Sorry if that isn't helpful either.
When applying a divide-and-conquer algorithm where you partition the problem into sub-problems until it is so simple that it is trivial, if the partitioning goes well, the size of each sub-problem is n/2 or thereabout. This is often the origin of the log(n) that crops up in big-O complexity: O(log(n)) is the number of recursive calls needed when partitioning goes well.

Resources