I have an algorithm that runs in O(m) time. This algorithm takes in a set of data containing m entries.
The data is generated randomly by specifying a strictly positive integer input n. The number of entries generated is O(n log n).
Edit
Alone, the time complexity of generating the data is independent of n (or O(1)), which means given the integer n, the entries are instantly and randomly generated. The number of resulting entries is random, but is O(n log n). E.g. n = 10, then number of entries generated is some constant times 10 (log 10).
The data is generate before hand. Then the resulting m entries is fed into the algorithms as input.
Question
Can I then assume that the algorithm runs in O(n log n) time?
There are some ambiguities in your question that were either deliberately place to help you internalize the relationship between input size and run time complexity, or simple caused by miscommunication.
So as best as I can interpret this scenario:
Your algorithm complexity O(m) is linear with respect to m.
So since We assume that generating the data is independent of input. i.e. O(1)., your time-complexity is only dependent on some n that you specify that generates entries.
So yes, you can say that the algorithm runs in O(n log n) time, since it doesn't do anything with the input of size m.
In response to your updated question:
It's still hard to follow because some key words refer to different things. But in general I think this is what you are getting at:
You have a data set as input, that is size O(n log n), given some specific n.
This data set is used as input only, it's either pre-generated, or generated using some blackbox that runs in O(1) time regardless of what n is given to the blackbox. (We aren't interested in the blackbox for this question)
This data set is then fed to the algorithm that we are actually interested in analyzing.
The algorithm has time-complexity O(m), for an input of size m.
Since your input has size O(n log n) with respect to n, then by extension your O(m) linear-time algorithm has time complexity O(n log n), with respect to n.
To see the difference: Suppose your algorithm wasn't linear but rather quadratic O(m^2), then it would have time-complexity O(n^2 log^2 n) with respect to n.
Related
I have to construct an algorithm where it's upper bound is O(n2 log n). Can anyone provide any examples on what an O(n2 log n) algorithm would look like? I cannot seem to wrap my mind around it.
My mental image of it would be two nested for loops and within the second loop a log n operation is performed. Is this correct?
There are many ways to get a runtime of O(n2 log n) in an algorithm. Here's a sampler.
Sorting a list of n2 items efficiently. For example, if you take n items, form all n2 pairs of those items, and then sort them using something like heapsort, the runtime will be O(n2 log n2) = O(n2 log n). This follows from properties of logarithms: log n2 = 2 log n = O(log n). More generally, running an O(n log n)-time algorithm on an input of size n2 will give you an O(n2 log n) runtime.
Running Dijkstra's algorithm on a dense graph using a binary heap. The runtime of Dijkstra's algorithm on a graph with n nodes an m edges, using a binary heap, is O(m log n). A dense graph is one where m = Θ(n2), so Dijkstra's algorithm would take time O(n2 log n) in this case. This is also the time bound for running some other graph algorithms on dense graphs, such as Prim's algorithm when using a binary heap.
Certain divide-and-conquer algorithms. A divide-and-conquer algorithm whose recurrence is T(n) = 2T(n / √2) + O(n2) has a runtime of O(n2 log n). This comes up, for example, as a subroutine in the Karger-Stein minimum cut algorithm.
Performing n2 searches on a binary tree of n items. The cost of each search is O(log n), so this would work out to O(n2 log n) total work. More generally, doing any O(log n)-time operation a total of O(n2) times will give you this bound.
Naive construction of a suffix array. A suffix array is an array of all the suffixes of a string in sorted order. Naively sorting the suffixes requires O(n log n) comparisons, but since comparing two suffixes can take time O(n), the total cost is O(n2 log n).
Constructing a 2D range tree. The range tree data structure allows for fast querying of all points in k-D space within an axis-aligned box. In two dimensions, the construction time is O(n2 log n), though this can be improved to O(n log n) using some more clever techniques.
This is, of course, not a comprehensive list, but it gives a sampler of where O(n2 log n) runtimes pop up in practice.
Technically, any algorithm which is asymptotically faster than n^2 log n is called O(n^2 log n). Examples include "do nothing" algorithm Theta(1), binary search Theta(log n), linear search Theta(n), bubble sort Theta(n^2).
The algorithm you describe would be O(n^2 log n) too while also being Omega(n^2 log n) and thus Theta(n^2 log n):
for i in range(n):
for j in range(n):
# binary search in array of size n
One approach to constructing a O(n2 log n) algorithm is to start with a O(n3) algorithm and optimize it so one of the loops runs in log n steps instead of n.
That could be non-trivial though, so searching Google turns up the question Why is the Big-O of this algorithm N^2*log N? The problem there is:
Fill array a from a[0] to a[n-1]: generate random numbers until you
get one that is not already in the previous indexes.
Even though there are faster algorithms to solve this problem, the one presented is O(n2 log n).
Stumbled across a (terrible) algorithm for computing the square root of a number. Got into a small argument about the time complexity. I assert that the time complexity is O(n^2) because for n input, it will be multiplied n times. My friend asserts that the time complexity is actually O(n). Who is right and why?
def squareRoot(x):
if x<0:
return "undefined"
elif x==0:
return 0
for i in range(0,x):
if(i*i)==x:
return i
It's O(n), because, in the worst case, you perform x multiplications and tests, so your computation time grows linear with your input.
First we need to know what the complexity of integer multiplication is.
For simplicity, we'll use the Schönhage–Strassen algorithm . The complexity is O(n log n log log n) for numbers with n digits.
Of course the number n actually has O(log n) digits, so to multiply numbers of value n, the time complexity is O(log n log log n log log log n)
There are n numbers of up to n size to multiply, so overall it is O(n log n log log n log log log n)
Thus O(n^2) was more correct, in the same way that O(n!) was correct. Correct but unhelpful. And O(n) is simply wrong, without several assumptions that were not made.
Do note that you can do better with better multiplication algorithms.
Lets us first see how this algo works.
In worst case, that is, x is not perfect square, and x is prime, the time complexity will be O(n).
In best case. that is the no. is perfect square, then it will be in order of SQRT(n).
Are there any famous algorithms with this complexity?
I was thinking maybe a skip list where levels of the nodes are not determined by the number of tails coin tosses, but instead are use a number generated randomly (with uniform distribution) from the (1,log(n)) period to determine the level of the node. Such a data structure would have a find(x) operation with the complexity of O(n/log(n)) (I think, at least). I was curious whether there was anything else.
It's common to see algorithms whose runtime is of the form O(nk / log n) or O(log n / log log n) when using the method of Four Russians to speed up an existing algorithm. The classic Four Russians speedup reduces the cost of doing a matrix/vector product on Boolean matrices from O(n2) to O(n2 / log n). The standard dynamic programming algorithm for sequence alignment on two length-n strings runs in time O(n2), which can be decreased to O(n2 / log n) by using a similar trick.
Similarly, the prefix parity problem - in which you need to maintain a sequence of Boolean values while supporting the "flip" and "parity of the prefix of a sequence" operations can be solved in time O(log n / log log n) by using a Four-Russians speedup. (Notice that if you express the runtime as a function of k = log n, this is O(k / log k).
I'm relatively new to the practice of determining algorithm runtimes using big-O notation and I have a question regarding the runtime of a sorting algorithm. Let's say I have a set of pairs (a, b) in an array and I sort the data using a known sorting algorithm that runs in O(n log n). Next, I take a subset of some number of the n data points and run the same sorting algorithm on that subset (so theoretically I could sort the entire array twice - the first sort would be comparing a's and the second set would be comparing b's). So in other words my code is
pairArray[n];
Sort(pairArray); //runs in O(n log n)
subsetArray[subset]; //where subset <= n
for (int i = 0; i < subset; i++) {
subsetArray[i] = pairArray[i];
}
Sort(subsetArray) //runs in O(n log n)
Is the runtime of this code still O(n log n)? I guess I have two questions: does running an O(something) sort twice increase complexity from the original "something", and does the iteration to reassign to a different array increase complexity? I'm more worried about the first one as the iteration can be eliminated with pointers.
Constact factors are ignored in big-O notation. Sorting twice is still O(n log n).
The loop with the assignment you are doing is an O(n) operation. This is also ignored. Only the largest term is mentioned in big-O notation.
If you want to decide which of two algorithms is better but their big-O is the same then you can use performance measurements on realistic data. When measuring actual performance you can see if one algorithm is typically twice as slow as another. This cannot be seen from the big-O notation.
Binary search has a average case performance as O(log n) and Quick Sort with O(n log n) is O(n log n) is same as O(n) + O(log n)
Imagine a database with with every person in the world. That's 6.7 billion entries. O(log n) is a lookup on an indexed column (e.g. primary key). O(n log n) is returning the entire population in sorted order on an unindexed column.
O(log n) was finished before you finished reading the first word of that sentence.
O(n log n) is still calculating...
Another way to imagine it:
log n is proportional to the number of digits in n.
n log n is n times greater.
Try writing the number 1000 once versus writing it one thousand times. The first takes O(log n) time, the second takes O(n log n) time.
Now try that again with 6700000000. Writing it once is still trivial. Now try writing it 6.7 billion times. Even if you could write it once per second you'd be dead before you finished.
You could visualize it in a plot, see here for example:
No, O(n log n) = O(n) * O(log n)
In mathematics, when you have an expression (i.e. e=mc^2), if there is no operator, then you multiply.
Normally the way to visualize O(n log n) is "do something which takes log n computations n times."
If you had an algorithm which first iterated over a list, then did a binary search of that list (which would be N + log N) you can express that simply as O(n) because the n dwarfs the log n for large values of n
A (log n) plot increases, but is concave downward, which means:
It increases when n gets larger
It's rate of increasing decreases
when n gets larger
A (n log n) plot increases, and is (slightly) concave upward, which means:
It increases when n gets larger
It's rate of increasing (slightly)
increases when n gets larger
Depends on whether you tend to visualize n as having a concrete value.
If you tend to visualize n as having a concrete value, and the units of f(n) are time or instructions, then O(log n) is n times faster than O(n log n) for a given task of size n. For memory or space units, then O(log n) is n times smaller for a given task of size n. In this case, you are focusing on the codomain of f(n) for some known n. You are visualizing answers to questions about how long something will take or how much memory will this operation consume.
If you tend to visualize n as a parameter having any value, then O(log n) is n times more scalable. O(log n) can complete n times as many tasks of size n. In this case, you are focused on the domain of f(n). You are visualizing answers to questions about how big n can get, or how many instances of f(n) you can run in parallel.
Neither perspective is better than the other. The former can be use to compare approaches to solving a specific problem. The latter can be used to compare the practical limitations of the given approaches.