Complex algorithmic analysis - algorithm

So, in my data structures class we recently learned about algorithmic analysis and Big-O analysis. We really only applied it to sorting algorithms so far, which are relatively simple to analyze. I was curious about how to analyze more complex algorithms.
For example, I wrote this python algorithm for a program I'm working on to read all bytes in from a file and divide them into blocks using 4 byte tags that separate the data. Each tag starts with an 'h' and I have a separate list of the possible tags that I use when determining if a 4 byte sequence is a tag. The algorithm is defined below
data = file.read()
blocks = []
tagIndexes = []
i = data.index(b'h')
try:
while 1:
if data[i:i+4] in tags:
tagIndexes += [i]
i = data.index(b'h', i+1)
except ValueError:
pass
for j in range(len(tagIndexes) - 1):
index = tagIndexes[j]
nextIndex = tagIndexes[j+1]
blocks += [block(data[index:index+4], data[index+4:nextIndex])]
lastIndex = tagIndexes[len(tagIndexes) - 1]
blocks += [block(data[lastIndex:lastIndex+4], data[lastIndex+4:])]
return blocks
I'm not asking about comments on how I might improve the algorithm. I'll be able to do that myself if necessary later. My question is how might I determine the worst case scenario or the Big-O notation for this algorithm. There are several sub-algorithms within it, and it's quite easy to see the worst case for most of the smaller ones. For example, the worst case of python's list.index(val) method would be if there were none of the specified values in the list, in which case it would just loop through the whole thing and raise an error O(n). However, the worst case of the loop around that method would be if every byte was an 'h' O(n). But in that case each call to data.index() would be very fast and immediately return a value O(1). And then the worst case for the second loop would be if every 4 bytes was a tag O(n/4).
How do I analyze this for the worst possible case encompassing the whole algorithm, not just the parts?

The two most important tips for this analysis are:
Remember that only the most dominant summand matters and that constant factors can be ignored.
Analyze the loops from the inside out.
So the steps are:
The first 4 lines are all in O(n).
The inside of the while loop is in O(1+k) = O(k):
in tags is in O(t) with t being the number of known tags. As that number is not related to n, this is the same as O(1).
tagIndexes += [i] is in O(1) [source]
data.index() is in O(k), with k being the average distance of tags in the input data
The number of loop iterations is n/k. Now you mulitply the number of iterations with the cost of one iteration and you have O(n) for the first loop.
The inside of the for loop is in O(k) (assuming block(a,b) is O(len(a)+len(b))):
The first two index accesses are O(1)
data[index+4:nextIndex] is O(k-4) = O(k). block(...) is also O(k). Thats actually 2 k, but as we can ignore constant factors, this whole line is O(k).
The loop once again runs n/k times, so it’s also in O(n).
So the total time of the algorithm is O(n) as once again constant factors and all smaller summands are are ignored.
Hope that helps – leave a comment if you have any questions.
Apart from that, here are two small style-related pointers for your code:
It’s while True, not while 1.
You access the last element of a list via list[-1], not list[len(list)-1].

What you already realized is that while finding worst case scenarios for each part of the algorithm is relatively easy, they sometimes are somewhat incompatible, i.e. they can't occur together. But worst case analysis doesn't care about that; its purpose is to give you an upper bound on the running time/space usage, so it is alright if the bound you end up with is so bad that it will never actually happen. We always are interested in as tight a bound as possible, but sometimes there is no really tight bound that can be proven by a straightforward approach; in those cases, the analysis tends to end up worse than the algorithm actually performs, and that's why worst case analysis is often times called a "pessimistic" approach.
To deal with this problem of worst case bounds not being as tight as we would like them to be, there are a couple approaches you can try: there's amortized analysis, which deals with the cost of sequences of operations, which sometimes gives tighter bounds than just looking at a single operation; there's probabilistic analysis, which uses probabilities to see whether the frequency with which the expensive stuff needs to be done can help to bring down the expected running time/space usage; and there's average case analysis, which is basically probabilistic analysis where you assume that all possible inputs have the same probability. Those three methods have varying difficulty depending on the algorithm at hand and tend to be used for more advanced algorithms and data structures, because those often times don't lend themselves to the straightforward worst case approach.
So to sum up: worst case analysis isn't always going to give you tight bounds, that's why it's sometimes called pessimistic. There are other analysis techniques you can use to try and get tighter bounds, but even those won't always help. Given that you just started out with algorithms and their analysis, I very much doubt you have heard of those techniques before, let alone applied them, so going with your worst case analysis is perfectly fine, don't worry about the result not being a tight bound.

Related

Time Complexity (Big O) - Can value of N decides whether the time complexity is O(1) or O(N) when we have 2 nested FOR loops?

Suppose that I have 2 nested for loops, and 1 array of size N as shown in my code below:
int result = 0;
for( int i = 0; i < N ; i++)
{
for( int j = i; j < N ; j++)
{
result = array[i] + array[j]; // just some funny operation
}
}
Here are 2 cases:
(1) if the constraint is that N >= 1,000,000 strictly, then we can definitely say that the time complexity is O(N^2). This is true for sure as we all know.
(2) Now, if the constraint is that N < 25 strictly, then people could probably say that because we know that definitely, N is always too small, the time complexity is estimated to be O(1) since it takes very little time to run and complete these 2 for loops WITH MODERN COMPUTERS ? Does that sound right ?
Please tell me if the value of N plays a role in deciding the outcome of the time complexity O(N) ? If yes, then how big the value N needs to be in order to play that role (1,000 ? 5,000 ? 20,000 ? 500,000 ?) In other words, what is the general rule of thumb here ?
INTERESTING THEORETICAL QUESTION: If 15 years from now, the computer is so fast that even if N = 25,000,000, these 2 for loops can be completed in 1 second. At that time, can we say that the time complexity would be O(1) even for N = 25,000,000 ? I suppose the answer would be YES at that time. Do you agree ?
tl:dr No. The value of N has no effect on time complexity. O(1) versus O(N) is a statement about "all N" or how the amount of computation increases when N increases.
Great question! It reminds me of when I was first trying to understand time complexity. I think many people have to go through a similar journey before it ever starts to make sense so I hope this discussion can help others.
First of all, your "funny operation" is actually funnier than you think since your entire nested for-loops can be replaced with:
result = array[N - 1] + array[N - 1]; // just some hilarious operation hahaha ha ha
Since result is overwritten each time, only the last iteration effects the outcome. We'll come back to this.
As far as what you're really asking here, the purpose of Big-O is to provide a meaningful way to compare algorithms in a way that is indenependent of input size and independent of the computer's processing speed. In other words, O(1) versus O(N) has nothing to with the size of N and nothing to do with how "modern" your computer is. That all effects execution time of the algorithm on a particular machine with a particular input, but does not effect time complexity, i.e. O(1) versus O(N).
It is actually a statement about the algorithm itself, so a math discussion is unavoidable, as dxiv has so graciously alluded to in his comment. Disclaimer: I'm going to omit certain nuances in the math since the critical stuff is already a lot to explain and I'll defer to the mountains of complete explanations elsewhere on the web and textbooks.
Your code is a great example to understand what Big-O does tell us. The way you wrote it, its complexity is O(N^2). That means that no matter what machine or what era you run your code in, if you were to count the number of operations the computer has to do, for each N, and graph it as a function, say f(N), there exists some quadratic function, say g(N)=9999N^2+99999N+999 that is greater than f(N) for all N.
But wait, if we just need to find big enough coefficients in order for g(N) to be an upper bound, can't we just claim that the algorithm is O(N) and find some g(N)=aN+b with gigantic enough coefficients that its an upper bound of f(N)??? THE ANSWER TO THIS IS THE MOST IMPORTANT MATH OBSERVATION YOU NEED TO UNDERSTAND TO REALLY UNDERSTAND BIG-O NOTATION. Spoiler alert. The answer is no.
For visuals, try this graph on Desmos where you can adjust the coefficients:[https://www.desmos.com/calculator/3ppk6shwem][1]
No matter what coefficients you choose, a function of the form aN^2+bN+c will ALWAYS eventually outgrow a function of the form aN+b (both having positive a). You can push a line as high as you want like g(N)=99999N+99999, but even the function f(N)=0.01N^2+0.01N+0.01 crosses that line and grows past it after N=9999900. There is no linear function that is an upper bound to a quadratic. Similarly, there is no constant function that is an upper bound to a linear function or quadratic function. Yet, we can find a quadratic upper bound to this f(N) such as h(N)=0.01N^2+0.01N+0.02, so f(N) is in O(N^2). This observation is what allows us to just say O(1) and O(N^2) without having to distinguish between O(1), O(3), O(999), O(4N+3), O(23N+2), O(34N^2+4+e^N), etc. By using phrases like "there exists a function such that" we can brush all the constant coefficients under the rug.
So having a quadratic upper bound, aka being in O(N^2), means that the function f(N) is no bigger than quadratic and in this case happens to be exactly quadratic. It sounds like this just comes down to comparing the degree of polynomials, why not just say that the algorithm is a degree-2 algorithm? Why do we need this super abstract "there exists an upper bound function such that bla bla bla..."? This is the generalization necessary for Big-O to account for non-polynomial functions, some common ones being logN, NlogN, and e^N.
For example if the number of operations required by your algorithm is given by f(N)=floor(50+50*sin(N)), we would say that it's O(1) because there is a constant function, e.g. g(N)=101 that is an upper bound to f(N). In this example, you have some bizarre algorithm with oscillating execution times, but you can convey to someone else how much it doesn't slow down for large inputs by simply saying that it's O(1). Neat. Plus we have a way to meaningfully say that this algorithm with trigonometric execution time is more efficient than one with linear complexity O(N). Neat. Notice how it doesn't matter how fast the computer is because we're not measuring in seconds, we're measuring in operations. So you can evaluate the algorithm by hand on paper and it's still O(1) even if it takes you all day.
As for the example in your question, we know it's O(N^2) because there are aN^2+bN+c operations involved for some a, b, c. It can't be O(1) because no matter what aN+b you pick, I can find a large enough input size N such that your algorithm requires more than aN+b operations. On any computer, in any time zone, with any chance of rain outside. Nothing physical effects O(1) versus O(N) versus (N^2). What changes it to O(1) is changing the algorithm itself to the one-liner that I provided above where you just add two numbers and spit out the result no matter what N is. Let's say for N=10 it takes 4 operations to do both array lookups, the addition, and the variable assignment. If you run it again on the same machine with N=10000000 it's still doing the same 4 operations. The amount of operations required by the algorithm doesn't grow with N. That's why the algorithm is O(1).
It's why problems like finding a O(NlogN) algorithm to sort an array are math problems and not nano-technology problems. Big-O doesn't even assume you have a computer with electronics.
Hopefully this rant gives you a hint as to what you don't understand so you can do more effective studying for a complete understanding. There's no way to cover everything needed in one post here. It was some good soul-searching for me, so thanks.

Difference between average case and amortized analysis

I am reading an article on amortized analysis of algorithms. The following is a text snippet.
Amortized analysis is similar to average-case analysis in that it is
concerned with the cost averaged over a sequence of operations.
However, average case analysis relies on probabilistic assumptions
about the data structures and operations in order to compute an
expected running time of an algorithm. Its applicability is therefore
dependent on certain assumptions about the probability distribution of
algorithm inputs.
An average case bound does not preclude the possibility that one will
get “unlucky” and encounter an input that requires more-than-expected
time even if the assumptions for probability distribution of inputs are
valid.
My questions about above text snippet are:
In the first paragraph, how does average-case analysis “rely on probabilistic assumptions about data structures and operations?” I know average-case analysis depends on probability of input, but what does the above statement mean?
What does the author mean in the second paragraph that average case is not valid even if the input distribution is valid?
Thanks!
Average case analysis makes assumptions about the input that may not be met in certain cases. Therefore, if your input is not random, in the worst case the actual performace of an algorithm may be much slower than the average case.
Amortized analysis makes no such assumptions, but it considers the total performance of a sequence of operations instead of just one operation.
Dynamic array insertion provides a simple example of amortized analysis. One algorithm is to allocate a fixed size array, and as new elements are inserted, allocate a fixed size array of double the old length when necessary. In the worst case a insertion can require time proportional to the length of the entire list, so in the worst case insertion is an O(n) operation. However, you can guarantee that such a worst case is infrequent, so insertion is an O(1) operation using amortized analysis. Amortized analysis holds no matter what the input is.
To get the average-case time complexity, you need to make assumptions about what the "average case" is. If inputs are strings, what's the "average string"? Does only length matter? If so, what is the average length of strings I will get? If not, what is the average character(s) in these strings? It becomes difficult to answer these questions definitively if the strings are, for instance, last names. What is the average last name?
In most interesting statistical samples, the maximum value is greater than the mean. This means that your average case analysis will sometimes underestimate the time/resources needed for certain inputs (which are problematic). If you think about it, for a symmetrical PDF, average case analysis should underestimate as much as it overestimates. Worst case analysis, OTOH, considers only the most problematic case(s), and so is guaranteed to overestimate.
Consider the computation of the minimum in an unsorted array. Maybe you know that it has O(n) running time but if we want be more precise it does n/2 comparison in the average case. Why this? because we are doing an assumption on data; we are assuming that the minimum can be in every position with the same probability.
if we change this assumption, and we say for example that the probability of being in the i position is for example increasing with i, we could prove a different comparison number, even a different asymptotical bound.
In the second paragraph the author say that with average case analysis we can be very unlucky and have a measured average case greater than the therotical case; recalling the previous example, if we are unlucky on m different arrays of size n, and the minimum is every time in the last position, than we'll measure a n average case and not a n/2. This can't just happen when a amortized bound is proven.

Count the number of operations for a sorting algorithm

This is my assignment question:
Explain with an example quick sort , merge sort and heap sort .
further count the number of operations, by each of these sorting methods.
I don't understand what exactly i have to answer in the context of " count the number of operations " ?
I found something in coremen book in chapter 2, they have explained insertions sort the running time of an algorithm by calculating run time of each statement ....
do i have to do in similar way ?
To count the number of operations is also known as to analyze the algorithm complexity. The idea is to have a rough idea how many operations are in the worst case needed to execute the algorithm on an input of size N, which gives you the upper bound of the computational resources required for that algorithm. And since each operation by itself (like multiplication or comparison for example) is a finite operation and takes deterministic time (even though it might be different on different machines), to get an idea of how good or bad an algorithm is, especially compared to other algorithms, all you need to know is the rough number of operations.
Here's an example with bubble sort. Let's say you have an array of two numbers. To sort it you need to compare both numbers and potentially exchange them. Since comparing and exchanging are single operations, the exact time for executing them is minimal and not important by itself. Thus, you can say that with N=2, the number of operations is O(N)=1. For three numbers, though, you need three operations in the worst case - compare the first and the second one and potentially exchange them, then compare the second one and the third one and exchange them, then compare the first one with the second one again. When you continue to generalize the bubble sort, you will find out that potentially to sort N numbers, you need to do N operations for the first number, N-1 for the second and so on. In other words, O(N) = N + (N-1) + ... + 2 + 1 = N * (N-1) / 2, which for big enough N can be simplified to O(N) = N^2.
Of course, you could just cheat and find out on the web the O(N) number for each of the three sort algorithms, but I would urge you to spend the time and try to come up with that number yourself at first. Even if you get it wrong, comparing your estimate and how you got it with the actual way to estimate their complexity will help you understand better the process of analyzing the complexity of particular piece of software you write in future.
This is called the big O notation.
This page shows you the most common sorting algorithms and their comparison expressed through big O.
Computational complexity (worst,
average and best number of comparisons
for several typical test cases, see
below). Typically, good average number
of comparisons/operations is O(n log
n) and bad is O(n^2)
From http://www.softpanorama.org/Algorithms/sorting.shtml
I think this assignment is to give you an idea that how a complexity of an algorithm is calculated. For example bubble sort algorithm has a complexity of O(n^2).
// Bubble sort method.
// ref: [http://www.metalshell.com/source_code/105/Bubble_Sort.html][1]
for(x = 0; x < ARRAY_SIZE; x++)
for(y = 0; y < ARRAY_SIZE-1; y++)
if(iarray[y] > iarray[y+1]) {
holder = iarray[y+1];
iarray[y+1] = iarray[y];
iarray[y] = holder;
}
As you see above, two loops are used to sort the array. Let ARRAY_SIZE be n. Then the number of operations is n*(n-1). That makes n^2-n which is denoted by O(N^2). That is big O notation. We just take the n that has the largest exponent, the highest growth rate. If it were 2n^2+2n, that would be still O(N^2) because constants are also omitted in calculating complexity. The wikipedia article on Big O Notation is really helpful (as Leniel mentioned in his post).
That's your homework so I did not get into details of algorithms you mentioned. But you need to do the math like this. But I think what you are asked is the actual number of operations. So, for the example above, if ARRAY_SIZE is 10, the answer gets 10*9=90. To see the differences you need to use the same array in your sample codes.

O(log N) == O(1) - Why not?

Whenever I consider algorithms/data structures I tend to replace the log(N) parts by constants. Oh, I know log(N) diverges - but does it matter in real world applications?
log(infinity) < 100 for all practical purposes.
I am really curious for real world examples where this doesn't hold.
To clarify:
I understand O(f(N))
I am curious about real world examples where the asymptotic behaviour matters more than the constants of the actual performance.
If log(N) can be replaced by a constant it still can be replaced by a constant in O( N log N).
This question is for the sake of (a) entertainment and (b) to gather arguments to use if I run (again) into a controversy about the performance of a design.
Big O notation tells you about how your algorithm changes with growing input. O(1) tells you it doesn't matter how much your input grows, the algorithm will always be just as fast. O(logn) says that the algorithm will be fast, but as your input grows it will take a little longer.
O(1) and O(logn) makes a big diference when you start to combine algorithms.
Take doing joins with indexes for example. If you could do a join in O(1) instead of O(logn) you would have huge performance gains. For example with O(1) you can join any amount of times and you still have O(1). But with O(logn) you need to multiply the operation count by logn each time.
For large inputs, if you had an algorithm that was O(n^2) already, you would much rather do an operation that was O(1) inside, and not O(logn) inside.
Also remember that Big-O of anything can have a constant overhead. Let's say that constant overhead is 1 million. With O(1) that constant overhead does not amplify the number of operations as much as O(logn) does.
Another point is that everyone thinks of O(logn) representing n elements of a tree data structure for example. But it could be anything including bytes in a file.
I think this is a pragmatic approach; O(logN) will never be more than 64. In practice, whenever terms get as 'small' as O(logN), you have to measure to see if the constant factors win out. See also
Uses of Ackermann function?
To quote myself from comments on another answer:
[Big-Oh] 'Analysis' only matters for factors
that are at least O(N). For any
smaller factor, big-oh analysis is
useless and you must measure.
and
"With O(logN) your input size does
matter." This is the whole point of
the question. Of course it matters...
in theory. The question the OP asks
is, does it matter in practice? I
contend that the answer is no, there
is not, and never will be, a data set
for which logN will grow so fast as to
always be beaten a constant-time
algorithm. Even for the largest
practical dataset imaginable in the
lifetimes of our grandchildren, a logN
algorithm has a fair chance of beating
a constant time algorithm - you must
always measure.
EDIT
A good talk:
http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
about halfway through, Rich discusses Clojure's hash tries, which are clearly O(logN), but the base of the logarithm is large and so the depth of the trie is at most 6 even if it contains 4 billion values. Here "6" is still an O(logN) value, but it is an incredibly small value, and so choosing to discard this awesome data structure because "I really need O(1)" is a foolish thing to do. This emphasizes how most of the other answers to this question are simply wrong from the perspective of the pragmatist who wants their algorithm to "run fast" and "scale well", regardless of what the "theory" says.
EDIT
See also
http://queue.acm.org/detail.cfm?id=1814327
which says
What good is an O(log2(n)) algorithm
if those operations cause page faults
and slow disk operations? For most
relevant datasets an O(n) or even an
O(n^2) algorithm, which avoids page
faults, will run circles around it.
(but go read the article for context).
This is a common mistake - remember Big O notation is NOT telling you about the absolute performance of an algorithm at a given value, it's simply telling you the behavior of an algorithm as you increase the size of the input.
When you take it in that context it becomes clear why an algorithm A ~ O(logN) and an algorithm B ~ O(1) algorithm are different:
if I run A on an input of size a, then on an input of size 1000000*a, I can expect the second input to take log(1,000,000) times as long as the first input
if I run B on an input of size a, then on an input of size 1000000*a, I can expect the second input to take about the same amount of time as the first input
EDIT: Thinking over your question some more, I do think there's some wisdom to be had in it. While I would never say it's correct to say O(lgN) == O(1), It IS possible that an O(lgN) algorithm might be used over an O(1) algorithm. This draws back to the point about absolute performance above: Just knowing one algorithm is O(1) and another algorithm is O(lgN) is NOT enough to declare you should use the O(1) over the O(lgN), it's certainly possible given your range of possible inputs an O(lgN) might serve you best.
You asked for a real-world example. I'll give you one. Computational biology. One strand of DNA encoded in ASCII is somewhere on the level of gigabytes in space. A typical database will obviously have many thousands of such strands.
Now, in the case of an indexing/searching algorithm, that log(n) multiple makes a large difference when coupled with constants. The reason why? This is one of the applications where the size of your input is astronomical. Additionally, the input size will always continue to grow.
Admittedly, these type of problems are rare. There are only so many applications this large. In those circumstances, though... it makes a world of difference.
Equality, the way you're describing it, is a common abuse of notation.
To clarify: we usually write f(x) = O(logN) to imply "f(x) is O(logN)".
At any rate, O(1) means a constant number of steps/time (as an upper bound) to perform an action regardless of how large the input set is. But for O(logN), number of steps/time still grows as a function of the input size (the logarithm of it), it just grows very slowly. For most real world applications you may be safe in assuming that this number of steps will not exceed 100, however I'd bet there are multiple examples of datasets large enough to mark your statement both dangerous and void (packet traces, environmental measurements, and many more).
For small enough N, O(N^N) can in practice be replaced with 1. Not O(1) (by definition), but for N=2 you can see it as one operation with 4 parts, or a constant-time operation.
What if all operations take 1hour? The difference between O(log N) and O(1) is then large, even with small N.
Or if you need to run the algorithm ten million times? Ok, that took 30minutes, so when I run it on a dataset a hundred times as large it should still take 30minutes because O(logN) is "the same" as O(1).... eh...what?
Your statement that "I understand O(f(N))" is clearly false.
Real world applications, oh... I don't know.... EVERY USE OF O()-notation EVER?
Binary search in sorted list of 10 million items for example. It's the very REASON we use hash tables when the data gets big enough. If you think O(logN) is the same as O(1), then why would you EVER use a hash instead of a binary tree?
As many have already said, for the real world, you need to look at the constant factors first, before even worrying about factors of O(log N).
Then, consider what you will expect N to be. If you have good reason to think that N<10, you can use a linear search instead of a binary one. That's O(N) instead of O(log N), which according to your lights would be significant -- but a linear search that moves found elements to the front may well outperform a more complicated balanced tree, depending on the application.
On the other hand, note that, even if log N is not likely to exceed 50, a performance factor of 10 is really huge -- if you're compute-bound, a factor like that can easily make or break your application. If that's not enough for you, you'll frequently see factors of (log N)^2 or (logN)^3 in algorithms, so even if you think you can ignore one factor of (log N), that doesn't mean you can ignore more of them.
Finally, note that the simplex algorithm for linear programming has a worst case performance of O(2^n). However, for practical problems, the worst case never comes up; in practice, the simplex algorithm is fast, relatively simple, and consequently very popular.
About 30 years ago, someone developed a polynomial-time algorithm for linear programming, but it was not initially practical because the result was too slow.
Nowadays, there are practical alternative algorithms for linear programming (with polynomial-time wost-case, for what that's worth), which can outperform the simplex method in practice. But, depending on the problem, the simplex method is still competitive.
The observation that O(log n) is oftentimes indistinguishable from O(1) is a good one.
As a familiar example, suppose we wanted to find a single element in a sorted array of one 1,000,000,000,000 elements:
with linear search, the search takes on average 500,000,000,000 steps
with binary search, the search takes on average 40 steps
Suppose we added a single element to the array we are searching, and now we must search for another element:
with linear search, the search takes on average 500,000,000,001 steps (indistinguishable change)
with binary search, the search takes on average 40 steps (indistinguishable change)
Suppose we doubled the number of elements in the array we are searching, and now we must search for another element:
with linear search, the search takes on average 1,000,000,000,000 steps (extraordinarily noticeable change)
with binary search, the search takes on average 41 steps (indistinguishable change)
As we can see from this example, for all intents and purposes, an O(log n) algorithm like binary search is oftentimes indistinguishable from an O(1) algorithm like omniscience.
The takeaway point is this: *we use O(log n) algorithms because they are often indistinguishable from constant time, and because they often perform phenomenally better than linear time algorithms.
Obviously, these examples assume reasonable constants. Obviously, these are generic observations and do not apply to all cases. Obviously, these points apply at the asymptotic end of the curve, not the n=3 end.
But this observation explains why, for example, we use such techniques as tuning a query to do an index seek rather than a table scan - because an index seek operates in nearly constant time no matter the size of the dataset, while a table scan is crushingly slow on sufficiently large datasets. Index seek is O(log n).
You might be interested in Soft-O, which ignores logarithmic cost. Check this paragraph in Wikipedia.
What do you mean by whether or not it "matters"?
If you're faced with the choice of an O(1) algorithm and a O(lg n) one, then you should not assume they're equal. You should choose the constant-time one. Why wouldn't you?
And if no constant-time algorithm exists, then the logarithmic-time one is usually the best you can get. Again, does it then matter? You just have to take the fastest you can find.
Can you give me a situation where you'd gain anything by defining the two as equal? At best, it'd make no difference, and at worst, you'd hide some real scalability characteristics. Because usually, a constant-time algorithm will be faster than a logarithmic one.
Even if, as you say, lg(n) < 100 for all practical purposes, that's still a factor 100 on top of your other overhead. If I call your function, N times, then it starts to matter whether your function runs logarithmic time or constant, because the total complexity is then O(n lg n) or O(n).
So rather than asking if "it matters" that you assume logarithmic complexity to be constant in "the real world", I'd ask if there's any point in doing that.
Often you can assume that logarithmic algorithms are fast enough, but what do you gain by considering them constant?
O(logN)*O(logN)*O(logN) is very different. O(1) * O(1) * O(1) is still constant.
Also a simple quicksort-style O(nlogn) is different than O(n O(1))=O(n). Try sorting 1000 and 1000000 elements. The latter isn't 1000 times slower, it's 2000 times, because log(n^2)=2log(n)
The title of the question is misleading (well chosen to drum up debate, mind you).
O(log N) == O(1) is obviously wrong (and the poster is aware of this). Big O notation, by definition, regards asymptotic analysis. When you see O(N), N is taken to approach infinity. If N is assigned a constant, it's not Big O.
Note, this isn't just a nitpicky detail that only theoretical computer scientists need to care about. All of the arithmetic used to determine the O function for an algorithm relies on it. When you publish the O function for your algorithm, you might be omitting a lot of information about it's performance.
Big O analysis is cool, because it lets you compare algorithms without getting bogged down in platform specific issues (word sizes, instructions per operation, memory speed versus disk speed). When N goes to infinity, those issues disappear. But when N is 10000, 1000, 100, those issues, along with all of the other constants that we left out of the O function, start to matter.
To answer the question of the poster: O(log N) != O(1), and you're right, algorithms with O(1) are sometimes not much better than algorithms with O(log N), depending on the size of the input, and all of those internal constants that got omitted during Big O analysis.
If you know you're going to be cranking up N, then use Big O analysis. If you're not, then you'll need some empirical tests.
In theory
Yes, in practical situations log(n) is bounded by a constant, we'll say 100. However, replacing log(n) by 100 in situations where it's correct is still throwing away information, making the upper bound on operations that you have calculated looser and less useful. Replacing an O(log(n)) by an O(1) in your analysis could result in your large n case performing 100 times worse than you expected based on your small n case. Your theoretical analysis could have been more accurate and could have predicted an issue before you'd built the system.
I would argue that the practical purpose of big-O analysis is to try and predict the execution time of your algorithm as early as possible. You can make your analysis easier by crossing out the log(n) terms, but then you've reduced the predictive power of the estimate.
In practice
If you read the original papers by Larry Page and Sergey Brin on the Google architecture, they talk about using hash tables for everything to ensure that e.g. the lookup of a cached web page only takes one hard-disk seek. If you used B-tree indices to lookup you might need four or five hard-disk seeks to do an uncached lookup [*]. Quadrupling your disk requirements on your cached web page storage is worth caring about from a business perspective, and predictable if you don't cast out all the O(log(n)) terms.
P.S. Sorry for using Google as an example, they're like Hitler in the computer science version of Godwin's law.
[*] Assuming 4KB reads from disk, 100bn web pages in the index, ~ 16 bytes per key in a B-tree node.
As others have pointed out, Big-O tells you about how the performance of your problem scales. Trust me - it matters. I have encountered several times algorithms that were just terrible and failed to meet the customers demands because they were too slow. Understanding the difference and finding an O(1) solution is a lot of times a huge improvement.
However, of course, that is not the whole story - for instance, you may notice that quicksort algorithms will always switch to insertion sort for small elements (Wikipedia says 8 - 20) because of the behaviour of both algorithms on small datasets.
So it's a matter of understanding what tradeoffs you will be doing which involves a thorough understanding of the problem, the architecture, & experience to understand which to use, and how to adjust the constants involved.
No one is saying that O(1) is always better than O(log N). However, I can guarantee you that an O(1) algorithm will also scale way better, so even if you make incorrect assumptions about how many users will be on the system, or the size of the data to process, it won't matter to the algorithm.
Yes, log(N) < 100 for most practical purposes, and No, you can not always replace it by constant.
For example, this may lead to serious errors in estimating performance of your program. If O(N) program processed array of 1000 elements in 1 ms, then you are sure it will process 106 elements in 1 second (or so). If, though, the program is O(N*logN), then it will take it ~2 secs to process 106 elements. This difference may be crucial - for example, you may think you've got enough server power because you get 3000 requests per hour and you think your server can handle up to 3600.
Another example. Imagine you have function f() working in O(logN), and on each iteration calling function g(), which works in O(logN) as well. Then, if you replace both logs by constants, you think that your program works in constant time. Reality will be cruel though - two logs may give you up to 100*100 multiplicator.
The rules of determining the Big-O notation are simpler when you don't decide that O(log n) = O(1).
As krzysio said, you may accumulate O(log n)s and then they would make a very noticeable difference. Imagine you do a binary search: O(log n) comparisons, and then imagine that each comparison's complexity O(log n). If you neglect both you get O(1) instead of O(log2n). Similarly you may somehow arrive at O(log10n) and then you'll notice a big difference for not too large "n"s.
Assume that in your entire application, one algorithm accounts for 90% of the time the user waits for the most common operation.
Suppose in real time the O(1) operation takes a second on your architecture, and the O(logN) operation is basically .5 seconds * log(N). Well, at this point I'd really like to draw you a graph with an arrow at the intersection of the curve and the line, saying, "It matters right here." You want to use the log(N) op for small datasets and the O(1) op for large datasets, in such a scenario.
Big-O notation and performance optimization is an academic exercise rather than delivering real value to the user for operations that are already cheap, but if it's an expensive operation on a critical path, then you bet it matters!
For any algorithm that can take inputs of different sizes N, the number of operations it takes is upper-bounded by some function f(N).
All big-O tells you is the shape of that function.
O(1) means there is some number A such that f(N) < A for large N.
O(N) means there is some A such that f(N) < AN for large N.
O(N^2) means there is some A such that f(N) < AN^2 for large N.
O(log(N)) means there is some A such that f(N) < AlogN for large N.
Big-O says nothing about how big A is (i.e. how fast the algorithm is), or where these functions cross each other. It only says that when you are comparing two algorithms, if their big-Os differ, then there is a value of N (which may be small or it may be very large) where one algorithm will start to outperform the other.
you are right, in many cases it does not matter for pracitcal purposes. but the key question is "how fast GROWS N". most algorithms we know of take the size of the input, so it grows linearily.
but some algorithms have the value of N derived in a complex way. if N is "the number of possible lottery combinations for a lottery with X distinct numbers" it suddenly matters if your algorithm is O(1) or O(logN)
Big-OH tells you that one algorithm is faster than another given some constant factor. If your input implies a sufficiently small constant factor, you could see great performance gains by going with a linear search rather than a log(n) search of some base.
O(log N) can be misleading. Take for example the operations on Red-Black trees.
The operations are O(logN) but rather complex, which means many low level operations.
Whenever N is the amount of objects that is stored in some kind of memory, you're correct. After all, a binary search through EVERY byte representable by a 64-bit pointer can be achieved in just 64 steps. Actually, it's possible to do a binary search of all Planck volumes in the observable universe in just 618 steps.
So in almost all cases, it's safe to approximate O(log N) with O(N) as long as N is (or could be) a physical quantity, and we know for certain that as long as N is (or could be) a physical quantity, then log N < 618
But that is assuming N is that. It may represent something else. Note that it's not always clear what it is. Just as an example, take matrix multiplication, and assume square matrices for simplicity. The time complexity for matrix multiplication is O(N^3) for a trivial algorithm. But what is N here? It is the side length. It is a reasonable way of measuring the input size, but it would also be quite reasonable to use the number of elements in the matrix, which is N^2. Let M=N^2, and now we can say that the time complexity for trivial matrix multiplication is O(M^(3/2)) where M is the number of elements in a matrix.
Unfortunately, I don't have any real world problem per se, which was what you asked. But at least I can make up something that makes some sort of sense:
Let f(S) be a function that returns the sum of the hashes of all the elements in the power set of S. Here is some pesudo:
f(S):
ret = 0
for s = powerset(S))
ret += hash(s)
Here, hash is simply the hash function, and powerset is a generator function. Each time it's called, it will generate the next (according to some order) subset of S. A generator is necessary, because we would not be able to store the lists for huge data otherwise. Btw, here is a python example of such a power set generator:
def powerset(seq):
"""
Returns all the subsets of this set. This is a generator.
"""
if len(seq) <= 1:
yield seq
yield []
else:
for item in powerset(seq[1:]):
yield [seq[0]]+item
yield item
https://www.technomancy.org/python/powerset-generator-python/
So what is the time complexity for f? As with the matrix multiplication, we can choose N to represent many things, but at least two makes a lot of sense. One is number of elements in S, in which case the time complexity is O(2^N), but another sensible way of measuring it is that N is the number of element in the power set of S. In this case the time complexity is O(N)
So what will log N be for sensible sizes of S? Well, list with a million elements are not unusual. If n is the size of S and N is the size of P(S), then N=2^n. So O(log N) = O(log 2^n) = O(n * log 2) = O(n)
In this case it would matter, because it's rare that O(n) == O(log n) in the real world.
I do not believe algorithms where you can freely choose between O(1) with a large constant and O(logN) really exists. If there is N elements to work with at the beginning, it is just plain impossible to make it O(1), the only thing that is possible is move your N to some other part of your code.
What I try to say is that in all real cases I know off you have some space/time tradeoff, or some pre-treatment such as compiling data to a more efficient form.
That is, you do not really go O(1), you just move the N part elsewhere. Either you exchange performance of some part of your code with some memory amount either you exchange performance of one part of your algorithm with another one. To stay sane you should always look at the larger picture.
My point is that if you have N items they can't disappear. In other words you can choose between inefficient O(n^2) algorithms or worse and O(n.logN) : it's a real choice. But you never really go O(1).
What I try to point out is that for every problem and initial data state there is a 'best' algorithm. You can do worse but never better. With some experience you can have a good guessing of what is this intrisic complexity. Then if your overall treatment match that complexity you know you have something. You won't be able to reduce that complexity, but only to move it around.
If problem is O(n) it won't become O(logN) or O(1), you'll merely add some pre-treatment such that the overall complexity is unchanged or worse, and potentially a later step will be improved. Say you want the smaller element of an array, you can search in O(N) or sort the array using any common O(NLogN) sort treatment then have the first using O(1).
Is it a good idea to do that casually ? Only if your problem asked also for second, third, etc. elements. Then your initial problem was truly O(NLogN), not O(N).
And it's not the same if you wait ten times or twenty times longer for your result because you simplified saying O(1) = O(LogN).
I'm waiting for a counter-example ;-) that is any real case where you have choice between O(1) and O(LogN) and where every O(LogN) step won't compare to the O(1). All you can do is take a worse algorithm instead of the natural one or move some heavy treatment to some other part of the larger pictures (pre-computing results, using storage space, etc.)
Let's say you use an image-processing algorithm that runs in O(log N), where N is the number of images. Now... stating that it runs in constant time would make one believe that no matter how many images there are, it would still complete its task it about the same amount of time. If running the algorithm on a single image would hypothetically take a whole day, and assuming that O(logN) will never be more than 100... imagine the surprise of that person that would try to run the algorithm on a very large image database - he would expect it to be done in a day or so... yet it'll take months for it to finish.

How do I find out out the fundamental operation when calculating run-time complexity?

I am trying to get the worst run-time complexity order on a couple of algorithms created. However I have run into a problem that I keep tending to select the wrong or wrong amount of fundamental operations for an algorithm.
To me it appears to be that the selection of the fundamental operation is more of an art than a science. After googling and reading my text boxes, I still have not found a good definition. So far I have defined it as "An operation that always occurs within an algorithms execution" such as a comparison or array manipulation.
But algorithms often have many comparisons that are always executed so which operation do you pick?
I agree to some degree it's an art, so you should always clarify when writing documentation, etc.. But usually it's a "visit" to the underlying data structure. So like you said, for an array it's a comparison or a swap, for a hash map it may be a manual examination of a key, for a graph it's a visit to a vertex or edge, etc.
Even practicing complexity theorists have disagreements about this sort of thing, so what follows may be a bit subjective: http://blog.computationalcomplexity.org/2009/05/shaving-logs-with-unit-cost.html
The purpose of big-O notation is to summarize the efficiency of an algorithm for the reader. In practical contexts, I am most concerned with how many clock cycles an algorithm takes, assuming that the big-O constant is neither extremely small or large (and ignoring the effects of the memory hierarchy); this is the "unit-cost" model alluded to in the linked post.
The reason to count comparisons for sorting algorithms is that the cost of a comparison depends on the type of the input data. You could say that a sorting algorithm takes O(c n log n) cycles where c is the expense of a comparison, but it's simpler in this case to count comparisons instead because the other work performed by the algorithm is O(n log n). There's a sorting algorithm that sorts the concatenation of n sorted arrays of length n in n^2 log n steps and n^2 comparisons; here, I would expect that the number of comparisons and the computational overhead be stated separately, because neither necessarily dominates the other.
This only works when You have actually implemented the algorithm, but You could just use a profiler to see which operation is the bottleneck. That's a practical point of view. In theory, some assume that everything that is not the fundamental operation runs in zero time.
The somewhat simple definition I have heard is:
The operation which is executed at least as many times as any other
operation in the algorithm.
For example, in a sorting algorithm, these tend to be comparisons rather than assignments as you almost always have to visit and 'check' an element before you re-order it, but the check may not result in a re-ordering. So there will always be at-least as many comparisons as assignments.

Resources