Difference between average case and amortized analysis

Difference between average case and amortized analysis - algorithm

I am reading an article on amortized analysis of algorithms. The following is a text snippet.
Amortized analysis is similar to average-case analysis in that it is
concerned with the cost averaged over a sequence of operations.
However, average case analysis relies on probabilistic assumptions
about the data structures and operations in order to compute an
expected running time of an algorithm. Its applicability is therefore
dependent on certain assumptions about the probability distribution of
algorithm inputs.
An average case bound does not preclude the possibility that one will
get “unlucky” and encounter an input that requires more-than-expected
time even if the assumptions for probability distribution of inputs are
valid.
My questions about above text snippet are:
In the first paragraph, how does average-case analysis “rely on probabilistic assumptions about data structures and operations?” I know average-case analysis depends on probability of input, but what does the above statement mean?
What does the author mean in the second paragraph that average case is not valid even if the input distribution is valid?
Thanks!

Average case analysis makes assumptions about the input that may not be met in certain cases. Therefore, if your input is not random, in the worst case the actual performace of an algorithm may be much slower than the average case.
Amortized analysis makes no such assumptions, but it considers the total performance of a sequence of operations instead of just one operation.
Dynamic array insertion provides a simple example of amortized analysis. One algorithm is to allocate a fixed size array, and as new elements are inserted, allocate a fixed size array of double the old length when necessary. In the worst case a insertion can require time proportional to the length of the entire list, so in the worst case insertion is an O(n) operation. However, you can guarantee that such a worst case is infrequent, so insertion is an O(1) operation using amortized analysis. Amortized analysis holds no matter what the input is.

To get the average-case time complexity, you need to make assumptions about what the "average case" is. If inputs are strings, what's the "average string"? Does only length matter? If so, what is the average length of strings I will get? If not, what is the average character(s) in these strings? It becomes difficult to answer these questions definitively if the strings are, for instance, last names. What is the average last name?
In most interesting statistical samples, the maximum value is greater than the mean. This means that your average case analysis will sometimes underestimate the time/resources needed for certain inputs (which are problematic). If you think about it, for a symmetrical PDF, average case analysis should underestimate as much as it overestimates. Worst case analysis, OTOH, considers only the most problematic case(s), and so is guaranteed to overestimate.

Consider the computation of the minimum in an unsorted array. Maybe you know that it has O(n) running time but if we want be more precise it does n/2 comparison in the average case. Why this? because we are doing an assumption on data; we are assuming that the minimum can be in every position with the same probability.
if we change this assumption, and we say for example that the probability of being in the i position is for example increasing with i, we could prove a different comparison number, even a different asymptotical bound.
In the second paragraph the author say that with average case analysis we can be very unlucky and have a measured average case greater than the therotical case; recalling the previous example, if we are unlucky on m different arrays of size n, and the minimum is every time in the last position, than we'll measure a n average case and not a n/2. This can't just happen when a amortized bound is proven.

Related

Combinations of Asymptotic Time Complexities with Best, Average and Worst Case Inputs

I'm confused by numerous claims that asymptotic notation has nothing to do with best-case, average-case and worst-case time complexity. If this is the case, then presumably the following combinations are all valid:
Time Complexity O(n)
Best case - upper bound for the best case input
For the best possible input, the number of basic operations carried out by this algorithm will never exceed some constant multiple of n.
Average case - upper bound for average case input
For an average input, the number of basic operations carried out by this algorithm will never exceed some constant multiple of n.
Worst case - upper bound for worst case input
For the worst possible input, the number of basic operations carried out by this algorithm will never exceed some constant multiple of n.
Time Complexity: Ө(n)
Best case - tight bound for the best case input
For the best possible input, the number of basic operations carried out by this algorithm will never exceed or be less than some constant multiple of n.
Average case - tight bound for average case input
For an average input, the number of basic operations carried out by this algorithm will never exceed or be less than some constant multiple of n.
Worst case - tight bound for worst case input
For the worst possible input, the number of basic operations carried out by this algorithm will never exceed or be less than some constant multiple of n.
Time Complexity: Ω(n)
Best case - lower bound for the best case input
For the best possible input, the number of basic operations carried out by this algorithm will never be less than some constant multiple of n.
Average case - lower bound for average case input
For an average input, the number of basic operations carried out by this algorithm will never be less than some constant multiple of n.
Worst case - lower bound for worst case input
For the worst possible input, the number of basic operations carried out by this algorithm will never be less than some constant multiple of n.
Which of the above make sense? Which are generally used in practice when assessing the efficiency of an algorithm in terms of time taken to execute as input grows? As far as I can tell, several of them are redundant and/or contradictory.
I'm really not seeing how the concepts of upper, tight and lower bounds have nothing to do with best, average and worst case inputs. This is one of those topics that the further I look into it, the more confused I become. I would be very grateful if someone could provide some clarity on the matter for me.

All of the statements make sense.
In practice, unless otherwise stated, we should be talking about the worst case input in all cases.
We're often concerned about the average case input, although the definition of "average case" gets a little dodgy. It's usually better to talk about the "expected time", since that is a more precise mathematical definition that corresponds to what people usually mean by "average case". People are unfortunately often sloppy here, and you'll often see complexity statements that refer to expected time instead of worst case time, but don't mention it.
We're rarely concerned about the best case input.
Unfortunately, it's also common to see people confusing the notions of best-vs-worst-case input vs upper-vs-lower bounds, especially on SO and other informal sites.
You can always go back to the definitions, as you have already done, to figure out what statements really mean.
"This algorithm runs in X(n2) time" Means that given the function f(n) for the worst-case execution time vs problem size in any environment, that function will be in the set X(n2).
"This algorithm runs in X(n2) expected time" Means that given the function f(n) for the mathematical expectation of execution time vs problem size in any environment, that function will be in the set X(n2).
Finally, note that in any environment actually only applies to specific computing models. We usually assume a random access machine, so complexity statements are not valid for, e.g., Turing machine implementations.

Amortized complexity in layman's terms?

Can someone explain amortized complexity in layman's terms? I've been having a hard time finding a precise definition online and I don't know how it entirely relates to the analysis of algorithms. Anything useful, even if externally referenced, would be highly appreciated.

Amortized complexity is the total expense per operation, evaluated over a sequence of operations.
The idea is to guarantee the total expense of the entire sequence, while permitting individual operations to be much more expensive than the amortized cost.
Example:
The behavior of C++ std::vector<>. When push_back() increases the vector size above its pre-allocated value, it doubles the allocated length.
So a single push_back() may take O(N) time to execute (as the contents of the array are copied to the new memory allocation).
However, because the size of the allocation was doubled, the next N-1 calls to push_back() will each take O(1) time to execute. So, the total of N operations will still take O(N) time; thereby giving push_back() an amortized cost of O(1) per operation.
Unless otherwise specified, amortized complexity is an asymptotic worst-case guarantee for any sequence of operations. This means:
Just as with non-amortized complexity, the big-O notation used for amortized complexity ignores both fixed initial overhead and constant performance factors. So, for the purpose of evaluating big-O amortized performance, you can generally assume that any sequence of amortized operations will be "long enough" to amortize away a fixed startup expense. Specifically, for the std::vector<> example, this is why you don't need to worry about whether you will actually encounter N additional operations: the asymptotic nature of the analysis already assumes that you will.
Besides arbitrary length, amortized analysis doesn't make assumptions about the sequence of operations whose cost you are measuring -- it is a worst-case guarantee on any possible sequence of operations. No matter how badly the operations are chosen (say, by a malicious adversary!), an amortized analysis must guarantee that a long enough sequence of operations may not cost consistently more than the sum of their amortized costs. This is why (unless specifically mentioned as a qualifier) "probability" and "average case" are not relevant to amortized analysis -- any more than they are to an ordinary worst-case big-O analysis!

In an amortized analysis, the time required to perform a sequence of data-structure operations is averaged over all the operations performed... Amortized analysis differs from average-case analysis in that probability is not involved; an amortized analysis guarantees the average performance of each operation in the worst case.
(from Cormen et al., "Introduction to Algorithms")
That might be a bit confusing since it says both that the time is averaged, and that it's not an average-case analysis. So let me try to explain this with a financial analogy (indeed, "amortized" is a word most commonly associated with banking and accounting.)
Suppose that you are operating a lottery. (Not buying a lottery ticket, which we'll get to in a moment, but operating the lottery itself.) You print 100,000 tickets, which you will sell for 1 currency unit each. One of those tickets will entitle the purchaser to 40,000 currency units.
Now, assuming you can sell all the tickets, you stand to earn 60,000 currency units: 100,000 currency units in sales, minus the 40,000 currency unit prize. For you, the value of each ticket is 0.60 currency units, amortized over all the tickets. This is a reliable value; you can bank on it. If you get tired of selling the tickets yourself, and someone comes along and offers to sell them for 0.30 currency units each, you know exactly where you stand.
For the lottery purchaser, the situation is different. The purchaser has an expected loss of 0.60 currency units when they purchase a lottery ticket. But that's probabilistic: the purchaser might buy ten lottery tickets every day for 30 years (a bit more than 100,000 tickets) without ever winning. Or they might spontaneously buy a single ticket one day, and win 39,999 currency units.
Applied to datastructure analysis, we're talking about the first case, where we amortize the cost of some datastructure operation (say, insert) over all the operations of that kind. Average-case analysis deals with the expected value of a stochastic operation (say, search), where we cannot compute the total cost of all the operations, but we can provide a probabilistic analysis of the expected cost of a single one.
It's often stated that amortized analysis applies to the situation where a high-cost operation is rare, and that's often the case. But not always. Consider, for example, the so-called "banker's queue", which is a first-in-first-out (FIFO) queue, made out of two stacks. (It's a classic functional data-structure; you can build cheap LIFO stacks out of immutable single-linked nodes, but cheap FIFOs are not so obvious). The operations are implemented as follows:
put(x): Push x on the right-hand stack.
y=get(): If the left-hand stack is empty:
Pop each element off the right-hand stack and
push it onto the left-hand stack. This effectively
reverses the right-hand stack onto the left-hand stack.
Pop and return the top element of the left-hand stack.
Now, I claim that the amortized cost of put and get is O(1), assuming that I start and end with an empty queue. The analysis is simple: I always put onto the right-hand stack, and get from the left-hand stack. So aside from the If clause, each put is a push, and each get is a pop, both of which are O(1). I don't know how many times I will execute the If clause -- it depends on the pattern of puts and gets -- but I know that every element moves exactly once from the right-hand stack to the left-hand stack. So the total cost over the entire sequence of n puts and n gets is: n pushes, n pops, and n moves, where a move is a pop followed by a push: in other words, the 2n operations (n puts and n gets) result in 2n pushes and 2n pops. So the amortized cost of a single put or get is one push and one pop.
Note that banker's queues are called that precisely because of the amortized complexity analysis (and the association of the word "amortized" with finance). Banker's queues are the answer to what used to be a common interview question, although I think it's now considered too well-known: Come up with a queue which implements the following three operations in amortized O(1) time:
1) Get and remove the oldest element of the queue,
2) Put a new element onto the queue,
3) Find the value of the current maximum element.

The principle of "amortized complexity" is that although something may be quite complex when you do it, since it's not done very often, it's considered "not complex". For example, if you create a binary tree that needs balancing from time to time - say once every 2^n insertions - because although balancing the tree is quite complex, it only happens once in every n insertions (e.g once at insertion number 256, then again at 512th, 1024th, etc). On all other insertions, the complexity is O(1) - yes, it takes O(n) once every n insertions, but it's only 1/n probability - so we multiply O(n) by 1/n and get O(1). So that is said to be "Amortized complexity of O(1)" - because as you add more elements, the time consumed for rebalancing the tree is minimal.

Amortized means divided over repeated runs. The worst-case behavior is guaranteed not to happen with much frequency. For example if the slowest case is O(N), but the chance of that happening is just O(1/N), and otherwise the process is O(1), then the algorithm would still have amortized constant O(1) time. Just consider the work of each O(N) run to be parceled out to N other runs.
The concept depends on having enough runs to divide the total time over. If the algorithm is only run once, or it has to meet a deadline each time it runs, then the worst-case complexity is more relevant.

Say you are trying to find the kth smallest element of an unsorted array.
Sorting the array would be O(n logn).
So then finding the kth smallest number is just locating the index, so O(1).
Since the array is already sorted, we never have to sort again. We will never hit the worst case scenario more than once.
If we perform n queries of trying to locate kth smallest, it will still be O(n logn) because it dominates over O(1). If we average the time of each operation it will be:
(n logn)/n or O(logn). So, Time Complexity/ Number of Operations.
This is amortized complexity.
I think this is how it goes, im just learning it too..

It is somewhat similar to multiplying worst case complexity of different branches in an algorithm with the probability of executing that branch, and adding the results. So if some branch is very unlikely to be taken, it contributes less to the complexity.

What is amortized analysis of algorithms? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How is it different from asymptotic analysis? When do you use it, and why?
I've read some articles that seem to have been written well, like these:
http://www.ugrad.cs.ubc.ca/~cs320/2010W2/handouts/aa-nutshell.pdf
http://www.cs.princeton.edu/~fiebrink/423/AmortizedAnalysisExplained_Fiebrink.pdf
but I've still not fully understood these concepts.
So, can anyone please simplify it for me?

Amortized analysis doesn't naively multiply the number of invocations with the worst case for one invocation.
For example, for a dynamic array that doubles in size when needed, normal asymptotic analysis would only conclude that adding an item to it costs O(n), because it might need to grow and copy all elements to the new array. Amortized analysis takes into account that in order to have to grow, n/2 items must have been added without causing a grow since the previous grow, so adding an item really only takes O(1) (the cost of O(n) is amortized over n/2 actions).
Amortized analysis is not the same as an "average performance" - amortized analysis gives a hard guarantee on what the performance will do if you do so much actions.

There are a lot of answers to "what", but none to "why".
As everyone else has said, asymptotic analysis is about how the performance of a given operation scales to a large data set. Amortized analysis is about how the average of the performance of all of the operations on a large data set scales. Amortized analysis never gives worse bounds than asymptotic, and sometimes gives much better ones.
If you are concerned with the total running time of a longer job, the better bounds of amortized analysis are probably what you care about. Which is why scripting languages (for instance) are often happy to grow arrays and hash tables by some factor even though that is an expensive operation. (The growing can be a O(n) operation, but amortized is O(1) because you do it rarely.)
If you are doing real time programming (individual operations must complete in a predictable time), then the better bounds from amortized analysis don't matter. It doesn't matter if the operation on average was fast, if you failed to finish it in time to get back and adjust the bandsaw before it cut too far...
Which one matters in your case depends on exactly what your programming problem is.

Asymptotic analysis
This term refers to the analysis of algorithm performance under the assumption that the data the algorithm operates on (the input) is, in layman's terms, "large enough that making it larger will not change the conclusion". Although the exact size of the input does not need to be specified (we only need an upper bound), the data set itself has to be specified.
Note that so far we have only talked about the method of analysis; we have not specified exactly which quantity we are analyzing (time complexity? space complexity?), and neither have we specified which metric we are interested in (worst case? best case? average?).
In practice the term asymptotic analysis commonly refers to upper bound time complexity of an algorithm, i.e. the worst case performance measured by total running time, which is represented by the big-Oh notation (e.g. a sorting algorithm might be O(nlogn)).
Amortized analysis
This term refers to the analysis of algorithm performance based on a specific sequence of operations that targets the worst case scenario -- that is, amortized analysis does imply that the metric is worst case performance (although it still does not say which quantity is being measured). To perform this analysis, we need to specify the size of the input, but we do not need to make any assumptions about its form.
In layman's terms, amortized analysis is picking an arbitrary size for the input and then "playing through" the algorithm. Whenever a decision that depends on the input must be made, the worst path is taken¹. After the algorithm has run to completion we divide the calculated complexity by the size of the input to produce the final result.
¹note: To be precise, the worst path that is theoretically possible. If you have a vector that dynamically doubles in size each time its capacity is exhausted, "worst case" does not mean to assume that it will need to double upon every insertion because the insertions are processed as a sequence. We are allowed to (and indeed must) use known state to mathematically eliminate as many "even worse" cases as we can, even while the input remains unknown.
The most important difference
The critical difference between asymptotic and amortized analysis is that the former is dependent on the input itself, while the latter is dependent on the sequence of operations the algorithm will execute.
Therefore:
asymptotic analysis allows us to assert that the complexity of the algorithm when it is given a best/worst/average case input of size approaching N is bounded by some function F(N) -- where N is a variable
amortized analysis allows us to assert that the complexity of the algorithm when it is given an input of unknown characteristics but known size N is no worse than the value of a function F(N) -- where N is a known value

The answer to this is succinctly defined by the first sentence of the Amortized Analysis chapter in the book - Introduction to Algorithms:
In an amortized analysis, the time required to perform a sequence of
data-structure operations is averaged over all the operations
performed.
We represent the complexity of a program's growth by Asymptotic analysis - which is bounding the program's growth by a function and defining the worst, best or average case of that.
But this can be misleading in cases where there is just one case where the program's complexity reaches a peak, but in general, the program doesn't take much computation.
Hence, it makes more sense to average the cost over a sequence of operations, even though a single operation might be expensive. This is Amortized Analysis!
Amortized Analysis is an alternate to Asymptotic technique used to calculate complexity. It helps us calculating a more true complexity in terms of practicality, so as to compare and decide between two or more algorithms.

The best reference I've found so far for understanding the amortized analysis of algorithms, is in the book Introduction to Algorithms, third edition, chapter 17: "Amortized Analysis". It's all there, explained much better than what can be found in a Stack Overflow post. You'll find the book in the library of any decent University.

Regular asymptotic analysis looks at the performance of an individual operation asymptotically, as a function of the size of the problem. The O() notation is what indicates an asymptotic analysis.
Amortized analysis (which is also an asymptotic analysis) looks at the total performance of multiple operations on a shared datastructure.
The difference is, amortized analysis typically proves that the total computation required for M operations has a better performance guarantee than M times the worst case for the individual operation.
For example, an individual operation on a splay tree of size N can take up to O(N) time. However, a sequence of M operations on a tree of size N is bounded by O( M(1+log N) + N log N ) time, which is roughly O(log N) per operation. However, note that an amortized analysis is much stricter than an "average-case" analysis: it proves that any possible sequence of operations will satisfy its asymptotic worst case.

Amortised analysis deals with the total cost over a number of runs of the routine, and the benefits that can be gained therein. For example searching an unsorted array of n items for a single match may take up to n comparisons and hence is o(n) complexity. However, if we know the same array is going to be searched for m items, repeating the total task would then have complexity O(m*n). However, if we sort the array in advance, the cost is O(n log(n)), and successive searches take only O(log(n)) for a sorted array. Thus the total amortised cost for m elements taking this approach is O(n*log(n) + m*log(n)). If m >= n, this equates to O(n log(n)) by pre-sorting compared to O(n^2) for not sorting. Thus the amortised cost is cheaper.
Put simply, by spending a bit extra early on, we can save a lot later.

Expected running time in randomized algorithm

In most of the calculation analysis of running times, we have assumed
that all inputs are equally likely. This is not true, because nearly
sorted input, for instance, occurs much more often than is
statistically expected, and this causes problems, particularly for
quicksort and binary search trees.
By using a randomized algorithm, the particular input is no longer
important. The random numbers are important, and we can get an
expected running time, where we now average over all possible random
numbers instead of over all possible inputs. Using quicksort with a
random pivot gives an O(n log n)-expected-time algorithm. This means
that for any input, including already-sorted input, the running time
is expected to be O(n log n), based on the statistics of random
numbers. An expected running time bound is somewhat stronger than an
average-case bound but, of course, is weaker than the corresponding
worst-case bound.
First, we will see a novel scheme for supporting the binary search
tree operations in O(log n) expected time. Once again, this means that
there are no bad inputs, just bad random numbers. From a theoretical
point of view, this is not terribly exciting, since balanced search
trees achieve this bound in the worst case. Nevertheless, the use of
randomization leads to relatively simple algorithms for searching,
inserting, and especially deleting.
My question on above text is
What does author mean by "An expected running time bound is somewhat stronger than an average-case bound but, of course, is weaker than the corresponding worst-case bound" ? in above text.
Regrading binary search trees what does author meant by "since balanced search trees achieve this bound in the worst case"? my understanding for binary search trees worst case is O(d), where d is depth of the node this can be "N" i.e., O(N). what does author mean by this is same as worst case above?
Thanks!

Like the author explained in the sentence before: An expected time must hold for any input. Average case is averaged over all inputs, that is, you get a reasonably mediocre input. Expected time means no matter how bad the input is, the algorithm must be able to compute it within the bound if the random number god is nice (i.e. gives you your expected value, and not the worst possible thing like she often does).
Balanced binary search trees. They can't reach depth N because they are balanced.

Author means that on average Quick Sort will produce slower results then O(n log n) (This is not correct for all sorting algorithms, e.g. for merge sort expected time == average time ==O(n log n) and no randomization is needed)
O(d) = O(log n) for balanced trees
PS who is the author?

In randomized quicksort,even intentionally, we cant produce a bad input(which may cause worst case)since the random permutation makes the input order irrelevant. The randomized algorithm performs badly only if the random-number generator produces an unlucky permutation to be sorted.Nearly all permutations cause quicksort to perform closer to the average case, there are very few permutations that cause near-worst-case behavior and therefore probability of worst case input is very low...so it almost performs in O(nlogn).

O(log N) == O(1) - Why not?

Whenever I consider algorithms/data structures I tend to replace the log(N) parts by constants. Oh, I know log(N) diverges - but does it matter in real world applications?
log(infinity) < 100 for all practical purposes.
I am really curious for real world examples where this doesn't hold.
To clarify:
I understand O(f(N))
I am curious about real world examples where the asymptotic behaviour matters more than the constants of the actual performance.
If log(N) can be replaced by a constant it still can be replaced by a constant in O( N log N).
This question is for the sake of (a) entertainment and (b) to gather arguments to use if I run (again) into a controversy about the performance of a design.

Big O notation tells you about how your algorithm changes with growing input. O(1) tells you it doesn't matter how much your input grows, the algorithm will always be just as fast. O(logn) says that the algorithm will be fast, but as your input grows it will take a little longer.
O(1) and O(logn) makes a big diference when you start to combine algorithms.
Take doing joins with indexes for example. If you could do a join in O(1) instead of O(logn) you would have huge performance gains. For example with O(1) you can join any amount of times and you still have O(1). But with O(logn) you need to multiply the operation count by logn each time.
For large inputs, if you had an algorithm that was O(n^2) already, you would much rather do an operation that was O(1) inside, and not O(logn) inside.
Also remember that Big-O of anything can have a constant overhead. Let's say that constant overhead is 1 million. With O(1) that constant overhead does not amplify the number of operations as much as O(logn) does.
Another point is that everyone thinks of O(logn) representing n elements of a tree data structure for example. But it could be anything including bytes in a file.

I think this is a pragmatic approach; O(logN) will never be more than 64. In practice, whenever terms get as 'small' as O(logN), you have to measure to see if the constant factors win out. See also
Uses of Ackermann function?
To quote myself from comments on another answer:
[Big-Oh] 'Analysis' only matters for factors
that are at least O(N). For any
smaller factor, big-oh analysis is
useless and you must measure.
and
"With O(logN) your input size does
matter." This is the whole point of
the question. Of course it matters...
in theory. The question the OP asks
is, does it matter in practice? I
contend that the answer is no, there
is not, and never will be, a data set
for which logN will grow so fast as to
always be beaten a constant-time
algorithm. Even for the largest
practical dataset imaginable in the
lifetimes of our grandchildren, a logN
algorithm has a fair chance of beating
a constant time algorithm - you must
always measure.
EDIT
A good talk:
http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
about halfway through, Rich discusses Clojure's hash tries, which are clearly O(logN), but the base of the logarithm is large and so the depth of the trie is at most 6 even if it contains 4 billion values. Here "6" is still an O(logN) value, but it is an incredibly small value, and so choosing to discard this awesome data structure because "I really need O(1)" is a foolish thing to do. This emphasizes how most of the other answers to this question are simply wrong from the perspective of the pragmatist who wants their algorithm to "run fast" and "scale well", regardless of what the "theory" says.
EDIT
See also
http://queue.acm.org/detail.cfm?id=1814327
which says
What good is an O(log2(n)) algorithm
if those operations cause page faults
and slow disk operations? For most
relevant datasets an O(n) or even an
O(n^2) algorithm, which avoids page
faults, will run circles around it.
(but go read the article for context).

This is a common mistake - remember Big O notation is NOT telling you about the absolute performance of an algorithm at a given value, it's simply telling you the behavior of an algorithm as you increase the size of the input.
When you take it in that context it becomes clear why an algorithm A ~ O(logN) and an algorithm B ~ O(1) algorithm are different:
if I run A on an input of size a, then on an input of size 1000000*a, I can expect the second input to take log(1,000,000) times as long as the first input
if I run B on an input of size a, then on an input of size 1000000*a, I can expect the second input to take about the same amount of time as the first input
EDIT: Thinking over your question some more, I do think there's some wisdom to be had in it. While I would never say it's correct to say O(lgN) == O(1), It IS possible that an O(lgN) algorithm might be used over an O(1) algorithm. This draws back to the point about absolute performance above: Just knowing one algorithm is O(1) and another algorithm is O(lgN) is NOT enough to declare you should use the O(1) over the O(lgN), it's certainly possible given your range of possible inputs an O(lgN) might serve you best.

You asked for a real-world example. I'll give you one. Computational biology. One strand of DNA encoded in ASCII is somewhere on the level of gigabytes in space. A typical database will obviously have many thousands of such strands.
Now, in the case of an indexing/searching algorithm, that log(n) multiple makes a large difference when coupled with constants. The reason why? This is one of the applications where the size of your input is astronomical. Additionally, the input size will always continue to grow.
Admittedly, these type of problems are rare. There are only so many applications this large. In those circumstances, though... it makes a world of difference.

Equality, the way you're describing it, is a common abuse of notation.
To clarify: we usually write f(x) = O(logN) to imply "f(x) is O(logN)".
At any rate, O(1) means a constant number of steps/time (as an upper bound) to perform an action regardless of how large the input set is. But for O(logN), number of steps/time still grows as a function of the input size (the logarithm of it), it just grows very slowly. For most real world applications you may be safe in assuming that this number of steps will not exceed 100, however I'd bet there are multiple examples of datasets large enough to mark your statement both dangerous and void (packet traces, environmental measurements, and many more).

For small enough N, O(N^N) can in practice be replaced with 1. Not O(1) (by definition), but for N=2 you can see it as one operation with 4 parts, or a constant-time operation.
What if all operations take 1hour? The difference between O(log N) and O(1) is then large, even with small N.
Or if you need to run the algorithm ten million times? Ok, that took 30minutes, so when I run it on a dataset a hundred times as large it should still take 30minutes because O(logN) is "the same" as O(1).... eh...what?
Your statement that "I understand O(f(N))" is clearly false.
Real world applications, oh... I don't know.... EVERY USE OF O()-notation EVER?
Binary search in sorted list of 10 million items for example. It's the very REASON we use hash tables when the data gets big enough. If you think O(logN) is the same as O(1), then why would you EVER use a hash instead of a binary tree?

As many have already said, for the real world, you need to look at the constant factors first, before even worrying about factors of O(log N).
Then, consider what you will expect N to be. If you have good reason to think that N<10, you can use a linear search instead of a binary one. That's O(N) instead of O(log N), which according to your lights would be significant -- but a linear search that moves found elements to the front may well outperform a more complicated balanced tree, depending on the application.
On the other hand, note that, even if log N is not likely to exceed 50, a performance factor of 10 is really huge -- if you're compute-bound, a factor like that can easily make or break your application. If that's not enough for you, you'll frequently see factors of (log N)^2 or (logN)^3 in algorithms, so even if you think you can ignore one factor of (log N), that doesn't mean you can ignore more of them.
Finally, note that the simplex algorithm for linear programming has a worst case performance of O(2^n). However, for practical problems, the worst case never comes up; in practice, the simplex algorithm is fast, relatively simple, and consequently very popular.
About 30 years ago, someone developed a polynomial-time algorithm for linear programming, but it was not initially practical because the result was too slow.
Nowadays, there are practical alternative algorithms for linear programming (with polynomial-time wost-case, for what that's worth), which can outperform the simplex method in practice. But, depending on the problem, the simplex method is still competitive.

The observation that O(log n) is oftentimes indistinguishable from O(1) is a good one.
As a familiar example, suppose we wanted to find a single element in a sorted array of one 1,000,000,000,000 elements:
with linear search, the search takes on average 500,000,000,000 steps
with binary search, the search takes on average 40 steps
Suppose we added a single element to the array we are searching, and now we must search for another element:
with linear search, the search takes on average 500,000,000,001 steps (indistinguishable change)
with binary search, the search takes on average 40 steps (indistinguishable change)
Suppose we doubled the number of elements in the array we are searching, and now we must search for another element:
with linear search, the search takes on average 1,000,000,000,000 steps (extraordinarily noticeable change)
with binary search, the search takes on average 41 steps (indistinguishable change)
As we can see from this example, for all intents and purposes, an O(log n) algorithm like binary search is oftentimes indistinguishable from an O(1) algorithm like omniscience.
The takeaway point is this: *we use O(log n) algorithms because they are often indistinguishable from constant time, and because they often perform phenomenally better than linear time algorithms.
Obviously, these examples assume reasonable constants. Obviously, these are generic observations and do not apply to all cases. Obviously, these points apply at the asymptotic end of the curve, not the n=3 end.
But this observation explains why, for example, we use such techniques as tuning a query to do an index seek rather than a table scan - because an index seek operates in nearly constant time no matter the size of the dataset, while a table scan is crushingly slow on sufficiently large datasets. Index seek is O(log n).

You might be interested in Soft-O, which ignores logarithmic cost. Check this paragraph in Wikipedia.

What do you mean by whether or not it "matters"?
If you're faced with the choice of an O(1) algorithm and a O(lg n) one, then you should not assume they're equal. You should choose the constant-time one. Why wouldn't you?
And if no constant-time algorithm exists, then the logarithmic-time one is usually the best you can get. Again, does it then matter? You just have to take the fastest you can find.
Can you give me a situation where you'd gain anything by defining the two as equal? At best, it'd make no difference, and at worst, you'd hide some real scalability characteristics. Because usually, a constant-time algorithm will be faster than a logarithmic one.
Even if, as you say, lg(n) < 100 for all practical purposes, that's still a factor 100 on top of your other overhead. If I call your function, N times, then it starts to matter whether your function runs logarithmic time or constant, because the total complexity is then O(n lg n) or O(n).
So rather than asking if "it matters" that you assume logarithmic complexity to be constant in "the real world", I'd ask if there's any point in doing that.
Often you can assume that logarithmic algorithms are fast enough, but what do you gain by considering them constant?

O(logN)*O(logN)*O(logN) is very different. O(1) * O(1) * O(1) is still constant.
Also a simple quicksort-style O(nlogn) is different than O(n O(1))=O(n). Try sorting 1000 and 1000000 elements. The latter isn't 1000 times slower, it's 2000 times, because log(n^2)=2log(n)

The title of the question is misleading (well chosen to drum up debate, mind you).
O(log N) == O(1) is obviously wrong (and the poster is aware of this). Big O notation, by definition, regards asymptotic analysis. When you see O(N), N is taken to approach infinity. If N is assigned a constant, it's not Big O.
Note, this isn't just a nitpicky detail that only theoretical computer scientists need to care about. All of the arithmetic used to determine the O function for an algorithm relies on it. When you publish the O function for your algorithm, you might be omitting a lot of information about it's performance.
Big O analysis is cool, because it lets you compare algorithms without getting bogged down in platform specific issues (word sizes, instructions per operation, memory speed versus disk speed). When N goes to infinity, those issues disappear. But when N is 10000, 1000, 100, those issues, along with all of the other constants that we left out of the O function, start to matter.
To answer the question of the poster: O(log N) != O(1), and you're right, algorithms with O(1) are sometimes not much better than algorithms with O(log N), depending on the size of the input, and all of those internal constants that got omitted during Big O analysis.
If you know you're going to be cranking up N, then use Big O analysis. If you're not, then you'll need some empirical tests.

In theory
Yes, in practical situations log(n) is bounded by a constant, we'll say 100. However, replacing log(n) by 100 in situations where it's correct is still throwing away information, making the upper bound on operations that you have calculated looser and less useful. Replacing an O(log(n)) by an O(1) in your analysis could result in your large n case performing 100 times worse than you expected based on your small n case. Your theoretical analysis could have been more accurate and could have predicted an issue before you'd built the system.
I would argue that the practical purpose of big-O analysis is to try and predict the execution time of your algorithm as early as possible. You can make your analysis easier by crossing out the log(n) terms, but then you've reduced the predictive power of the estimate.
In practice
If you read the original papers by Larry Page and Sergey Brin on the Google architecture, they talk about using hash tables for everything to ensure that e.g. the lookup of a cached web page only takes one hard-disk seek. If you used B-tree indices to lookup you might need four or five hard-disk seeks to do an uncached lookup [*]. Quadrupling your disk requirements on your cached web page storage is worth caring about from a business perspective, and predictable if you don't cast out all the O(log(n)) terms.
P.S. Sorry for using Google as an example, they're like Hitler in the computer science version of Godwin's law.
[*] Assuming 4KB reads from disk, 100bn web pages in the index, ~ 16 bytes per key in a B-tree node.

As others have pointed out, Big-O tells you about how the performance of your problem scales. Trust me - it matters. I have encountered several times algorithms that were just terrible and failed to meet the customers demands because they were too slow. Understanding the difference and finding an O(1) solution is a lot of times a huge improvement.
However, of course, that is not the whole story - for instance, you may notice that quicksort algorithms will always switch to insertion sort for small elements (Wikipedia says 8 - 20) because of the behaviour of both algorithms on small datasets.
So it's a matter of understanding what tradeoffs you will be doing which involves a thorough understanding of the problem, the architecture, & experience to understand which to use, and how to adjust the constants involved.
No one is saying that O(1) is always better than O(log N). However, I can guarantee you that an O(1) algorithm will also scale way better, so even if you make incorrect assumptions about how many users will be on the system, or the size of the data to process, it won't matter to the algorithm.

Yes, log(N) < 100 for most practical purposes, and No, you can not always replace it by constant.
For example, this may lead to serious errors in estimating performance of your program. If O(N) program processed array of 1000 elements in 1 ms, then you are sure it will process 106 elements in 1 second (or so). If, though, the program is O(N*logN), then it will take it ~2 secs to process 106 elements. This difference may be crucial - for example, you may think you've got enough server power because you get 3000 requests per hour and you think your server can handle up to 3600.
Another example. Imagine you have function f() working in O(logN), and on each iteration calling function g(), which works in O(logN) as well. Then, if you replace both logs by constants, you think that your program works in constant time. Reality will be cruel though - two logs may give you up to 100*100 multiplicator.

The rules of determining the Big-O notation are simpler when you don't decide that O(log n) = O(1).
As krzysio said, you may accumulate O(log n)s and then they would make a very noticeable difference. Imagine you do a binary search: O(log n) comparisons, and then imagine that each comparison's complexity O(log n). If you neglect both you get O(1) instead of O(log2n). Similarly you may somehow arrive at O(log10n) and then you'll notice a big difference for not too large "n"s.

Assume that in your entire application, one algorithm accounts for 90% of the time the user waits for the most common operation.
Suppose in real time the O(1) operation takes a second on your architecture, and the O(logN) operation is basically .5 seconds * log(N). Well, at this point I'd really like to draw you a graph with an arrow at the intersection of the curve and the line, saying, "It matters right here." You want to use the log(N) op for small datasets and the O(1) op for large datasets, in such a scenario.
Big-O notation and performance optimization is an academic exercise rather than delivering real value to the user for operations that are already cheap, but if it's an expensive operation on a critical path, then you bet it matters!

For any algorithm that can take inputs of different sizes N, the number of operations it takes is upper-bounded by some function f(N).
All big-O tells you is the shape of that function.
O(1) means there is some number A such that f(N) < A for large N.
O(N) means there is some A such that f(N) < AN for large N.
O(N^2) means there is some A such that f(N) < AN^2 for large N.
O(log(N)) means there is some A such that f(N) < AlogN for large N.
Big-O says nothing about how big A is (i.e. how fast the algorithm is), or where these functions cross each other. It only says that when you are comparing two algorithms, if their big-Os differ, then there is a value of N (which may be small or it may be very large) where one algorithm will start to outperform the other.

you are right, in many cases it does not matter for pracitcal purposes. but the key question is "how fast GROWS N". most algorithms we know of take the size of the input, so it grows linearily.
but some algorithms have the value of N derived in a complex way. if N is "the number of possible lottery combinations for a lottery with X distinct numbers" it suddenly matters if your algorithm is O(1) or O(logN)

Big-OH tells you that one algorithm is faster than another given some constant factor. If your input implies a sufficiently small constant factor, you could see great performance gains by going with a linear search rather than a log(n) search of some base.

O(log N) can be misleading. Take for example the operations on Red-Black trees.
The operations are O(logN) but rather complex, which means many low level operations.

Whenever N is the amount of objects that is stored in some kind of memory, you're correct. After all, a binary search through EVERY byte representable by a 64-bit pointer can be achieved in just 64 steps. Actually, it's possible to do a binary search of all Planck volumes in the observable universe in just 618 steps.
So in almost all cases, it's safe to approximate O(log N) with O(N) as long as N is (or could be) a physical quantity, and we know for certain that as long as N is (or could be) a physical quantity, then log N < 618
But that is assuming N is that. It may represent something else. Note that it's not always clear what it is. Just as an example, take matrix multiplication, and assume square matrices for simplicity. The time complexity for matrix multiplication is O(N^3) for a trivial algorithm. But what is N here? It is the side length. It is a reasonable way of measuring the input size, but it would also be quite reasonable to use the number of elements in the matrix, which is N^2. Let M=N^2, and now we can say that the time complexity for trivial matrix multiplication is O(M^(3/2)) where M is the number of elements in a matrix.
Unfortunately, I don't have any real world problem per se, which was what you asked. But at least I can make up something that makes some sort of sense:
Let f(S) be a function that returns the sum of the hashes of all the elements in the power set of S. Here is some pesudo:
f(S):
ret = 0
for s = powerset(S))
ret += hash(s)
Here, hash is simply the hash function, and powerset is a generator function. Each time it's called, it will generate the next (according to some order) subset of S. A generator is necessary, because we would not be able to store the lists for huge data otherwise. Btw, here is a python example of such a power set generator:
def powerset(seq):
"""
Returns all the subsets of this set. This is a generator.
"""
if len(seq) <= 1:
yield seq
yield []
else:
for item in powerset(seq[1:]):
yield [seq[0]]+item
yield item
https://www.technomancy.org/python/powerset-generator-python/
So what is the time complexity for f? As with the matrix multiplication, we can choose N to represent many things, but at least two makes a lot of sense. One is number of elements in S, in which case the time complexity is O(2^N), but another sensible way of measuring it is that N is the number of element in the power set of S. In this case the time complexity is O(N)
So what will log N be for sensible sizes of S? Well, list with a million elements are not unusual. If n is the size of S and N is the size of P(S), then N=2^n. So O(log N) = O(log 2^n) = O(n * log 2) = O(n)
In this case it would matter, because it's rare that O(n) == O(log n) in the real world.

I do not believe algorithms where you can freely choose between O(1) with a large constant and O(logN) really exists. If there is N elements to work with at the beginning, it is just plain impossible to make it O(1), the only thing that is possible is move your N to some other part of your code.
What I try to say is that in all real cases I know off you have some space/time tradeoff, or some pre-treatment such as compiling data to a more efficient form.
That is, you do not really go O(1), you just move the N part elsewhere. Either you exchange performance of some part of your code with some memory amount either you exchange performance of one part of your algorithm with another one. To stay sane you should always look at the larger picture.
My point is that if you have N items they can't disappear. In other words you can choose between inefficient O(n^2) algorithms or worse and O(n.logN) : it's a real choice. But you never really go O(1).
What I try to point out is that for every problem and initial data state there is a 'best' algorithm. You can do worse but never better. With some experience you can have a good guessing of what is this intrisic complexity. Then if your overall treatment match that complexity you know you have something. You won't be able to reduce that complexity, but only to move it around.
If problem is O(n) it won't become O(logN) or O(1), you'll merely add some pre-treatment such that the overall complexity is unchanged or worse, and potentially a later step will be improved. Say you want the smaller element of an array, you can search in O(N) or sort the array using any common O(NLogN) sort treatment then have the first using O(1).
Is it a good idea to do that casually ? Only if your problem asked also for second, third, etc. elements. Then your initial problem was truly O(NLogN), not O(N).
And it's not the same if you wait ten times or twenty times longer for your result because you simplified saying O(1) = O(LogN).
I'm waiting for a counter-example ;-) that is any real case where you have choice between O(1) and O(LogN) and where every O(LogN) step won't compare to the O(1). All you can do is take a worse algorithm instead of the natural one or move some heavy treatment to some other part of the larger pictures (pre-computing results, using storage space, etc.)

Let's say you use an image-processing algorithm that runs in O(log N), where N is the number of images. Now... stating that it runs in constant time would make one believe that no matter how many images there are, it would still complete its task it about the same amount of time. If running the algorithm on a single image would hypothetically take a whole day, and assuming that O(logN) will never be more than 100... imagine the surprise of that person that would try to run the algorithm on a very large image database - he would expect it to be done in a day or so... yet it'll take months for it to finish.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio