Shouldn't the average search time for a linked list be O(N/2)? - algorithm

I keep seeing the search time for linked lists listed as O(N) but if you have 100 elements in a list aren't you on average only comparing against 50 of them before you've found a match?
So is O(N/2) being rounded to O(N) or am I just wrong in thinking it's N/2 on average for a linked list lookup?
Thanks!

The thing is, the order is really only talking about how the time increases as n increases.
So O(N) means that you have linear growth. If you double N then the time taken also doubles. N/2 and N both have the same growth behaviour so in terms of Order they are identical.
Functions like log(N), and N^2 on the other hand have non-linear growth, N^2 for example means that if you double N the time taken increases 4 times.
It is all about ratios. If something on average takes 1 minute for 1 item will it on average take 2 minutes or 4 minutes for 2 items? O(N) will be 2 minutes, O(N^2) will take 4 minutes. If the original took 1 second then O(N) will take 2 seconds, O(N^2) 4 seconds.
The algorithm that takes 1 minute and the algorithm that takes 1 second are both O(N)!

The other answers make the main points required for the answer, but there's a detail I'd like to add: If it's not explicitly stated, complexity statements typically discuss the worst case behaviour, rather than the average, simply because analysing the worst case behaviour is frequently a lot easier than the average behaviour. So if I say, without any additional qualifications, search in a linked list is O(N), then I'm admittedly being sloppy, but I would be talking about the worst case of search in a linked list taking a number of steps linear in N.

The O means something. The average number of elements that need to be traversed, to find something in a linked list is N/2. N/2 = O(N).
Note that saying search in linked list on average takes n/2 operations is wrong as operation is not defined. I could argue that for each node you need to read it's value, compare it to what you are searching for and then read the pointer to next node, and thus that the algorithm performs 3N/2 operations on average. Using O notation allows us to avoid such insignificant details.

Related

is anything less than n is log n?

Consider there are two solutions for a problem.
executes in n/2 times ie. if n = 100 then it executes in 50 times
executes in sqrt of n times ie. if n = 100 then it executes in 10 times.
Are both the solutions can be called as O(log N) ?
if so, then there is huge difference between sqrt of N and N/2.
if we can't say O(log N) then can we say it is N ?
But the problem is the difference rate between these two. By the below image the algorithm should come in either of these thing, under which these solutions will come ?
Please help me on this.
Consider the three cases.
Executes n/2 times. That means each time we increase n by a factor of 100, the execution time increases by a factor of 100.
Executes sqrt(n) times. That means each time we increase n by a factor of 100, the execution time increases by a factor of 10.
Executes log(n) times. That means each time we increase n by a factor of 100, the execution time increases by a constant amount.
No, these three things aren't even close to the same. The first is much worse than the second. The third is much better than the second.
Neither of them is O(logn)
Here is an example of O(logn), Binary search algorithm
The best algorithm is the best algorithm for the data that you have. If you don't know what data you have, consider massively large amounts of data, say n= 1 billion. Would you choose O(31623), or O(5000000000)? Graph the comparison and find where your data size is.
If your dataset was n=4, then either algorithm is identical. If you get in to the details, it may actually take longer for the sqrt(n) algorithm due to the operations it conducts.
You can have O(1) which is the fastest. One such example is looking up in a hash map, but your memory size may suffer. So you should consider space constraints as well as time constraints.
You are also misunderstanding and overanalyzing complexity classification. O(n) algorithms are not algorithms that execute with exactly n operations. Any constant multiplier does not affect the Order of the classification. What is important is the grown of the number of operations when the problem grows. Consider two search algorithms.
A) Scan a sorted list sequentially from index 0 to (n-1) to find the number.
B) Scan a sorted list from from index 0 to (n-1), skipping by 2, and backtracking if necessary.
Clearly A takes at most n operations, and B takes n/2+1 operations. Yet they are both O(n). You can say algorithm B is faster, but I might run it on my machine which is twice as fast. So complexity is a general class, one isn't supposed to need to be overly finicky on the details of the operation.
If you were trying to develop a better algorithm, it would be much more useful to search for one with a better complexity class, than one with slightly fewer operations.

Are highest order of growth functions the slowest?

Do the higher order of growth functions take the longest time? So, that O(n2) takes less time than (2n)?
Are the highest order of growth functions take the longest to actually run when N is a large number?
The idea of Big O Notation is to express the worst case scenario of algorithm complexity. For instance, a function using a loop may be described as O(n) even if it contains several O(1) statements, since it may have to run the entire loop over n items.
n is the number of inputs to the algorithm you are measuring. Big-O in terms of n tells you how that algorithm will perform as the number of inputs gets increasingly large. This may mean that it will take more time to execute, or that something will take more space to store. If you notice your algorithm has a high Big-O (like O(2^n) or O(n!)), you should consider an alternate implementation that scales better (assuming n will ever get large -- if n is always small it doesn't matter). The key take-away here is that Big-O is useful for showing you which of two algorithms will scale better, or possibly just the input size for which one algorithm would become a serious bottleneck on performance.
Here is an example image comparing several polynomials which might give you an idea of their growth rates in terms of Big-O. The growth time is as n approaches infinity, which in graphical terms is how sharply the function curves upwards along the y-axis as x grows large.
In case it is not clear, the x-axis here is your n, and the y-axis the time taken. You can see from this how much more quickly, for instance, something O(n^2) would eat up time (or space, or whatever) than something O(n). If you graph more of them and zoom out, you will see the incredible difference in, say, O(2^n) and O(n^3).
Attempting a Concrete Example
Using your example of comparing two string arrays of size 20, let's say we do this (pseudocode since this is language agnostic):
for each needle in string_array_1:
for each haystack in string_array_2:
if needle == haystack:
print 'Found!'
break
This is O(n^2). In the worst case scenario, it has to run completely through the second loop (in case no match is found), which happens on every iteration of the first loop. For two arrays of size 20, this is 400 total iterations. If each array was incrased by just one string to size 21, the total number of iterations in the worst case grows to 441! Obviously, this could get out of hand quickly. What if we had arrays with 1000 or more members? Note that it's not really correct to think of n as being 20 here, because the arrays could be of different sizes. n is an abstraction to help you see how bad things could get under more and more load. Even if string_array_1 was size 20 and string_array_2 was size 10 (or 30, or 5!), this is still O(n^2).
O-time is only relevant when compared against itself, but 2^n will grow faster than n^2.
Compare as n grows:
N n^2 2^n
1 1 2
2 4 4
3 9 8
4 16 16
5 25 32
6 36 64
...
10 100 1024
20 400 1048576
Relevant Link: Wolfram Alpha
You can think in reverse: let an algorithm be such that T(n) = t; how many more elements can I process in time 2.t ?
O(n^2) -> 41% elements more ((n + 0.41 n)^2 = 2.n^2)
O(2^n) -> a single element more (2^(n+1) = 2.2^n)
Or in time 1000000.t ?
O(n^2) -> 1000 times more
O(2^n) -> 20 elements more

What does Wikipedia mean when it says the complexity of inserting an item at the end of a dynamic array is O(1) amortized?

http://en.wikipedia.org/wiki/Dynamic_array#Performance
What exactly does it mean?
I thought inserting at the end would be O(n), as you'd have to allocate say, twice the space of the original array, and then move all the items to that location and finally insert the item. How is this O(1)?
Amortized O(1) efficiency means that the sum of the runtimes of n insertions will be O(n), even if any individual operation may take a lot longer.
You are absolutely correct that appending an element can take O(n) time because of the work required to copy everything over. However, because the array is doubled each time it is expanded, expensive doubling steps happen exponentially less and less frequently. As a result, the total work done in n inserts comes out to be O(n) rather than O(n2).
To elaborate: suppose you want to insert a total of n elements. The total amount of work done copying elements when resizing the vector will be at most
1 + 2 + 4 + 8 + ... + n ≤ 2n - 1
This is because first you copy one element, then twice that, then twice that, etc., and in the absolute worst case copy over all n elements. The sum of this geometric series works out to 2n - 1, so at most O(n) elements get moved across all copy steps. Since you do n inserts and only O(n) total work copying across all of them, the amortized efficiency is O(1) per operation. This doesn't say each operation takes O(1) time, but that n operations takes O(n) time total.
For a graphical intuition behind this, as well as a rationale for doubling the array versus just increasing it by a small amount, you might want to check out these lecture slides. The pictures toward the end might be very relevant.
Hope this helps!
Each reallocation in isolation is O(N), yes. But then on the next N insertions, you don't need to do anything. So the "average" cost per insertion is O(1). We say that "the cost is amortized across multiple operations".

Trying to prove/disprove complexity analysis of an algorithm

I am not looking for an algorithm to the above question. I just want someone to comment on my answer.
I was asked the following question in an interview:
How to get top 100 numbers out of a large set of numbers (can't fit in
memory)
And this is what I said:
Divide the numbers in batches of 1000 each. Sort each batch in "O(1)" time. Total time taken is O(n) up till now. Now take 1st 100 numbers from 1st and 2nd batch (in O(1)). Take 1st 100 from the above computed nos and the 3rd batch and so on. This will take O(n) in total - so it is an O(n) algorithm.
The interviewer replies that sorting a batch of 1000 nos. won't take O(1) time and so won't picking out 1st 100 out of a batch and after a lot of discussion he said, he doesn't have problem with the algo taking O(n) time, he just has a problem with me saying that sorting the batch takes O(1) time.
My explanation was that 1000 doesn't depend on the input (n). Irrespective of what n is, I'll always make batches of 1000 nos. and if you have to calculate, the sorting takes O(1000*log 1000)) which is essentially O(1).
If you have to make proper calculations, it would be
1000*log 1000 to sort one batch
sort (n/1000) such batches
takes 1000 * log 1000 * n/1000 = O(n*log(1000)) time = O(n) time
I asked a lot of my friends also about this and although they agreed with me but partially.
So I wan't to know if my reasoning is 100% accurate (please criticize even if it is 99% correct).
Just remember, this post is not asking for the answer to the above posted question. I have already found a better answer at Retrieving the top 100 numbers from one hundred million of numbers
The interviewer is wrong, but it's useful to consider why. What you're saying is correct, but there is an unstated assumption that you depend on. Possibly, the interviewer is making a different assumption.
If we say that sorting 1000 numbers is O(1), we're being a bit informal. Specifically, what we mean is that, in the limit as N goes to infinity, there is a constant greater than or equal to the cost of sorting the 1000 numbers. Since the cost of sorting the fixed-size set is independent of N, the limit isn't going to depend on N, either. Thus, it's O(1) as N goes to infinity.
A generous interpretation is that the interviewer wanted you to treat the sorting step differently. You could be more precise and say that it was O(M*log(M)) as M goes to infinity (or M goes to N, if you prefer), with M representing the size of the batches of numbers. That would make an overall O(N*log(M)) for your approach, as N and M both approach infinity. Of course, that wasn't the limit you described.
Strictly speaking, it's meaningless to say that something is O(1) without specifying the limit. One usually doesn't need to bother for algorithms, because it's clear from the context: the limit commonly taken is as a single parameter approaches infinity. Your description is correct when considering only N, but you could consider more than just N.
It is indeed O(n) - but the constants are very high, especially considering you will need to read each element from the filesystem twice [once in the sort, and once in the second phase], and file system access, is much slower then memory access. Since this will probably be the bottleneck of the algorithm, your solution will probably run twice slower then using a priority-queue.
Note that for a constant top 100, even the naive solution is O(n):
for each i in range(1,100):
x <- find highest element
remove x from the list
append x to the solution
This solution is also O(n), since you have 100 iteration, in each iteration you need 2 traversals of the list [with some optimisations, 1 traversal per iteration can be done]. So, the total number of traversals is strictly smaller then 1000, and there are no more factors that depend on the size, thus the solution is O(n) - but it is definetly a terrible solution.
I think the interviewer meant that your solution - though O(n) has very large constants.

what does O(N) mean [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
What is Big O notation? Do you use it?
Hi all,
fairly basic scalability notation question.
I recently recieved a comment on a post that my python ordered-list implimentation
"but beware that your 'ordered set' implementation is O(N) for insertions"
Which is great to know, but I'm not sure what this means.
I've seen notation such as n(o) o(N), N(o-1) or N(o*o)
what does the above notation refer to?
The comment was referring to the Big-O Notation.
Briefly:
O(1) means in constant time -
independent of the number of items.
O(N) means in proportion to the
number of items.
O(log N) means a time proportional to
log(N)
Basically any 'O' notation means an operation will take time up to a maximum of k*f(N)
where:
k is a constant multiplier
f() is a function that depends on N
O(n) is Big O Notation and refers to the complexity of a given algorithm. n refers to the size of the input, in your case it's the number of items in your list.
O(n) means that your algorithm will take on the order of n operations to insert an item. e.g. looping through the list once (or a constant number of times such as twice or only looping through half).
O(1) means it takes a constant time, that it is not dependent on how many items are in the list.
O(n^2) means that for every insert, it takes n*n operations. i.e. 1 operation for 1 item, 4 operations for 2 items, 9 operations for 3 items. As you can see, O(n^2) algorithms become inefficient for handling large number of items.
For lists O(n) is not bad for insertion, but not the quickest. Also note that O(n/2) is considered as being the same as O(n) because they both grow at the same rate with n.
It's called Big O Notation: http://en.wikipedia.org/wiki/Big_O_notation
So saying that insertion is O(n) means that you have to walk through the whole list (or half of it -- big O notation ignores constant factors) to perform the insertion.
This looks like a nice introduction: http://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/
Specifically O(n) means that if there's 2x as many items in the list, it'll takes No more than twice as long, if there's 50 times as many it'll take No more than 50 times as long. See the wikipedia article dreeves pointed out for more details
Edit (in bold above): It was pointed out that Big-O does represent the upper bound, so if there's twice as many elements in the list, insertion will take at most twice as long, and if there's 50 times as many elements, it would take at most 50 times as long.
If it was additionally Ω(n) (Big Omega of n) then it would take at least twice as long for a list that is twice as big. If your implementation is both O(n) and Ω(n), meaning that it'll take both at least and at most twice as long for a list twice as big, then it can be said to be Θ(n) (Big Theta of n), meaning that it'll take exactly twice as long if there are twice as many elements.
According to Wikipedia (and personal experience, being guilty of it myself) Big-O is often used where Big-Theta is what is meant. It would be technically correct to call your function O(n^n^n^n) because all Big-O says is that your function is no slower than that, but no one would actually say that other than to prove a point because it's not very useful and misleading information, despite it being technically accurate.
It refers to how complex your program is, i.e., how many operations it takes to actually solve a problem. O(n) means that each operation takes the same number of steps as the items in your list, which for insertion, is very slow. Likewise, if you have O(n^2) means that any operation takes "n" squared number of steps to accomplish, and so on... The "O" is for Order of Magnitude, and the the expression in the parentheses is always related to the number of items being manipulated in the procedure.
Short answer: It means that the processing time is in linear relation to the size of input. E.g if the size of input (length of list) triples, the processing time (roughly) triples. And if it increases thousandfold, the processing time also increases in the same magnitude.
Long answer: See the links provided by Ian P and dreeves
This may help:
http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions
O(n): Finding an item in an unsorted
list or a malformed tree (worst case);
adding two n-digit numbers
Good luck!
Wikipedia explains it far better than I can, however it means that if your list size is N, it takes at max N loops/iterations to insert an item. (In effect, you have to iterate over the whole list)
If you want a better understanding, there is a free book from Berkeley that goes more in-depth about the notation.

Resources