Unable to deduce a table about algorithms' effectiveness - algorithm

I am not completely sure about the following table
alt text http://files.getdropbox.com/u/175564/algTranslation.png
The table provides the size of problem that can be solved in the time limit given in the left-hand column when the algorithmic complexity is of the given size.
I am interested in the deduction of the table.
The table suggests me that
O(n) = 10M in a second (This seems to be the power of current computers)
n is the number of items to process # Thanks to Guffa!
I am not sure how the values in the column of O(n * log(n)) have been deduced.
How can you deduce the value 0.5M for O(n * log(n)) or 3000 for O(n^2)?

No, n is not the number of seconds, it's the number of items to process.
O(n) means that the time to process the items is linear to the number of items.
O(n²) means that the time to proess the items is relative to the square of the number of items. If you double the number of items, the processing time will be four times longer.
See: Big O notation
The table assumes that there is a fixed amount of work per item, although the big O notation only specifies how an algorithm reacts to a change in number of items, it doesn't tell you anything about how much work there is per item.
Edit:
The values along the x axis of the table are just approximations based on the assumption that the work per item is the same. For example the value 3000 for O(n²) is rounded from the square root of 10 millions, which is ~3162.28. The cubic root of 10 millions is not 200, it's ~215.44.
In a real situatuon, two algorithms rarely do the same amount of work per item. An algorithm with O(log n) typically does more work per item than an O(n) algorithm for the same purpose, but it's still preferrable in most situations because it scales a lot better.

I think that this table gives simply some very approximate illustration how big n can be for different kind of complexities when you have fixed time (1 second, 1 minute, 1 hour, 1 day or 1 year) at your disposal.
For example O(n^3):
1 second: 200^3 = 8 000 000 (roughly 10 million, given in O(n) column)
1 minute: 850^3 = 614 125 000 (roughly 600 million, given in O(n) column))
1 hour: 3000^3 = 27 000 000 000 (somewhat roughly 35 billion, given in O(n) column)
As you can see, the number are very rough approximations. It seems that author has wanted to use nice round numbers to illustrate his point.

if you can do 10,000,000 ops per second, then when you set n = 500,000 and calculate n * log(n) = 500,000 * log2(500,000) = 500,000 * 18 = 9,000,000 ops which is roughly 10,000,000 for the purposes of the "seconds" classification.
Similarly, with n = 3,000 you get n^2 = 9,000,000. So on every line the number of operations is roughly the same.

Related

Big O Notation O(n^2) what does it mean?

For example, it says that in 1 sec 3000 number are sorted with selection sort. How can we predict how many numbers are gonna be sorted in 10 sec ?
I checked that selection sort needs O(n^2) but I dont understand how I am gonna calculate how many numbers are gonna be sorted in 10 sec.
We cannot use big O to reliably extrapolate actual running times or input sizes (whichever is the unknown).
Imagine the same code running on two machines A and B, different parsers, compilers, hardware, operating system, array implementation, ...etc.
Let's say they both can parse and run the following code:
procedure sort(reference A)
declare i, j, x
i ← 1
n ← length(A)
while i < n
x ← A[i]
j ← i - 1
while j >= 0 and A[j] > x
A[j+1] ← A[j]
j ← j - 1
end while
A[j+1] ← x[3]
i ← i + 1
end while
end procedure
Now system A spends 0.40 seconds on the initialisation part before the loop starts, independent on what A is, because on that configuration the initiation of the function's execution context including the allocation of the variables is a very, very expensive operation. It also needs to spend 0.40 seconds on the de-allocation of the declared variables and the call stack frame when it arrives at the end of the procedure, again because on that configuration the memory management is very expensive. Furthermore, the length function is costly as well, and takes 0.19 seconds. That's a total overhead of 0.99 seconds
On system B this memory allocation and de-allocation is cheap and takes 1 microsecond. Also the length function is fast and needs 1 microsecond. That's a total overhead that is 2 microseconds.
System A is however much faster on the rest of the statements in comparison with system B.
Both implementations happen to need 1 second to sort an array A having 3000 values.
If we now take the reasoning that we could predict the array size that can be sorted in 10 seconds based on the results for 1 second, we would say:
𝑛 = 3000, and the duration is 1 second which corresponds to 𝑛² = 9 000 000 operations. So if 9 000 000 operations corresponds to 1 second, then 90 000 000 operations correspond to 10 seconds, and 𝑛 = √(𝑛²) ~= 9 487 (the size of the array that can be sorted in 10 seconds).
However, if we follow the same reasoning, we can look at the time needed for completing the outer loop only (without the initialisation overhead), which also is O(𝑛²) and thus the same reasoning can be followed:
𝑛 = 3000, and the duration in system A is 0.01 second which corresponds to 𝑛² = 9 000 000 operations. So if 9 000 000 operations can be executed in 0.01 second then in 10 - 0.99 seconds (overhead is subtracted) we can execute 9.01 / 0.01 operations, i.e 𝑛² = 8 109 000 000 operations, and now 𝑛 = √(𝑛²) ~= 90 040.
The problem is that using the same reasoning on big O, the predicted outcomes differ by a factor of about 10!
We may be tempted to think that this is now only a "problem" of constant overhead, but similar things can be said about operations in the outer loop. For instance it might be that x ← A[i] has a relatively high cost for some reason on some system. These are factors that are not revealed in the big O notation, which only retains the most significant factor, omitting linear and constant factors that play a role.
The actual running time for an actual input size is dependent on a more complex function that is likely close to polynomial, like 𝑛² + 𝑎𝑛 + 𝑏. These coefficients 𝑎, and 𝑏 would be needed to make a more reasonable prediction possible. There might even be function components that are non-polynomial, like 𝑛² + 𝑎𝑛 + 𝑏 + 𝑐√𝑛... This may seem unlikely, but systems on which the code runs may do all kinds of optimisations while code runs which may have such or similar effect on actual running time.
The conclusion is that this type of reasoning gives no guarantee that the prediction is anywhere near the reality -- without more information about the actual code, system on which it runs,... etc, it is nothing more than a guess. Big O is a measure for asymptotic behaviour.
As the comments say, big-oh notation has nothing to do with specific time measurements; however, the question still makes sense, because the big-oh notation is perfectly usable as a relative factor in time calculations.
Big-oh notation gives us an indication of how the number of elementary operations performed by an algorithm varies as the number of items to process varies.
Simple algorithms perform a fixed number of operations per item, but in more complicated algorithms the number of operations that need to be performed per item varies as the number of items varies. Sorting algorithms are a typical example of such complicated algorithms.
The great thing about big-oh notation is that it belongs to the realm of science, rather than technology, because it is completely independent of your hardware, and of the speed at which your hardware is capable of performing a single operation.
However, the question tells us exactly how much time it took for some hypothetical hardware to process a certain number of items, so we have an idea of how much time that hardware takes to perform a single operation, so we can reason based on this.
If 3000 numbers are sorted in 1 second, and the algorithm operates with O( N ^ 2 ), this means that the algorithm performed 3000 ^ 2 = 9,000,000 operations within that second.
If given 10 seconds to work, the algorithm will perform ten times that many operations within that time, which is 90,000,000 operations.
Since the algorithm works in O( N ^ 2 ) time, this means that after 90,000,000 operations it will have sorted Sqrt( 90,000,000 ) = 9,486 numbers.
To verify: 9,000,000 operations within a second means 1.11e-7 seconds per operation. Since the algorithm works at O( N ^ 2 ), this means that to process 9,486 numbers it will require 9,486 ^ 2 operations, which is roughly equal to 90,000,000 operations. At 1.11e-7 seconds per operation, 90,000,000 operations will be done in roughly 10 seconds, so we are arriving at the same result via a different avenue.
If you are seriously pursuing computer science or programming I would recommend reading up on big-oh notation, because it is a) very important and b) a very big subject which cannot be covered in stackoverflow questions and answers.

Finding Big O Notation Clarification without function

So this example was on this site and it was pretty clear, but what if you have this instead:
N, average sec:(1,000: 2.7)(2,000: 3.04) (4,000: 3.6)(8,000: 3.7)(16,000: 4)?
N doubles every time (2*N) and the average time starts to level off. I can guess from looking at the examples below (O(logN)), but can someone clarify how you would calculate the problem?
O(1): known as Constant complexity
1 item: 1 second
10 items: 1 second
100 items: 1 second
The number of items is still increasing by a factor of 10, but the scaling factor of O(1) is always 1.
O(log n): known as Logarithmic complexity
1 item: 1 second
10 items: 2 seconds
100 items: 3 seconds
1000 items: 4 seconds
You'd do a regression analysis based on a log curve fit. You can start by plotting your data to get a visual confirmation.
A log fit in Wolfram Alpha would for example produce:
This shows that you're right and the growth seems to be logarithmic (for the provided data).
However, be aware that time measurements are not equal to an actual complexity analysis which is a formal proof rather than a curve fit to empirical data (which can be distorted for a number of reasons).

Finding the constant c in the time complexity of certain algorithms

I need help finding and approximating the constant c in the complexity of insertion sort (cn^2) and merge sort (cnlgn) by inspecting the results of their running times.
A bit of background, my purpose was to "implement insertion sort and merge sort (decreasing order) algorithms and measure the performance of these two algorithms. For each algorithm, and for each n = 100, 200, 300, 400, 500, 1000, 2000, 4000, measure its running time when the input is
already sorted, i.e. n, n-1, …, 3, 2,1;
reversely sorted 1, 2, 3, … n;
random permutation of 1, 2, …, n.
The running time should exclude the time for initialization."
I have done the code for both algorithms and put the measurements (microseconds) in a spreadsheet. Now, I'm not sure how to find this c due to differing values for each condition of each algorithm.
For reference, the time table:
InsertionSort MergeSort
n AS RS Random AS RS Random
100 12 419 231 192 191 211
200 13 2559 1398 1303 1299 1263
300 20 236 94 113 113 123
400 25 436 293 536 641 556
500 32 504 246 91 81 105
1000 65 1991 995 169 246 214
2000 9 8186 4003 361 370 454
4000 17 31777 15797 774 751 952
I can provide the code if necessary.
It's hardly possible to determine values of these constants, especially for modern processors that uses caches, pipelines, and other "performance things".
Of course, you can try to find an approximation, and then you'll need Excel or any other spreadsheet.
Enter your data, create chart, and then add trendline. The spreadsheet calculates the values of constants for you.
First to understand is, that complexity and running times are not the same and maybe does not have very much to do with each other.
The complexity is a theoretical measurement to get an idea of how an algorithm slow down on bigger inputs compared to smaller inputs or compared to other algorithms.
The running time depends on the exact implementation, the computer it is running on, the other programms that run on the same computer and many other things. You will also notice, that the running time will slow down if the input is to big for your cache, and jump an other time if its also to big for your RAM. As you can see for n = 200 you got some weird running times. This will not help you finding the constants.
In cases where you don't have the code, you have no other choise to use the running times to approximat the complexity. Then you should use only big inputs (1000 should be the smallest input in your case). If your algorithm is deterministic, just input the worst case. Random cases can be good and bad, and so you never get anything about the real complexity. An other problem is, that the complexity measures "operations", so evaluating and if-statement or incrementing a variable is the same, but in running time an if needs more time than an incrementing something.
So what you can do is to plot your complexity and the values you measured and look for a factor that holds...
E.g. This is a plot of n² skaled by 1/500 and the points from your chart.
First some notes:
you have very small n
The algorithm complexity start corresponding to runtime only if n is big enough. For n=4000 is ~4KB of data which can still fit into most of CPU CACHE's so increasing to at least n=1000000 can and will change the relation between runtime and n considerably !
Runtime measurement
for random data you need the average runtime measurement not single one so for any n do at least 5 measurements each with different dataset and use average time from all
Now how to obtain c
If program has complexity O(n^2) it means that for big enough n the runtime is:
t(n)=c*n^2
so take few measurements. I choose last 3 from your insert sort, reverse sorted because that should match the worst case O(n^2) complexity if I am not mistaken so:
c*n^2 =t(n)
c*1000^2= 1.991
c*2000^2= 8.186
c*4000^2=31.777
solve the equations:
c=t(n)/(n^2)
c= 1.991/ 1000000=1.991 us
c= 8.186/ 4000000=2.0465 us
c=31.777/16000000=1.9860625 us
If everything is alright then the c for different n should be relatively the same. In your case it is around 2 us per element but as I mentioned above with increasing n this will change due to CACHE usage. Also if any dynamic container is used then you have to include complexity of its usage to the algorithm which can be sometimes significant !!!
Take the case of 4000 elements and divide the time by the respective complexity estimate, 4000² or 4000 Lg 4000.
This is not worse than any other method.
For safety, you should check anyway that the last values align on a relatively smooth curve, so that the value for 4000 is representative.
As others commented, this is rather poor methodology. You should also consider the standard deviation of the running times, or even better, the histogram of running times, and cover a larger range of sizes.
On another hand, getting accurate values is not so important as knowing the values of the constants is not helpful to compare the two algorithms.

Data structure for storing data and calculating average

In this problem, we are interested in a data structure that supports keeping infinite numbers of Y axis parallel vectors.
Each node contains location (X axis value) and height (Y axis value). We can assume there are no two vectors in the same location.
Please advise for an efficient data structure that supports:
init((x1,y1)(x2,y2)(x3,y3)...(xn,yn)) - the DS will contain all n vectors, while VECTOR#i's location is xi VECTOR#i's hieght is yi.
We also know that x1 < x2 < x3 < ... < xn (nothing is known about the y) - complexity = O(n) on average
insert(x,y) - add vector with location x and height y. - complexity = O(logn) amortized on average.
update(x,y) - update vector#x's height to y. - complexity = O(logn) worst case
average_around(x) - return the heights average of logn neighbors of x - complexity = O(1) on average
Space Complexity: O(n)
I can't provide a full answer, but it might be a hint into the right direction.
Basic ideas:
Let's assume you've calculated the average of n numbers a_1,...,a_n, then this average is avg=(a_1+...+a_n)/n. If we now replace a_n by b, we can recalculate the new average as follows: avg'=(a_1+...+a_(n-1)+b)/n, or - simpler - avg'=((avg*n)-a_n+b)/n. That means, if we exchange one element, we can recompute the average using the original average value by simple, fast operations, and don't need to re-iterate over all elements participating in the average.
Note: I assume that you want to have log n neighbours on each side, i.e. in total we have 2 log(n) neighbours. You can simply adapt it if you want to have log(n) neighbours in total. Moreover, since log n in most cases won't be a natural number, I assume that you are talking about floor(log n), but I'll just write log n for simplicity.
The main thing I'm considering is the fact that you have to tell the average around element x in O(1). Thus, I suppose you have to somehow precompute this average and store it. So, i would store in a node the following:
x value
y value
average around
Note that update(x,y) runs strictly in O(log n) if you have this structure: If you update element x to height y, you have to consider the 2log(n) neighbours whose average is affected by this change. You can recalculate each of these averages in O(1):
Let's assume, update(x,y) affects an element b, whose average is to be updated as well. Then, you simply multiply average(b) by the number of neighbours (2log(n) as stated above). Then, we subtract the old y-value of element x, and add the new (updated) y-value of x. After that, we divide by 2 log(n). This ensures that we now have the updated average for element b. This involved only some calculations and can thus be done in O(1). Since we have 2log n neighbours, update runs in O(2log n)=O(log n).
When you insert a new element e, you have to update the average of all elements affected by this new element e. This is essentially done like in the update routine. However, you have to be careful when log n (or precisely floor(log n)) changes its value. If floor(log n) stays the same (which it will, in most cases), then you can just do the analogue things described in update, however you will have to "remove" the height of one element, and "add" the height of the newly added element. In these "good" cases, run time is again strictly O(log n).
Now, when floor(log n) is changing (incrementing by 1), you have to perform an update for all elements. That is, you have to do an O(1) operation for n elements, resulting in a running time of O(n). However, it is very seldom the case that floor(log n) increments by 1 (you need to double the value of n to increment floor(log n) by 1 - assuming we are talking about log to base 2, which is not uncommon in computer science). We denote this time by c*n or simply cn.
Thus, let's consider a sequence of inserts: The first insert needs an update: c*1, the second insert needs an update: 2*c. The next time an expensive insert occurs, is the fourth insert: 4*c, then the eight insert: 8c, the sixtenth insert: 16*c. The distance between two expensive inserts is doubled each time:
insert # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ..
cost 1c 2c 1 4c 1 1 1 8c 1 1 1 1 1 1 1 16c 1 1 ..
Since no remove is required, we can continue with our analysis without any "special cases" and consider only a sequence of inserts. You see that most inserts cost 1, while few are expensive (1,2,4,8,16,32,...). So, if we have m inserts in total, we have roughly log m expensive inserts, and roughly m-log m cheap inserts. For simplicity, we assume simply m cheap inserts.
Then, we can compute the cost for m inserts:
log m
----
\ i
m*1 + / 2
----
i=0
m*1 counts the cheap operations, the sum the expensive ones. It can be shown that the whole thing is at most 4m (in fact you can even show better estimates quite easily, but for us this suffices).
Thus, we see that m insert operations cost at most 4m in total. Thus, a single insert operation costs at most 4m/m=4, thus is O(1) amortized.
So, there are 2 things left:
How to store all the entries?
How to initialize the data structure in O(n)?
I suggest storing all entries in a skip-list, or some tree that guarantees logarithmic search-operations (otherwise, insert and update require more than O(log n) for finding the correct position). Note that the data structure must be buildable in O(n) - which should be no big problem assuming the elements are sorted according to their x-coordinate.
To initialize the data structure in O(n), I suggest beginning at element at index log n and computing its average the simple way (sum up, the 2log n neighbours, divide by 2 log n).
Then you move the index one further and compute average(index) using average(index-1): average(index)=average(index-1)*log(n)-y(index-1-log(n))+y(index+log(n)).
That is, we follow a similar approach as in update. This means that computing the averages costs O(log n + n*1)=O(n). Thus, we can compute the averages in O(n).
Note that you have to take some details into account which I haven't described here (e.g. border cases: element at index 1 does not have log(n) neighbours on both sides - how do you proceed with this?).

Big O confusion

I'm testing out some functions I made and I am trying to figure out the time complexity.
My problem is that even after reading up on some articles on Big O I can't figure out what the following should be:
1000 loops : 15000 objects : time 6
1000 loops : 30000 objects : time 9
1000 loops : 60000 objects : time 15
1000 loops : 120000 objects : time 75
The difference between the first 2 is 3 ms, then 6 ms, and then 60, so the time doubles up with each iteration. I know this isn't O(n), and I think it's not O(log n).
When I try out different sets of data, the time doesn't always go up. For example take this sequence (ms): 11-17-26-90-78-173-300
The 78 ms looks out of place. Is this even possible?
Edit:
NVM, I'll just have to talk this over with my college tutor.
The output of time differs too much with different variables.
Thanks for those who tried to help though!
Big O notation is not about how long it takes exactly for an operation to complete. It is a (very rough) estimation of how various algorithms compare asymptotically with respect to changing input sizes, expressed in generic "steps". That is "how many steps does my algorithm do for an input of N elements?".
Having said this, note that in the Big O notation constants are ignored. Therefore a loop over N elements doing 100 calculations at each iteration would be 100 * N but still equal to O(N). Similarly, a loop doing 10000 calculations would still be O(N).
Hence in your example, if you have something like:
for(int i = 0; i < 1000; i++)
for(int j = 0; j < N; j++)
// computations
it would be 1000 * N = O(N).
Big O is just a simplified algorithm running time estimation, which basically says that if an algorithm has running time O(N) and another one has O(N^2) then the first one will eventually be faster than the second one for some value of N. This estimation of course does not take into account anything related to the underlying platform like CPU speed, caching, I/O bottlenecks, etc.
Assuming you can't get O(n) from theory alone, then I think you need to look over more orders of magnitude in O(n) -- at least three, preferably six or more (you will just need to experiment to see what variation in n is required). Leave the thing running overnight if you have to. Then plot the results logarithmically.
Basically I suspect you are looking at noise right now.
Without seeing your actual algorithm, I can only guess:
If you allow a constant initialisation overhead of 3ms, you end up with
1000x15,000 = (OH:3) + 3
1000x30,000 = (OH:3) + 6
1000x60,000 = (OH:3) + 12
This, to me, appears to be O(n)
The disparity in your timestamping of differing datasets could be due to any number of factors.

Resources